Decisions in Python Comparing Strings ASCII History Comparing

  • Slides: 10
Download presentation
Decisions in Python Comparing Strings – ASCII History

Decisions in Python Comparing Strings – ASCII History

Comparing strings • Comparing numbers is easy, how do you compare strings? • Remember

Comparing strings • Comparing numbers is easy, how do you compare strings? • Remember that everything inside the computer is a number – yes, everything! • So every character on the keyboard (and more) are represented as a number inside • Does that mean you have to memorize a lot of numbers to compare strings? No!

Comparing strings • If the operator is ==, then the strings have to be

Comparing strings • If the operator is ==, then the strings have to be exactly identical – same spacing, same case, everything has to be the same to give a True (!= gives a False then) • If the operator is one of the other relational operators, how do you tell if one string is less than another? greater than? By using their ASCII codes!

ASCII • American Standard Code for Information Interchange • Early computer manufacturers each had

ASCII • American Standard Code for Information Interchange • Early computer manufacturers each had their own code for characters • Users didn’t care as long as it worked, until they wanted to share / trade / sell information with someone who had a different brand of computer

ASCII • Some small companies made a living translating data from one company’s code

ASCII • Some small companies made a living translating data from one company’s code to another • The computing community decided they needed a standard code for characters for all computers • Having one helps the Internet send messages between dissimilar computers • Other codes were considered but ASCII was the winner in the mid 1960’s • ASCII has 256 codes, each character takes up 1 byte

ASCII • ASCII is designed so that the alphabetic characters are in numeric order

ASCII • ASCII is designed so that the alphabetic characters are in numeric order also. So ‘A’ < ‘B’ < ‘C’ < ‘D’ < … < ‘Z’ • Other characters: the lower case alphabetic characters are also in numeric order ‘a’ < ‘b’ < ‘c’ < ‘d’ < … < ‘z’ • The lower case letters come after the upper case letters in ASCII – an arbitrary decision

ASCII • What about digits? ‘ 0’, ‘ 1’, ‘ 2’, ‘ 3’, …?

ASCII • What about digits? ‘ 0’, ‘ 1’, ‘ 2’, ‘ 3’, …? • They have codes also. ASCII is designed so those digits are also in numeric order ‘ 0’ < ‘ 1’ < ‘ 2’ < ‘ 3’ < … < ‘ 9’ • The codes for the digits are less than the upper case letters – an arbitrary decision • Just one more! ‘ ‘ = 1 space, is the lowest printable character, lower than all the other characters discussed

ASCII • • • Summary of the order you should know First comes the

ASCII • • • Summary of the order you should know First comes the space character Then the digits ‘ 0’ ‘ 1’ ‘ 2’ ‘ 3’ ‘ 4’ ‘ 5’ ‘ 6’ ‘ 7’ ‘ 8’ ‘ 9’ Then the upper case ‘A’, ‘B’, ‘C’, ‘D’, … ‘Z’ Then the lower case ‘a’, ‘b’, ‘c’, ‘d’, …, ‘z’ There are other characters in the code – control codes, punctuation, etc. you do NOT have to know any of those.

Unicode • For a while ASCII’s 256 codes was sufficient. It contained the English

Unicode • For a while ASCII’s 256 codes was sufficient. It contained the English (Roman) alphabet, digits and some pnc. • What about other countries / languages? • Eventually it was decided that a code that could represent more languages’ symbols had to be created. • Unicode came about in the 90’s – first 256 characters are the ASCII code • Each character takes up 2 bytes (or more!) so Unicode has at least 65 thousand character codes

What does Unicode mean to me? • It takes up more room than a

What does Unicode mean to me? • It takes up more room than a plain ASCII text file, which means more room on your HD, more time uploading and downloading • As a programmer, you may run into “wide characters” or “fat characters” which will require different libraries / methods to handle • If you get documents in foreign languages, you will be better able to get them translated correctly (Unicode does not do translation!)