CSC 1015 F Chapter 5 Strings and Input

  • Slides: 54
Download presentation
CSC 1015 F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs. uct. ac.

CSC 1015 F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs. uct. ac. za

The String Data Type Used for operating on textual information Think of a string

The String Data Type Used for operating on textual information Think of a string as a sequence of characters To create string literals, enclose them in single, double, or triple quotes as follows: a = "Hello World" b = 'Python is groovy' c = """Computer says 'Noooo'""" 2

Comments and docstrings It is common practice for the first statement of function to

Comments and docstrings It is common practice for the first statement of function to be a documentation string describing its usage. For example: def hello: “””Hello World function””” print(“Hello”) print(“I love CSC 1015 F”) This is called a “docstring” and can be printed thus: print(hello. __doc__) 3

Comments and docstrings Try printing the doc string for functions you have been using,

Comments and docstrings Try printing the doc string for functions you have been using, e. g. : print(input. __doc__) print(eval. __doc__) 4

Checkpoint Str 1: Strings and loops. What does the following function do? def one.

Checkpoint Str 1: Strings and loops. What does the following function do? def one. At. ATime(word): for c in word: print("give us a '", c, "'. . . ", c, "!", sep='') print("What do you have? -", word) 5

Checkpoint Str 1 a: Indexing examples does this function do? def str 1 a(word):

Checkpoint Str 1 a: Indexing examples does this function do? def str 1 a(word): for i in word: if i in "aeiou": continue print(i, end='') 0 H 6 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b

Some BUILT IN String functions/methods s. capitalize() Capitalizes the first character. s. count(sub) Count

Some BUILT IN String functions/methods s. capitalize() Capitalizes the first character. s. count(sub) Count the number of occurences of sub in s s. isalnum() Checks whether all characters are alphanumeric. s. isalpha() Checks whether all characters are alphabetic. s. isdigit() Checks whether all characters are digits. s. islower() Checks whether all characters are lowercase. s. isspace() Checks whether all characters are 7 whitespace.

Some BUILT IN String functions/methods s. istitle() Checks whether the string is a titlecased

Some BUILT IN String functions/methods s. istitle() Checks whether the string is a titlecased string (first letter of each word capitalized). s. isupper() Checks whether all characters are uppercase. s. join(t) Joins the strings in sequence t with s as a separator. s. lower() Converts to lowercase. s. lstrip([chrs]) Removes leading whitespace or characters supplied in chrs. s. upper() Converts a string to uppercase. 8

Some BUILT IN String functions/methods s. replace(oldsub, newsub) Replace all occurrences of oldsub in

Some BUILT IN String functions/methods s. replace(oldsub, newsub) Replace all occurrences of oldsub in s with newsub s. find(sub) 9 Find the first occurrence of sub in s

BUILT IN String functions/methods Try printing the doc string for str functions: print(str. isdigit.

BUILT IN String functions/methods Try printing the doc string for str functions: print(str. isdigit. __doc__) 10

The String Data Type As string is a sequence of characters, we can access

The String Data Type As string is a sequence of characters, we can access individual characters called indexing 0 H 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b form: <string>[<expr>] 11 The last character in a string of n characters has index n-1

String functions: len tells you how many characters there are in a string: len(“Jabberwocky”)

String functions: len tells you how many characters there are in a string: len(“Jabberwocky”) len(“Twas brillig and the slithy toves did gyre and gimble in the wabe”) 12

Checkpoint Str 2: Indexing examples What does this function do? def str 2(word): for

Checkpoint Str 2: Indexing examples What does this function do? def str 2(word): for i in range(0, len(word), 2): print(word[i], end='') 0 H 13 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b

More Indexing examples - indexing from the end What is the output of these

More Indexing examples - indexing from the end What is the output of these lines? greet =“Hello Bob” greet[-1] greet[-2] greet[-3] 0 H 14 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b

Checkpoint Str 3 What is the output of these lines? def str 3(word): for

Checkpoint Str 3 What is the output of these lines? def str 3(word): for i in range(len(word)-1, -1): print(word[i], end='') 0 H 15 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b

Chopping strings into pieces: slicing The previous examples can be done much more simply:

Chopping strings into pieces: slicing The previous examples can be done much more simply: slicing indexes a range – returns a substring, starting at the first position and running up to, but not including, the last position. 16

Examples - slicing What is the output of these lines? greet =“Hello Bob” greet[0:

Examples - slicing What is the output of these lines? greet =“Hello Bob” greet[0: 3] greet[5: 9] greet[: 5] greet[5: ] greet[: ] 0 H 17 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b

Checkpoint Str 4: Strings and loops. What does the following function do? def s.

Checkpoint Str 4: Strings and loops. What does the following function do? def s. Tree(word): for i in range(len(word)): print(word[0: i+1]) 18

Checkpoint Str 5: Strings and loops. What does the following code output? def s.

Checkpoint Str 5: Strings and loops. What does the following code output? def s. Tree 2(word): step=len(word)//3 for i in range(step, step*3+1, step): for j in range(i): print(word[0: j+1]) print("**n") s. Tree 2(“strawberries”) 19

More info on slicing The slicing operator may be given an optional stride, s[i:

More info on slicing The slicing operator may be given an optional stride, s[i: j: stride], that causes the slice to skip elements. 20 Then, i is the starting index; j is the ending index; and the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included). The stride may also be negative. If the starting index is omitted, it is set to the beginning of the sequence if stride is positive or the end of the sequence if stride is negative. If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative.

More on slicing Here are some examples with strides: a = "Jabberwocky” b =

More on slicing Here are some examples with strides: a = "Jabberwocky” b = a[: : 2] # b = 'Jbewcy' c = a[: : -2] # c = 'ycweb. J' d = a[0: 5: 2] # d = 'Jbe' e = a[5: 0: -2] # e = 'rba' f = a[: 5: 1] # f = 'Jabbe' g = a[: 5: -1] # g = 'ykcow' h = a[5: : 1] # h = 'rwocky' i = a[5: : -1] # i = 'rebba. J' j = a[5: 0: -1] # 'rebba' 21

Checkpoint Str 6: strides What is the output of these lines? greet =“Hello Bob”

Checkpoint Str 6: strides What is the output of these lines? greet =“Hello Bob” greet[8: 5: -1] 0 H 22 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b

Checkpoint Str 7: Slicing with strides How would you do this function in one

Checkpoint Str 7: Slicing with strides How would you do this function in one line with no loops? def str 2(word): for i in range(0, len(word), 2): print(word[i], end='') 0 H 23 1 e 2 l 3 l 4 o 5 6 B 7 o 8 b

Checkpoint Str 8: What does this code display? #checkpoint. Str 8. py def crunch(s):

Checkpoint Str 8: What does this code display? #checkpoint. Str 8. py def crunch(s): m=len(s)//2 print(s[0], s[m], s[-1], sep='+') crunch("omelette") crunch("bug") 24

Example: filters Pirate, Elmer Fudd, Swedish Cheff produce parodies of English speech How would

Example: filters Pirate, Elmer Fudd, Swedish Cheff produce parodies of English speech How would you write one in Python? 25

Example: Genetic Algorithms (GA’s) GA’s attempt to mimic the process of natural evolution in

Example: Genetic Algorithms (GA’s) GA’s attempt to mimic the process of natural evolution in a population of individuals use the principles of selection and evolution to produce several solutions to a given problem. biologically-derived techniques such as inheritance, mutation, natural selection, and recombination a computer simulation in which a population of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions. over time, those genetic changes which enhance the viability of an organism tend to predominate

Bioinformatics Example: Crossover (recombination) Evolution works at the chromosome level through the reproductive process

Bioinformatics Example: Crossover (recombination) Evolution works at the chromosome level through the reproductive process portions of the genetic information of each parent are combined to generate the chromosomes of the offspring this is called crossover

Crossover Methods Single-Point Crossover randomly-located cut is made at the pth bit of each

Crossover Methods Single-Point Crossover randomly-located cut is made at the pth bit of each parent and crossover occurs produces 2 different offspring

Gene splicing example (for genetic algorithms) We can now do a cross-over! Crossover 3.

Gene splicing example (for genetic algorithms) We can now do a cross-over! Crossover 3. py 29

Example: palindrome program palindrome |ˈpalɪndrəʊm|noun a word, phrase, or sequence that reads the same

Example: palindrome program palindrome |ˈpalɪndrəʊm|noun a word, phrase, or sequence that reads the same backward as forward, e. g. , madam or nurses run In Python, write a program to check whether a word is a palindrome. You don’t need to use loops… 30

String representation and message encoding On the computer hardware, strings are also represented as

String representation and message encoding On the computer hardware, strings are also represented as zeros and ones. Computers represent characters as numeric codes, a unique code for each digit. an entire string is stored by translating each character to its equivalent code and then storing the whole thing as as a sequence of binary numbers in computer memory There used to be a number of different codes for storing characters 31 which caused serious headaches!

ASCII (American Standard Code for Information Interchange) An important character encoding standard are used

ASCII (American Standard Code for Information Interchange) An important character encoding standard are used to represent numbers found on a typical (American) computer keyboard as well as some special control codes used for sending and recieveing information A-Z uses values in range 65 -90 a-z uses values in range 97 -122 in use for a long time: developed for teletypes American-centric Extended ASCII codes have been developed 32

33

33

Unicode A character set that includes all the ASCII characters plus many more exotic

Unicode A character set that includes all the ASCII characters plus many more exotic characters http: //www. unicode. org Python supports Unicode standard ord returns numeric code of a character chr returns character corresponding to a code 34 Unicodes for Cuneiform

Characters in memory Smallest addressable piece of memory is usually 8 bits, or a

Characters in memory Smallest addressable piece of memory is usually 8 bits, or a byte 35 how many characters can be represented by a byte?

Characters in memory Smallest addressable piece of memory is usually 8 bits, or a

Characters in memory Smallest addressable piece of memory is usually 8 bits, or a byte 36 how many characters can be represented by a byte? 256 different values (28) is this enough?

Characters in memory Smallest addressable piece of memory is usually 8 bits, or a

Characters in memory Smallest addressable piece of memory is usually 8 bits, or a byte 37 256 different values is enough for ASCII (only a 7 bit code) but not enough for UNICODE, with 100 000+ possible characters UNICODE uses different schemes for packing UNICODE characters into sequences of bytes UTF-8 most common uses a single byte for ASCII up to 4 bytes for more exotic characters

Comparing strings conditions may compare numbers or strings when strings are compared, the order

Comparing strings conditions may compare numbers or strings when strings are compared, the order is lexographic strings are put into order based on their Unicode values e. g “Bbbb” < “bbbb” “B” <”a” 38

The min function… min(iterable[, key=func]) -> value min(a, b, c, . . . [,

The min function… min(iterable[, key=func]) -> value min(a, b, c, . . . [, key=func]) -> value With a single iterable argument, return its smallest item. With two or more arguments, return the smallest argument. 39

Checkpoint: What do these statements evaluate as? min(“hello”) min(“ 983456”) min(“Peanut”) 40

Checkpoint: What do these statements evaluate as? min(“hello”) min(“ 983456”) min(“Peanut”) 40

Example 2: DNA Reverse Complement Algorithm A DNA molecule consists of two strands of

Example 2: DNA Reverse Complement Algorithm A DNA molecule consists of two strands of nucleotides. Each nucleotide is one of the four molecules adenine, guanine, thymine, or cytosine. Adenine always pairs with guanine and thymine always pairs with cytosine. A pair of matched nucleotides is called a base pair 41 Task: write a Python

Scrabble letter scores Different languages should have different scores for the letters how do

Scrabble letter scores Different languages should have different scores for the letters how do you work this out? 42 what is the algorithm?

Related Example: Calculating character (base) frequency DNA has the alphabet ACGT Base. Frequency. py

Related Example: Calculating character (base) frequency DNA has the alphabet ACGT Base. Frequency. py 43

Why would you want to do this? You can calculate the melting temperature of

Why would you want to do this? You can calculate the melting temperature of DNA from the base pair percentage in a DNA References: 44 Breslauer et al. Proc. Natl. Acad. Sci. USA 83, 37463750 Baldino et al. Methods in Enzymol. 168, 761 -777).

Input/Output as string manipulation evaluates a string as a Python expression. Very general and

Input/Output as string manipulation evaluates a string as a Python expression. Very general and can be used to turn strings into nearly any other Python data type The “Swiss army knife” of string conversion eval("3+4") Can also use Python numeric type conversion functions: int(“ 4”) float(“ 4”) But string must be a numeric literal of the appropriate form, or will get an error Can also convert numbers to strings with str 45 function

String formatting with format The built-in s. format() method is used to perform string

String formatting with format The built-in s. format() method is used to perform string formatting. The {} are slots show where the values will go. You can “name” the values, or access them by their position (counting from zero). >>> a = "Your name is {0} and your age is {age}" >>> a. format("Mike", age=40) 'Your name is Mike and your age is 40' 46

Example 4: Better output for Calculating character (base) frequency Base. Frequency 2. py 47

Example 4: Better output for Calculating character (base) frequency Base. Frequency 2. py 47

More on format You can add an optional format specifier to each placeholder using

More on format You can add an optional format specifier to each placeholder using a colon (: ) to specify column widths, decimal places, and alignment. general format is: [[fill[align]][sign][0][width] [. precision][type] where each part enclosed in [] is optional. The width specifier specifies the minimum field width to use the align specifier is one of '<', '>’, or '^' for left, right, and centered alignment within the field. An optional fill character fill is used to pad the space 48

More on format For example: name = "Elwood" r = "{0: <10}". format(name) #

More on format For example: name = "Elwood" r = "{0: <10}". format(name) # r = 'Elwood ' r = "{0: >10}". format(name) # r = ' Elwood' r = "{0: ^10}". format(name) # r = ' Elwood ' r = "{0: =^10}". format(name) # r = '==Elwood==‘ 49

format: type specifier indicates the type of data. 50

format: type specifier indicates the type of data. 50

More on format The precision part supplies the number of digits of accuracy to

More on format The precision part supplies the number of digits of accuracy to use for decimals. If a leading '0' is added to the field width for numbers, numeric values are padded with leading 0 s to fill the space. x = 42 r = '{0: 10 d}'. format(x) # r = ' 42' r = '{0: 10 x}'. format(x) # r = ' 2 a' r = '{0: 10 b}'. format(x) # r = ' 101010' r = '{0: 010 b}'. format(x) # r = '0000101010' y = 3. 1415926 r = '{0: 10. 2 f}'. format(y) # r = ' 3. 14’ r = '{0: 10. 2 e}'. format(y) # r = ' 3. 14 e+00' r = '{0: +10. 2 f}'. format(y) # r = ' +3. 14' r = '{0: +010. 2 f}'. format(y) # r = '+000003. 14' r = '{0: +10. 2%}'. format(y) # r = ' +314. 16%' 51

Example: Format. Eg. py 52

Example: Format. Eg. py 52

Checkpoint: Write down the exact output for the following code txt="{name}-{0}*{y}+{1}” print(txt. format("cat", "dog",

Checkpoint: Write down the exact output for the following code txt="{name}-{0}*{y}+{1}” print(txt. format("cat", "dog", name=”hat", y="rat")) print(txt. format(1, 0, name=2, y=3)) print(txt. format(2, 3)) 53

Format to improve formatting Base. Frequency 2. py restuarant 2. py 54

Format to improve formatting Base. Frequency 2. py restuarant 2. py 54