Strings By S P Siddique Ibrahim Data Structures
Strings By S. P. Siddique Ibrahim
Data Structures in Python Collection of values in a particular structure Some of the data structures available in python are Strings List Sets Tuples Dictionaries
Strings String is a collection of characters contained within quote marks. In Python, strings start and end with single or double quotes or triple quotes. >>> “python” ‘python’ >>> ‘python’ >>> "93895342858" '93895342858' >>> 'a' >>> 'fdsf 335'
>>> str='welcome to python' >>> str='welcome "to" python‘ –Wants double quotes >>> str 'welcome "to" python‘ >>> "This is Python's power“ -Wants single quotes "This is Python's power" >>> str='welcome to python' >>> str 'welcometopython'
Triple Quotes >>> str='''hellp klkdf dfldsjf fksldfl jlsdjklf''' >>> str 'hellpnklkdfndfldsjfnfksldflnjlsdjklf'
Contd. , >>> str='''help lmlmf sdfdsf dfd fdsf''' >>> str 'helplmlmfsdfdsfdfdfdsf'
Some more working examples >>> s='''i love "python" forever''' >>> s 'i love "python" forever' >>> s='''i love 'pythin' forever''' >>> s "i love 'pythin' forever" >>>
Defining strings Each string is stored in the computer’s memory as a list of characters. The characters in a string can be accessed one at a time through the index operator. First character of a string stored in 0 th position and last character of the string is stored at a position one less than that of the length of the string. >>> my. String = “GATTACA” my. String
Accessingle characters[positive and negative index] You can access individual characters by using indices in square brackets. Index starts from Zero It can be accessed using possitive and negative index >>> my. String = “GATTACA” >>> my. String[0] ‘G’ >>> my. String[1] ‘A’ >>> my. String[-1] Negative indices start at the end ‘A’ of the string and move left. >>> my. String[-2] ‘C’ >>> my. String[7] Traceback (most recent call last): File "<stdin>", line 1, in ? Index. Error: string index out of range
A Character Too Far • • You will get a python error if you attempt to index beyond the end of a string. So be careful when constructing index values and slices >>> zot = 'abc' >>> print zot[5] Traceback (most recent call last): File "<stdin>", line 1, in <module>Index. Error: string index out of range >>>
Basic Inbuild Python Function For String Min()- function return smallest character in a string Max()- function return largest character in a string Len()- return the number of characters in a string
Strings Have Length • There is a built-in function len that gives us the length of a string b a n a 0 1 2 3 4 5 >>> fruit = 'banana' >>> print len(fruit) 6
Min and max() function >>> check='kumaraguru' >>> min(check) 'a' >>> max(check) 'u'
String Slicing The slicing operator returns a subset of a string called slice by specifying two indices(i. e start and end) Syntax: String_variable[start_index : End_index] Example: >>> s="welcome to kct" >>> s[2: 10] 'lcome to'
Accessing substrings >>> my. String = “GATTACA” >>> my. String[1: 3] ‘AT’ >>> my. String[: 3] ‘GAT’ >>> my. String[4: ] ‘ACA’ >>> my. String[3: 5] ‘TA’ >>> my. String[: ] ‘GATTACA’
s[: : ] – Prints the entire string like s[: ] S[: : -1] – Display the string in reverse order S[-1: 0: -1] –Access the characters of a string from index -1
Special characters The backslash is used to introduce a special character. >>> "He said, "Wow!"" File "<stdin>", line 1 "He said, "Wow!"" ^ Syntax. Error: invalid syntax >>> "He said, 'Wow!'" >>> "He said, "Wow!"" 'He said, "Wow!"' Escape sequence \ Meaning Backslash ’ Single quote ” Double quote n Newline t Tab
Repeating (multiplying) strings using * You cannot perform regular arithmetic on numbers stored as characters, but you can "multiply" (repeat) strings >>> "3"*3 '333' >>> "5"*9 '55555'
More string functionality >>> len(“GATTACA”) 7 >>> “GAT” + “TACA” ‘GATTACA’ >>> “A” * 10 ‘AAAAA >>> “GAT” in “GATTACA” True >>> “AGT” in “GATTACA” False ← Length ← Concatenation ← Repeat ← Substring test
Immutable/Mutable objects Immutable objects Strings, Tuples Mutable objects Lists, dictionary, set
Strings are immutable Strings cannot be modified; instead, create a new one. >>> s = "GATTACA" >>> s[3] = "C" Traceback (most recent call last): File "<stdin>", line 1, in ? Type. Error: object doesn't support item assignment >>> s = s[: 3] + "C" + s[4: ] >>> s 'GATCACA' >>> s = s. replace("G", "U") >>> s 'UATCACA'
Strings are immutable • String methods do not modify the string; they return a new string. >>> sequence = “ACGT” >>> sequence. replace(“A”, “G”) ‘GCGT’ >>> print sequence ACGT >>> sequence = “ACGT” >>> new_sequence = sequence. replace(“A”, “G”) >>> print new_sequence GCGT
String with for and while loop s=“india” for ch in s: print(ch, end=“”) Output: india
Contd (Predict the output) s= “I Love python programming” for ch in range(0, len(s), 2): #Traverse each second char Print(s[ch], end=“”) OUTPUT:
Traversing with a while loop s=“Coimbatore” index=0 while index<len(s): Print(s[index], end=“”) Index=index+1
2. String Slicing with Step Size Previously you have studied to select a portion of a string. But how does a programmer select every second/third character from a string? This can be done using step size. Syntax: Variable_name of the_string[start_Index: End_Index: Step_size]
Example >>> s="kumaraguru college" >>> s[0: 6] 'kumara' >>> s[0: len(s): 4] 'krrog'
String Operations [1. String Comparison] Operators such as ==, <, >, <=, >= and != are used to compare the strings.
>>> s 1="arun" >>> s 2="ARUN" >>> s 1>s 2 True Explanation: Python compare the numaric value for each character. i. e ASCII value of ‘a’ is 97 and 65 for ‘A’. It means 97>65. Thus, it returns True. However, character by character comparison goes on till the end of the string.
>>> s 1==s 2 False >>> s 2="ARUN". lower() >>> s 2 'arun' >>> s 1==s 2 True
String methods In Python, a method is a function that is defined with respect to a particular object. The syntax is <object>. <method>(<parameters>) >>> dna = “ACGT” >>> dna. find(“T”) 3
String methods >>> "GATTACA". find("ATT") 1 >>> "GATTACA". count("T") 2 >>> "GATTACA". lower() 'gattaca' >>> "gattaca". upper() 'GATTACA' >>> "GATTACA". replace("G", "U") 'UATTACA‘ >>> "GATTACA". replace("C", "U") 'GATTAUA' >>> "GATTACA". replace("AT", "**") 'G**TACA' >>> "GATTACA". startswith("G") True >>> "GATTACA". startswith("g") False
String format() Python supports %s inside the print statement. Example: >>> " I am %s and I have completed my B. E Degree from %s"%("Arun", "Anna University") ' I am Arun and I have completed my B. E Degree from Anna University' >>> " I am %s and I have completed my B. E Degree from %s"("Arun", "Anna University") Traceback (most recent call last): File "<pyshell#16>", line 1, in <module> " I am %s and I have completed my B. E Degree from %s"("Arun", "Anna University") Type. Error: 'str' object is not callable
More complex formatting, Python 3 added string method called format() method. Here, Instead of % we can use {0}, {1} and so on. Syntax: template. format(P 0, P 1, ……Pn)
Example >>> "{} plus {} equals {}". format(4, 6, "Ten") '4 plus 6 equals Ten' >>> "[] plus [] equals =". format(2, 3, "Five") '[] plus [] equals =' >>> "I am {0} and I have completed my B. E Degree from {1}". format("Arun", "Anna University") 'I am Arun and I have completed my B. E Degree from Anna University'
Explanation: The empty {} are replaced with the argument in order. The first {} curly bracket is replaced with the first argument and so on. By default, the index of the first argument in format always start from zero. We can also give a position of arguments inside the curly brackets.
>>> "I am {1} and I have completed my B. E Degree from {0}". format("Arun", "Anna University") 'I am Anna University and I have completed my B. E Degree from Arun' >>> "I am {0} and I have completed my B. E Degree from {2}". format("Arun", "Anna University") Traceback (most recent call last): File "<pyshell#22>", line 1, in <module> "I am {0} and I have completed my B. E Degree from {2}". format("Arun", "Anna University") Index. Error: tuple index out of range
Keyword argument and format() method We can also insert text within curly braces along with numeric indexes. However, the text has to match keyword arguments passed to the format() method. Example: >>> "I am {0} years exp. in python. but not in {c} programming". format(2, c="C++") 'I am 2 years exp. in python. but not in C++ programming'
The Split () method It returns a list of all the words in a string. It is used to break up a string into smaller strings Example: >>> str=" The computer programming languages like C C++ jave python" >>> str. split() ['The', 'computer', 'programming', 'languages', 'like', 'C++', 'jave', 'python']
String summary Basic string operations: S = "AATTGG" s 1 + s 2 * 3 s 2[i] s 2[x: y] len(S) int(S) # or use float(S) # assignment - or use single quotes ' ' # concatenate # repeat string # index character at position 'i' # index a substring # get length of string # turn a string into an integer or floating point decimal Methods: S. upper() S. lower() S. count(substring) S. replace(old, new) S. find(substring) S. startswith(substring), S. endswith(substring) Printing: print var 1, var 2, var 3 print "text", var 1, "text" # print multiple variables # print a combination of explicit text (strings) and variables
Sample problem #1 Write a program called dna 2 rna. py that reads a DNA sequence from the first command line argument, and then prints it as an RNA sequence. Make sure it works for both uppercase and lowercase input. > python dna 2 rna. py AGTCAGT ACUCAGU > python dna 2 rna. py actcagt acucagu > python dna 2 rna. py ACTCagt ACUCagu First get it working just for uppercase letters.
Two solutions import sys sequence = sys. argv[1] new_sequence = sequence. replace(“T”, “U”) newer_sequence = new_sequence. replace(“t”, “u”) print newer_sequence import sys print sys. argv[1]
Two solutions import sys sequence = sys. argv[1] new_sequence = sequence. replace(“T”, “U”) newer_sequence = new_sequence. replace(“t”, “u”) print newer_sequence import sys print sys. argv[1]. replace(“T”, “U”)
Two solutions import sys sequence = sys. argv[1] new_sequence = sequence. replace(“T”, “U”) newer_sequence = new_sequence. replace(“t”, “u”) print newer_sequence import sys print sys. argv[1]. replace(“T”, “U”). replace(“t”, “u”) It is legal (but not always desirable) to chain together multiple methods on a single line.
Sample problem #2 Write a program get-codons. py that reads the first command line argument as a DNA sequence and prints the first three codons, one per line, in uppercase letters. > python get-codons. py TTGCAGTCG TTG CAG TCG > python get-codons. py TTGCAGTCGATC TTG CAG TCG > python get-codons. py tcgac TCG ATC GAC
Solution #2 import sys sequence = sys. argv[1] upper_sequence = sequence. upper() print upper_sequence[: 3] print upper_sequence[3: 6] print upper_sequence[6: 9]
Sample problem #3 (optional) Write a program that reads a protein sequence as a command line argument and prints the location of the first cysteine residue. > python find-cysteine. py MNDLSGKTVIITGGARGLGAEAARQAVAAGARVVLADVLDEEGAATARELGDAARYQHLDVTI EEDWQRVCAYAREEFGSVDGL 70 > python find-cysteine. py MNDLSGKTVIITGGARGLGAEAARQAVAAGARVVLADVLDEEGAATARELGDAARYQHLDVTI EEDWQRVVAYAREEFGSVDGL -1
Solution #3 import sys protein = sys. argv[1] upper_protein = protein. upper() print upper_protein. find(“C”)
Reading Chapters 5 and 8 of Learning Python by Lutz.
- Slides: 53