COSC 1306COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS JehanFranois
- Slides: 95
COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS Jehan-François Pâris jfparis@uh. edu
Module Overview • We will learn how to read, create and modify files – Pay special attention to pickled files • They are very easy to use!
The file system • Provides long term storage of information. • Will store data in stable storage (disk) • Cannot be RAM because: – Dynamic RAM loses its contents when powered off – Static RAM is too expensive – System crashes can corrupt contents of the main memory
Overall organization • Data managed by the file system are grouped in user-defined data sets called files • The file system must provide a mechanism for naming these data – Each file system has its own set of conventions – All modern operating systems use a hierarchical directory structure
Windows solution • Each device and each disk partition is identified by a letter – A: and B: were used by the floppy drives – C: is the first disk partition of the hard drive – If hard drive has no other disk partition, D: denotes the DVD drive
Windows solution Second disk Flash drive F: D: C: Users Windo ws Program Files
UNIX/LINUX organization • Each device and disk partition has its own directory tree – Disk partitions are glued together through the operation to form a single tree • Typical user does not know where her files are stored
UNIX/LINUX organization Root partition / Other partition usr bi n The magic mount Second partition can be accessed as /usr
Mac OS organization • Similar to Windows – Disk partitions are not merged – Represented by separate icons on the desktop
Accessing a file (I) • Your Python programs are stored in a folder AKA directory – On my home PC it is C: UsersJehan-Francois ParisDocuments Courses1306Python • All files in that directory can be directly accessed through their names – "myfile. txt"
Accessing a file (II) • Files in subdirectories can be accessed by specifying first the subdirectory – Windows style: • "test\sample. txt" –Note the double backslash – Linux/Unix/Mac OS X style: • "test/sample. txt" –Generally works for Windows
Why the double backslash? • The backslash is an escape character in Python – Combines with its successor to represent non-printable characters • ‘n’ represents a newline • ‘t’ represents a tab – Must use ‘\’ to represent a plain backslash
Accessing a file (III) • For other files, must use full pathname – Windows Style: • "C: \Users\Jehan-Francois Paris\ Documents\Courses\1306\Pyt hon\ myfile. txt"
Accessing file contents • Two step process: – First we open the file – Then we access its contents • Read • Write • When we are done, we close the file.
What happens at open() time? • The system verifies – That you are an authorized user – That you have the right permission • Read permission • Write permission • Execute permission exists but doesn’t apply and returns a file handle /file
The file handle • Gives the user – Direct access to the file • No directory lookups – Authority to execute the file operations whose permissions have been requested
Python open() • open(name, mode = ‘r’, buffering = 1) where – name is name of file – mode is permission requested • Default is ‘r’ for read only – buffering specifies the buffer size
The modes • Can request – ‘r’ for read-only – ‘w’ for write-only • Always overwrites the file – ‘a’ for append • Writes at the end – ‘r+’ or ‘a+’ for updating (read + write/append)
Examples • f 1 = open("myfile. txt") same as f 1 = open("myfile. txt", "r") • f 2 = open("test\sample. txt", "r") • f 3 = open("test/sample. txt", "r") • f 4 = open("C: \Users\Jehan-Francois Paris\ Documents\Courses\1306\Python\myfi
Reading a file • Three ways: – Global reads – Line by line – Pickled files
Global reads • fh. read() – Returns whole contents of file specified by file handle fh – File contents are stored in a single string that might be very large
Example • f 2 = open("test\sample. txt", "r") bigstring = f 2. read() print(bigstring) f 2. close() # not required
Output of example • To be or not to be that is the question Now is the winter of our discontent – Exact contents of file ‘testsample. txt’
Line-by-line reads • for line in fh : # do not forget the column #anything you want fh. close() # not required
Example • f 3 = open("test/sample. txt", "r") for line in f 3 : # do not forget the column print(line) f 3. close() # not required
Output • To be or not to be that is the question Now is the winter of our discontent – With one or more extra blank lines
Why? • Each line ends with an end-of-line marker • print(…) adds an extra end-of-line
Trying to remove blank lines • print('--------------------------') f 5 = open("test/sample. txt", "r") for line in f 5 : # do not forget the column print(line[: -1]) # remove last char f 5. close() # not required print('--------------------------')
The output • --------------------------To be or not to be that is the question Now is the winter of our disconten -------------------------- • The last line did not end with an EOL!
A smarter solution (I) • Only remove the last character if it is an EOL – if line[-1] == ‘n’ : print(line[: -1] else print line
A smarter solution (II) • print('--------------------------') fh = open("test/sample. txt", "r") for line in fh : # do not forget the column if line[-1] == 'n' : print(line[: -1]) # remove last char else : print(line)
It works! • --------------------------To be or not to be that is the question Now is the winter of our discontent ---------------------------
Making sense of file contents • Most files contain more than one data item per line – COSC 713 -743 -3350 UHPD 713 -743 -3333 • Must split lines – mystring. split(sepchar) where sepchar is a separation character • returns a list of items
Splitting strings • >>> text = "Four score and seven years ago" >>> text. split() ['Four', 'score', 'and', 'seven', 'years', 'ago'] • >>>record ="1, 'Baker, Andy', 83, 89, 85" >>> record. split(', ') Not what we wanted! [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85']
Example # how 2 split. py print('--------------------------') f 5 = open("test/sample. txt", "r") for line in f 5 : words = line. split() for xxx in words : print(xxx) f 5. close() # not required print('------------------------
Other separators (I) • Commas – CSV Excel format • Values are separated by commas • Strings are stored without quotes –Unless they contain a comma • “Doe, Jane”, freshman, 90 –Quotes within strings are doubled
Other separators (II) • Tabs( ‘t’) – Advantages: • Your fields will appear nicely aligned • Spaces, commas, … are not an issue – Disadvantage: • You do not see them –They look like spaces
Why it is important • When you must pick your file format, you should decide how the data inside the file will be used: – People will read them – Other programs will use them – Will be used by people and machines
An exercise • Converting our output to CSV format – Replacing tabs by commas • Easy –Will use string replace function
First attempt • fh_in = open('grades. txt', 'r') # the 'r' is optional buffer = fh_in. read() newbuffer = buffer. replace('t', ', ') fh_out = open('grades 0. csv', 'w') fh_out. write(newbuffer) fh_in. close() fh_out. close() print('Done!')
The output • Alice 90 90 Bob 85 85 Carol 75 75 becomes • Alice, 90, 90, 90 Bob, 85, 85, 85 Carol, 75, 75, 75 90 85 75
Dealing with commas (I) • Work line by line • For each line – split input into fields using TAB as separator – store fields into a list • Alice 90 90 90 becomes [‘Alice’, ’ 90’, ’ 90’]
Dealing with commas (II) – Put within double quotes any entry containing one or more commas – Output list entries separated by commas • ['"Baker, Alice"', 90, 90, 90] becomes "Baker, Alice", 90, 90, 90
Dealing with commas (III) • Our troubles are not over: – Must store somewhere all lines until we are done – Store them in a list
Dealing with double quotes • Before wrapping items with commas with double quotes replace – All double quotes by pairs of double quotes – 'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"'
General organization (I) • linelist = [ ] • for line in file – itemlist = line. split(…) – linestring = '' # empty string – for each item in itemlist • remove any trailing newline • double all double quotes • if item contains comma, wrap • add to linestring
General organization (II) • for line in file –… – for each item in itemlist • double all double quotes • if item contains comma, wrap • add to linestring – append linestring to stringlist
General organization (III) • for line in file –… – remove last comma of linestring – add newline at end of linestring – append linestring to stringlist • for linestring in in stringline – write linestring into output file
The program (I) • # betterconvert 2 csv. py """ Convert tab-separated file to csv """ fh = open('grades. txt', 'r') #input file linelist = [ ] # global data structure for line in fh : # outer loop itemlist = line. split('t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh
The program (II) • for item in itemlist : #inner loop item = item. replace('"', '""') # for quotes if item[-1] == 'n' : # remove it item = item[: -1] if ', ' in item : # wrap item linestring += '"' + item +'"' + ', ' else : # just append linestring += item +', ' # end of inside for loop
The program (III) • # must replace last comma by newlinestring = linestring[: -1] + 'n' linelist. append(linestring) # end of outside for loop fh. close() fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()
Notes • Most print statements used for debugging were removed – Space considerations • Observe that the inner loop adds a comma after each item – Wanted to remove the last one • Must also add a newline at end of each line
The input file • Alice 90 90 Bob 85 85 Carol 75 75 Doe, Jane 90 90 90 Fulano, Eduardo "Lalo" 90 90 85 75 80 90 70 90 90
The output file • Alice, 90, 90, 90 Bob, 85, 85, 85 Carol , 75, 75, 75 "Doe, Jane", 90 , 80 , 75 "Fulano, Eduardo ""Lalo""", 90, 90
Mistakes being made (I) • Mixing lists and strings: – Earlier draft of program declared • linestring = [ ] and did • linestring. append(item) – Outcome was • ['Alice, ', '90, '. … ] instead of • 'Alice, 90, …'
Mistakes being made (II) • Forgetting to add a newline – Output was a single line • Doing the append inside the inner loop: – Output was • Alice, 90, 90 …
Mistakes being made • Forgetting that strings are immutable: – Trying to do • linestring[-1] = 'n' instead of • linestring = linestring[: -1] + 'n' – Bigger issue: • Do we have to remove the last
Could we have done better? (I) • Make the program more readable by decomposing it into functions – A function to process each line of input • do_line(line) –Input is a string ending with newline –Output is a string in CSV format –Should call a function processing
Could we have done better? (II) – A function to process individual items • do_item(item) –Input is a string –Returns a string • With double quotes "doubled" • Without a newline • Within quotes if it contains a comma
The new program (I) • def do_item(item) : item = item. replace('"', '""') if item[-1] == 'n' : item = item[: -1] if ', ' in item : item ='"' + item +'"' return item
The new program (II) • def do_line(line) : itemlist = line. split('t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +', ' linestring += 'n' return linestring
The new program (III) • fh = open('grades. txt', 'r') linelist = [ ] for line in fh : linelist. append(do_line(line)) fh. close()
The new program (IV) • fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()
Why it is better • Program is decomposed into small modules that are much easier to understand – Each fits on a Power. Point slide
The break statement • Makes the program exit the loop it is in • In next example, we are looking for first instance of a string in a file – Can exit as soon it is found
Example (I) • searchstring= input('Enter search string: ') found = False fh = open('grades. txt') for line in fh : if searchstring in line : print(line) found = True break
Example (II) • if found == True : print("String %s was found" % searchstring) else : print("String %s NOT found " % searchstring)
Flags • A variable like found – That can either be True or False – That is used in a condition for an if or a while is often referred to as a flag
A dumb mistake • • Unlike C and its family of languages, Python does not let you write – if found = True for – if found == True There are still cases where we can do mistakes!
HANDLING EXCEPTIONS
When a wrong value is entered • When user is prompted for – number = int(input("Enter a number: ") and enters – a non-numerical string a Value. Error exception is raised and the program terminates • Python a programs catch errors
The try… except pair (I) • try: <statements being tried> except Exception as ex: <statements catching the exception> • Observe – the colons – the indentation
The try… except pair (II) • try: <statements being tried> except Exception as ex: <statements catching the exception> • If an exception occurs while the program executes the statements between the try and the except, control is immediately transferred to the statements after the
A better example • done = False while not done : filename= input("Enter a file name: ") try : fh = open(filename) done = True except Exception as ex: print ('File %s does not exist' % filename) print(fh. read())
An Example (I) • done = False while not done : try : number = int(input('Enter a number: ')) done = True except Exception as ex: print ('You did not enter a number') print ("You entered %. 2 f. " % number) input("Hit enter when done with
A simpler solution • done = False while not done myinput = (input('Enter a number: ')) if myinput. isdigit() : number = int(myinput) done = True else : print ('You did not enter a number') print ("You entered %. 2 f. " % number)
PICKLED FILES
Pickled files • import pickle – Provides a way to save complex data structures in a file – Sometimes said to provide a serialized representation of Python objects
Basic primitives (I) • dump(object, fh) – appends a sequential representation of object into file with file handle fh – object is virtually any Python object – fh is the handle of a file that must have been opened in 'wb' mode b is a special option allowing to write or read binary data
Basic primitives (II) • target = load( filehandle) – assigns to target next pickled object stored in filehandle – target is virtually any Python object – filehandle id filehandle of a file that was opened in rb mode
Example (I) • >>> mylist = [ 2, 'Apples', 5, 'Oranges'] • >>> mylist [2, 'Apples', 5, 'Oranges'] • >>> fh = open('testfile', 'wb') # b is for BINARY • >>> import pickle • >>> pickle. dump(mylist, fh) • >>> fh. close()
Example (II) • >>> fhh = open('testfile', 'rb') # b is for BINARY • >>> theirlist = pickle. load(fhh) • >>> theirlist [2, 'Apples', 5, 'Oranges'] • >>> theirlist == mylist True
What was stored in testfile? • Some binary data containing the strings 'Apples' and 'Oranges'
Using ASCII format • Can require a pickled representation of objects that only contains printable characters – Must specify protocol = 0 • Advantage: – Easier to debug • Disadvantage: – Takes more space
Example • import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile. txt', 'wb') # MUST be 'wb' pickle. dump(mydict, fh, protocol = 0) fh. close() fhh = open('asciifile. txt', 'rb') theirdict = pickle. load(fhh) print(mydict) print(theirdict)
The output • {'Bob': 27, 'Alice': 22}
What is inside asciifile. txt? • (dp 0 VBobp 1 L 27 Ls. VAlicep 2 L 22 Ls.
Dumping multiple objects (I) • import pickle fh = open('asciifile. txt', 'wb') for k in range(3, 6) : mylist = [i for i in range(1, k)] print(mylist) pickle. dump(mylist, fh, protocol = 0) fh. close()
Dumping multiple objects (II) • fhh = open('asciifile. txt', 'rb') lists = [ ] # initializing list of lists while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break fhh. close() print(lists)
Dumping multiple objects (III) • Note the way we test for end-of-file (EOF) – while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break
The output • [1, 2] [1, 2, 3, 4] [[1, 2], [1, 2, 3, 4]]
What is inside asciifile. txt? • (lp 0 L 1 La. L 2 La. L 3 La. (lp 0 L 1 La. L 2 La. L 3 La. L 4 La.
Practical considerations • You rarely pick the format of your input files – May have to do format conversion • You often have to use specific formats for you output files – Often dictated by program that will use them • Otherwise stick with pickled files!
- Python programming an introduction to computer science
- What is your favorite subject ?
- Rapid gui programming with python and qt
- Cosc 4p61
- Cosc 4p42
- Cosc 3p91
- Cosc 1306
- Cosc 1306
- Cosc 4368
- Cosc 4p41
- Cosc 4p41
- Cosc
- Cosc 3340
- Cosc 320
- Cosc parameters
- Cosc 2p12
- Cosc 4p41
- Cosc 3340
- 8088 pinout
- Cosc 121
- Cosc 1p02
- Cosc 2p12
- Cosc 3p92
- 1 bit alu
- Cosc 121
- Cosc 121
- Lc3 trap
- Cosc 121
- Cosc -cos d
- Cosc 1306
- Cosc 3p94
- Cosc101
- Is etm recognizable
- Cosc 3340
- Wiki
- Python chapter 5
- Second order cone programming python
- Python programming in context
- Constraint programming python
- Python programming in context
- Python cgi programming examples
- Python audio programming
- Python str
- Programming essentials in python
- Python blackjack oop
- Pure function and modifiers in python
- Pure function and modifiers in python
- Perbedaan linear programming dan integer programming
- Greedy vs dynamic
- What is system programing
- Integer programming vs linear programming
- Definisi linear
- Python methods vs functions
- Python lists functions
- Python intrinsic functions
- Fruitful function example in python
- Fruitful functions in python
- Types of functions in programming
- Management science linear programming problems
- Vallath nandakumar
- How to get zj in simplex method
- How to solve evaluating functions
- Evaluating functions and operations on functions
- Rapid change
- Science fusion digital lessons
- Hard science and soft science
- Absolute value as a piecewise function
- Natural vs social science
- Natural science vs physical science
- Natural science vs physical science
- Applied science vs pure science
- Tragedy of the commons
- Windcube lidar
- Difference between tuple and list
- Stack queue python
- Python stack and queue
- Single slash and double slash in python
- Append vs extend python
- Python history and features
- Topping d10b
- Xor in python
- Fruitful and non fruitful function in python
- Py4e chapter 4
- Variables expressions and statements in python
- Python spirit
- Difference between range and xrange in python
- Yield vs return python
- Computational thinking algorithms and programming
- Echo client server
- Generic programming and the stl
- Real-time systems and programming languages
- Cs 421 programming languages and compilers
- Java introduction to problem solving and programming
- Dynamic programming vs divide and conquer
- Dynamic programming vs divide and conquer
- Sic programming examples