COSC 1306COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS JehanFranois

  • Slides: 95
Download presentation
COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS Jehan-François Pâris jfparis@uh. edu

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS Jehan-François Pâris jfparis@uh. edu

Module Overview • We will learn how to read, create and modify files –

Module Overview • We will learn how to read, create and modify files – Pay special attention to pickled files • They are very easy to use!

The file system • Provides long term storage of information. • Will store data

The file system • Provides long term storage of information. • Will store data in stable storage (disk) • Cannot be RAM because: – Dynamic RAM loses its contents when powered off – Static RAM is too expensive – System crashes can corrupt contents of the main memory

Overall organization • Data managed by the file system are grouped in user-defined data

Overall organization • Data managed by the file system are grouped in user-defined data sets called files • The file system must provide a mechanism for naming these data – Each file system has its own set of conventions – All modern operating systems use a hierarchical directory structure

Windows solution • Each device and each disk partition is identified by a letter

Windows solution • Each device and each disk partition is identified by a letter – A: and B: were used by the floppy drives – C: is the first disk partition of the hard drive – If hard drive has no other disk partition, D: denotes the DVD drive

Windows solution Second disk Flash drive F: D: C: Users Windo ws Program Files

Windows solution Second disk Flash drive F: D: C: Users Windo ws Program Files

UNIX/LINUX organization • Each device and disk partition has its own directory tree –

UNIX/LINUX organization • Each device and disk partition has its own directory tree – Disk partitions are glued together through the operation to form a single tree • Typical user does not know where her files are stored

UNIX/LINUX organization Root partition / Other partition usr bi n The magic mount Second

UNIX/LINUX organization Root partition / Other partition usr bi n The magic mount Second partition can be accessed as /usr

Mac OS organization • Similar to Windows – Disk partitions are not merged –

Mac OS organization • Similar to Windows – Disk partitions are not merged – Represented by separate icons on the desktop

Accessing a file (I) • Your Python programs are stored in a folder AKA

Accessing a file (I) • Your Python programs are stored in a folder AKA directory – On my home PC it is C: UsersJehan-Francois ParisDocuments Courses1306Python • All files in that directory can be directly accessed through their names – "myfile. txt"

Accessing a file (II) • Files in subdirectories can be accessed by specifying first

Accessing a file (II) • Files in subdirectories can be accessed by specifying first the subdirectory – Windows style: • "test\sample. txt" –Note the double backslash – Linux/Unix/Mac OS X style: • "test/sample. txt" –Generally works for Windows

Why the double backslash? • The backslash is an escape character in Python –

Why the double backslash? • The backslash is an escape character in Python – Combines with its successor to represent non-printable characters • ‘n’ represents a newline • ‘t’ represents a tab – Must use ‘\’ to represent a plain backslash

Accessing a file (III) • For other files, must use full pathname – Windows

Accessing a file (III) • For other files, must use full pathname – Windows Style: • "C: \Users\Jehan-Francois Paris\ Documents\Courses\1306\Pyt hon\ myfile. txt"

Accessing file contents • Two step process: – First we open the file –

Accessing file contents • Two step process: – First we open the file – Then we access its contents • Read • Write • When we are done, we close the file.

What happens at open() time? • The system verifies – That you are an

What happens at open() time? • The system verifies – That you are an authorized user – That you have the right permission • Read permission • Write permission • Execute permission exists but doesn’t apply and returns a file handle /file

The file handle • Gives the user – Direct access to the file •

The file handle • Gives the user – Direct access to the file • No directory lookups – Authority to execute the file operations whose permissions have been requested

Python open() • open(name, mode = ‘r’, buffering = 1) where – name is

Python open() • open(name, mode = ‘r’, buffering = 1) where – name is name of file – mode is permission requested • Default is ‘r’ for read only – buffering specifies the buffer size

The modes • Can request – ‘r’ for read-only – ‘w’ for write-only •

The modes • Can request – ‘r’ for read-only – ‘w’ for write-only • Always overwrites the file – ‘a’ for append • Writes at the end – ‘r+’ or ‘a+’ for updating (read + write/append)

Examples • f 1 = open("myfile. txt") same as f 1 = open("myfile. txt",

Examples • f 1 = open("myfile. txt") same as f 1 = open("myfile. txt", "r") • f 2 = open("test\sample. txt", "r") • f 3 = open("test/sample. txt", "r") • f 4 = open("C: \Users\Jehan-Francois Paris\ Documents\Courses\1306\Python\myfi

Reading a file • Three ways: – Global reads – Line by line –

Reading a file • Three ways: – Global reads – Line by line – Pickled files

Global reads • fh. read() – Returns whole contents of file specified by file

Global reads • fh. read() – Returns whole contents of file specified by file handle fh – File contents are stored in a single string that might be very large

Example • f 2 = open("test\sample. txt", "r") bigstring = f 2. read() print(bigstring)

Example • f 2 = open("test\sample. txt", "r") bigstring = f 2. read() print(bigstring) f 2. close() # not required

Output of example • To be or not to be that is the question

Output of example • To be or not to be that is the question Now is the winter of our discontent – Exact contents of file ‘testsample. txt’

Line-by-line reads • for line in fh : # do not forget the column

Line-by-line reads • for line in fh : # do not forget the column #anything you want fh. close() # not required

Example • f 3 = open("test/sample. txt", "r") for line in f 3 :

Example • f 3 = open("test/sample. txt", "r") for line in f 3 : # do not forget the column print(line) f 3. close() # not required

Output • To be or not to be that is the question Now is

Output • To be or not to be that is the question Now is the winter of our discontent – With one or more extra blank lines

Why? • Each line ends with an end-of-line marker • print(…) adds an extra

Why? • Each line ends with an end-of-line marker • print(…) adds an extra end-of-line

Trying to remove blank lines • print('--------------------------') f 5 = open("test/sample. txt", "r") for

Trying to remove blank lines • print('--------------------------') f 5 = open("test/sample. txt", "r") for line in f 5 : # do not forget the column print(line[: -1]) # remove last char f 5. close() # not required print('--------------------------')

The output • --------------------------To be or not to be that is the question Now

The output • --------------------------To be or not to be that is the question Now is the winter of our disconten -------------------------- • The last line did not end with an EOL!

A smarter solution (I) • Only remove the last character if it is an

A smarter solution (I) • Only remove the last character if it is an EOL – if line[-1] == ‘n’ : print(line[: -1] else print line

A smarter solution (II) • print('--------------------------') fh = open("test/sample. txt", "r") for line in

A smarter solution (II) • print('--------------------------') fh = open("test/sample. txt", "r") for line in fh : # do not forget the column if line[-1] == 'n' : print(line[: -1]) # remove last char else : print(line)

It works! • --------------------------To be or not to be that is the question Now

It works! • --------------------------To be or not to be that is the question Now is the winter of our discontent ---------------------------

Making sense of file contents • Most files contain more than one data item

Making sense of file contents • Most files contain more than one data item per line – COSC 713 -743 -3350 UHPD 713 -743 -3333 • Must split lines – mystring. split(sepchar) where sepchar is a separation character • returns a list of items

Splitting strings • >>> text = "Four score and seven years ago" >>> text.

Splitting strings • >>> text = "Four score and seven years ago" >>> text. split() ['Four', 'score', 'and', 'seven', 'years', 'ago'] • >>>record ="1, 'Baker, Andy', 83, 89, 85" >>> record. split(', ') Not what we wanted! [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85']

Example # how 2 split. py print('--------------------------') f 5 = open("test/sample. txt", "r") for

Example # how 2 split. py print('--------------------------') f 5 = open("test/sample. txt", "r") for line in f 5 : words = line. split() for xxx in words : print(xxx) f 5. close() # not required print('------------------------

Other separators (I) • Commas – CSV Excel format • Values are separated by

Other separators (I) • Commas – CSV Excel format • Values are separated by commas • Strings are stored without quotes –Unless they contain a comma • “Doe, Jane”, freshman, 90 –Quotes within strings are doubled

Other separators (II) • Tabs( ‘t’) – Advantages: • Your fields will appear nicely

Other separators (II) • Tabs( ‘t’) – Advantages: • Your fields will appear nicely aligned • Spaces, commas, … are not an issue – Disadvantage: • You do not see them –They look like spaces

Why it is important • When you must pick your file format, you should

Why it is important • When you must pick your file format, you should decide how the data inside the file will be used: – People will read them – Other programs will use them – Will be used by people and machines

An exercise • Converting our output to CSV format – Replacing tabs by commas

An exercise • Converting our output to CSV format – Replacing tabs by commas • Easy –Will use string replace function

First attempt • fh_in = open('grades. txt', 'r') # the 'r' is optional buffer

First attempt • fh_in = open('grades. txt', 'r') # the 'r' is optional buffer = fh_in. read() newbuffer = buffer. replace('t', ', ') fh_out = open('grades 0. csv', 'w') fh_out. write(newbuffer) fh_in. close() fh_out. close() print('Done!')

The output • Alice 90 90 Bob 85 85 Carol 75 75 becomes •

The output • Alice 90 90 Bob 85 85 Carol 75 75 becomes • Alice, 90, 90, 90 Bob, 85, 85, 85 Carol, 75, 75, 75 90 85 75

Dealing with commas (I) • Work line by line • For each line –

Dealing with commas (I) • Work line by line • For each line – split input into fields using TAB as separator – store fields into a list • Alice 90 90 90 becomes [‘Alice’, ’ 90’, ’ 90’]

Dealing with commas (II) – Put within double quotes any entry containing one or

Dealing with commas (II) – Put within double quotes any entry containing one or more commas – Output list entries separated by commas • ['"Baker, Alice"', 90, 90, 90] becomes "Baker, Alice", 90, 90, 90

Dealing with commas (III) • Our troubles are not over: – Must store somewhere

Dealing with commas (III) • Our troubles are not over: – Must store somewhere all lines until we are done – Store them in a list

Dealing with double quotes • Before wrapping items with commas with double quotes replace

Dealing with double quotes • Before wrapping items with commas with double quotes replace – All double quotes by pairs of double quotes – 'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"'

General organization (I) • linelist = [ ] • for line in file –

General organization (I) • linelist = [ ] • for line in file – itemlist = line. split(…) – linestring = '' # empty string – for each item in itemlist • remove any trailing newline • double all double quotes • if item contains comma, wrap • add to linestring

General organization (II) • for line in file –… – for each item in

General organization (II) • for line in file –… – for each item in itemlist • double all double quotes • if item contains comma, wrap • add to linestring – append linestring to stringlist

General organization (III) • for line in file –… – remove last comma of

General organization (III) • for line in file –… – remove last comma of linestring – add newline at end of linestring – append linestring to stringlist • for linestring in in stringline – write linestring into output file

The program (I) • # betterconvert 2 csv. py """ Convert tab-separated file to

The program (I) • # betterconvert 2 csv. py """ Convert tab-separated file to csv """ fh = open('grades. txt', 'r') #input file linelist = [ ] # global data structure for line in fh : # outer loop itemlist = line. split('t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh

The program (II) • for item in itemlist : #inner loop item = item.

The program (II) • for item in itemlist : #inner loop item = item. replace('"', '""') # for quotes if item[-1] == 'n' : # remove it item = item[: -1] if ', ' in item : # wrap item linestring += '"' + item +'"' + ', ' else : # just append linestring += item +', ' # end of inside for loop

The program (III) • # must replace last comma by newlinestring = linestring[: -1]

The program (III) • # must replace last comma by newlinestring = linestring[: -1] + 'n' linelist. append(linestring) # end of outside for loop fh. close() fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()

Notes • Most print statements used for debugging were removed – Space considerations •

Notes • Most print statements used for debugging were removed – Space considerations • Observe that the inner loop adds a comma after each item – Wanted to remove the last one • Must also add a newline at end of each line

The input file • Alice 90 90 Bob 85 85 Carol 75 75 Doe,

The input file • Alice 90 90 Bob 85 85 Carol 75 75 Doe, Jane 90 90 90 Fulano, Eduardo "Lalo" 90 90 85 75 80 90 70 90 90

The output file • Alice, 90, 90, 90 Bob, 85, 85, 85 Carol ,

The output file • Alice, 90, 90, 90 Bob, 85, 85, 85 Carol , 75, 75, 75 "Doe, Jane", 90 , 80 , 75 "Fulano, Eduardo ""Lalo""", 90, 90

Mistakes being made (I) • Mixing lists and strings: – Earlier draft of program

Mistakes being made (I) • Mixing lists and strings: – Earlier draft of program declared • linestring = [ ] and did • linestring. append(item) – Outcome was • ['Alice, ', '90, '. … ] instead of • 'Alice, 90, …'

Mistakes being made (II) • Forgetting to add a newline – Output was a

Mistakes being made (II) • Forgetting to add a newline – Output was a single line • Doing the append inside the inner loop: – Output was • Alice, 90, 90 …

Mistakes being made • Forgetting that strings are immutable: – Trying to do •

Mistakes being made • Forgetting that strings are immutable: – Trying to do • linestring[-1] = 'n' instead of • linestring = linestring[: -1] + 'n' – Bigger issue: • Do we have to remove the last

Could we have done better? (I) • Make the program more readable by decomposing

Could we have done better? (I) • Make the program more readable by decomposing it into functions – A function to process each line of input • do_line(line) –Input is a string ending with newline –Output is a string in CSV format –Should call a function processing

Could we have done better? (II) – A function to process individual items •

Could we have done better? (II) – A function to process individual items • do_item(item) –Input is a string –Returns a string • With double quotes "doubled" • Without a newline • Within quotes if it contains a comma

The new program (I) • def do_item(item) : item = item. replace('"', '""') if

The new program (I) • def do_item(item) : item = item. replace('"', '""') if item[-1] == 'n' : item = item[: -1] if ', ' in item : item ='"' + item +'"' return item

The new program (II) • def do_line(line) : itemlist = line. split('t') linestring =

The new program (II) • def do_line(line) : itemlist = line. split('t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +', ' linestring += 'n' return linestring

The new program (III) • fh = open('grades. txt', 'r') linelist = [ ]

The new program (III) • fh = open('grades. txt', 'r') linelist = [ ] for line in fh : linelist. append(do_line(line)) fh. close()

The new program (IV) • fhh = open('great. csv', 'w') for line in linelist

The new program (IV) • fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()

Why it is better • Program is decomposed into small modules that are much

Why it is better • Program is decomposed into small modules that are much easier to understand – Each fits on a Power. Point slide

The break statement • Makes the program exit the loop it is in •

The break statement • Makes the program exit the loop it is in • In next example, we are looking for first instance of a string in a file – Can exit as soon it is found

Example (I) • searchstring= input('Enter search string: ') found = False fh = open('grades.

Example (I) • searchstring= input('Enter search string: ') found = False fh = open('grades. txt') for line in fh : if searchstring in line : print(line) found = True break

Example (II) • if found == True : print("String %s was found" % searchstring)

Example (II) • if found == True : print("String %s was found" % searchstring) else : print("String %s NOT found " % searchstring)

Flags • A variable like found – That can either be True or False

Flags • A variable like found – That can either be True or False – That is used in a condition for an if or a while is often referred to as a flag

A dumb mistake • • Unlike C and its family of languages, Python does

A dumb mistake • • Unlike C and its family of languages, Python does not let you write – if found = True for – if found == True There are still cases where we can do mistakes!

HANDLING EXCEPTIONS

HANDLING EXCEPTIONS

When a wrong value is entered • When user is prompted for – number

When a wrong value is entered • When user is prompted for – number = int(input("Enter a number: ") and enters – a non-numerical string a Value. Error exception is raised and the program terminates • Python a programs catch errors

The try… except pair (I) • try: <statements being tried> except Exception as ex:

The try… except pair (I) • try: <statements being tried> except Exception as ex: <statements catching the exception> • Observe – the colons – the indentation

The try… except pair (II) • try: <statements being tried> except Exception as ex:

The try… except pair (II) • try: <statements being tried> except Exception as ex: <statements catching the exception> • If an exception occurs while the program executes the statements between the try and the except, control is immediately transferred to the statements after the

A better example • done = False while not done : filename= input("Enter a

A better example • done = False while not done : filename= input("Enter a file name: ") try : fh = open(filename) done = True except Exception as ex: print ('File %s does not exist' % filename) print(fh. read())

An Example (I) • done = False while not done : try : number

An Example (I) • done = False while not done : try : number = int(input('Enter a number: ')) done = True except Exception as ex: print ('You did not enter a number') print ("You entered %. 2 f. " % number) input("Hit enter when done with

A simpler solution • done = False while not done myinput = (input('Enter a

A simpler solution • done = False while not done myinput = (input('Enter a number: ')) if myinput. isdigit() : number = int(myinput) done = True else : print ('You did not enter a number') print ("You entered %. 2 f. " % number)

PICKLED FILES

PICKLED FILES

Pickled files • import pickle – Provides a way to save complex data structures

Pickled files • import pickle – Provides a way to save complex data structures in a file – Sometimes said to provide a serialized representation of Python objects

Basic primitives (I) • dump(object, fh) – appends a sequential representation of object into

Basic primitives (I) • dump(object, fh) – appends a sequential representation of object into file with file handle fh – object is virtually any Python object – fh is the handle of a file that must have been opened in 'wb' mode b is a special option allowing to write or read binary data

Basic primitives (II) • target = load( filehandle) – assigns to target next pickled

Basic primitives (II) • target = load( filehandle) – assigns to target next pickled object stored in filehandle – target is virtually any Python object – filehandle id filehandle of a file that was opened in rb mode

Example (I) • >>> mylist = [ 2, 'Apples', 5, 'Oranges'] • >>> mylist

Example (I) • >>> mylist = [ 2, 'Apples', 5, 'Oranges'] • >>> mylist [2, 'Apples', 5, 'Oranges'] • >>> fh = open('testfile', 'wb') # b is for BINARY • >>> import pickle • >>> pickle. dump(mylist, fh) • >>> fh. close()

Example (II) • >>> fhh = open('testfile', 'rb') # b is for BINARY •

Example (II) • >>> fhh = open('testfile', 'rb') # b is for BINARY • >>> theirlist = pickle. load(fhh) • >>> theirlist [2, 'Apples', 5, 'Oranges'] • >>> theirlist == mylist True

What was stored in testfile? • Some binary data containing the strings 'Apples' and

What was stored in testfile? • Some binary data containing the strings 'Apples' and 'Oranges'

Using ASCII format • Can require a pickled representation of objects that only contains

Using ASCII format • Can require a pickled representation of objects that only contains printable characters – Must specify protocol = 0 • Advantage: – Easier to debug • Disadvantage: – Takes more space

Example • import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile.

Example • import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile. txt', 'wb') # MUST be 'wb' pickle. dump(mydict, fh, protocol = 0) fh. close() fhh = open('asciifile. txt', 'rb') theirdict = pickle. load(fhh) print(mydict) print(theirdict)

The output • {'Bob': 27, 'Alice': 22}

The output • {'Bob': 27, 'Alice': 22}

What is inside asciifile. txt? • (dp 0 VBobp 1 L 27 Ls. VAlicep

What is inside asciifile. txt? • (dp 0 VBobp 1 L 27 Ls. VAlicep 2 L 22 Ls.

Dumping multiple objects (I) • import pickle fh = open('asciifile. txt', 'wb') for k

Dumping multiple objects (I) • import pickle fh = open('asciifile. txt', 'wb') for k in range(3, 6) : mylist = [i for i in range(1, k)] print(mylist) pickle. dump(mylist, fh, protocol = 0) fh. close()

Dumping multiple objects (II) • fhh = open('asciifile. txt', 'rb') lists = [ ]

Dumping multiple objects (II) • fhh = open('asciifile. txt', 'rb') lists = [ ] # initializing list of lists while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break fhh. close() print(lists)

Dumping multiple objects (III) • Note the way we test for end-of-file (EOF) –

Dumping multiple objects (III) • Note the way we test for end-of-file (EOF) – while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break

The output • [1, 2] [1, 2, 3, 4] [[1, 2], [1, 2, 3,

The output • [1, 2] [1, 2, 3, 4] [[1, 2], [1, 2, 3, 4]]

What is inside asciifile. txt? • (lp 0 L 1 La. L 2 La.

What is inside asciifile. txt? • (lp 0 L 1 La. L 2 La. L 3 La. (lp 0 L 1 La. L 2 La. L 3 La. L 4 La.

Practical considerations • You rarely pick the format of your input files – May

Practical considerations • You rarely pick the format of your input files – May have to do format conversion • You often have to use specific formats for you output files – Often dictated by program that will use them • Otherwise stick with pickled files!