COSC 1306 COMPUTER SCIENCE AND PROGRAMMING JehanFranois Pris
- Slides: 100
COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris jfparis@uh. edu Fall 2016
THE ONLINE BOOK CHAPTER XI FILES
Chapter Overview n We will learn how to read, create and modify files ¨ Essential if we want to store our program inputs and results. ¨ Pay special attention to pickled files n They are very easy to use!
Accessing file contents n n Two step process: ¨ First we open the file ¨ Then we access its contents n read n write When we are done, we close the file
What happens at open() time? n The system verifies ¨ That you are an authorized user ¨ That you have the right permission n Read permission n Write permission n Execute permission exists but doesn’t apply and returns a file handle /file descriptor
The file handle n Gives the user ¨ Fast direct access to the file n No folder lookups ¨ Authority to execute the file operations whose permissions have been requested
Python open() n open(name, mode = 'r', buffering where ¨ name is name of file ¨ mode is permission requested n Default is 'r' for read only ¨ buffering specifies the buffer size n Use system default value (code -1) = -1)
The modes n Can request ¨ 'r' for read-only ¨ 'w' for write-only n Always overwrites the file ¨ 'a' for append n Writes at the end ¨ 'r+' or 'a+' for updating (read + write/append)
Examples n f 1 = open("myfile. txt") same as f 1 = open("myfile. txt", "r") n f 2 = open("test\sample. txt", "r") n f 3 = open("test/sample. txt", "r") n f 4 = open("C: \Users\Jehan-Francois Paris\Documents\Courses\1306\Python myfile. txt")
The file system n n n Provides long term storage of information. Will store data in stable storage (disk) Cannot be RAM because: ¨ Dynamic RAM loses its contents when powered off ¨ Static RAM is too expensive ¨ System crashes can corrupt contents of the main memory
Overall organization n n Data managed by the file system are grouped in user-defined data sets called files The file system must provide a mechanism for naming these data ¨ Each file system has its own set of conventions ¨ All modern operating systems use a hierarchical directory structure
Windows solution n n Each device and each disk partition is identified by a letter ¨ A: and B: were used by the floppy drives ¨ C: is the first disk partition of the hard drive ¨ If hard drive has no other disk partition, D: denotes the DVD drive Each device and each disk partition has its own hierarchy of folders
Windows solution Second disk Flash drive F: D: C: Users Windo ws Program Files
Linux organization n n Inherited from Unix Each device and disk partition has its own directory tree ¨ Disk partitions are glued together through the operation to form a single tree n Typical user does not know where her files are stored ¨ Uses "/" as a separator
UNIX/LINUX organization Root partition / Other partition usr bi n The magic mount Second partition can be accessed as /usr
Mac OS organization n Similar to Windows ¨ Disk partitions are not merged ¨ Represented by separate icons on the desktop
Accessing a file (I) n n Your Python programs are stored in a folder AKA directory ¨ On my home PC it is C: UsersJehan-Francois ParisDocuments Courses1306Python All files in that folder can be directly accessed through their names ¨ "myfile. txt"
The root Users J. -F. Paris Documents Courses1306Pythonx. txt Courses 1306Pythonx. txt 1306 Pythonx. txt Python x. txt
Accessing a file (II) n Files in folders inside that folder—subfolders —can be accessed by specifying first the subfolder ¨ Windows style: n "test\sample. txt" ¨ Note the double backslash ¨ Linux/Unix/Mac OS X style: n "test/sample. txt" ¨ Generally works for Windows
Why the double backslash? n The backslash is an escape character in Python ¨ Combines with its successor to represent non -printable characters n ‘n’ represents a newline n ‘t’ represents a tab ¨ Must use ‘\’ to represent a plain backslash
Accessing a file (III) n For other files, must use full pathname ¨ Windows Style: "C: \Users\Jehan-Francois Paris\ Documents\Courses\1306\Python\ myfile. txt" ¨ Linux and Mac: n "/Users/Jehan-Francois Paris/ Documents/Courses/1306/Python/ myfile. txt"
Reading a file n Four ways: ¨ Line by line ¨ Global reads ¨ Within a while loop n Also works with other languages ¨ Pickled files
Line-by-line reads for line in fh : # special for loop #anything you want fh. close() # optional
Example n f 3 = open("test/sample. txt", "r") for line in f 3 : print(line) f 3. close() # optional
Output n To be or not to be that is the question Now is the winter of our discontent ¨ With one or more extra blank lines
Why? n n Each line ends with newline print(…) adds an extra newline
Trying to remove blank lines print('-----') f 5 = open("test/sample. txt", "r") for line in f 5 : print(line[: -1]) # remove last char f 5. close() # optional print('------')
The output -----To be or not to be that is the question Now is the winter of our disconten -----n The last line did not end with an newline!
A smarter solution (I) n Only remove the last character if it is an newline ¨ if line[-1] == 'n' : print(line[: -1] else print line
A smarter solution (II) print('-------') fh = open("test/sample. txt", "r") for line in fh : if line[-1] == 'n' : print(line[: -1]) # remove last char else : print(line) print('------') fh. close() # optional
It works! -----To be or not to be that is the question Now is the winter of our discontent -------
We can do better n Use the rstrip() Python method ¨ astring. rstrip() remove all trailing spaces from astring ¨ astring. rstrip('n') remove all trailing newlines from astring
Examples
The simplest solution print('-------') fh = open("test/sample. txt", "r") for line in fh : print(line. rstrip('n') print('------') fh. close() # optional This will remove all trailing newlines even the ones we should keep
Global reads n fh. read() ¨ Returns whole contents of file specified by file handle fh ¨ File contents are stored in a single string that might be very large
Example f 2 = open("test\sample. txt", "r") bigstring = f 2. read() print(bigstring) f 2. close() # optional
Output of example To be or not to be that is the question Now is the winter of our discontent ¨ Exact contents of file ‘testsample. txt’ followed by an extra return
fh. read() and fh. read(n) n fh. read() reads in the whole fh file and returns its contents as a single string n fh. read(n) reads the next n bytes of file fh
Reading within a loop n Standard method for C/C++ infile = open("test sample. txt", "r") line = infile. readline() # priming read while line : # false if empty print(line. rstrip("n") line = infile. readline() infile. close()
Making sense of file contents n n Most files contain more than one data item per line ¨ COSC 713 -743 -3350 UHPD 713 -743 -3333 Must split lines ¨ mystring. split(sepchar) where sepchar is a separation character n returns a list of items
Splitting strings >>> txt = "Four score and seven years ago" >>> txt. split() ['Four', 'score', 'and', 'seven', 'years', 'ago'] >>>record ="1, 'Baker, Andy', 83, 89, 85" >>> record. split(', ') [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!
Example # how 2 split. py print('-----') fh = open("test/sample. txt", "r") for line in fh : words = line. split() for xxx in words : print(xxx) fh. close() # optional print('-----')
Output n ----To be … of our discontent ----- Spurious newlines are gone
Standard way to access a file # preprocessing # set up counters, strings and lists fh = open("input. txt", "r") for line in fh : words = line. split(sepchar) # often space for xxx in words : # do something fh. close() # optional # postprocessing # print results
Example n List of expenditures with dates: ¨ Rent 11/2/16 $850 Latte 11/2/16 $4. 50 Food 11/2/16 $35. 47 Latte 11/3/16 $4. 50 Outing 11/4/16 $27. 00 n Want to know how much money was spent on latte
First attempt Read line by line n Will split all lines such as ¨ "Food 11/2/16 $35. 47" into ¨ ["Food", "11/2/16", "$35. 47"] n Will use first and last entries of each linelist n
First attempt total = 0 # set up accumulator fh = open("expenses. txt", "r") for line in fh : words = line. split(" ") if words[0] == 'Latte' : total += words[2] # increment fh. close() # optional print("you spent %. 2 f on latte" % total) n It does not work!
Second attempt n n Must first remove the offending '$' Must also convert string to float def price 2 float(s) : """ remove leading dollar sign""" if s[0] == "$" : returns float(s[1: ]) else : return float(s)
Second attempt total = 0 # set up accumulator fh = open("expenses. txt", "r") for line in fh : words = line. split(" ") if words[0] == 'Latte' : total += price 2 float(words[2]) fh. close() # optional print("You spent $%. 2 f on latte" % total) You spent $13. 50 on latte
Picking the right separator (I) n Commas ¨ CSV Excel format n Values are separated by commas n Strings are stored without quotes ¨Unless they contain a comma § “Doe, Jane”, freshman, 90 ¨Quotes within strings are doubled
Picking the right separator (II) n Tabs( ‘t’) ¨ Advantages: n Your fields will appear nicely aligned n Spaces, commas, … are not an issue ¨ Disadvantage: n You do not see them ¨They look like spaces
Why it is important n When you must pick your file format, you should decide how the data inside the file will be used: ¨ People will read them ¨ Other programs will use them ¨ Will be used by people and machines
An exercise n Converting tab-separated data to CSV format ¨ Replacing tabs by commas n Easy ¨Will use string replace function
Possible input lines Alice 85 92 95 Doe, Jane 87 88 90 Doe, John 82 91 77 Kingsman, Edward "Ted" 75 87 89
First attempt fh_in = open('grades. txt', 'r') buffer = fh_in. read() newbuffer = buffer. replace('t', ', ') fh_out = open('grades 0. csv', 'w') fh_out. write(newbuffer) fh_in. close() fh_out. close() print('Done!')
The output n n Alice 90 90 90 Bob 85 85 85 Carol 75 75 75 becomes Alice, 90, 90, 90 Bob, 85, 85, 85 Carol, 75, 75, 75 90 85 75
Dealing with commas (I) n n Work line by line For each line ¨ split input into fields using TAB as separator ¨ store fields into a list n Alice 90 90 becomes [‘Alice’, ’ 90’, ’ 90’] 90
Dealing with commas (II) ¨ Put within double quotes any entry containing one or more commas ¨ Output list entries separated by commas n ['"Baker, Ann"', 90, 90, 90] which will become later "Baker, Ann", 90, 90, 90
Dealing with commas (III) n Our troubles are not over: ¨ Must store somewhere all lines until we are done ¨ Store them in a list
Dealing with double quotes n Before wrapping items with commas with double quotes replace ¨ All double quotes by pairs of double quotes n 'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"'
Order matters (I) n We must double the inside double quotes before wrapping the string into double quotes; ¨ From 'Aguirre, "Lalo" Eduardo' go to 'Aguirre, ""Lalo"" Eduardo' then to '"Aguirre, ""Lalo"" Eduardo"'
Order matters (II) n Otherwise; ¨ We go from 'Aguirre, "Lalo" Eduardo' to '"Aguirre, "Lalo" Eduardo"' then to '""Aguirre, ""Lalo"" Eduardo""' with all double quotes doubled
General organization (I) linelist = [] # new file contents for line in file : itemlist = line. split(…) linestring = '' # start with empty line for item in itemlist : remove any trailing newline double all double quotes if item contains comma, wrap add to linestring append linestring to stringlist
General organization (II) for line in file … remove last comma of linestring add newline at end of linestring append linestring to stringlist for linestring in in stringline write linestring into output file
The program (I) # betterconvert 2 csv. py """ Convert tab-separated file to csv """ fh = open('grades. txt', 'r') #input file linelist = [ ] # global data structure for line in fh : # process an input line itemlist = line. split('t') # print(str(itemlist)) # for debugging linestring = '' # start afresh
The program (II) for item in itemlist : # process item # double all double quotes item = item. replace('"', '""') if item[-1] == 'n' : # remove it item = item[: -1] if ', ' in item : # wrap item linestring += '"' + item +'"' # just append linestring += item +', ' # end of item loop
The program (III) # replace last comma by newlinestring = linestring[: -1] + 'n' linelist. append(linestring) # end of line loop fh. close() fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()
Notes n n n Most print statements used for debugging were removed ¨ Space considerations Observe that the inner loop adds a comma after each item ¨ Wanted to remove the last one Must also add a newline at end of each line
The input file Alice 90 90 90 Bob 85 85 Carol 75 75 75 Doe, Jane 90 90 90 Fulano, Eduardo "Lalo" 90 85 75 80 90 90 75 70 90 90 90
The output file n Alice, 90, 90, 90 Bob, 85, 85, 85 Carol , 75, 75, 75 "Doe, Jane", 90, 90 , 80 , 75 "Fulano, Eduardo ""Lalo""", 90, 90
Mistakes being made (I) n Mixing lists and strings: ¨ Earlier draft of program declared n linestring = [ ] and did n linestring. append(item) ¨ Outcome was n ['Alice, ', '90, '. … ] instead of n 'Alice, 90, …'
Mistakes being made (II) n n Forgetting to add a newline ¨ Output was a single line Doing the append inside the inner loop: ¨ Output was n Alice, 90, 90 …
Mistakes being made n Forgetting that strings are immutable: ¨ Trying to do n linestring[-1] = 'n' instead of n linestring = linestring[: -1] + 'n' ¨ Bigger issue: n Do we have to remove the last comma?
Could we have done better? (I) n Make the program more readable by decomposing it into functions ¨ A function to process each line of input n do_line(line) ¨Input is a string ending with newline ¨Output is a string in CSV format ¨Should call a function processing individual items
Could we have done better? (II) ¨A function to process individual items n do_item(item) ¨Input is a string ¨Returns a string § With double quotes "doubled" § Without a newline § Within quotes if it contains a comma
The new program (I) def do_item(item) : item = item. replace('"', '""') if item[-1] == 'n' : item = item[: -1] if ', ' in item : item ='"' + item +'"' return item
The new program (II) def do_line(line) : itemlist = line. split('t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +', ' if linestring != '' and linestring[-1] == ', ' : linestring = linestring [: -1] linestring += 'n' return linestring
The new program (III) fh = open('grades. txt', 'r') linelist = [ ] for line in fh : linelist. append(do_line(line)) fh. close() fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()
Why it is better n Program is decomposed into small modules that are much easier to understand ¨ Each fits on a Power. Point slide
The break statement n n Makes the program exit the loop it is in In next example, we are looking for first instance of a string in a file ¨ Can exit as soon it is found
Example (I) search. Str= input('Enter search string: ') found = False fh = open('grades. txt') for line in fh : if search. Str in line : print(line) found = True break
Example (II) if found == True : print("String %s was found" % search. Str) else : print("String %s NOT found " % search. Str)
Flags n A variable like found ¨ That can either be True or False ¨ That is used in a condition for an if or a while is often referred to as a flag
PICKLED FILES (NOT ON THE QUIZ)
Pickled files n import pickle ¨ Provides a way to save complex data structures in a file ¨ Sometimes said to provide a serialized representation of Python objects
Basic primitives (I) n dump(object, fh) ¨ appends a sequential representation of object into file with file handle fh ¨ object is virtually any Python object ¨ fh is the handle of a file that must have been opened in 'wb' mode b is a special option allowing to write or read binary data
Basic primitives (II) n target = load( filehandle) ¨ assigns to target next pickled object stored in filehandle ¨ target is virtually any Python object ¨ filehandle is the filehandle of a file that was opened in rb mode
Example (I) >>> [2, >>> >>> mylist = [ 2, 'Apples', 5, 'Oranges'] mylist 'Apples', 5, 'Oranges'] fh = open('afile', 'wb') # b = BINARY import pickle. dump(mylist, fh) fh. close()
Example (II) >>> fhh = open('afile', 'rb') # b = BINARY >>> theirlist = pickle. load(fhh) >>> theirlist [2, 'Apples', 5, 'Oranges'] >>> theirlist == mylist True
What was stored in testfile? n Some binary data containing the strings 'Apples' and 'Oranges'
Using ASCII format n n n Can require a pickled representation of objects that only contains printable characters ¨ Must specify protocol = 0 Advantage: ¨ Easier to debug Disadvantage: ¨ Takes more space
Example import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile. txt', 'wb') pickle. dump(mydict, fh, protocol = 0) fh. close() fhh = open('asciifile. txt', 'rb') theirdict = pickle. load(fhh) print(mydict) print(theirdict)
The output {'Bob': 27, 'Alice': 22}
What is inside asciifile. txt? n (dp 0 VBobp 1 L 27 Ls. VAlicep 2 L 22 Ls.
Dumping multiple objects (I) import pickle fh = open('asciifile. txt', 'wb') for k in range(3, 6) : mylist = [i for i in range(1, k)] print(mylist) pickle. dump(mylist, fh, protocol = 0) fh. close()
Dumping multiple objects (II) fhh = open('asciifile. txt', 'rb') lists = [ ] # initializing list of lists while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break fhh. close() print(lists)
Dumping multiple objects (III) n Note the way we test for end-of-file (EOF) ¨ while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break
The output n [1, 2] [1, 2, 3, 4] [[1, 2], [1, 2, 3, 4]]
What is inside asciifile. txt? n (lp 0 L 1 La. L 2 La. L 3 La. (lp 0 L 1 La. L 2 L a. L 3 La. L 4 La.
Practical considerations n You rarely pick the format of your input files ¨ May have to do format conversion n You often have to use specific formats for you output files ¨ Often dictated by program that will use them n Otherwise stick with pickled files!
- Cosc 1306
- Cosc 1306
- Cosc 1306
- Ssd 1306
- Music 1306
- Vallath nandakumar
- Python programming an introduction to computer science
- Science
- Cosc 4p61
- Cosc 4p42
- Cosc 3p91
- Cosc 4368
- Cosc 4p41
- Cosc 4p41
- Cosc
- Cosc 3340
- Cosc 320
- Cosc parameters
- Cosc 4p41
- Adt functional programming
- Cosc 3340
- 8088 pinout
- Cosc 121
- Cosc 1p02
- Cosc 2p12
- Cosc 3p92
- Cosc 3p92
- Cosc 121
- Cosc 121
- Lc3 traps
- Cosc 121
- Cosc -cos d
- Cosc 3p94
- Cosc101
- Introduction to the theory of computation
- Cosc 3340
- Albuminkorrigerat calcium
- Propofol infusion syndrome
- Sio helse lege
- Prisanalyse
- Tretak pris
- Hva er nn as
- Konkurransemidler
- Siemens synco pris
- Ipma certificering pris
- Jordbundsanalyse pris
- Biofyringsolje pris pr liter
- Målsatt pris
- Ibs attest
- Kartoffelpulp pris
- Udnyttelse af loftrum århus
- Løsøreforsikring pris
- Gpu pris
- Boligventilationsvarmepumper
- Varemerkeregistrering pris
- Slidetodoc
- Ferdiggarasjer priser
- Technicolor tg234 pris
- Målsatt pris
- Forvekselbarhet på varemerke
- Perbedaan linear programming dan integer programming
- Greedy vs dynamic
- System programming vs application programming
- Linear vs integer programming
- Programing adalah
- Concepts techniques and models of computer programming
- Four special cases in linear programming
- Management science linear programming problems
- Simplex
- Nocti study guide
- Nano control memory
- Language a
- Types of variables in computer programming
- Programming raster display system in computer graphics
- Computerite
- Computer programming chapter 1
- Part programming
- Discrete mathematics with applications susanna s. epp
- Computer programming chapter 1
- Computer programming chapter 1
- Computer programming with matlab
- Decision making in computer programming
- Computer organization
- Programming fundamentals 1
- Natural science and social science similarities
- Science fusion introduction to science and technology
- Hard and soft science
- Computer science input and output
- Difference between ba and bs in computer science
- Ucf college of engineering and computer science
- Erik jonsson school of engineering and computer science
- Computer science and engineering unr
- Ucla eecs
- Computer science for innovators and makers
- Erik jonsson school of engineering and computer science
- Erik jonsson school of engineering and computer science
- 3 components of computer system
- Difference between a computer and computer system
- The large program that controls how the cpu communicates
- Computer architecture and organisation
- Natural vs social science