COSC 1306 COMPUTER SCIENCE AND PROGRAMMING JehanFranois Pris

COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris jfparis@uh. edu Fall 2016

Chapter Overview n We will learn how to read, create and modify files ¨

Accessing file contents n n Two step process: ¨ First we open the file

What happens at open() time? n The system verifies ¨ That you are an

The file handle n Gives the user ¨ Fast direct access to the file

Python open() n open(name, mode = 'r', buffering where ¨ name is name of

The modes n Can request ¨ 'r' for read-only ¨ 'w' for write-only n

Examples n f 1 = open("myfile. txt") same as f 1 = open("myfile. txt",

The file system n n n Provides long term storage of information. Will store

Overall organization n n Data managed by the file system are grouped in user-defined

Windows solution n n Each device and each disk partition is identified by a

Windows solution Second disk Flash drive F: D: C: Users Windo ws Program Files

Linux organization n n Inherited from Unix Each device and disk partition has its

UNIX/LINUX organization Root partition / Other partition usr bi n The magic mount Second

Mac OS organization n Similar to Windows ¨ Disk partitions are not merged ¨

Accessing a file (I) n n Your Python programs are stored in a folder

The root Users J. -F. Paris Documents Courses1306Pythonx. txt Courses 1306Pythonx. txt 1306 Pythonx.

Accessing a file (II) n Files in folders inside that folder—subfolders —can be accessed

Why the double backslash? n The backslash is an escape character in Python ¨

Accessing a file (III) n For other files, must use full pathname ¨ Windows

Reading a file n Four ways: ¨ Line by line ¨ Global reads ¨

Line-by-line reads for line in fh : # special for loop #anything you want

Example n f 3 = open("test/sample. txt", "r") for line in f 3 :

Output n To be or not to be that is the question Now is

Why? n n Each line ends with newline print(…) adds an extra newline

Trying to remove blank lines print('-----') f 5 = open("test/sample. txt", "r") for line

The output -----To be or not to be that is the question Now is

A smarter solution (I) n Only remove the last character if it is an

A smarter solution (II) print('-------') fh = open("test/sample. txt", "r") for line in fh

It works! -----To be or not to be that is the question Now is

We can do better n Use the rstrip() Python method ¨ astring. rstrip() remove

The simplest solution print('-------') fh = open("test/sample. txt", "r") for line in fh :

Global reads n fh. read() ¨ Returns whole contents of file specified by file

$Example f 2 = open("test\sample. txt", "r") bigstring = f 2. read() print(bigstring) f$

Output of example To be or not to be that is the question Now

fh. read() and fh. read(n) n fh. read() reads in the whole fh file

Reading within a loop n Standard method for C/C++ infile = open("test sample. txt",

Making sense of file contents n n Most files contain more than one data

Splitting strings >>> txt = "Four score and seven years ago" >>> txt. split()

Example # how 2 split. py print('-----') fh = open("test/sample. txt", "r") for line

Output n ----To be … of our discontent ----- Spurious newlines are gone

Standard way to access a file # preprocessing # set up counters, strings and

Example n List of expenditures with dates: ¨ Rent 11/2/16 $850 Latte 11/2/16 $4.

First attempt Read line by line n Will split all lines such as ¨

First attempt total = 0 # set up accumulator fh = open("expenses. txt", "r")

Second attempt n n Must first remove the offending '$' Must also convert string

Second attempt total = 0 # set up accumulator fh = open("expenses. txt", "r")

Picking the right separator (I) n Commas ¨ CSV Excel format n Values are

Picking the right separator (II) n Tabs( ‘t’) ¨ Advantages: n Your fields will

Why it is important n When you must pick your file format, you should

An exercise n Converting tab-separated data to CSV format ¨ Replacing tabs by commas

Possible input lines Alice 85 92 95 Doe, Jane 87 88 90 Doe, John

First attempt fh_in = open('grades. txt', 'r') buffer = fh_in. read() newbuffer = buffer.

The output n n Alice 90 90 90 Bob 85 85 85 Carol 75

Dealing with commas (I) n n Work line by line For each line ¨

Dealing with commas (II) ¨ Put within double quotes any entry containing one or

Dealing with commas (III) n Our troubles are not over: ¨ Must store somewhere

Dealing with double quotes n Before wrapping items with commas with double quotes replace

Order matters (I) n We must double the inside double quotes before wrapping the

Order matters (II) n Otherwise; ¨ We go from 'Aguirre, "Lalo" Eduardo' to '"Aguirre,

General organization (I) linelist = [] # new file contents for line in file

General organization (II) for line in file … remove last comma of linestring add

The program (I) # betterconvert 2 csv. py """ Convert tab-separated file to csv

The program (II) for item in itemlist : # process item # double all

The program (III) # replace last comma by newlinestring = linestring[: -1] + 'n'

Notes n n n Most print statements used for debugging were removed ¨ Space

The input file Alice 90 90 90 Bob 85 85 Carol 75 75 75

The output file n Alice, 90, 90, 90 Bob, 85, 85, 85 Carol ,

Mistakes being made (I) n Mixing lists and strings: ¨ Earlier draft of program

Mistakes being made (II) n n Forgetting to add a newline ¨ Output was

Mistakes being made n Forgetting that strings are immutable: ¨ Trying to do n

Could we have done better? (I) n Make the program more readable by decomposing

Could we have done better? (II) ¨A function to process individual items n do_item(item)

The new program (I) def do_item(item) : item = item. replace('"', '""') if item[-1]

The new program (II) def do_line(line) : itemlist = line. split('t') linestring = ''

The new program (III) fh = open('grades. txt', 'r') linelist = [ ] for

Why it is better n Program is decomposed into small modules that are much

The break statement n n Makes the program exit the loop it is in

Example (I) search. Str= input('Enter search string: ') found = False fh = open('grades.

Example (II) if found == True : print("String %s was found" % search. Str)

Flags n A variable like found ¨ That can either be True or False

Pickled files n import pickle ¨ Provides a way to save complex data structures

Basic primitives (I) n dump(object, fh) ¨ appends a sequential representation of object into

Basic primitives (II) n target = load( filehandle) ¨ assigns to target next pickled

Example (I) >>> [2, >>> >>> mylist = [ 2, 'Apples', 5, 'Oranges'] mylist

Example (II) >>> fhh = open('afile', 'rb') # b = BINARY >>> theirlist =

What was stored in testfile? n Some binary data containing the strings 'Apples' and

Using ASCII format n n n Can require a pickled representation of objects that

Example import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile. txt',

What is inside asciifile. txt? n (dp 0 VBobp 1 L 27 Ls. VAlicep

Dumping multiple objects (I) import pickle fh = open('asciifile. txt', 'wb') for k in

Dumping multiple objects (II) fhh = open('asciifile. txt', 'rb') lists = [ ] #

Dumping multiple objects (III) n Note the way we test for end-of-file (EOF) ¨

The output n [1, 2] [1, 2, 3, 4] [[1, 2], [1, 2, 3,

What is inside asciifile. txt? n (lp 0 L 1 La. L 2 La.

Practical considerations n You rarely pick the format of your input files ¨ May

Slides: 100

Download presentation

COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris jfparis@uh. edu Fall 2016

THE ONLINE BOOK CHAPTER XI FILES

Chapter Overview n We will learn how to read, create and modify files ¨ Essential if we want to store our program inputs and results. ¨ Pay special attention to pickled files n They are very easy to use!

Accessing file contents n n Two step process: ¨ First we open the file ¨ Then we access its contents n read n write When we are done, we close the file

What happens at open() time? n The system verifies ¨ That you are an authorized user ¨ That you have the right permission n Read permission n Write permission n Execute permission exists but doesn’t apply and returns a file handle /file descriptor

The file handle n Gives the user ¨ Fast direct access to the file n No folder lookups ¨ Authority to execute the file operations whose permissions have been requested

Python open() n open(name, mode = 'r', buffering where ¨ name is name of file ¨ mode is permission requested n Default is 'r' for read only ¨ buffering specifies the buffer size n Use system default value (code -1) = -1)

The modes n Can request ¨ 'r' for read-only ¨ 'w' for write-only n Always overwrites the file ¨ 'a' for append n Writes at the end ¨ 'r+' or 'a+' for updating (read + write/append)

Examples n f 1 = open("myfile. txt") same as f 1 = open("myfile. txt", "r") n f 2 = open("test\sample. txt", "r") n f 3 = open("test/sample. txt", "r") n f 4 = open("C: \Users\Jehan-Francois Paris\Documents\Courses\1306\Python myfile. txt")

The file system n n n Provides long term storage of information. Will store data in stable storage (disk) Cannot be RAM because: ¨ Dynamic RAM loses its contents when powered off ¨ Static RAM is too expensive ¨ System crashes can corrupt contents of the main memory

Overall organization n n Data managed by the file system are grouped in user-defined data sets called files The file system must provide a mechanism for naming these data ¨ Each file system has its own set of conventions ¨ All modern operating systems use a hierarchical directory structure

Windows solution n n Each device and each disk partition is identified by a letter ¨ A: and B: were used by the floppy drives ¨ C: is the first disk partition of the hard drive ¨ If hard drive has no other disk partition, D: denotes the DVD drive Each device and each disk partition has its own hierarchy of folders

Windows solution Second disk Flash drive F: D: C: Users Windo ws Program Files

Linux organization n n Inherited from Unix Each device and disk partition has its own directory tree ¨ Disk partitions are glued together through the operation to form a single tree n Typical user does not know where her files are stored ¨ Uses "/" as a separator

UNIX/LINUX organization Root partition / Other partition usr bi n The magic mount Second partition can be accessed as /usr

Mac OS organization n Similar to Windows ¨ Disk partitions are not merged ¨ Represented by separate icons on the desktop

Accessing a file (I) n n Your Python programs are stored in a folder AKA directory ¨ On my home PC it is C: UsersJehan-Francois ParisDocuments Courses1306Python All files in that folder can be directly accessed through their names ¨ "myfile. txt"

The root Users J. -F. Paris Documents Courses1306Pythonx. txt Courses 1306Pythonx. txt 1306 Pythonx. txt Python x. txt

Accessing a file (II) n Files in folders inside that folder—subfolders —can be accessed by specifying first the subfolder ¨ Windows style: n "test\sample. txt" ¨ Note the double backslash ¨ Linux/Unix/Mac OS X style: n "test/sample. txt" ¨ Generally works for Windows

Why the double backslash? n The backslash is an escape character in Python ¨ Combines with its successor to represent non -printable characters n ‘n’ represents a newline n ‘t’ represents a tab ¨ Must use ‘\’ to represent a plain backslash

Accessing a file (III) n For other files, must use full pathname ¨ Windows Style: "C: \Users\Jehan-Francois Paris\ Documents\Courses\1306\Python\ myfile. txt" ¨ Linux and Mac: n "/Users/Jehan-Francois Paris/ Documents/Courses/1306/Python/ myfile. txt"

Reading a file n Four ways: ¨ Line by line ¨ Global reads ¨ Within a while loop n Also works with other languages ¨ Pickled files

Line-by-line reads for line in fh : # special for loop #anything you want fh. close() # optional

Example n f 3 = open("test/sample. txt", "r") for line in f 3 : print(line) f 3. close() # optional

Output n To be or not to be that is the question Now is the winter of our discontent ¨ With one or more extra blank lines

Why? n n Each line ends with newline print(…) adds an extra newline

Trying to remove blank lines print('-----') f 5 = open("test/sample. txt", "r") for line in f 5 : print(line[: -1]) # remove last char f 5. close() # optional print('------')

The output -----To be or not to be that is the question Now is the winter of our disconten -----n The last line did not end with an newline!

A smarter solution (I) n Only remove the last character if it is an newline ¨ if line[-1] == 'n' : print(line[: -1] else print line

A smarter solution (II) print('-------') fh = open("test/sample. txt", "r") for line in fh : if line[-1] == 'n' : print(line[: -1]) # remove last char else : print(line) print('------') fh. close() # optional

It works! -----To be or not to be that is the question Now is the winter of our discontent -------

We can do better n Use the rstrip() Python method ¨ astring. rstrip() remove all trailing spaces from astring ¨ astring. rstrip('n') remove all trailing newlines from astring

Examples

The simplest solution print('-------') fh = open("test/sample. txt", "r") for line in fh : print(line. rstrip('n') print('------') fh. close() # optional This will remove all trailing newlines even the ones we should keep

Global reads n fh. read() ¨ Returns whole contents of file specified by file handle fh ¨ File contents are stored in a single string that might be very large

$Example f 2 = open("test\sample. txt", "r") bigstring = f 2. read() print(bigstring) f$

Example f 2 = open("test\sample. txt", "r") bigstring = f 2. read() print(bigstring) f 2. close() # optional

Output of example To be or not to be that is the question Now is the winter of our discontent ¨ Exact contents of file ‘testsample. txt’ followed by an extra return

fh. read() and fh. read(n) n fh. read() reads in the whole fh file and returns its contents as a single string n fh. read(n) reads the next n bytes of file fh

Reading within a loop n Standard method for C/C++ infile = open("test sample. txt", "r") line = infile. readline() # priming read while line : # false if empty print(line. rstrip("n") line = infile. readline() infile. close()

Making sense of file contents n n Most files contain more than one data item per line ¨ COSC 713 -743 -3350 UHPD 713 -743 -3333 Must split lines ¨ mystring. split(sepchar) where sepchar is a separation character n returns a list of items

Splitting strings >>> txt = "Four score and seven years ago" >>> txt. split() ['Four', 'score', 'and', 'seven', 'years', 'ago'] >>>record ="1, 'Baker, Andy', 83, 89, 85" >>> record. split(', ') [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!

Example # how 2 split. py print('-----') fh = open("test/sample. txt", "r") for line in fh : words = line. split() for xxx in words : print(xxx) fh. close() # optional print('-----')

Output n ----To be … of our discontent ----- Spurious newlines are gone

Standard way to access a file # preprocessing # set up counters, strings and lists fh = open("input. txt", "r") for line in fh : words = line. split(sepchar) # often space for xxx in words : # do something fh. close() # optional # postprocessing # print results

Example n List of expenditures with dates: ¨ Rent 11/2/16 $850 Latte 11/2/16 $4. 50 Food 11/2/16 $35. 47 Latte 11/3/16 $4. 50 Outing 11/4/16 $27. 00 n Want to know how much money was spent on latte

First attempt Read line by line n Will split all lines such as ¨ "Food 11/2/16 $35. 47" into ¨ ["Food", "11/2/16", "$35. 47"] n Will use first and last entries of each linelist n

First attempt total = 0 # set up accumulator fh = open("expenses. txt", "r") for line in fh : words = line. split(" ") if words[0] == 'Latte' : total += words[2] # increment fh. close() # optional print("you spent %. 2 f on latte" % total) n It does not work!

Second attempt n n Must first remove the offending '$' Must also convert string to float def price 2 float(s) : """ remove leading dollar sign""" if s[0] == "$" : returns float(s[1: ]) else : return float(s)

Second attempt total = 0 # set up accumulator fh = open("expenses. txt", "r") for line in fh : words = line. split(" ") if words[0] == 'Latte' : total += price 2 float(words[2]) fh. close() # optional print("You spent $%. 2 f on latte" % total) You spent $13. 50 on latte

Picking the right separator (I) n Commas ¨ CSV Excel format n Values are separated by commas n Strings are stored without quotes ¨Unless they contain a comma § “Doe, Jane”, freshman, 90 ¨Quotes within strings are doubled

Picking the right separator (II) n Tabs( ‘t’) ¨ Advantages: n Your fields will appear nicely aligned n Spaces, commas, … are not an issue ¨ Disadvantage: n You do not see them ¨They look like spaces

Why it is important n When you must pick your file format, you should decide how the data inside the file will be used: ¨ People will read them ¨ Other programs will use them ¨ Will be used by people and machines

An exercise n Converting tab-separated data to CSV format ¨ Replacing tabs by commas n Easy ¨Will use string replace function

Possible input lines Alice 85 92 95 Doe, Jane 87 88 90 Doe, John 82 91 77 Kingsman, Edward "Ted" 75 87 89

First attempt fh_in = open('grades. txt', 'r') buffer = fh_in. read() newbuffer = buffer. replace('t', ', ') fh_out = open('grades 0. csv', 'w') fh_out. write(newbuffer) fh_in. close() fh_out. close() print('Done!')

The output n n Alice 90 90 90 Bob 85 85 85 Carol 75 75 75 becomes Alice, 90, 90, 90 Bob, 85, 85, 85 Carol, 75, 75, 75 90 85 75

Dealing with commas (I) n n Work line by line For each line ¨ split input into fields using TAB as separator ¨ store fields into a list n Alice 90 90 becomes [‘Alice’, ’ 90’, ’ 90’] 90

Dealing with commas (II) ¨ Put within double quotes any entry containing one or more commas ¨ Output list entries separated by commas n ['"Baker, Ann"', 90, 90, 90] which will become later "Baker, Ann", 90, 90, 90

Dealing with commas (III) n Our troubles are not over: ¨ Must store somewhere all lines until we are done ¨ Store them in a list

Dealing with double quotes n Before wrapping items with commas with double quotes replace ¨ All double quotes by pairs of double quotes n 'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"'

Order matters (I) n We must double the inside double quotes before wrapping the string into double quotes; ¨ From 'Aguirre, "Lalo" Eduardo' go to 'Aguirre, ""Lalo"" Eduardo' then to '"Aguirre, ""Lalo"" Eduardo"'

Order matters (II) n Otherwise; ¨ We go from 'Aguirre, "Lalo" Eduardo' to '"Aguirre, "Lalo" Eduardo"' then to '""Aguirre, ""Lalo"" Eduardo""' with all double quotes doubled

General organization (I) linelist = [] # new file contents for line in file : itemlist = line. split(…) linestring = '' # start with empty line for item in itemlist : remove any trailing newline double all double quotes if item contains comma, wrap add to linestring append linestring to stringlist

General organization (II) for line in file … remove last comma of linestring add newline at end of linestring append linestring to stringlist for linestring in in stringline write linestring into output file

The program (I) # betterconvert 2 csv. py """ Convert tab-separated file to csv """ fh = open('grades. txt', 'r') #input file linelist = [ ] # global data structure for line in fh : # process an input line itemlist = line. split('t') # print(str(itemlist)) # for debugging linestring = '' # start afresh

The program (II) for item in itemlist : # process item # double all double quotes item = item. replace('"', '""') if item[-1] == 'n' : # remove it item = item[: -1] if ', ' in item : # wrap item linestring += '"' + item +'"' # just append linestring += item +', ' # end of item loop

The program (III) # replace last comma by newlinestring = linestring[: -1] + 'n' linelist. append(linestring) # end of line loop fh. close() fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()

Notes n n n Most print statements used for debugging were removed ¨ Space considerations Observe that the inner loop adds a comma after each item ¨ Wanted to remove the last one Must also add a newline at end of each line

The input file Alice 90 90 90 Bob 85 85 Carol 75 75 75 Doe, Jane 90 90 90 Fulano, Eduardo "Lalo" 90 85 75 80 90 90 75 70 90 90 90

The output file n Alice, 90, 90, 90 Bob, 85, 85, 85 Carol , 75, 75, 75 "Doe, Jane", 90, 90 , 80 , 75 "Fulano, Eduardo ""Lalo""", 90, 90

Mistakes being made (I) n Mixing lists and strings: ¨ Earlier draft of program declared n linestring = [ ] and did n linestring. append(item) ¨ Outcome was n ['Alice, ', '90, '. … ] instead of n 'Alice, 90, …'

Mistakes being made (II) n n Forgetting to add a newline ¨ Output was a single line Doing the append inside the inner loop: ¨ Output was n Alice, 90, 90 …

Mistakes being made n Forgetting that strings are immutable: ¨ Trying to do n linestring[-1] = 'n' instead of n linestring = linestring[: -1] + 'n' ¨ Bigger issue: n Do we have to remove the last comma?

Could we have done better? (I) n Make the program more readable by decomposing it into functions ¨ A function to process each line of input n do_line(line) ¨Input is a string ending with newline ¨Output is a string in CSV format ¨Should call a function processing individual items

Could we have done better? (II) ¨A function to process individual items n do_item(item) ¨Input is a string ¨Returns a string § With double quotes "doubled" § Without a newline § Within quotes if it contains a comma

The new program (I) def do_item(item) : item = item. replace('"', '""') if item[-1] == 'n' : item = item[: -1] if ', ' in item : item ='"' + item +'"' return item

The new program (II) def do_line(line) : itemlist = line. split('t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +', ' if linestring != '' and linestring[-1] == ', ' : linestring = linestring [: -1] linestring += 'n' return linestring

The new program (III) fh = open('grades. txt', 'r') linelist = [ ] for line in fh : linelist. append(do_line(line)) fh. close() fhh = open('great. csv', 'w') for line in linelist : fhh. write(line) fhh. close()

Why it is better n Program is decomposed into small modules that are much easier to understand ¨ Each fits on a Power. Point slide

The break statement n n Makes the program exit the loop it is in In next example, we are looking for first instance of a string in a file ¨ Can exit as soon it is found

Example (I) search. Str= input('Enter search string: ') found = False fh = open('grades. txt') for line in fh : if search. Str in line : print(line) found = True break

Example (II) if found == True : print("String %s was found" % search. Str) else : print("String %s NOT found " % search. Str)

Flags n A variable like found ¨ That can either be True or False ¨ That is used in a condition for an if or a while is often referred to as a flag

PICKLED FILES (NOT ON THE QUIZ)

Pickled files n import pickle ¨ Provides a way to save complex data structures in a file ¨ Sometimes said to provide a serialized representation of Python objects

Basic primitives (I) n dump(object, fh) ¨ appends a sequential representation of object into file with file handle fh ¨ object is virtually any Python object ¨ fh is the handle of a file that must have been opened in 'wb' mode b is a special option allowing to write or read binary data

Basic primitives (II) n target = load( filehandle) ¨ assigns to target next pickled object stored in filehandle ¨ target is virtually any Python object ¨ filehandle is the filehandle of a file that was opened in rb mode

Example (I) >>> [2, >>> >>> mylist = [ 2, 'Apples', 5, 'Oranges'] mylist 'Apples', 5, 'Oranges'] fh = open('afile', 'wb') # b = BINARY import pickle. dump(mylist, fh) fh. close()

Example (II) >>> fhh = open('afile', 'rb') # b = BINARY >>> theirlist = pickle. load(fhh) >>> theirlist [2, 'Apples', 5, 'Oranges'] >>> theirlist == mylist True

What was stored in testfile? n Some binary data containing the strings 'Apples' and 'Oranges'

Using ASCII format n n n Can require a pickled representation of objects that only contains printable characters ¨ Must specify protocol = 0 Advantage: ¨ Easier to debug Disadvantage: ¨ Takes more space

Example import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile. txt', 'wb') pickle. dump(mydict, fh, protocol = 0) fh. close() fhh = open('asciifile. txt', 'rb') theirdict = pickle. load(fhh) print(mydict) print(theirdict)

The output {'Bob': 27, 'Alice': 22}

What is inside asciifile. txt? n (dp 0 VBobp 1 L 27 Ls. VAlicep 2 L 22 Ls.

Dumping multiple objects (I) import pickle fh = open('asciifile. txt', 'wb') for k in range(3, 6) : mylist = [i for i in range(1, k)] print(mylist) pickle. dump(mylist, fh, protocol = 0) fh. close()

Dumping multiple objects (II) fhh = open('asciifile. txt', 'rb') lists = [ ] # initializing list of lists while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break fhh. close() print(lists)

Dumping multiple objects (III) n Note the way we test for end-of-file (EOF) ¨ while 1 : # means forever try: lists. append(pickle. load(fhh)) except EOFError : break

The output n [1, 2] [1, 2, 3, 4] [[1, 2], [1, 2, 3, 4]]

What is inside asciifile. txt? n (lp 0 L 1 La. L 2 La. L 3 La. (lp 0 L 1 La. L 2 L a. L 3 La. L 4 La.

Practical considerations n You rarely pick the format of your input files ¨ May have to do format conversion n You often have to use specific formats for you output files ¨ Often dictated by program that will use them n Otherwise stick with pickled files!