CSV Comma Separated Values Goals for these videos

  • Slides: 10
Download presentation
CSV Comma Separated Values

CSV Comma Separated Values

Goals for these videos Understand the distinction between a schema and a database instance

Goals for these videos Understand the distinction between a schema and a database instance Understand three commonly used file formats

Comma Separated Values Delimited flat file Stores tabular data (numbers and text) in plain

Comma Separated Values Delimited flat file Stores tabular data (numbers and text) in plain text Each line is a record Each record is a list of fields, separated by commas No actual standard except convention.

CSV Edge Cases Fields can be put in double quotes • "josh", "2016" Embedded

CSV Edge Cases Fields can be put in double quotes • "josh", "2016" Embedded double quotes must be preceded by an additional double quote • Fields containing an embedded comma character (, ), double quote (") or newline character must be in double quotes • "Nahum, Josh" "Josh said, ""Hi"" to us!" The first line of the file may be a header, which contains the column names. You need contextual information to tell if this is the case.

CSV Example Table Contents CSV Contents To, Subject, Message josh@msu. edu, Sign Up, "Do

CSV Example Table Contents CSV Contents To, Subject, Message josh@msu. edu, Sign Up, "Do tyler@msu. edu, """Scare"" they allowed? " it, Do it Quotes", " now" Are To Subject Message josh@msu. edu Sign Up Do it, Do it now tyler@msu. edu "Scare" Quotes Are they allowed?

Well-Formed CSV Which of these lines are well-formed (legal) lines in a CSV file?

Well-Formed CSV Which of these lines are well-formed (legal) lines in a CSV file? Josh, Nahum, 48823 Hi Class!, Friday, 2016 ""Stop" he said", Josh New York City, 40° 42'46"N, 74° 00'21"W

CSV Schema 1. 0 Schema defines a textual language which can be used to

CSV Schema 1. 0 Schema defines a textual language which can be used to define the data structure, types and rules for a data format. For instance, we may want to constrict what values are legal in a given column. The CSV format itself is very permissive. So we need a second document to define what constitutes "valid" data. There is an working draft of a CSV schema found here (http: //digitalpreservation. github. io/csvschema/) by the National Archives of the UK.

Example CSV Schema version 1. 0 @total. Columns 3 name: not. Empty age: range(0,

Example CSV Schema version 1. 0 @total. Columns 3 name: not. Empty age: range(0, 120) gender: is("m") or is("f") Valid CSV Data name, age, gender james, 21, m lauren, 19, f simon, 57, m

Well-Formed versus Valid Well-Formed means the data conforms to the file format (e. g.

Well-Formed versus Valid Well-Formed means the data conforms to the file format (e. g. CSV). Valid means the data conforms to a schema (more restrictive than the format)

Whitespace Do these two lines represent the same record/content? Josh, Nahum, 48823 Yes No

Whitespace Do these two lines represent the same record/content? Josh, Nahum, 48823 Yes No Depends