LIS 508 lecture 2 Thomas Krichel 2003 10

  • Slides: 20
Download presentation
LIS 508 lecture 2 Thomas Krichel 2003 -10 -07

LIS 508 lecture 2 Thomas Krichel 2003 -10 -07

today's lecture • Recap on what we did last week. • Encoding mark-up •

today's lecture • Recap on what we did last week. • Encoding mark-up • Databases

Recap • Computers deal with on/off signals called bits. • Collections of these bits

Recap • Computers deal with on/off signals called bits. • Collections of these bits are binary numbers. • Texts are (basically) strings of characters. To represent text, we need to represent characters. • To make a characters understandable to a computer we associate a number with each character. The result is a character set.

Beyond characters • There is more to text than a string of characters. •

Beyond characters • There is more to text than a string of characters. • There is layout – titles – abstracts – mathematical formula spacing

Layout • Layout can be conveyed by additional text that has special meaning. Examples

Layout • Layout can be conveyed by additional text that has special meaning. Examples – La. Te. X – HTML – Post. Script • Another way is to do non-textual layout by adding some other digital signals. Examples – DVI – MS Word – MS Powerpoint These can not be shown in these slides!

Example: La. Te. X bigskiptextbf{Class structure} Classes will be held in the computer lab

Example: La. Te. X bigskiptextbf{Class structure} Classes will be held in the computer lab in the Palmer School between 18: 15 and 20: 45. An optional practice session will last until 21: 15. begin{tabular}{@{}llll@{}} 0&2003 --09 --23&introduction to the course &\ 1&2002 --09 --30&bits bytes and characters &\ 2&2003 --10 --07&databases and markup languages&\

Example: HTML <p><strong>Class structure</strong><p>Classes will be held in the computer lab in the Palmer

Example: HTML <p><strong>Class structure</strong><p>Classes will be held in the computer lab in the Palmer School between 18: 15 and 20: 45. An optional practice session will last until 21: 15. <p>Class details: <p><center><table width=100% border=1> <tr><td align=left> 0 </td><td align=left> 2003– 09– 23 </td><td align=left><a href="lis 508 w 03 a-00. ppt">introduction to the course</a> </td></tr><td align=left> 1 </td><td align=left> 2002– 09– 30 </td><td align=left><a href="lis 508 w 03 a-01. ppt">bits bytes and characters</a> </td>

Example: Post. Script Fc(Class)g(structur)o(e)-104 3956 y Fd(Classes)26 b(will)g(be)e(held)g(in)h(the)f(com puter)f(lab)i(in)f(the)h(P)o(almer)f(School)g(betw een)f(18: 15)h(and)g(20: 45. )36 b(An)25

Example: Post. Script Fc(Class)g(structur)o(e)-104 3956 y Fd(Classes)26 b(will)g(be)e(held)g(in)h(the)f(com puter)f(lab)i(in)f(the)h(P)o(almer)f(School)g(betw een)f(18: 15)h(and)g(20: 45. )36 b(An)25 b(optional)e(practice)h(session)-104 4055 y(will)d(last)g(until)f(21: 15. )-104 4155 y(Class)i(details: )-104 4307 y(0)141 b(20032260922623)94 b(introduction)18 b(to)i(the)h(course)-104 4407 y(1)141 b(20022260922630)94 b(bits)21 b(bytes)f(and)g(characters)-104 4507 y(2)141 b(20032261022607)94 b(databases)20 b(and)g(markup)e(languages)-

DVI (rendition, "class structure") 1659: fntnum 27 current font is ptmb 8 t 1660:

DVI (rendition, "class structure") 1659: fntnum 27 current font is ptmb 8 t 1660: setchar 67 h: =-820459+473168=-347291, hh: =-22 1661: setchar 108 h: =-347291+182183=-165108, hh: =-10 1662: setchar 97 h: =-165108+327680=162572, hh: =11 1663: setchar 115 h: =162572+254928=417500, hh: =27 1664: setchar 115 h: =417500+254928=672428, hh: =43 1665: right 3 163840 h: =672428+163840=836268, hh: =53 1669: setchar 115 h: =836268+254928=1091196, hh: =69 1670: setchar 116 h: =1091196+218232=1309428, hh: =83 1671: setchar 114 h: =1309428+290976=1600404, hh: =101 1672: setchar 117 h: =1600404+364376=1964780, hh: =124 1673: setchar 99 h: =1964780+290976=2255756, hh: =142 1674: setchar 116 h: =2255756+218232=2473988, hh: =156 1675: setchar 117 h: =2473988+364376=2838364, hh: =179 1676: setchar 114 h: =2838364+290976=3129340, hh: =197 1677: right 2 -11792 h: =3129340 -11792=3117548, hh: =196 1680: setchar 101 h: =3117548+290976=3408524, hh: =214

Databases • Databases are collection of data with some organization to them. • The

Databases • Databases are collection of data with some organization to them. • The classic example is the relational database. • But not all database need to be relational databases.

Relational databases • A relational database is a set of tables. There may be

Relational databases • A relational database is a set of tables. There may be relations between the tables. • Each table has a number of record. Each record has a number of fields. • When the database is being set up, we fix – the size of each field – relationships between tables

Example: Movie database ID M 1 M 2 M 3 M 4 M 5

Example: Movie database ID M 1 M 2 M 3 M 4 M 5 M 6 | title | Gone with the wind | Room with a view | High Noon | Star Wars | Alien | Blowing in the Wind | director | F. Ford Coppola | Coppola, F Ford | Woody Allan | Steve Spielberg | Allen, Woody | Spielberg, Steven • Single table • No relations between tables, of course | date | 1963 | 1985 | 1974 | 1993 | 1987 | 1962

Problem with this database • All data wrong, but this is just for illustration.

Problem with this database • All data wrong, but this is just for illustration. • Name covered inconsistently. There is no way to find films by Woody Allan without having to go through all spelling variations. • Mistakes are difficult to correct. We have to wade through all records, a masochist’s pleasure.

Better movie database ID M 1 M 2 M 3 M 4 M 5

Better movie database ID M 1 M 2 M 3 M 4 M 5 M 6 | title | Gone with the wind | Room with a view | High Noon | Star Wars | Alien | Blowing in the Wind ID D 1 D 2 D 3 | director name | Ford Coppola, Francis | Allan, Woody | Spielberg, Steven | director | D 1 | D 2 | D 3 | birth year | 1942 | 1957 | 1942 | year | 1963 | 1985 | 1974 | 1993 | 1987 | 1962

Relational database • We have a one to many relationship between directors and film

Relational database • We have a one to many relationship between directors and film – Each film has one director – Each director has produced many films • Here it becomes possible for the computer – To know which films have been directed by Woody Allen – To find which films have been directed by a director born in 1942

Many-to-many relationships • Each film has one director, but many actors star in it.

Many-to-many relationships • Each film has one director, but many actors star in it. Relationship between actors and films is a many to many relationship. • Here a few actors ID A 1 A 2 A 3 | sex |f |m |f | actor name | Brigitte Bardot | George Clooney | Marilyn Monroe | birth year | 1972 | 1927 | 1934

Actor/Movie table actor id | movie id A 1 | M 4 A 2

Actor/Movie table actor id | movie id A 1 | M 4 A 2 | M 3 A 3 | M 2 A 1 | M 5 A 1 | M 3 A 2 | M 6 A 3 | M 4 … as many lines as required

SQL • Once we have the relational database, we can ask sophisticated questions: –

SQL • Once we have the relational database, we can ask sophisticated questions: – Which director has had the most female actors working for him? – In which years films have been shot that starred actors born between 1926 and 1935? • Such questions can be encoded in a language know as “structured query language” or SQL. All relational database vendors implement a dialect of SQL.

databases in libraries • Relational databases dominate the world of structured data • But

databases in libraries • Relational databases dominate the world of structured data • But not so popular in libraries – Slow on very large databases (such as catalogs) – Library data has nasty ad-hoc relationships, e. g. • Translation of the first edition of a book • CD supplement that comes with the print version Difficult to deal with in a system where all relations and field have to be set up at the start, can not be changed easily later.

http: //openlib. org/home/krichel Thank you for your attention!

http: //openlib. org/home/krichel Thank you for your attention!