Integrated Information Management and Access new chances for

  • Slides: 39
Download presentation
 Integrated Information Management and Access - new chances for museums, archives and libraries

Integrated Information Management and Access - new chances for museums, archives and libraries Martin Doerr Center for Cultural Informatics Institute of Computer Science Foundation for Research and Technology - Hellas Singapore, August 1, 2008 ICS-FORTH August 1, 2008 1

 Integrated Information Management Overview q Information Integration – a utility perspective q Museum

Integrated Information Management Overview q Information Integration – a utility perspective q Museum and Library Information q Key-words, Finding Aids and Thesauri q Do we talk about the same thing? q Understanding events, contexts and stories q CIDOC CRM, simplementations ICS-FORTH August 1, 2008 2

 Information Integration Management A Perspective of Utility Memory institutions maintain Digital Repositories (“Digital

Information Integration Management A Perspective of Utility Memory institutions maintain Digital Repositories (“Digital Memories”) l. Information systems preserving and providing access to primary information sources, scientific and scholarly information and literature, such as digital libraries of publications, indices of archives of social or scientific activities, or documentation of physical collections. l. Digital Repositories are necessarily heterogeneous to optimize their function for different information forms and access needs, but the knowledge they contain forms a logical whole. l. To get information and learn from information we need l uniform access, l retrieval by human criteria and l connection of disparate information assets (e. g. , painting & biography) ICS-FORTH August 1, 2008 3

 Information Integration Management A Perspective of Utility Information integration provides a syntactically and

Information Integration Management A Perspective of Utility Information integration provides a syntactically and semantically homogeneous layer on top, be it physical or virtual, manual or automated. l. Multiple standard formats can coexist, if information can be transformed or merged. One format does not ensure that the information is connected! l. Standardization and transformation go hand in hand. For both, documentation (metadata) needs to be provided, adapted or “cleaned”: l legacy data to standard form, from one standard to another, “tune” data so that they can be transformed. l Ultimate integration cost: manual creation/ adaptation of metadata. l Better integration is not always more work, but needs more foresight. l Bad decisions cost most. ICS-FORTH August 1, 2008 4

 Information Integration Management A Perspective of Utility q Levels of Integration: From one

Information Integration Management A Perspective of Utility q Levels of Integration: From one platform, I can… 1. 2. 3. 4. 5. read everything, if I have the ID : syntactic integration, The Web get everything that refers to the words X, Y, Z: Google and others get everything about a particular person, thing, place, fact, or concept learn, if there are things, facts with given characteristics learn about associations and contexts of things across documents For instance, u What species is this object? u Which professions had the relatives of van Gogh? Which where the clients of van Gogh’s paintings? u Were German soldiers in Russia before WWII ? u Which antique art objects may Michelangelo have seen? (25 years project !) ICS-FORTH August 1, 2008 5

 Information Integration Management A Perspective of Utility q The traditional library task: u

Information Integration Management A Perspective of Utility q The traditional library task: u Collect and preserve documents and provide finding aids u The job is solved, when the (one, best) document is handed out. “All you need is in this document”. q But understanding lives from relationships. Museum information has complex relationships. Relationships may be categorical or factual: u Categorical (e. g. , “smoking causes cancer”). : Richly exploited by Semantic Web technology. Use and integration limited to research results. Not useful for primary research itself. u Factual associations concatenate information assets to meaningful (“epistemic”) networks (“stories”): support context-based hypothesis building, cross-disciplinary search etc. (e. g. “John smoked with 20”, … 30. . 40”. “John had lung cancer with 60”) ICS-FORTH August 1, 2008 6

 Information Integration Management Library, Archive, Museum Information q The typical library contents: “The

Information Integration Management Library, Archive, Museum Information q The typical library contents: “The whole stories” u Secondary literature (research results) u Facts brought into causal context u Categorical: theories and hypotheses u Fiction. q The typical archive contents: “The needle in the haystack” u Primary sources, “bits and pieces” (letters, legal documents, administration acts, images, scientific records). u factual, kept in the sequence of creation, as by the creator or responsible. q The typical museum information: “Museum objects rarely talk” u Factual documentation of properties and context per object, references, classification u Highly heterogeneous, disparate. ICS-FORTH August 1, 2008 7

 Museum Information “A Monet is not like a Dinosaur” q Museum objects may

Museum Information “A Monet is not like a Dinosaur” q Museum objects may be: q Unique in form, valuable out of context — Valued art objects: “La Pie by Monet”, aesthetic minerals, exceptional life forms, curiosities. l Unique by particular context, not valuable out of context, valuable only as illustration or symbol, l l — Historical heirlooms, relics of saints, “John Lennon’s T-Shirt” Not unique, not particularly valuable. Used as example of a category out of the particular context — Most objects in Natural History, ethnology, archeology. Unique by rarity, valuable as evidence out of a particular context — Most objects in paleontology, many unique archeological objects: “ 6 th left rib from a T. Rex” ICS-FORTH August 1, 2008 8

 Information Integration Management The Museum Information Problem q. The ultimate goal of users

Information Integration Management The Museum Information Problem q. The ultimate goal of users seeking information is not to get an “object” but to understand a topic. q Understanding lives from relationships: u objects are interpreted by context (e. g. , bone finds in Evan’s “bathtubs”) u contexts are interpreted by objects (e. g. , many arrowheads in Troy IV) u objects are interpreted by categories (e. g. , Evan’s Minoan “bathtubs”) u categories are supported by examples (e. g. , the shape of a kris) u categories may be based on rare evidence (e. g. , a hominid tooth) q We need to integrate museum, archives, libraries in a sensible way to find integrated knowledge and produce new knowledge, to provide evidence for new hypotheses or verify or challenge old hypotheses. ICS-FORTH August 1, 2008 9

Information Integration Management Library and Museum Information q. Museum and library information has complex

Information Integration Management Library and Museum Information q. Museum and library information has complex interrelations. Museum and library information overlaps, and otherwise is different. u Libraries document literature in order to facilitate access to it. u Museum documentation classifies and describes museum objects, their context and relevance. It refers to literature. Museums produce regularly (secondary) literature. u Museum objects are referred to and published in literature. Literature may describe museum objects, their context and theories about and related to them. Literature describes concepts that are exemplified or illustrated by museum objects. No standard documentation format yet for that! u Libraries may also produce literature. Libraries may document and curate rare objects as museums do. Most museums maintain libraries. ICS-FORTH August 1, 2008 10

Information Integration Management Archive, Library and Museum Information Libraries provide Museums publish document features

Information Integration Management Archive, Library and Museum Information Libraries provide Museums publish document features & context finding aids illustrate, exemplify using Books refer to Objects are about make narratives from Archives provide finding aids primary Documents ICS-FORTH August 1, 2008 11

 Key-words, Finding Aids and Thesauri The second level of integration q. Why is

Key-words, Finding Aids and Thesauri The second level of integration q. Why is Google (i. e. Search Engines!) good? u Low cost, no data tuning, scalable u Find easily secondary literature, esp. if abundant u Find things by usual category names u No user training, no access language => Recommendation: You should always provide a good search engine ! q Why is Google bad? u User must know all synonyms u Names are not things: Rare things are covered under frequent names (e. g. , “George Bush”, a S/W called “Volcano”) u Relations only by aggregation of terms appearing in the source (e. g. , “First known Turkish - Greek marriage in Crete” (1635) ), u No control on relevance, no statistics possible, no related sources ICS-FORTH August 1, 2008 12

 Key-words, Finding Aids and Thesauri integration The second level of q Finding Aids:

Key-words, Finding Aids and Thesauri integration The second level of q Finding Aids: u Assumption: User knows a topic, characterized by a noun, or knows associations of the topic uncorrelated to the problem to be solved (e. g. “organic farming” for “host-parasite studies”, an author for a topic, or: search object by date of acquisition, because I don’t remember the name) q Dublin Core Metadata Elements makes 15 relationships to terms explicit (type, classification, creator, publisher, date, format etc. ) u It increases precision u It increases recall if additional terms in the metadata are added ICS-FORTH August 1, 2008 13

 Key-words, Finding Aids and Thesauri integration The second level of q Is Dublin

Key-words, Finding Aids and Thesauri integration The second level of q Is Dublin Core better than Google? u Literature search by Author-Title: Google is sufficient or better u Type, format, subject, coverage: DC only better if terms not in the content u Relationship: DC better if not connected by relevant term cluster u Non-verbose, non-digital objects: DC provides the minimal metadata! u By Shakespeare or about Shakespeare: DC disambiguates! q What Dublin Core does not? u Not appropriate for museum objects (no place, finding info, material) u No typed relationships, no context information u No notion of identity (separation of URI and name, American library tradition) => DC has significant benefit for non-verbose digital objects. ICS-FORTH August 1, 2008 14

 Key-words, Finding Aids and Thesauri The second level of integration q Thesauri of

Key-words, Finding Aids and Thesauri The second level of integration q Thesauri of controlled terms (categories) u Subjects, object types, place types, person roles, event types u Good for secondary literature search, metadata fields (libraries!) u Bad: A “new language” users must learn, expensive to create u invisible thesauri enhance search engines q “Museums do not like thesauri”: u Not suited for factual knowledge!! u Cultural terminology is a dynamic research tool (“every Ph. D a new typology”) to conclude from form to function or time etc. u Only few high-level terms are stable and useful for finding aids Recommendation: Small thesauri for museums (that users can see on one page) increase power of metadata and improve search results. ICS-FORTH August 1, 2008 15

Do we talk about the same Thing? Co-reference can connect documents! Such networks hide

Do we talk about the same Thing? Co-reference can connect documents! Such networks hide stories! (complementary information) ? ? ICS-FORTH August 1, 2008 ? 16

Do we talk about the same Thing? Hypertext is wrong: Documents contain links! Linking

Do we talk about the same Thing? Hypertext is wrong: Documents contain links! Linking documents by co-reference Primary link corresponding to one document CIDOC CRM Core Ontology Deductions Instance of Integration by Factual Relations real world nodes (KOS) Donald Johanson's Expedition Cleveland Museum of Natural History Discovery of Lucy AL 288 -1 Lucy Ethiopia Hadar Documents in Digital Libraries ICS-FORTH August 1, 2008 17

Do we talk about the same Thing? Co-reference links via authority files Join across

Do we talk about the same Thing? Co-reference links via authority files Join across sources by transitivity of co-reference local ids Find “friends of a friend” Not scalable! Content . . output: “George” Dyn amic li nk Source 2 Join query Content . . ICS-FORTH August 1, 2008 Source 1 id L i n k local ids input: “Martin” match t a b l match e “Κώστας” / “Kostas” . . Authority service 18

Do we talk about the same Thing? Co-reference links without authority files Join across

Do we talk about the same Thing? Co-reference links without authority files Join across sources by transitivity of co-reference local ids Find “friends of a friend” make a co-reference Content . . output: “George” make a co-reference Source 2 Join query “Κώστας” / “Kostas” local ids Content match. . input: “Martin” ICS-FORTH August 1, 2008 . . Source 1 19

 Do we talk about the same Thing? The third level of integration q

Do we talk about the same Thing? The third level of integration q Do we talk about the same thing? u Documents are connected if they refer to the same things people, places, events = “Co-reference”. The hypertext model is wrong. u Authority files cannot catch up, they simplify procedure but do not solve it. The scale is incredible. u Curation of direct co-reference links (co-reference clusters) needed. u Not more expensive than a search engine index u Duplicate detection, data cleaning and Web 2. 0 methods can help massively generate co-reference links Recommendation: Prepare for co-reference in documentation practice! (tag names, link locally etc. ) ICS-FORTH August 1, 2008 20

 Understanding Events, Contexts, Stories The Fourth Level of Integration q So far, by

Understanding Events, Contexts, Stories The Fourth Level of Integration q So far, by integration nothing learned yet beyond what I manually collect from each source. q Co-reference: Allows for tracing stories, but not for querying stories. q Understanding lives from relationships. u Is there a global model of relationships? (social, economic, material, geographic, biological relations…, thousands of documentation formats) u Dominance of the mesoscopic, human activity scale. u Identification, classification, part-whole, reference, participation in meetings => these relations integrate museum and library information! u Confirmed by museums, e-science, historians. ICS-FORTH August 1, 2008 21

Information Integration Management Context as a network of related “meetings” time “LAOKOON” (copy) (in

Information Integration Management Context as a network of related “meetings” time “LAOKOON” (copy) (in Vatican museum) “…noble simplicity, silent grandeur…” (in a library) Winkelmann’s death Winkelmann writes…. Winkelmann unknown Roman Winkelmann sees “Laokoon” (archive information) “LAOKOON” unknown Roman Published copies “Laokoon” Inference (in a library) Greece ICS-FORTH August 1, 2008 Rome Winkelmann’s birth Germany Winkelmann’s mother (archive information) space 22

 The CIDOC CRM ISO 21127 The CIDOC Conceptual Reference Model (ISO 21127: 2006)

The CIDOC CRM ISO 21127 The CIDOC Conceptual Reference Model (ISO 21127: 2006) u Developed by the CRM Special Interest Group of the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM), following an initiative of ICS-FORTH, Heraklion, Crete. u Is an extensible core ontology describing the underlying semantics of over a hundred database schemata and structures from all museum disciplines, archives and libraries. (Now extended by FRBROO, modeling IFLA’s FRBR). u u It is result of 15 years interdisciplinary work and agreement. In essence, it is a generic model of recording of “what has happened” in human scale, i. e. a class of discourse. u By it we can generate huge, meaningful networks of knowledge by a simple abstraction: history as meetings of people, things and information. u It bears surprise: Minimal or no specialization allows for covering new domains. ICS-FORTH August 1, 2008 23

The CIDOC CRM Historical Archives…. Type: Title: Title. Subtitle: Date: Creator: Publisher: Subject: Text

The CIDOC CRM Historical Archives…. Type: Title: Title. Subtitle: Date: Creator: Publisher: Subject: Text Protocol of Proceedings of Crimea Conference II. Declaration of Liberated Europe February 11, 1945. The Premier of the Union of Soviet Socialist Republics The Prime Minister of the United Kingdom The President of the United States of America State Department Postwar division of Europe and Japan Metadata Documents About… ICS-FORTH August 1, 2008 “The following declaration has been approved: The Premier of the Union of Soviet Socialist Republics, the Prime Minister of the United Kingdom and the President of the United States of America have consulted with each other in the common interests of the people of their countries and those of liberated Europe. They jointly declare their mutual agreement to concert… …. and to ensure that Germany will never again be able to disturb the peace of the world…… “ 24

The CIDOC CRM Images, non-verbose objects… Type: Title: Date: Publisher: Source: Copyright: References: Image

The CIDOC CRM Images, non-verbose objects… Type: Title: Date: Publisher: Source: Copyright: References: Image Allied Leaders at Yalta 1945 United Press International (UPI) The Bettmann Archive Corbis Churchill, Roosevelt, Stalin Photos, Persons Metadata About… ICS-FORTH August 1, 2008 25

The CIDOC CRM Places and Objects TGN Id: 7012124 Names: Yalta (C, V), Jalta

The CIDOC CRM Places and Objects TGN Id: 7012124 Names: Yalta (C, V), Jalta (C, V) Types: inhabited place(C), city (C) Position: Lat: 44 30 N, Long: 034 10 E Hierarchy: Europe (continent) <– Ukrayina (nation) <– Krym (autonomous republic) Note: …Site of conference between Allied powers in WW II in 1945; …. Source: TGN, Thesaurus of Geographic Names Places, Objects About… Title: Yalta, Crimean Peninsula Publisher: Kurgan-Lisnet Source: Liaison Agency ICS-FORTH August 1, 2008 26

The CIDOC CRM Explicit Events, Object Identity, Symmetry E 52 Time-Span E 39 Actor

The CIDOC CRM Explicit Events, Object Identity, Symmetry E 52 Time-Span E 39 Actor E 53 Place 7012124 February 1945 P 11 P 82 at some time within pa rtic ipa ted in P 7 took place at E 7 Activity “Crimea Conference” E 39 Actor P 86 falls within E 38 Image P 6 7 i s r efe rre E 65 Creation Event E 39 Actor ed rm o f r e p P 14 * P 81 ongoing throughout E 52 Time-Span P 9 4 ha s c d t o b y E 31 Document rea ted “Yalta Agreement” 11 -2 -1945 ICS-FORTH August 1, 2008 27

The CIDOC CRM Data Example (e. g. from Extraction) Epitaphios GE 34604 (entity E

The CIDOC CRM Data Example (e. g. from Extraction) Epitaphios GE 34604 (entity E 22 Man-Made Object) P 30 custody transferred through, P 24 changed ownership through Transfer of Epitaphios GE 34604 (entity E 10 Transfer of Custody, E 8 Acquisition Event P 28 custody surrendered by Multiple Instantiation ! Metropolitan Church of the Greek Community of Ankara (entity E 39 Actor ) P 23 transferred title from Metropolitan Church of the Greek Community of Ankara (entity E 39 Actor ) P 29 custody received by Museum Benaki (entity E 39 Actor ) P 22 transferred title to Exchangeable Fund of Refugees (entity P 40 Legal Body ) P 2 has type national foundation (entity E 55 Type ) P 14 carried out by Exchangeable Fund of Refugees (entity E 39 Actor ) P 4 has time-span GE 34604_transfer_time (entity E 52 Time-Span ) P 82 at some time within 1923 - 1928 P 7 took place at Greece (entity E 61 Time Primitive) (entity E 53 Place ) P 2 has type (entity E 55 Type ) nation republic (entity E 55 Type ) P 89 falls within Europe P 2 has type ICS-FORTH August 1, 2008 TGN data (entity E 53 Place ) continent (entity E 55 Type ) 28

The CIDOC CRM Top-level Entities relevant for Integration refer to / identifie E 41

The CIDOC CRM Top-level Entities relevant for Integration refer to / identifie E 41 Appellations E 55 Types refer to / refine E 39 Actors E 28 Conceptual Objects E 18 Physical Thing participate in affect or / refer to location E 2 Temporal Entities E 52 Time-Spans within ICS-FORTH August 1, 2008 at E 53 Places 29

The CIDOC CRM Example: The Temporal Entity Hierarchy ICS-FORTH August 1, 2008 30

The CIDOC CRM Example: The Temporal Entity Hierarchy ICS-FORTH August 1, 2008 30

The CIDOC CRM A Classification of its Relationships u Identification of real world items

The CIDOC CRM A Classification of its Relationships u Identification of real world items by real world names. u Classification of real world items. u Part-decomposition and structural properties of Conceptual & Physical Objects, Periods, Actors, Places and Times. u Participation of persistent items in temporal entities. — creates a notion of history: “world-lines” meeting in space-time. u Location of periods in space-time and physical objects in space. u Influence of objects on activities and products and vice-versa. u Reference of information objects to any real-world item. ICS-FORTH August 1, 2008 31

The CIDOC CRM What is an ontology? q Ontologies are formalized knowledge: clearly defined

The CIDOC CRM What is an ontology? q Ontologies are formalized knowledge: clearly defined concepts and relationships about real possible states of affairs of a domain. “Semantics” is the world they refer to (“ontological commitment”), and not a set of logical rules! (e. g. , what is an event? ) q Ontologies describe a reality, independent from context and performance! Information models are not ontologies! They abbreviate, denormalize, select. E. g. : “DC. creator”, “DC. Date”, “birthday/birthplace”, “destination” in the MIDAS schema (UK monuments records). q Ontologies can be understood by people and processed by machines to enable data exchange, data integration, query mediation: u Local information systems may export information in a CRM compatible form (CRM Core or more). u Local information systems may answer queries by a subset of CRM concepts. u Exported information may be merged in another database (“data warehouse”). Complementary information can thus be easily integrated. ICS-FORTH August 1, 2008 32

Interoperability of Museum Information towards a network of knowledge u. There cannot be one

Interoperability of Museum Information towards a network of knowledge u. There cannot be one database schema for all ALM information. A global core ontology is a high-level explanation, not a format, allowing for automated correlation, mediation, transformation, generation of integrated views. u. A particular Installation should have a core schema, compatible with the core ontology, following an informed decision about its integration and access capabilities, for instance, CRM Core, Museum. Dat, or a similar CRM-compatible schema. DC and CRM Core can be combined. u. With CRM, we know at any time what extension to more functionality means, e. g. , FRBRoo/ FRBRCore. (DC extension simply failed!). u. CRM Core(or Museum. Dat): A low-cost entry to CRM compatibility. — As easy as Dublin Core, but appropriate to relate ALM — start with finding aids — add co-reference – manual, automated, Web 2. 0 — add NLP to recover more events. — Add more sophisticated relationships. ICS-FORTH August 1, 2008 33

Interoperability of Museum Information CRM Core metadata elements ICS-FORTH August 1, 2008 34

Interoperability of Museum Information CRM Core metadata elements ICS-FORTH August 1, 2008 34

Interoperability of Museum Information Integration with CRM Core (Network View) E 84 Information Carrier

Interoperability of Museum Information Integration with CRM Core (Network View) E 84 Information Carrier The “Monument to Balzac”(S 1296) P 62 depicts P 108 B was produced by P 2 has type E 55 Type bronze E 21 Person Honoré de Balzac P 62 depicts E 84 Information Carrier The “Monument to Balzac” (plaster) P 16 B was used for E 12 Production P 134 continued Bronze casting “Monument to Balzac” in 1925 P 4 has time-span E 52 Time-Span 1925 E 69 Death Rodin’s death E 12 Production P 2 has type Rodin making “Monument to Balzac” in 1898 P 120 B occurs after E 55 Type plaster P 14 carried out by E 52 Time-Span 1917 P 4 has time-span P 108 B was produced by E 40 Legal Body Rudier (Vve Alexis) et Fils P 4 has time-span P 7 took place at E 52 Time-Span E 53 Place 1898 France (nation) P 2 has type E 55 Type P 14 carried out by companies E 21 Person P 100 B died in P 98 B was born E 67 Birth Rodin’s birth ICS-FORTH August 1, 2008 P 4 has time -span Auguste Rodin P 2 has type E 55 Type sculptors E 52 Time-Span 1840 35

Metadata View Artist (CRM Core). Category = E 21 Person Classification = artists Classification

Metadata View Artist (CRM Core). Category = E 21 Person Classification = artists Classification = sculptors Identification =Rodin, Auguste Identification =ID: 500016619 Event Role in Event =P 98 B was born Identification= Rodin‘s birth Event Type = E 67 Birth Date = 1840 Event Role in Event =P 100 B died in Identification= Rodin‘s death Event Type = E 69_Death Date = 1917 Related event Role in Event =P 120 occurs before Identification= Bronze casting Monument to Balzac in 1925 Work (CRM Core). Category = E 84 Information Carrier Classification =sculpture (visual work) Classification =plaster Identification =The Monument to Balzac (plaster) Description =Commissioned to honor one of France's greatest novelists, Rodin spent seven years preparing for Monument to Balzac. When the plaster original was exhibited in Paris in 1898, it was widely attacked. Rodin retired the plaster model to his home in the Paris suburbs. It was not cast in bronze until years after his death. Event Role in Event =P 108 B was produced by Identification= Rodin making Monument to Balzac in 1898 Event Type = E 12 Production Participant Identification =Rodin, Auguste Identification =ID: 500016619 Participant Type = artists Participant Type = sculptors Date = 1898 Place = France (nation) Related event Role in Event =P 134 B was continued by Identification= Bronze casting Monument to Balzac in 1925 Event Role in Event =P 16 B was used for Identification= Bronze casting Monument to Balzac in 1925 Event Type = E 12 Production Participant Identification =Rudier (Vve Alexis) et Fils Participant Type = companies Thing Present Identification =The Monument to Balzac (S. 1296) Thing Present Type =bronze Thing Present Type =sculpture (visual work) Date = 1925 Related event Role in Event =P 120 B occurs after ICS-FORTH August 1, 2008 Identification= Rodin's death Relation To = Honore de Balzac Relation type refers to 36

The CIDOC CRM Why an Integration layer on Top? q Information acquisition needs: —

The CIDOC CRM Why an Integration layer on Top? q Information acquisition needs: — sequence and order, completeness, case-specific language and constraints to guide and control data entry. — ergonomic documentation units, optimized to specialist needs — work-flow on series of analogous items, item-centric. — Low interoperability needs (capability to be mapped!) q Integration / comprehension needs epistemic networks: — break up document boundaries, relate facts to wider context, — match shared identifiers of items, aggregate alternatives — no preference direction of search, no cardinality constraints. — High interoperability needs (mapping to a global schema) q Interpretation, story-telling, hypothesis building — explore context, paths, analogies (orthogonal to data acquisition) — present in order, resolve alternatives (enforce constraints) — deduction and induction ICS-FORTH August 1, 2008 37

Epistemic Networks on DLs Metadata at sources and indirect co-reference links • Easy update

Epistemic Networks on DLs Metadata at sources and indirect co-reference links • Easy update • Scalable, peer-to-peer • Slow querying, • Concatenation of facts, • Alternatives management Core Ontology (e. g. , CIDOC CRM) surrogate nodes Donald Johanson extracted, normalized metadata Sources ICS-FORTH August 1, 2008 Johanson's Expedition Hadar Lucy Ethiopia? indirect co-reference links 38

Interoperability of Museum Information Conclusions q Historical information is factual and contextual. Metadata formats

Interoperability of Museum Information Conclusions q Historical information is factual and contextual. Metadata formats for cultural heritage data must be adequate to the scientific discourse. q We need small thesauri for museums. Better invest in Gazetteers (placenames), and authority files. q CRM Core already captures first sensible Museum-Archive-Library connection. Immense benefit over Dublin Core, with similar effort. q The co-reference problem is widely ignored (or even feared ? ). Its scale is extraordinary. Traditional KOS and data cleaning are not enough. We need Web 2. 0 methods. q Capacity to link and transform information is crucial to integrate information in long-terms, beyond platforms. The CRM shows how to do that. Understand the historical perspective of information. ICS-FORTH August 1, 2008 39