Group 2 Charge identify and briefly describe four
Group 2 Charge: identify and briefly describe four most important computational challenges for data citation; give examples / use cases. (We did not spend time on negative identifications. ) Participants: Altman, Cohen-Boulakia, Davidson, Duerr, Fan, Goble, Groth, Howe, Martone, Tannen
1. Modeling the referent of a data citation – Define a formal framework to be used by different fields to give their respective definition of referent – Referents can be very different things: a set of tuples, a bitstream, a landing page, they can be extensional or intensional (see next) – Relevant to all three categories of users: data exporters, data citers, citation consumers
2. Handling intensional referents – Extensional referent: a data set that exists as such somewhere. Intensional: defined by computational means, e. g. , a query, a workflow – Use case: when existing extensional referents are too large/complicated – Relevant to all three categories of users: data exporters, data citers, citation consumers
3. Information closure, e. g. attribution stacking – When the referent contains, explicitly or implicitly citations or links to other referents it depends on – Limit how far (deep? ) you go (until you hit ground? ) – Use cases: closure in sources, in attribution, and in time (e. g. , the history of a referent) – Relevant to data exporters and data citers (make it transparent to citation consumers)
4. Automatic detection of referent relationship – Can we automatically detect and verify whether the (external) referents of two different citations are related/overlapping? – Related to “fixity”? – Use case: citations to fragments of Facebook – Relevant to citation consumers
5. Collecting citations during data processing – During the execution of an ensemble of workflows using multiple data sets – Calculating which data sets used are significant enough for citations – Use case: GBIF – Relevant to data citers
6. A language for spec. levels of granularity – Whenever possible automatically infer levels of granularity – Reconcile conflicts between the data exporters and the data citers – Relevant to data exporters and data citers
7. Citing semantically unique data sets that have multiple syntactic/physical representations – Semantic resolution: a big problem everywhere not just data citation; we hope to find computationally tractable instances – Related to “fixity”? – Use case: DBpedia has multiple serializations – Relevant to data exporters, data citers, and citation consumers
- Slides: 8