Getting to know the data Getting to know

  • Slides: 14
Download presentation
Getting to know the data, Getting to know all about the data i. Dig.

Getting to know the data, Getting to know all about the data i. Dig. Bio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Examples of data • Observational – Recording that you saw a species • Can

Examples of data • Observational – Recording that you saw a species • Can be crowdsourced, provides data over time • Assumes that you accurately ID the species and that you record it correctly 2

Examples of data • Observational – Recording that you saw a species • Can

Examples of data • Observational – Recording that you saw a species • Can be crowdsourced, provides data over time • Assumes that you accurately ID the species and that you record it correctly • Environmental – Recording an abiotic variable • Can be automated, done with a tool • Depends on accuracy and precision of tool 3

Examples of data • Observational – Recording that you saw a species • Can

Examples of data • Observational – Recording that you saw a species • Can be crowdsourced, provides data over time • Assumes that you accurately ID the species and that you record it correctly • Environmental – Recording an abiotic variable • Can be automated, done with a tool • Modeled – Input large quantities of data • Useful for prediction • Robustness dependent on the input data 4

Examples of data • Observational – Recording that you saw a species • Can

Examples of data • Observational – Recording that you saw a species • Can be crowdsourced, provides data over time • Assumes that you accurately ID the species and that you record it correctly • Environmental – Recording an abiotic variable • Can be automated, done with a tool • Modeled – Input large quantities of data • Useful for prediction • Robustness dependent on the input data • Other? What kinds of data do you use in research? 5

Collections data* Pros • Verifiable • Old – Baseline data – Data for research

Collections data* Pros • Verifiable • Old – Baseline data – Data for research on topics not yet known – Comparison over time • DNA Individual Species • Often have associated text in field books • Not just full specimens (e. g. , sounds, genetic info, fossils) • Standards-based databases *including characteristics that are not necessarily unique to collections 6

Collections data* Pros • Verifiable • Old – Baseline data – Data for research

Collections data* Pros • Verifiable • Old – Baseline data – Data for research on topics not yet known – Comparison over time • DNA Individual Species • Often have associated text in field books • Not just full specimens (e. g. , sounds, genetic info, fossils) • Standards-based databases Cons • Biases – – – Geographic Temporal (years and seasonal) Research-based Taxonomic Phenological Duplication • Post-collection errors – Illegible handwriting – Incomplete label data – Poor preservation *including characteristics that are not necessarily unique to collections 7

Darwin Core The Darwin Core is a body of standards. It includes a glossary

Darwin Core The Darwin Core is a body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. (https: //en. wikipedia. org/wiki/Darwin_Core) http: //www. canadensys. net/publication/darwin-core 8

i. Dig. Bio portal search results Each row represents a specimen housed in a

i. Dig. Bio portal search results Each row represents a specimen housed in a collection 9

i. Dig. Bio portal search results Same Darwin Core format for all species, localities,

i. Dig. Bio portal search results Same Darwin Core format for all species, localities, types of specimen, etc. 10

As with applications of other data sources, it’s all about appropriately accounting for the

As with applications of other data sources, it’s all about appropriately accounting for the characteristics of the data 11

with other dataof As. As with applications sources, all about other datait’s sources, it’s

with other dataof As. As with applications sources, all about other datait’s sources, it’s appropriately accounting all about appropriately are for accounting the These characteristics forcritical the ofaspects dataofliteracy theofdata characteristics the for undergrads in all data heavy STEM fields! 12

with other dataof As. As with applications sources, all about other datait’s sources, it’s

with other dataof As. As with applications sources, all about other datait’s sources, it’s Get to know the data and appropriately accounting all about appropriately thecritical applications are These are aspects for accounting the characteristics for the of limitless! of data literacy the data of the for characteristics undergrads in all data heavy STEM fields! 13

Get involved! idigbio. org/wiki facebook. com/i. Dig. Bio twitter. com/i. Dig. Bio vimeo. com/i.

Get involved! idigbio. org/wiki facebook. com/i. Dig. Bio twitter. com/i. Dig. Bio vimeo. com/i. Dig. Bio idigbio. org/rss-feed. xml idigbio. org/events-calendar/export. ics i. Dig. Bio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.