FITS The File Information Tool Set Background l
FITS: The File Information Tool Set
Background l l l FITS is part of the second generation Harvard University Library Digital Repository Service(DRS 2), which supports content models and METS/PREMIS object descriptors. Developed Fall 2008 First public release Spring 2009: http: //fits. googlecode. com
Why? l l Needed an automatic way to identify and extract metadata for a wide range of file types No single file analysis tool satisfied our needs
Design Goals l l l Act as a wrapper around other open source tools Extensible Needs to be a standalone command line tool and also provide an API l Allow priority setting for tools l Open source
The Tools l Current tools: l Jhove 1. 5 l Exiftool l l National Library of New Zealand Metadata Extractor (NLNZ) l DROID l FFIdent l File Utility 3 Categories l File Identification (all of them) l Metadata Extraction (Jhove, Exiftool, NLNZ) l format Validation (Jhove)
Process
Features l Conflict management l Value normalization l l l “inches” vs “ 2” Tool prioritization Format tree for understanding more specific format identities. l PDF/A is a more specific version of PDF
Example Output <fits> <identification> <identity format="Graphics Interchange Format" mimetype="image/gif"> <toolname="Jhove" toolversion="1. 5" />. . . </identity> </identification> <fileinfo> <size toolname="OIS File Information" toolversion="0. 1" status="SINGLE_RESULT">40149</size> <md 5 checksum toolname="OIS File Information" toolversion="0. 1" status="SINGLE_RESULT">265 c 9345 ebf 93 c 89 d 472766 fda 095 de 4</md 5 checksum> . . . </fileinfo> <filestatus> <well-formed toolname="Jhove" toolversion="1. 5" status="SINGLE_RESULT">true</well-formed> <valid toolname="Jhove" toolversion="1. 5" status="SINGLE_RESULT">true</valid> </filestatus> <metadata> <image> <height toolname="Jhove" toolversion="1. 5" status="SINGLE_RESULT">1024</height>. . . </image> </metadata> </fits>
Configuration l All settings are in the fits. xml config file l Enable/disable tools (available in the API too) l Prevent tools from processing files with specific file extensions l Set tool priority l Add new tools l Use your own consolidator code l Report or ignore conflicts l Options to display original tool output
Sample Configuration File <fits_configuration> <!-- Order of the tools determines preference --> <tools> <!-- exclude-exts attribute is a comma delimited list of file extensions that the tool should not try to process --> <tool class="edu. harvard. hul. ois. fits. tools. jhove. Jhove" exclude-exts="dng, mbx"/> <tool class="edu. harvard. hul. ois. fits. tools. fileutility. File. Utility" exclude-exts="dng, wps"/> <tool class="edu. harvard. hul. ois. fits. tools. exiftool. Exiftool" exclude-exts="txt, wps, vsd"/> <tool class="edu. harvard. hul. ois. fits. tools. droid. Droid" exclude-exts="dng"/> <tool class="edu. harvard. hul. ois. fits. tools. nlnz. Metadata. Extractor" excludeexts="dng, zip, odb, ott, odg, otg, odp, otp, ods, ots, odc, otc, odi, oti, odf, otf, odm, oth"/> <tool class="edu. harvard. hul. ois. fits. tools. oisfileinfo. File. Info"/> <tool class="edu. harvard. hul. ois. fits. tools. oisfileinfo. Xml. Metadata"/> <tool class="edu. harvard. hul. ois. fits. tools. ffident. FFIdent" exclude-exts="dng, wps, vsd"/> </tools> <output> <data. Consolidator class="edu. harvard. hul. ois. fits. consolidation. OISConsolidator"/> <display-tool-output>true</display-tool-output> 10
Some Limitations. . . l l l Speed Technical metadata only returned if the tool that reported it is in the first <identity> block FITS considers a successful identification to be a combination of the format name and mime type
Future Plans l l More tools l Apache Tika (text document formats) l Jhove 2 l Aduna Aperture (text, documents, email formats) l Mediainfo (audio and video formats) Better audio and video format support as we add object support for them to DRS 2
Wrap Up l http: //fits. googlecode. com l http: //ots-schemas. googlecode. com l l Java library for reading and writing METS (limited support), MODS, PREMIS, MIX, Text. MD, Document. MD, and soon AES audio metadata More information on DRS 2: http: //hul. harvard. edu/ois/systems/drs/enhance ments. html
- Slides: 13