Digitization Workflow Management System for Massive Digitization Projects

  • Slides: 32
Download presentation
Digitization Workflow Management System for Massive Digitization Projects The 2 nd International Conference on

Digitization Workflow Management System for Massive Digitization Projects The 2 nd International Conference on Universal Digital Library 2006 (ICUDL 2006) Mohamed Yakout mohamed. yakout@bibalex. org Noha Adly Magdy Nagi noha. adly@bibalex. org magdy. nagi@bibalex. org Bibliotheca Alexandrina November 19, 2006

Goals l l l Automate, track and manage the digitization workflow. Flexibility in defining

Goals l l l Automate, track and manage the digitization workflow. Flexibility in defining digitization workflow Phases. Support dynamic evolution and deviations with a history tracking. Flexibility integration with the LIS and Library Digital Repository. Accept external partially digitized Jobs to start in the proper Phase within the digitization workflow Simultaneous management of multiple projects with a diversity of materials (books, journals, manuscripts, audio, video, slides, … etc)

Related Work l l l Manual workflow management using several software packages (MS Excel,

Related Work l l l Manual workflow management using several software packages (MS Excel, MS Share. Point, MS Project) Simple tracking workflow system with limited capabilities Several integrated digitization activities (digital capturing, image processing, OCRing, …) in one software l l DOCWorks from CCS. Book. Restorer from i 2 s. OUPS Limitations: l l l Tightly coupled with certain tools and do not allow easily other tools to be integrated. No Resources Management (e. g. Workstations and users) Lack of projects and collections management. Manual files handling between the storage server and clients. Lack of handling workflow exceptions, dynamic evolution and deviations except through manual intervention.

System Data Model

System Data Model

System Data Model The object being digitized l l Book for Naguib Mahfouz Photos

System Data Model The object being digitized l l Book for Naguib Mahfouz Photos for an event Map for Alexandria Music sheet for Omar Khayrat

System Data Model All types of materials in the system l l l Book

System Data Model All types of materials in the system l l l Book Map Audio Manuscripts Journals Video

System Data Model A task that should be applied within the digitization process l

System Data Model A task that should be applied within the digitization process l l l Scanning OCRing Publishing Processing Encoding Zipping for archiving

System Data Model The system users with several roles l l l Digital lab

System Data Model The system users with several roles l l l Digital lab operators Shift operators Administrator

System Data Model Represents logical grouping for the Jobs l l l Nasser Alex.

System Data Model Represents logical grouping for the Jobs l l l Nasser Alex. Med AMEEL

System Data Model The computer used to perform the Phase

System Data Model The computer used to perform the Phase

System Architecture

System Architecture

System Architecture

System Architecture

System Architecture

System Architecture

System Handlers l XML Phases Definition Handler l l Pre-Phase and Post-Phase Physical section

System Handlers l XML Phases Definition Handler l l Pre-Phase and Post-Phase Physical section Database section Reflection Call <Phase Name="Book Arabic OCR"> <Pre. Phase> <Physical Mode="Un. Restricted"> <Folder Name="OTIFF" Create="false" To. Destination="false" New. Name="OTIFF" Mode="Restircted"> <File Name="Original. Files" Type="tif" Count="+" To. Destination="false" Compare=""/> </Folder>. . </Physical> </Pre. Phase> <Post. Phase> <Physical Mode="Un. Restricted"> <Folder Name="TXT" Create="false" To. Destination="true" New. Name="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" To. Destination="true" Compare=""/> <File Name="" Type="art" Count="1" To. Destination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" Display. Name="Font Family: " /> <Field Name="Lrn. Page" Display. Name="Learn Page : "/>. . </Database> <Reflection. Call Method="package. Name. do. Something" /> </Post. Phase> </Phase>

System Handlers <Phase Name="Book Arabic OCR"> <Pre. Phase> l XML Phases Definition Handler l

System Handlers <Phase Name="Book Arabic OCR"> <Pre. Phase> l XML Phases Definition Handler l l Pre-Phase and Post-Phase Physical section Database section Reflection Call <Physical Mode="Un. Restricted"> <Folder Name="OTIFF" Create="false" To. Destination="false" New. Name="OTIFF" Mode="Restircted"> <File Name="Original. Files" Type="tif" Count="+" To. Destination="false" Compare=""/> </Folder>. . </Physical> </Pre. Phase> <Post. Phase> <Physical Mode="Un. Restricted"> <Folder Name="TXT" Create="false" To. Destination="true" New. Name="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" To. Destination="true" Compare=""/> <File Name="" Type="art" Count="1" To. Destination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" Display. Name="Font Family: " /> <Field Name="Lrn. Page" Display. Name="Learn Page : "/>. . </Database> <Reflection. Call Method="package. Name. do. Something" /> </Post. Phase> </Phase>

System Handlers l XML Phases Definition Handler l l Pre-Phase and Post-Phase Physical section

System Handlers l XML Phases Definition Handler l l Pre-Phase and Post-Phase Physical section Database section Reflection Call <Phase Name="Book Arabic OCR"> <Pre. Phase> <Physical Mode="Un. Restricted"> <Folder Name="OTIFF" Create="false" To. Destination="false" New. Name="OTIFF" Mode="Restircted"> <File Name="Original. Files" Type="tif" Count="+" To. Destination="false" Compare=""/> </Folder>. . </Physical> </Pre. Phase> <Post. Phase> <Physical Mode="Un. Restricted"> <Folder Name="TXT" Create="false" To. Destination="true" New. Name="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" To. Destination="true" Compare=""/> <File Name="" Type="art" Count="1" To. Destination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" Display. Name="Font Family: " /> <Field Name="Lrn. Page" Display. Name="Learn Page : "/>. . </Database> <Reflection. Call Method="package. Name. do. Something" /> </Post. Phase> </Phase>

System Handlers l XML Phases Definition Handler l l Pre-Phase and Post-Phase Physical section

System Handlers l XML Phases Definition Handler l l Pre-Phase and Post-Phase Physical section Database section Reflection Call <Phase Name="Book Arabic OCR"> <Pre. Phase> <Physical Mode="Un. Restricted"> <Folder Name="OTIFF" Create="false" To. Destination="false" New. Name="OTIFF" Mode="Restircted"> <File Name="Original. Files" Type="tif" Count="+" To. Destination="false" Compare=""/> </Folder>. . </Physical> </Pre. Phase> <Post. Phase> <Physical Mode="Un. Restricted"> <Folder Name="TXT" Create="false" To. Destination="true" New. Name="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" To. Destination="true" Compare=""/> <File Name="" Type="art" Count="1" To. Destination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" Display. Name="Font Family: " /> <Field Name="Lrn. Page" Display. Name="Learn Page : "/>. . </Database> <Reflection. Call Method="package. Name. do. Something" /> </Post. Phase> </Phase>

System Architecture

System Architecture

System Architecture

System Architecture

System Architecture

System Architecture

System Architecture

System Architecture

System Modules l Check-In l l Plug-in based for integration. Creates the Job in

System Modules l Check-In l l Plug-in based for integration. Creates the Job in the system Assign the Job to any Phase Check-Out l l Java Reflection Call section of the XML Phases Definition Ingest the Job’s digital objects into the repository

System Architecture

System Architecture

System Modules l Phases Manager l Request a new Job l Download the Jobs

System Modules l Phases Manager l Request a new Job l Download the Jobs folders and files l Submit the Job back to the system to continue other Phases l Reject a Job and recommend another Phase in addition to specifying reasons. l Redirect a Job from the default Phase Sequence l Provide information on the files level to help solving problems

System Modules (Contd) l Reporting l l l Archiving l l Workflow Tracking Pending

System Modules (Contd) l Reporting l l l Archiving l l Workflow Tracking Pending Items Late Jobs Operators rates Build Customized Report On different Medias with different size and on online storage Administration

BA Digitization Workflow

BA Digitization Workflow

Quality Assurance l Supported on two different stages l l Maintain QA information on

Quality Assurance l Supported on two different stages l l Maintain QA information on the files levels while moving from a Phase to another. A QA Phase is defined in the Digitization Phase Sequence as the last Phase before the Archiving

Achieving Flexibility Using DWMS l l The defined Phase Sequence for a Job Type

Achieving Flexibility Using DWMS l l The defined Phase Sequence for a Job Type is a guide, rather than a prescription. The list of Phases can or can not be in the Phase Sequence. The operator can assign the Job to any of all of these Phases. Jobs can be Forwarded dynamically to another Phase in the Phase Sequence. Changes in the Phase Sequence affects the current and new Jobs in the system, leading to natural process evolution

Job Life Cycle

Job Life Cycle

Future Work l l l Check-out plug-in for Fedora. . Check-in plug-ins will be

Future Work l l l Check-out plug-in for Fedora. . Check-in plug-ins will be implemented to support various metadata standards formats MODS, DC, VAR, etc. Enhance the software interface with graphical tools to help design and follow the digitization process.

Thank You mohamed. yakout@bibalex. org

Thank You mohamed. yakout@bibalex. org