Filigranes pour tous Watermarks For All A new

  • Slides: 20
Download presentation
Filigranes pour tous Watermarks For All A new project based on deep-learning technology and

Filigranes pour tous Watermarks For All A new project based on deep-learning technology and crowd-sourcing Marc H. Smith École nationale des chartes / Centre-Jean Mabillon Paris Sciences & Lettres Watermarks in digital collections 4 th International Conference Vienna, 19 -20 October 2017

 « Science des données, données de la science » IRIS – Initiative de

« Science des données, données de la science » IRIS – Initiative de recherche interdisciplinaire et stratégique École nationale des chartes Christine Bénévent, Olivier Poncet, Marc Smith École des Ponts Paris. Tech Mathieu Aubry INRIA – Institut national de recherche en informatique et en automatique Joseph Sivic IRHT – Institut de recherche et d’histoire des textes François Bougard, Bruno Bon

Repertories of watermarks: evolution and limitations From drawings to photographs From paper to digital

Repertories of watermarks: evolution and limitations From drawings to photographs From paper to digital From single/national corpora to portals and interoperability Limitations: – Identifying watermarks: image > word > image – Number of reference images: more often “similar” than identical – Closed data, from producer to user

Filigranes pour tous Identification : image to image Deep-learning technology for image comparison Initial

Filigranes pour tous Identification : image to image Deep-learning technology for image comparison Initial corpus: French watermarks > international collaboration? User interaction: image matching and database augmentation > Multiple images of (identical or variant) watermarks

Test corpus Set of homogeneous watermarks from French archives Notarial records from the Archives

Test corpus Set of homogeneous watermarks from French archives Notarial records from the Archives nationales (1650) 4 different watermarks × 61 photographs using 3 lightsheets and 3 smartphones Minimal guidelines for framing. Pages with and without writing

Test sample: four watermarks

Test sample: four watermarks

Random sample of multiple occurrences of a watermark

Random sample of multiple occurrences of a watermark

Image capture and pre-processing

Image capture and pre-processing

Image capture and pre-processing 1/6 1/6

Image capture and pre-processing 1/6 1/6

Image capture and pre-processing 1/6 1/6 300 x 300 pixels

Image capture and pre-processing 1/6 1/6 300 x 300 pixels

Deep learning Convolutional neural network: • Iteration of simple operations with multiple parameters •

Deep learning Convolutional neural network: • Iteration of simple operations with multiple parameters • Parameters are optimized on training data, producing a different result for each watermark Image Layer 1 Layer 2 … classifier

Elementary operation of a single ‘neuron’ x = input, w = parameters

Elementary operation of a single ‘neuron’ x = input, w = parameters

Image matching: first results Training set: 200 images (50 / watermark ) 100% correct

Image matching: first results Training set: 200 images (50 / watermark ) 100% correct matching Control set: 44 images (11 / watermark) 95 % correct matching (42/44) Caution: “black box” syndrome: is the matching actually based on watermarks?

Further development: the app Tools for image capture: Ruler & framing mask > scale

Further development: the app Tools for image capture: Ruler & framing mask > scale Real-time uploading and image comparison User-uploaded images and metadata added to the database

Open questions Expanding the data set: how will the software adapt? Minimum training data

Open questions Expanding the data set: how will the software adapt? Minimum training data set? (a single image? ) Fragmentary/partially visible watermarks (sub-folio quires) Capture: close-ups vs full pages — at 300 × 300 pix ! Comparing photographs and drawings? Stimulation of crowdsourcing

Research questions Quantitative measurements Watermark variants and evolution: copies, deterioration, etc. Paper history: from

Research questions Quantitative measurements Watermark variants and evolution: copies, deterioration, etc. Paper history: from production to circulation and consumption Functional distribution of formats and quality: books vs documents vs art…

marc. smith@enc-sorbonne. fr

marc. smith@enc-sorbonne. fr