DOST Dataset Downtown Osaka Scene Text Dataset Masakazu

Agenda 1. 2. 3. 4. 5. 6. Introduction Unique Features of DOST Dataset Construction

“Scene Text in the Wild” Text in Real Environment • We mean • Text

Unique Features of DOST Dataset 1. Aim: evaluation of methods in the real environment

Unique Features of DOST Dataset 3. Video dataset captured with omnidirectional camera • ICDAR

Unique Features of DOST Dataset 5. Large scale • Contains largest number of word

No. of Images Contained Image DB in Existing Datasets Video DB ICDAR 2003 ICDAR

No. of Word Images Contained in Existing Datasets Image DB Video DB 0 100,

No. of Word Sequences in Existing Video Datasets 0 ICDAR 2013 Chal. 3 5,

Unique Features of DOST Dataset 6. Contains Japanese characters • On the other hand,

No. of Ground Truthed Characters per Category 0 200, 000 400, 000 Alphabet 800,

No. of Ground Truthed Characters per Category 0 Alphabet Kanji Katakana Hiragana Digit Symbol

Construction of DOST Dataset 1. Image capture Completed in 2012 • Point Grey Research

Construction of DOST Dataset 2. Manual ground truthing We spent more than 1, 500

We will improve them Known Issues • Ground truths are not perfect • Bounding

Evaluation: Methods • Text detection • Open. CV API • Matsuda’s method based on

Evaluation: Datasets • Image datasets • • • ICDAR 2003 ICDAR 2013 Chal. 2

Text Detection by Open. CV API F-measure [%] Image DB Video DB ICDAR 2003

End-to-end Text Recognition by Google Vision API F-measure [%] Image DB Video DB ICDAR

Conclusion • DOST dataset is presented • Has unique features • More challenging than

Slides: 29

Download presentation

DOST Dataset Downtown Osaka Scene Text Dataset Masakazu Iwamura, Takahiro Matsuda Naoyuki Morimoto, Hitomi Sato Yuki Ikeda and Koichi Kise Osaka Prefecture University

Agenda 1. 2. 3. 4. 5. 6. Introduction Unique Features of DOST Dataset Construction of DOST Dataset Known Issues Evaluation Conclusion

AB an B M g ( YY is 20 hr 1 W a (2 1) N an 01 ov g 2 ik (2 ) ov 01 a 2 G (20 ) Bi oe 1 ss l ( 2) ac 20 Al co 13 sh (2 ) Al ari 01 m f ( 3) az 20 an 14 ) Ja Ya (20 de o 14 rb (2 ) er 01 g 4) ( 2 R od Su 01 4) rg ( ue 20 z 1 G -S 4) Ja ord err de o an Ja rbe (20 o (2 de rg 15 01 rb (2 ) 5) er 01 g 5 (2 ) Sh 01 i ( 5) Po S 20 zn hi 15 an (2 ) sk 01 i ( 6) 20 16 ) W Recent Improvement of Scene Text Recognition IIIT 5 K 50 SVT None 100 90 80 70 60 50 40 30 20 10 0 IIIT 5 K 1 k ICDAR 2003 50 IIIT 5 K None ICDAR 2003 Full SVT 50 ICDAR 2003 50 k Recent results are 80+% or even 90+% This does not mean these methods can read a wide variety of text in the real environment

“Scene Text in the Wild” Text in Real Environment • We mean • Text captured without intention (as much as possible) • Text not screened so as to be easily read (with regard to resolution, capture angle and so on)

We present DOST Dataset

Agenda 1. 2. 3. 4. 5. 6. Introduction Unique Features of DOST Dataset Construction of DOST Dataset Known Issues Evaluation Conclusion

Unique Features of DOST Dataset 1. Aim: evaluation of methods in the real environment • Not aiming at training classifiers like MJSynth and Synth. Text datasets 2. Completely not intentionally captured • The most similar is ICDAR 2015 Challenge 4 “incidental scene text” dataset captured with Google Glass • DOST is even free from face direction

Unique Features of DOST Dataset 3. Video dataset captured with omnidirectional camera • ICDAR 2013 & 2015 Challenge 3: single direction • You. Tube Video (YTV) Dataset: You. Tube Videos 4. Contains multiple images of single word

Unique Features of DOST Dataset 5. Large scale • Contains largest number of word Images • Excluding synthesized datasets (MJSynth and Synth. Text) • Excluding dataset containing numbers only (Google Streetview House Number dataset)

No. of Images Contained Image DB in Existing Datasets Video DB ICDAR 2003 ICDAR 2013 Chal. 2 ICDAR 2015 Chal. 4 NEOCR KAIST SVT IIIT 5 K COCO-Text ICDAR 2013 Chal. 3 ICDAR 2015 Chal. 3 YVT DOST 0 10, 00020, 00030, 00040, 00050, 00060, 00070, 000 509 462 1, 670 659 3, 000 349 5, 000 63, 686 15, 277 27, 824 11, 791 32, 147 Almost double

No. of Word Images Contained in Existing Datasets Image DB Video DB 0 100, 000 200, 000 300, 000 400, 000 500, 000 600, 000 700, 000 800, 000 ICDAR 2003 2, 268 ICDAR 2013 Chal. 2 2, 524 Images were captured ICDAR 2015 Chal. 4 17, 548 in shopping streets NEOCR 5, 238 where a lot of texts exist KAIST 3, 000 SVT 904 IIIT 5 K 5, 000 x 4. 6 COCO-Text 173, 589 ICDAR 2013 Chal. 3 93, 598 ICDAR 2015 Chal. 3 125, 141 YVT 16, 620 797, 919 DOST

No. of Word Sequences in Existing Video Datasets 0 ICDAR 2013 Chal. 3 5, 000 10, 000 15, 000 20, 000 25, 000 1, 962 x 6. 3 ICDAR 2015 Chal. 3 YVT DOST 3, 562 245 22, 398

Unique Features of DOST Dataset 6. Contains Japanese characters • On the other hand, a lot of non-Japanese words are contained

No. of Ground Truthed Characters per Category 0 200, 000 400, 000 Alphabet 800, 000 837, 489 Kanji 723, 805 Katakana 696, 697 Hiragana 355, 158 Digit Symbol 600, 000 324, 742 22, 802 Japanese characters

No. of Ground Truthed Characters per Category 0 Alphabet Kanji Katakana Hiragana Digit Symbol 200, 000 400, 000 600, 000 日本店円大中四業房会北月千元年間販売酒家取台止あいうえおかきくけこさしすせそたちつてと 800, 000 837, 489 723, 805 696, 697 アイウエオカキクケコサシスセソタ 355, 158 チツテト 324, 742 ～！＃＆（）＊，-．／： 22, 802 ？×’↑→★、。々〇」・ Japanese characters

Agenda 1. 2. 3. 4. 5. 6. Introduction Unique Features of DOST Dataset Construction of DOST Dataset Known Issues Evaluation Conclusion

Construction of DOST Dataset 1. Image capture Completed in 2012 • Point Grey Research Lady. Bug 3 • 1, 200 x 1, 600 pixels, 6. 5 fps

Construction of DOST Dataset 2. Manual ground truthing We spent more than 1, 500 man hours • Most of GT policies are shared with ICDAR 2013 & 2015 Challenge 3 datasets • GT software was developed • Reuse GT information in neighboring frames 3. Privacy preservation • Faces were blurred

Agenda 1. 2. 3. 4. 5. 6. Introduction Unique Features of DOST Dataset Construction of DOST Dataset Known Issues Evaluation Conclusion

We will improve them Known Issues • Ground truths are not perfect • Bounding boxes of text regions are not tight enough • Ground trothing “Don’t care” is not comprehensive “Don’t care” is marked in illegible regions • Some word sequences are broken • Relationship between other cameras • Word images in other cameras are not followed

Agenda 1. 2. 3. 4. 5. 6. Introduction Unique Features of DOST Dataset Construction of DOST Dataset Known Issues Evaluation Conclusion

Evaluation: Methods • Text detection • Open. CV API • Matsuda’s method based on NAT method • End-to-end text recognition • Google Vision API

Evaluation: Datasets • Image datasets • • • ICDAR 2003 ICDAR 2013 Chal. 2 ICDAR 2015 Chal. 4 SVT COCO-Text • Video datasets • • ICDAR 2015 Chal. 3 YVT DOST Latin Subset of DOST which contain words consisting of alphabets and digits Data were sampled

Text Detection by Open. CV API F-measure [%] Image DB Video DB ICDAR 2003 ICDAR 2013 Chal. 2 ICDAR 2015 Chal. 4 SVT COCO-Text ICDAR 2015 Chal. 3 YVT DOST Latin 0 5 10 15 20 25 30 18. 7 6. 1 13 19 11. 9 8. 5 2. 4 1. 2

End-to-end Text Recognition by Google Vision API F-measure [%] Image DB Video DB ICDAR 2003 ICDAR 2013 Chal. 2 ICDAR 2015 Chal. 4 SVT COCO-Text ICDAR 2015 Chal. 3 YVT DOST Latin 0 20 40 60 80 81. 8 71. 3 48. 5 24. 2 17. 1 44. 1 37. 7 2. 7 11. 2 Recognized in Japanese mode 100

Agenda 1. 2. 3. 4. 5. 6. Introduction Unique Features of DOST Dataset Construction of DOST Dataset Known Issues Evaluation Conclusion

Conclusion • DOST dataset is presented • Has unique features • More challenging than existing datasets

Thank you for your attention!!