The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation Bastien Moysset, Th´eodore Bluche, Maxime Knibbe Mohamed Faouzi Benzeghiba, Ronaldo Messina, J´erˆome Louradour Christopher Kermorvant www.a2ialab.com www.a2ialab.com 1/17 The Maurdor database 8,774 pages fully annotated : layout, images, graphics, languages, textual content, logical order. www.a2ialab.com 2/17 The Maurdor challenge Text recognition given the position, type (printed/handwritten) and language HW-AR! HW-EN!13 146 11 745 HW-FR! 24 515 PRN-FR! 79 248 PRN-AR! 28 595 PRN-EN! 35 028 Text zone statistics per language and type www.a2ialab.com 3/17 The A2iA system for this challenge Same system for Printed/Handwritten, French/English/Arabic : Test Images Image pre-processing Text line segmentation Text detection Train2+Dev2 text Language models IRSTLM Text line hypothesis decoding graph Lexicon Train Images Optical Model RNN training on text line images RNN character recognition hypothesis character lattice Language Model decoding with language model KALDI Paragraph transcription Best transcription selection Postprocessing www.a2ialab.com 4/17 Decision PRN AR 90 58 23 HW FR 110 72 22 HW EN 137 59 35 125 58 30 The A2iA system for this challenge HW AR Performance improvement 140 Word Error Rate (%) 120 12/2012 04/2013 12/2013 100 80 60 40 20 0 PRN FR PRN EN PRN AR HW FR HW EN Error rate divided by 3 to 5 www.a2ialab.com 5/17 HW AR Keys of success Consider several line detection hypothesis Text detection Optical Model Model word and character sequences Language Model Train using curriculum www.a2ialab.com 6/17 Text line detection Difficult to find the text line, even if the text bloc is given www.a2ialab.com 7/17 Text line detection Difficult to find the text line, even if the text bloc is given Generate hypothesis : use 2 different line detectors try with and without deskew try line compression and stretching normalize or not the line height re-order the text line try not to detect the line Recognize all the hypothesis, keep the best www.a2ialab.com 7/17 HW-EN 43 35 HW-AR 33 30 Text line detection Results without alternatives with alternatives 50 43 Word Error Rate (%) 40 35 30 20 0 26 23 23 20 10 11 PRN-FR 33 30 28 22 13 PRN-EN PRN-AR HW-FR HW-EN HW-AR up to 50% word error rate reduction www.a2ialab.com 8/17 Optical model : LSTM neural network The RNN predicts hypothesis of characters : Latin letters with accent, upper and lower cases 120 shapes of Arabic forms punctuation, inter-word space and not a character www.a2ialab.com 9/17 Optical model : LSTM neural network Training Curriculum training : start training with with easy samples and gradually increase the difficulty simple database at word level (Rimes, IAM, OpenHaRT) simple database at line level difficult database at line level (Maurdor) www.a2ialab.com 10/17 Optical model Improvements RNN training set Single lines First step of automatic location Second step of automatic location Total number of lines (without location) # of training lines 7310 10570 Word error rate 54.7% 43.8% 10925 35.2% 11608 - 40% word error rate reduction for handwritten English www.a2ialab.com 11/17 Language models Prepare the data : explicitly model the inter-word space separate punctuation signs and digit sequences (reduce the vocabulary) clean the textual data from unknown characters www.a2ialab.com 12/17 Language models Prepare the data : explicitly model the inter-word space separate punctuation signs and digit sequences (reduce the vocabulary) clean the textual data from unknown characters Train the language model : Out-of-vocabulary words modelling with character n-gram In-vocabulary words sequence modelling with word n-gram for Arabic, use part-of-Arabic word decomposition www.a2ialab.com 12/17 Language models Language models evaluation : Language English French Arabic Type PPL %OOV PRN HWR PRN HWR PRN HWR 111 66 48 73 146 134 3.9 4.5 3.6 3.9 11.2 8.4 www.a2ialab.com 3-gram 37.1 49.7 54.7 48.6 31.8 29.5 %Hit-Ratio 2-gram 41.8 35.1 31.9 36.6 33.9 44.9 13/17 1-gram 21.1 15.2 13.4 14.7 34.3 25.6 PRN AR 23 HW FR 37 22 HW EN 48 35 HW AR 44 30 Maurdor evaluation Maurdor second evaluation results : 50 48 Word Error Rate on Test set (%) second best A2iA 44 40 37 35 30 20 30 10 0 23 21 11 *PRN FR www.a2ialab.com 22 20 13 *PRN EN PRN AR HW FR 14/17 HW EN HW AR Now ? Layout Analysis is the weak link Need for document understanding www.a2ialab.com 15/17 The future of document processing ? Initial Maurdor workflow www.a2ialab.com 16/17 The future of document processing ? Initial Maurdor workflow www.a2ialab.com Non linear workflow 16/17 Thank you ! www.a2ialab.com www.a2ialab.com 17/17
© Copyright 2024