A multi-stage HMM-based text recognition system for handwritten Arabic is presented. This system employs a novel way of representing Arabic characters by separating the core shapes from the diacritics and then representing these core shapes by smaller units which we term as sub-core shapes. This results in huge reductions in the number of models that need to be trained for the text recognition task. Further, contextual HMM modeling utilizing these sub-core shapes is presented which demonstrates that using sub-core shapes as models improve the contextual HMM system in comparison with a contextual HMM system employing the standard Arabic character shapes as models, and it leads to significantly compact recognizer at the same time. Furthermore, multi-stream contextual sub-core-shape HMMs are presented where the features computed from a sliding window form one stream and its horizontal derivative features are the second stream with each stream having different weights. The system is evaluated on two publicly available databases for different text recognition tasks including conditions where little training data are available. The presented system outperforms the standard character-shape system on all the text recognition tasks on both the databases.
Short bio of presenter:
Dr. Irfan Ahmad is an Assistant Professor in the Department of Information & Computer Science at KFUPM. He received his PhD degree in Computer Science from TU Dortmund, Germany in 2017. He received his MS in Computer Science from KFUPM. He is actively involved in research in the area of Pattern Recognition and Machine Learning. Dr. Ahmad has worked on several research projects funded by KACST and KFUPM. Dr. Ahmad has published several articles in high quality journals and international conferences. He has also published a book chapter and has 3 US Patents under his name. He regularly reviews articles for well-known journals and conferences in his area of research in addition to being a program committee member for some reputed international conferences.