Proposal Number: 5326

Domain: Information & Communication Technology

Theme(s): Multimodal, Multilingual and Cross-lingual Interfaces

Supporting Central Government Agency:

Budget (Rs. Lakhs): 400.00

Principal Investigator: Prof. Prabir Kumar Biswas

Principal Investigator Institute: Dept. of E & ECE, IIT Kharagpur

Prof. Jayanta Mukhopadhyay ,Dept. of CSE, IIT Kharagpur
Prof. Santanu Chaudhury ,CEERI Pilani
Prof. Bhabotosh Chanda ,ISI Kolkata
Prof. Shamik Sural ,Dept. of CSE, IIT Kharagpur
Dr. C. V. Jawahar ,IIIT Hyderabad
Prof. P. P. Das ,Dept. of CSE, IIT Kharagpur
Prof. Gaurav Harit ,IIT Jodhpur

Project Title

Information Access from Document Images of Indian languages

Web Abstract

Development content aware image processing algorithms for robust and efficient recognition and retrieval from Indian language document images is proposed. Our image processing algorithms aim at improving the quality of document images by removing the noise and low resolution artifacts by adopting content aware shape-based morphological filters. A set of recognizers will be built using state of the art machine learning techniques such as deep learning for handwritten, typewritten and low resolution document images where the existing technologies are insufficient. For hard and noisy handwritten documents, we propose holistic keyword spotting techniques to reduce search space and complement the recognition based approaches. We will also build and demonstrate information access and retrieval schemes over a joint space of image features and noisy text, so as to enable a set of immediate practical applications. The methods will be validated on two different focussed collections during the project.