Proposal Number: 5326
Domain: Information & Communication Technology
Theme(s): Multimodal, Multilingual and Cross-lingual Interfaces
Supporting Central Government Agency:
Budget (Rs. Lakhs): 400.00
Principal Investigator: Prof. Prabir Kumar Biswas
Principal Investigator Institute: Dept. of E & ECE, IIT Kharagpur
Co-Investigators:
Prof. Jayanta Mukhopadhyay ,Dept. of CSE, IIT Kharagpur
Prof. Santanu Chaudhury ,CEERI Pilani
Prof. Bhabotosh Chanda ,ISI Kolkata
Prof. Shamik Sural ,Dept. of CSE, IIT Kharagpur
Dr. C. V. Jawahar ,IIIT Hyderabad
Prof. P. P. Das ,Dept. of CSE, IIT Kharagpur
Prof. Gaurav Harit ,IIT Jodhpur
Domain: Information & Communication Technology
Project Title
Information Access from Document Images of Indian languages
Web Abstract
Development content aware image processing algorithms for robust and efficient recognition and retrieval from Indian language document images is proposed. Our image processing algorithms aim at improving the quality of document images by removing the noise and low resolution artifacts by adopting content aware shape-based morphological filters. A set of recognizers will be built using state of the art machine learning techniques such as deep learning for handwritten, typewritten and low resolution document images where the existing technologies are insufficient. For hard and noisy handwritten documents, we propose holistic keyword spotting techniques to reduce search space and complement the recognition based approaches. We will also build and demonstrate information access and retrieval schemes over a joint space of image features and noisy text, so as to enable a set of immediate practical applications. The methods will be validated on two different focussed collections during the project.