Decrease size
Reset to Default
Increase size

Development of Speech Interface for Form-filling application in Five Indian languages

Primary Information

Domain

Information & Communication Technology

Project No.

5283

Sanction and Project Initiation

Sanction No: F. No.:3-18/2015-T.S.-I (Vol.III)

Sanction Date: 22/03/2017

Project Initiation date: 20/04/2017

Project Duration: 24

Partner Ministry/Agency/Industry

MHRD & MeitY

 

Role of partner:The project was sanctioned under ICT domain with financial support from MHRD and MeitY agencies. The role of the agencies is to provide financial support, monitoring and reviewing the progress for successful implementation of the objectives of the project proposal.

 

Support from partner:The MHRD supports 50% of the budget. The Meity has agreed to support 50% of the budget, but MeitY is yet to release the rest of the 50% budget.

Principal Investigator

PI Image

Dr. K. Sreenivasa Rao
Indian Institute of Technology Kharagpur

Host Institute

Co-PIs

PI Image

Dr. S. R. M. Prasanna
Indian Institute of Technology Guwahati

PI Image

Dr. Priyankoo Sarmah
Indian Institute of Technology Guwahati

PI Image

Dr. Shrivastava Abhishek
Indian Institute of Technology Guwahati

PI Image

Dr. M. Sabarimalai Manikandan
Indian Institute of Technology Bhubaneswar

PI Image

Dr. K. Sri Rama Murty
Indian Institute of Technology Hyderabad

PI Image

Dr. Pabitra Mitra
Indian Institute of Technology Kharagpur

 

Scope and Objectives

The objective of this project is to develop a modularized framework for development of speech interface for form-filling application. In this project, we implement and deploy the SiFA systems in five different Indian languages, viz., Assamese, Bengali, Hindi, Odia and Telugu. Such a framework can easily be extended to other Indian languages as well. The project aims at providing an integrated framework for developing IVRS, ASR and TTS systems across the languages by providing appropriate language models and dictionaries specific to various forms and domains. Finally, we demonstrate the developed system with some selected e-governance application forms. Specific scopes of this project include the following: - Voice-Enabled Application Form Filling Kiosk - National Train Ticket Reservation Kiosk - Service-Oriented Appointment Request Kiosk - Voice Enabled Education Admission Form Portal - Pincode Based Information Search and Delivery Kiosk or App

Deliverables

Major outcomes of this project are summarized below: The proposed field specific and/or application-specific ASR systems can enable in automating the spoken-based filling various fields of various application forms such as ID card issue, admission, service-request, and service-feedback forms which are commonly submitted by native and non-native speakers daily in wide variety of service sectors for the Aadhar Card, PAN Card, Gas Connection, Driving License, National Train Ticket Reservation, Education, Healthcare, Tourism, Banking, and other public and private service sectors. The proposed ASR and TTS technologies have great potential in developing the following indigenous voice enabled products in different regional languages: Voice Enabled Train Ticket Reservation Kiosk (Stand-Alone Kiosk) This kiosk is developed to generate voice-based train ticket reservation slip with the following essential fields such as name of the source and destination stations or/and station codes, reservation category, class of journey, date of journey, passenger details (number of passengers, name of each of the passengers, age, gender, berth), mobile numbers, identity card numbers), reservation form preview, online submission, print, and OTP verification. Voice Enabled Query-Oriented Feedback Submission Kiosk This kiosk is developed to collect the feedback (Yes/No) from customers/ consumers/ individual for different queries related to the services provided to the people. It enables remote monitoring of quality of services. The ASR system recognizes the spoken words and then prints Yes/No and No Answer or the tick and X marks Voice Assisted Text Reading System in Five Languages Smart voice assistant tab is developed for reading the printed text or recognized text in Five Languages. The tab include text processing tools and TTS system. Pincode Based Information Search and Delivery App. This app can be used for searching specific information based on the spoken Pincode, such as nearby police station, hospital, school, tourist places and so on. The app can recognize the pincode and the fields like police station, schools, and deliver the information in the local languages. Quality-Aware ASR Engine for Outdoor Voice-Enabled Applications: The quality-aware ASR engine can reduce the energy consumption of the portable ASR device and bandwidth utilization costs and also improve ASR performance under noisy recordings.

 

Project image
Project image
Project image
Project image
Project image
Project image
Project image
Project image

Videos

Scientific Output

Minimum 5 hrs of annotated speech corpus has been collected by each of the partner institute in their respective local languages (Telugu collected by IIT Hyderabad, Odia collected by IIT Bhubaneswar, Bengali collected by IIT Kharagpur and Assamese collected by IIT Guwahati). Using the collected annotated speech, acoustic models were built using GMM+HMM (mono-phone, tri-1, tri-2 and tri-3) and DNNs.

 

Project image
Project image
Project image
Project image

Results and outcome till date

The outcomes of this project are summarized below: Online interface for speech recording over mobile phones Speech Corpora in Assamese, Hindi, English and Telugu ASR modules for Hindi, English and Telugu First versions of TTS systems for Hindi, English, Telugu, Odia, Assamese and Bengali ASR and TTS systems are integrated for English

 

Project image
Project image

Societal benefit and impact anticipated

The aim of this project is to design and develop field specific and/or application-specific ASR and TTS systems for five Indian Languages in addition with the English language. This project mainly focus on the implementation of ASR and TTS systems for recognizing most important fields including Name, Gender, Age, Date of Birth, Pincode, Place of Birth, Telephone Number, Marital Status, Address, Nationality, E-mail ID, Education, ID Card, Reservation Category, Caste, Religion, Train Station Code, Train Station Name, Reservation Quota, Class of Journey, Berth, and so on. In addition with these fields, the following feedback fields such as Yes/No, and Grading/Scoring (Excellent, Very Good, Good, Bad, Very Bad) are considered for developing spoken based feedback form filling system. The societal benefits of the outcomes of this project are summarized below: The proposed field specific and/or application-specific ASR systems can enable in automating the spoken-based filling various fields of various application forms such as ID card issue, admission, service-request, and service-feedback forms which are commonly submitted by native and non-native speakers daily in wide variety of service sectors for the Aadhar Card, PAN Card, Gas Connection, Driving License, National Train Ticket Reservation, Education, Healthcare, Tourism, Banking, and other public and private service sectors. The proposed ASR and TTS technologies have great potential in developing the following indigenous voice enabled products in different regional languages: Voice-Enabled Application Form Filling Kiosk National Train Ticket Reservation Kiosk Service-Oriented Appointment Request Kiosk Voice Enabled Education Admission Form Portal Pincode Based Information Search and Delivery Kiosk or App Voice Enabled Query-Oriented Feedback Submission Kiosk: It can be used for remotely monitoring and managing the quality of services provided to the citizens who speak in their regional language(s) and/or Hindi. Multilingual Information Delivery System: It can be used for delivering the multilingual information related to the Healthcare, Agriculture, Tourism, and Weather in the regional language(s). Smart Voice Assistant: The multilingual TTS systems can used for developing voice assisted text reading system for helping visually impaired persons to read any printed text in vocal form. Online Speech Recording (OSR) Interface: The OSR interface was developed for collecting large-scale speech corpora remotely for building speech enabled technologies such as automatic speech recognizers and synthesizers. The snapshot of the OSR interface is provided in the progress report. In the last review meeting, the voice enabled Aadhaar card enrollment form filling interface was demonstrated based on the outcomes of our proposed ASR and TTS technologies. The recognition performance of the baseline ASR technology was evaluated with respect to the publicly available ASR systems for a same set of fields. It was noticed that recognition accuracy for the name field is poor. The recognition performance is degraded under noisy recording conditions. By considering the intellectual proprietary rights of publicly available ASR and TTS systems and the recognition process implemented in-the-cloud system which highly demands reliable communication connectivity, our project proposal aims to develop indigenous lightweight application-specific ASR and TTS tools for stand-alone kiosk, portable device, and web-server based voice enabled technologies.

Next steps

In the second phase, we mainly focus on the development of the following domain-specific voice-enabled technological products: Voice Enabled Train Ticket Reservation Kiosk (Stand-Alone Kiosk) This kiosk is developed to generate voice-based train ticket reservation slip with the following essential fields such as name of the source and destination stations or/and station codes, reservation category, class of journey, date of journey, passenger details (number of passengers, name of each of the passengers, age, gender, berth), mobile numbers, identity card numbers), reservation form preview, online submission, print, and OTP verification. Voice-Enabled Application Form Filling Kiosk Voice Enabled Query-Oriented Feedback Submission Kiosk This kiosk is developed to collect the feedbacks (Yes/No) from customers/ consumers/ individual for different queries related to the services provided to the people. It enables remote monitoring of quality of services. The ASR system recognizes the spoken words and then prints Yes/No and No Answer or the tick and X marks Voice Assisted Text Reading System in Five Languages Smart voice assistant tab is developed for reading the printed text or recognized text in Five Languages. The tab include text processing tools and TTS system. Pincode Based Information Search and Delivery App This app can be used for searching specific information based on the spoken Pincode, such as nearby police station, hospital, school, tourist places and so on. The app can recognize the pincode and the fields like police station, schools, and deliver the information in the local languages. Quality-Aware ASR Engine for Outdoor Voice-Enabled Applications: The quality-aware ASR engine can reduce the energy consumption of the portable ASR device and bandwidth utilization costs and also improve ASR performance under noisy recordings.

Publications and reports

1. Saurabhchand Bhati, Hermann Kamper and K Sri Rama Murty, Phoneme-based embedded K-Means for unsupervised term discovery, in Proc. ICASSP-2018, Calgary, Canada, Apr. 2018. 2. Shekhar Nayak, Saurabhchand Bhati and K Sri Rama Murty, An Investigation into Instantaneous Frequency Estimation Methods for Improved Speech Recognition Features, in Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), Nov. 14-16, 2017, Montreal, Canada. 3. Manjunath K E, K. Sreenivasa Rao, Dinesh Babu Jayagopi and V Ramasubramanian, Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion, in INTERSPEECH (ISCA), 2018, Hyderabad, India, Sep. 2018. 4. M. Kiran Reddy and K. Sreenivasa Rao, Inverse filter based excitation model for HMM-based speech synthesis system, IET Signal Processing, Vol. 12, pp. 544-548, 2018. 5. M. Kiran Reddy and K. Sreenivasa Rao, Robust pitch extraction method for HMM- based speech synthesis system, IEEE Signal Processing Letters, Vol. 24 (8), pp. 1133- 1137, 2017. 6. Kishore Kumar Ravi, Lokendra Birla and K Sreenivasa Rao, A Robust Unsupervised Pattern Discovery and Clustering of Speech Signals, Pattern Recognition Letters (Elsevier), 2018, (Accepted).

Patents

The following patent documents under preparation: ***Method and System for Speech Quality Aware Automatic Speech Recognition System for improving recognition accuracy and battery lifetime of Automatic Speech Recognition devices *** Interactive Online Speech Recording Apparatus for remotely collecting large-scale speech corpora for building speech enabled technologies *** Method and System for Lightweight Form Filling Application and Query-Based Feedback Form Interfaces ***Method and System for Interactive Single Icon Based Voice Enabled Application-Form and Query-Feedback Design Studio ***Method and Apparatus for Interactive User-Defined Application Specific Automatic Speech Recognition System

Scholars and Project Staff

1 Abhash Deka Assistant Project Engineer 1 yr IIT Guwahati 2 Shruthi B.S. Assistant Project Engineer 1 yr IIT Guwahati 3 Priya Dharshini G Senior scientific officer 2 yrs IIT KGP 4 Sudhakar P Senior scientific officer 2 yrs IIT KGP 5 Madhu Keerthana Senior scientific officer 2 yrs IIT KGP 6 Aravinda Reddy PN Senior scientific officer 2 yrs IIT KGP 7 Palli Mishra Project Staff 1 yr IIT BBS 8 Lipsa Routray Project Staff 1 yr IIT BBS 9 Devi Moonchandh Project Staff 1 yr IIT BBS 10 Saurabhchand Bhati Senior Research Fellow 7 Months IITH 11 A Shiv Ganesh Sernior Research Fellow 10 Months IITH 12 V.Venkatesh Intern 9 Months IITH 13 C.Sivakumar Intern 9 Months IITH 14 S Sreekanth Intern 12 Months IITH 15 G Ramesh Intern 12 Months IITH 16 G Naveen Intern 12 Months IITH 17 R Gowriprasad Intern 10 Months IITH

Challenges faced

Some of the difficulties are summarized below: The word error rate (WER) of the automatic speech recognition system is higher for the name entity. We are currently focusing on the improvement of named entity recognition accuracy. The performance of the ASR system needs to be further improved for noisy speech recordings encountered under outdoor ASR environments.

Other information

Non-disbursal of 50% of the budget is the primary hurdle of the project. Due to the uncertainty arising to the paucity of funds, the PI and the co-PIs were cautious in spending the already disbursed funds, resulting in purchase of equipment of with half the budget in some of the institutions.

Financial Information

  • Total sanction: Rs. 37680000

  • Amount received: Rs. 18840000

  • Amount utilised for Equipment: Rs. 2967329

  • Amount utilised for Manpower: Rs. 3574006

  • Amount utilised for Consumables: Rs. 187473

  • Amount utilised for Contingency: Rs. 454252

  • Amount utilised for Travel: Rs. 649173

  • Amount utilised for Other Expenses: 1122318

  • Amount utilised for Overheads: Rs. 3138776

Equipment and facilities

 

GPU-based servers and 3 low-end GPU-based systems for real-time implementation of ASR and TTS systems GPU based Workstation with 128 GB RAM