Decrease size
Reset to Default
Increase size

Identification of covert Geo-locations in a scene, using a Hyper-Classifier based Intelligent Robotic Visual Perception system

Primary Information


Security & Defence

Project No.


Sanction and Project Initiation

Sanction No: 3-18/2015-T.S-I(Vol.IV)

Sanction Date: 17/05/2017

Project Initiation date: 22/08/2017

Project Duration: 36

Partner Ministry/Agency/Industry



Role of partner:Advise on the implementation of: (i) Programming and use of mobile robots, for online execution of programs developed on desktop platforms, and capture scene data and transfer to onboard GPU/CPU while on the move (ii) Scene Ontological presentation. (iii) Help/directions to be given to identify suitable targets for data capture and identify potential user agencies.


Support from partner:Two Technical meetings and discussion held between PI and Partner or collaborator : i) one recently on site at IIT Madras, ii) while the other at the onset (last year) at CAIR (DRDO), Bangalore; Next meeting being scheduled within the next 2 months

Principal Investigator

PI Image

Prof. Sukhendu Das
Indian Insitute of Technology, Madras

Host Institute


Scope and Objectives

The objectives involves designing software modules for: i. Scene segmentation, object saliency detection, 3-D depth maps, reconstruction and recognition of object; and Scene Ontology. ii. Latent SVM and SOM based Hyper-classifier to learn and identify spatial locations for concealment, by fusing information/cues; Scene Ontological descriptors will be developed for this purpose. iii. A faster R-CNN based Deep Learning model trained (end-to-end) directly from labeled image samples for detection of covert geo locations. iv. Deployment on a mobile robot for verification of the prototype design. These objectives aim to impart intelligent visual perception to the robotic-eye for visual exploration of hazardous/lethal places.


Deliverables (projected output, as in main proposal) : (i) Software prototypes which analyzes multiple video/image frames for identification of potential scene locations for concealment. Software will have modules for stitching, segmentation, object detection/recognition, depth map from video. (ii) Scene Ontological Extraction and Representation (with support from CAIR, DRDO). (iii) Algorithm designed to train and test a Hyper-classifier for fusing heterogeneous (blobs, depth maps, object categories, scene ontologies) inputs as cues, for obtaining the solution to deliver predictions of geo-locations for concealment. (iv) Annotated Video and Image datasets will be developed for: (i) scene ontology creation; as well as (ii) training a Hyper-classifier with ground truth labels of obscured target geo-locations for concealment. The category of object classes and usage scenarios to be identified upfront. (v) Deep learning models based on Caffe and YOLO will be used to perform end-to-end training for a modified deep learning based faster R-CNN architecture. (vi) Software modules developed at IIT Madras, to be deployed on a programmable mobile robot with sensors (RGBD and IR) - task to be executed by technical support and collaboration with CAIR (DRDO), Bangalore. This will aid visual navigation capability and detection of covert geo-locations for a robot. Support for development of PC-based solutions on Microsoft Windows platforms, as well as Matlab.


Project image

Scientific Output

Intermediate stages of development, at the initial phase consists of: (i) Panorama from Video (our own method); (ii) Salient object detection (T-PAMI' 17) from image/panorama; (iii) Simple object detection using SSD (CVPR '17) algorithm; (iv) Segmentation (DeepLab-ResNet in TensorFlow) of indoor scenes into components.


Project image
Project image
Project image

Results and outcome till date

Intermediate stages of development, currently consists of : (i) Generating Panorama from Video (our own method); (ii) Improved results of Salient object detection (T-PAMI' 17) from image/panorama scenes; (iii) Fine-tuned Object detection using Yolo (CVPR '18) algorithm; (iv) Semantic Segmentation (Lightweight DeepLabV3+ - ECCV'18) of indoor scenes into components; (v) Novel Segmentation of Night or low-light scenes (our method - ICIP '19); (vi) SLAM output using ORB-SLAM (ECCV '18) - running on desktop and laptop GPUs/CPUs; (vii) Novel Annotated dataset for inhouse object detection and Covert Scene (geo-) locations (annotation of later ongoing).


Project image
Project image
Project image
Project image
Project image
Project image
Project image
Project image

Societal benefit and impact anticipated

Since this is a military application, its long-term benefits for society are beyond questions. Clandestine operations are necessary in many sensitive areas, under severely adverse situations, where human lives are in danger. This outcome if properly used by Indian defense forces, will benefit them in long run, and protect our nation from larger damage, in future.

Next steps

1. Annotation (started last month) work for datasets captured, for training Deep and Shallow networks. 2. Capture more data in large lounge, seminar halls, suite rooms etc. 3. Look for State-of-art in object detection, depth estimation, SLAM based reconstruction and semi-supervised attention based saliency algorithms; implement and verify their performance. 4. Once annotation in place (both for object identification and covert geolocations), Deep and shallow networks will be trained with huge set of samples.

Publications and reports

1. "Moving Object Segmentation for Jittery Videos, by Clustering of Stabilized Latent Trajectories"; Geethu Miriam Jacob and Sukhendu Das; Image and Vision Computing (Elsevier) (Impact Factor 2.94); Volume 64, August 2017, pp. 10-22; DOI:10.1016/j.imavis.2017.05.002.
2. "Panorama from Representative Frames of Unconstrained Videos Using DiffeoMeshes.", Geethu Miriam Jacob and Sukhendu Das, 14th Asian Conference on Computer Vision (ACCV), Perth, WA, Australia, Dec. 2018.
3. "What's there in the Dark", Sauradip Nag, Saptakatha Adak and Sukhendu Das; 26th IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, September 22-25, 2019, pp.2996-3000 DOI:10.1109/ICIP.2019.8803299.
4. "Panorama from Representative Frames of Unconstrained Videos Using DiffeoMeshes.", Geethu Miriam Jacob and Sukhendu Das, In 14th Asian Conference on Computer Vision (ACCV), Perth, WA, Australia, December 2-6, 2018, pp. 166-182; DOI:10.1007/978-3-030-20893-6_11.
5. "Visual Saliency Detection via Convolutional Gated Recurrent Units", Sayanti Bardhan, Sukhendu Das and Shibu Jacob; Accepted in 26th International Conference on Neural Information Processing (ICONIP), Sydney, Australia, Dec. 12-15, 2019.



Scholars and Project Staff

Project Officers: Kitty Varghese - July '18 - Nov. '19.
Project Associates: Sadbhavana Babar M - July '18 - Nov. '19. Souradip Nag - June '18 - Nov. '19. Binoy Kumar Saha - July '19 - July '20
Project Assistants/Attendants: N. Mahalakshmi - Jan. 2019 - Jan 2020

Challenges faced

Since trained manpower is in general not available for this high-end rich-technological sector, we were thus compelled to recruit fresh B. Techs mostly and train them with various courses (Deep Learning, GPU programming, Vision, Machine Learning etc.) running in the Deptt. The staff/scholars are getting trained and will be be able to deliver output mostly from Mid-2019 onwards. This is the main cause of the mild delay in progress, as it appears now.

Other information

Planned Tasks (next 4-6 months), currently pursued: (i) Exploring latest technology (deep-CNN, latent SVM, multi-task or DA- based Identification) for improving performance of : - Scene Segmentation and saliency Object Detection and Recognition (normal and low lighting) - Depth from single image/panorama; (ii) SLAM-based 3D reconstruction; - Scene structure (3D) from point cloud generated from SLAM; (iii) Combining cues from depth, saliency, object detection etc, for identification of covert geo-locations, using annotated sample being prepared; (iv) Interface for sensors and GPU with onboard system on Robots - expecting help from CAIR (DRDO) Bangalore, as per proposal; (v) Initiated Scene Ontology generation - scene graph generation and Query language format, for limited content. For latest results and demos/videos - see our cite: vplab/IMPRINT_2019_index.html /*insert tilda character before vplab/....

Financial Information

  • Total sanction: Rs. 187.02 lakhs

  • Amount received: Rs. 142.59 lakhs

  • Amount utilised for Equipment: Rs. 44.62 lakhs

  • Amount utilised for Manpower: Rs. 30.98 lakhs

  • Amount utilised for Consumables: Rs. 14.04 lakhs

  • Amount utilised for Contingency: Rs. 0.92 lakhs

  • Amount utilised for Travel: Rs. 11.62 lakhs

  • Amount utilised for Other Expenses: 0

  • Amount utilised for Overheads: Rs. 23.12 lakhs

Equipment and facilities


Equipments Purchased: (i) Quad GPU Machine - 1 No.; (ii) Dual GPU machine - 2 Nos.; (iii) Single GPU Machine - 2 Nos.; (iv) High-end workstations - 2 Nos.; (v) ROBOT (Indoor) OX-DELTA 4 WD Research Platform - 1 No.; (vi) ROBOT (Outdoor) Fire Bird VI Robotics Research Platform - 1 No.; (vii) CAMERA Nikon D7200 - 24 MP DSLR, with Tripod & Dolly - 1 No. Facility Created: - VICOGSA (VIsio-COGnitive Smart Agent) Laboratory setup in Deptt. of CS&E, IIT Madras, for design and experiments with Visual-perceptive Intelligent Robots,