Dr. Nabin Sharma is a Senior Lecturer, School of Software, Faculty of Engineering & IT at UTS. He graduated with a PhD from the School of ICT, Griffith University, Queensland, Australia. His research area focuses on video and image processing, pattern recognition, and machine learning techniques for object detection and recognition. He has more than 14 years of experience in research & development and academia. He has substantial industry experience in software design and development while working on various projects at IBM India Private Ltd.
He have published over 45 papers in referred books, conferences and journals. His Google Scholar profile reports 941 citations with i10-index of 16 and h-index of 13. He secured research grants for projects with funds exceeding AUD$341K, in collaboration with industry and academia. He received the runner-up award in ‘The Young IT Professional Award Competition 2006’ organized by the Computer Society of India (CSI), East Zone, and was also nominated for the prestigious ‘Young Scientist Award 2006’, of The Indian Science Congress Association. He received the ‘Spirit of GovHack 2015’ award at GovHack 2015, Griffith University Gold Coast Campus. Winner of iAward NSW 2018 in the following three major categories for the SharkSpotter project,
- Research and Development Project of the Year,
- Artificial Intelligence or Machine Learning Innovation of the Year, and
- Community Service Markets
The Australian Information Industry Association (AIIA) NSW iAwards is the nation’s leading awards program for innovation in the digital economy.
Winner of National iAwards 2018 for the SharkSpotter project in the Artificial Intelligence or Machine Learning Innovation of the Year category.
Recieved Merit Award at Asia Pacific ICT Alliance Awards (APICTA) 2018 for the SharkSpotter project in the Artificial Intelligence Technology of the Year category.
- Senior Member of Institute of Electrical and Electronics Engineers (IEEE)
- Member of Association for Computing Machinery (ACM)
- Life Member of Indian Unit for Pattern Recognition and Artificial Intelligence (IUPRAI) (IAPR)
- Member of Australian Water Association (AWA)
- Pattern Recognition (Elsevier)
- Pattern Recognition Letters (Elsevier)
- ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
- Malaysian Journal of Computer Science
- IPSJ Transactions on Computer Vision and Applications (Springer)
- Frontiers of Computer Science
- Connection Science
- International Journal of Pattern Recognition and Artificial Intelligence.
- Karbala International Journal of Modern Science
- Journal of Imaging
- International Conference on Pattern Recognition (ICPR), 2012, 2014, 2016.
- Digital Image Computing: Techniques and Applications (DICTA), 2016, 2017, 2018.
- International Conference on Document Analysis and Recognition (ICDAR) 2013.
- International Conference on Frontiers in Handwriting Recognition. 2014, 2016.
- International Conference on Intelligent Systems Design and Applications (ISDA), 2012
- International Conference on Hybrid Intelligent Systems (HIS) 2012.
- IAPR International Workshop on Document Analysis Systems (DAS) 2012.
- International Workshop on Camera-Based Document Analysis and Recognition (CBDAR) 2013.
- International Conference on Advances in Pattern Recognition (ICAPR), 2017
- Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI) 2016, 2018.
- IAPR International Workshop on Document Analysis Systems (DAS) 2012
- Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI) 2016.
- International Conference on Man-Machine Interactions (ICMMI), 2017
- Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI) 2018.
- The International Conference on Digital Image Computing: Techniques and Applications (DICTA) 2017.
- 16th Australasian Symposium on Parallel and Distributed Computing (AusPDC) 2018.
- Organizing Chair: 15th International Conference on Document Analysis and Recognition (ICDAR) 2019, Sydney.
- 2014 – Present: Associate Editor - Gate to Computer Vision and Pattern Recognition (gtCVPR) journal.
Conference Special Session organizer:
- Deep learning for computer vision: theory and applications, International Conference on Neural Information Processing (ICONIP) 2017.
- Feature extraction and learning on image and text data, 2018 IEEE World Congress on Computational Intelligence, IJCNN.
- Advances in Document Analysis and Recognition, 2018 IEEE World Congress on Computational Intelligence, IJCNN.
Can supervise: YES
Nabin's research interests include Object detection, Document Analysis, Handwritten Character Recognition, Biometrics, Machine Learning, Image Processing, and Pattern Recognition. Nabin has published several research papers and book chapters based on his work.
His current reseach focus is on marine animal detection from aerial imagery, beach surveillance from aerial imagery, human gesture recognition, crowd analysis, use of deep learninig technqiues for solving object detection and classtification problems, to mention a few.
Fundamental of Software Development (32555)
.Net Application Development (32998)
Application Development with .Net (31927)
Deep Learning and Convolutional Neural Networks(42028)
- Programming: Java, Python, C, C++
- Deep Learning and Convolutional Neural Networks
- Data Structure and Algorithms
- System Analysis and Design
Saqib, M, Khan, SD, Sharma, N & Blumenstein, M 2019, 'Crowd Counting in Low-Resolution Crowded Scenes Using Region-Based Deep Convolutional Neural Networks', IEEE Access, vol. 7, pp. 35317-35329.View/Download from: Publisher's site
© 2013 IEEE. Crowd counting and density estimation is an important and challenging problem in the visual analysis of the crowd. Most of the existing approaches use regression on density maps for the crowd count from a single image. However, these methods cannot localize individual pedestrian and therefore cannot estimate the actual distribution of pedestrians in the environment. On the other hand, detection-based methods detect and localize pedestrians in the scene, but the performance of these methods degrades when applied in high-density situations. To overcome the limitations of pedestrian detectors, we proposed a motion-guided filter (MGF) that exploits spatial and temporal information between consecutive frames of the video to recover missed detections. Our framework is based on the deep convolution neural network (DCNN) for crowd counting in the low-to-medium density videos. We employ various state-of-the-art network architectures, namely, Visual Geometry Group (VGG16), Zeiler and Fergus (ZF), and VGGM in the framework of a region-based DCNN for detecting pedestrians. After pedestrian detection, the proposed motion guided filter is employed. We evaluate the performance of our approach on three publicly available datasets. The experimental results demonstrate the effectiveness of our approach, which significantly improves the performance of the state-of-the-art detectors.
Chou, K-P, Prasad, M, Wu, D, Sharma, N, Li, D-L, Lin, Y-F, Blumenstein, M, Lin, W-C & Lin, C-T 2018, 'Robust Feature-Based Automated Multi-View Human Action Recognition System', IEEE ACCESS, vol. 6, pp. 15283-15296.View/Download from: UTS OPUS or Publisher's site
Sharma, N, Shivakumara, P, Pal, U, Blumenstein, M & Tan, CL 2015, 'Piece-wise linearity based method for text frame classification in video', PATTERN RECOGNITION, vol. 48, no. 3, pp. 862-881.View/Download from: UTS OPUS or Publisher's site
Pal, U, Jayadevan, R & Sharma, N 2012, 'Handwriting recognition in indian regional scripts: A survey of offline techniques', ACM Transactions on Asian Language Information Processing, vol. 11, no. 1.View/Download from: UTS OPUS or Publisher's site
Offline handwriting recognition in Indian regional scripts is an interesting area of research as almost 460 million people in India use regional scripts. The nine major Indian regional scripts are Bangla (for Bengali and Assamese languages), Gujarati, Kannada, Malayalam, Oriya, Gurumukhi (for Punjabi language), Tamil, Telugu, and Nastaliq (for Urdu language). A state-of-the-art survey about the techniques available in the area of offline handwriting recognition (OHR) in Indian regional scripts will be of a great aid to the researchers in the subcontinent and hence a sincere attempt is made in this article to discuss the advancements reported in this regard during the last few decades. The survey is organized into different sections. A brief introduction is given initially about automatic recognition of handwriting and official regional scripts in India. The nine regional scripts are then categorized into four subgroups based on their similarity and evolution information. The first group contains Bangla, Oriya, Gujarati and Gurumukhi scripts. The second group contains Kannada and Telugu scripts and the third group contains Tamil and Malayalam scripts. The fourth group contains only Nastaliq script (Perso-Arabic script for Urdu), which is not an Indo-Aryan script. Various feature extraction and classification techniques associated with the offline handwriting recognition of the regional scripts are discussed in this survey. As it is important to identify the script before the recognition step, a section is dedicated to handwritten script identification techniques. A benchmarking database is very important for any pattern recognition related research. The details of the datasets available in different Indian regional scripts are also mentioned in the article. A separate section is dedicated to the observations made, future scope, and existing difficulties related to handwriting recognition in Indian regional scripts. We hope that this survey will serve as a compendi...
Saqib, M, Daud Khan, S, Sharma, N, Scully-Power, P, Butcher, P, Colefax, A & Blumenstein, M 2019, 'Real-Time Drone Surveillance and Population Estimation of Marine Animals from Aerial Imagery', International Conference Image and Vision Computing New Zealand.View/Download from: Publisher's site
© 2018 IEEE. Video analysis is being rapidly adopted by marine biologists to asses the population and migration of marine animals. Manual analysis of videos by human observers is labor intensive and prone to error. The automatic analysis of videos using state-of-the-art deep learning object detectors provides a cost-effective way for the study of marine animals population and their ecosystem. However, there are many challenges associated with video analysis such as background clutter, illumination, occlusions, and deformation. Due to the high-density of objects in the images and sever occlusion, current state-of-the-art object often results in multiple detections. Therefore, customized Non-Maxima-Suppression is proposed after the detections to suppress false positives which significantly improves the counting and mean average precision of the detections. An end-to-end deep learning framework of Faster-RCNN  was adopted for detections with base architectures of VGG16 , VGGM  and ZF .
Wu, D, Sharma, N & Blumenstein, M 2019, 'An End-to-End Hierarchical Classification Approach for Similar Gesture Recognition', International Conference Image and Vision Computing New Zealand.View/Download from: Publisher's site
© 2018 IEEE. Human action recognition from the RGB video is widely applied on varies real applications. Many works have been done by researchers in computer vision and machine learning area to address the challenges and complexity involved in video-based human action recognition. Deep learning approaches including Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) have been introduced in the human action recognition research area. However, due to the drawbacks of the CNNs, recognizing actions with similar gestures and describing complex actions is still very challenging. Hence, an end-to-end hierarchical classification architecture has been proposed in this paper to resolve the confusion between similar gesture. The proposed approach firstly classifies the whole dataset and generates the accuracy for each class in stage 1. Based on the confusion matrix obtained from stage-1, the approach combines the most confused similar gesture pairs into one class, and classify them along with all other class, in the stage-2. In stage 3, similar gesture pairs will be classified by binary classifiers, which will increase the performance of each class and the overall accuracy. We apply and evaluate the developed models to recognize the similar human actions on the both KTH and UCF101 dataset. The result shows that the proposed approach can boost the classification performance on both the datasets. The proposed architecture is robust and any classification technique can be used in stage 1 and stage 2.
Wu, D, Sharma, N & Blumenstein, M 2019, 'Similar Gesture Recognition using Hierarchical Classification Approach in RGB Videos', 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018.View/Download from: Publisher's site
© 2018 IEEE. Recognizing human actions from the video streams has become one of the very popular research areas in computer vision and deep learning in the recent years. Action recognition is wildly used in different scenarios in real life, such as surveillance, robotics, healthcare, video indexing and human-computer interaction. The challenges and complexity involved in developing a video-based human action recognition system are manifold. In particular, recognizing actions with similar gestures and describing complex actions is a very challenging problem. To address these issues, we study the problem of classifying human actions using Convolutional Neural Networks (CNN) and develop a hierarchical 3DCNN architecture for similar gesture recognition. The proposed model firstly combines similar gesture pairs into one class, and classify them along with all other class, as a stage-1 classification. In stage-2, similar gesture pairs are classified individually, which reduces the problem to binary classification. We apply and evaluate the developed models to recognize the similar human actions on the HMDB51 dataset. The result shows that the proposed model can achieve high performance in comparison to the state-of-the-art methods.
Saqib, M, Daud Khan, S, Sharma, N & Blumenstein, M 2018, 'Extracting descriptive motion information from crowd scenes', International Conference Image and Vision Computing New Zealand, pp. 1-6.View/Download from: Publisher's site
© 2017 IEEE. An important contribution that automated analysis tools can generate for management of pedestrians and crowd safety is the detection of conflicting large pedestrian flows: this kind of movement pattern, in fact, may lead to dangerous situations and potential threats to pedestrian's safety. For this reason, detecting dominant motion patterns and summarizing motion information from the scene are inevitable for crowd management. In this paper, we develop a framework that extracts motion information from the scene by generating point trajectories using particle advection approach. The trajectories obtained are then clustered by using unsupervised hierarchical clustering algorithm, where the similarity is measured by the Longest Common Sub-sequence (LCS) metric. The achieved motions patterns in the scene are summarized and represented by using color-coded arrows, where speeds of the different flows are encoded with colors, the width of an arrow represents the density (number of people belonging to a particular motion pattern) while the arrowhead represents the direction. This novel representation of crowded scene provides a clutter free visualization which helps the crowd managers in understanding the scene. Experimental results show that our method outperforms state-of-the-art methods.
Sharma, N, Sengupta, A, Sharma, R, Pal, U & Blumenstein, M 2018, 'Pincode detection using deep CNN for postal automation', International Conference Image and Vision Computing New Zealand, pp. 1-6.View/Download from: Publisher's site
© 2017 IEEE. Postal automation has been a topic of research over a decade. The challenges and complexity involved in developing a postal automation system for a multi-lingual and multi-script country like India are many-fold. The characteristics of Indian postal documents include: multi-lingual behaviour, unconstrained handwritten addresses, structured/unstructured envelopes and postcards, being among the most challenging aspects. This paper examines the state-of-the-art Deep CNN architectures for detecting pin-code in both structured and unstructured postal envelopes and documents. Region-based Convolutional Neural Networks (RCNN) are used for detecting the various significant regions, namely Pin-code blocks/regions, destination address block, seal and stamp in a postal document. Three network architectures, namely Zeiler and Fergus (ZF), Visual Geometry Group (VGG16), and VGG M were considered for analysis and identifying their potential. A dataset consisting of 2300 multilingual Indian postal documents of three different categories was developed and used for experiments. The VGG-M architecture with Faster-RCNN performed better than others and promising results were obtained.
Saqib, M, Khan, SD, Sharma, N & Blumenstein, M 2018, 'Person Head Detection in Multiple Scales Using Deep Convolutional Neural Networks', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Rio de Janeiro, Brazil.View/Download from: UTS OPUS or Publisher's site
© 2018 IEEE. Person detection is an important problem in computer vision with many real-world applications. The detection of a person is still a challenging task due to variations in pose, occlusions and lighting conditions. The purpose of this study is to detect human heads in natural scenes acquired from a publicly available dataset of Hollywood movies. In this work, we have used state-of-the-art object detectors based on deep convolutional neural networks. These object detectors include region-based convolutional neural networks using region proposals for detections. Also, object detectors that detect objects in the single-shot by looking at the image only once for detections. We have used transfer learning for fine-tuning the network already trained on a massive amount of data. During the fine-tuning process, the models having high mean Average Precision (mAP) are used for evaluation of the test dataset. Experimental results show that Faster R-CNN  and SSD MultiBox  with VGG16  perform better than YOLO  and also demonstrate significant improvements against several baseline approaches.
Sharma, N, Scully-Power, P & Blumenstein, M 2018, 'Shark detection from aerial imagery using region-based CNN, a study', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 224-236.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2018. Shark attacks have been a very sensitive issue for Australians and many other countries. Thus, providing safety and security around beaches is very fundamental in the current climate. Safety for both human beings and underwater creatures (sharks, whales, etc.) in general is essential while people continue to visit and use the beaches heavily for recreation and sports. Hence, an efficient, automated and real-time monitoring approach on beaches for detecting various objects (e.g. human activities, large fish, sharks, whales, surfers, etc.) is necessary to avoid unexpected casualties and accidents. The use of technologies such as drones and machine learning techniques are promising directions in such challenging circumstances. This paper investigates the potential of Region-based Convolutional Neural Networks (R-CNN) for detecting various marine objects, and Sharks in particular. Three network architectures namely Zeiler and Fergus (ZF), Visual Geometry Group (VGG16), and VGG_M were considered for analysis and identifying their potential. A dataset consisting of 3957 video frames were used for experiments. VGG16 architecture with faster-R-CNN performed better than others, with an average precision of 0.904 for detecting Sharks.
Sharma, N, Mandal, R, Sharma, R, Pal, U & Blumenstein, M 2018, 'Signature and logo detection using deep CNN for document image retrieval', Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 416-422.View/Download from: Publisher's site
© 2018 IEEE. Signature and logo as a query are important for content-based document image retrieval from a scanned document repository. This paper deals with signature and logo detection from a repository of scanned documents, which can be used for document retrieval using signature or logo information. A large intra-category variance among signature and logo samples poses challenges to traditional hand-crafted feature extraction-based approaches. Hence, the potential of deep learning-based object detectors namely, Faster R-CNN and YOLOv2 were examined for automatic detection of signatures and logos from scanned administrative documents. Four different network models namely ZF, VGG16, VGG-M, and YOLOv2 were considered for analysis and identifying their potential in document image retrieval. The experiments were conducted on the publicly available 'Tobacco-800' dataset. The proposed approach detects Signatures and Logos simultaneously. The results obtained from the experiments are promising and at par with the existing methods.
Wu, D, Sharma, N & Blumenstein, M 2017, 'Recent Advances in Video Based Human Action Recognition Using Deep Learning: A Review', Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), International Joint Conference on Neural Networks, IEEE, Anchorage, Alaska, USA, pp. 2865-2872.View/Download from: UTS OPUS or Publisher's site
In this work, we consider the problem of robust principal component analysis (RPCA) for streaming noisy data that has been highly compressed. This problem is prominent when one deals with high-dimensional and large-scale data and data compression is necessary. To solve this problem, we propose an online compressed RPCA algorithm to efficiently recover the low-rank components of raw data. Though data compression incurs severe information loss, we provide deep analysis on the proposed algorithm and prove that the low-rank component can be asymptotically recovered under mild conditions. Compared with other recent works on compressed RPCA, our algorithm reduces the memory cost significantly by processing data in an online fashion and reduces the communication cost by accepting sequential compressed data as input.
Coluccia, A, Ghenescu, M, Piatrik, T, De Cubber, G, Schumann, A, Sommer, L, Klatte, J, Schuchert, T, Beyerer, J, Farhadi, M, Amandi, R, Aker, C, Kalkan, S, Saqib, M, Sharma, N, Makkah, SDK & Blumenstein, M 2017, 'Drone-vs-Bird detection challenge at IEEE AVSS2017', Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017, IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Lecce, Italy, pp. 1-6.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Small drones are a rising threat due to their possible misuse for illegal activities, in particular smuggling and terrorism. The project SafeShore, funded by the European Commission under the Horizon 2020 program, has launched the 'drone-vs-bird detection challenge' to address one of the many technical issues arising in this context. The goal is to detect a drone appearing at some point in a video where birds may be also present: the algorithm should raise an alarm and provide a position estimate only when a drone is present, while not issuing alarms on birds. This paper reports on the challenge proposal, evaluation, and results1.
Saqib, M, Daud Khan, S, Sharma, N & Blumenstein, M 2017, 'A study on detecting drones using deep convolutional neural networks', Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017, IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Lecce, Italy.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. The object detection is a challenging problem in computer vision with various potential real-world applications. The objective of this study is to evaluate the deep learning based object detection techniques for detecting drones. In this paper, we have conducted experiments with different Convolutional Neural Network (CNN) based network architectures namely Zeiler and Fergus (ZF), Visual Geometry Group (VGG16) etc. Due to sparse data available for training, networks are trained with pre-trained models using transfer learning. The snapshot of trained models is saved at regular interval during training. The best models having high mean Average Precision (mAP) for each network architecture are used for evaluation on the test dataset. The experimental results show that VGG16 with Faster R-CNN perform better than other architectures on the training dataset. Visual analysis of the test dataset is also presented.
Cheng, EJ, Prasad, M, Puthal, D, Sharma, N, Prasad, OK, Chin, PH, Lin, CT & Blumenstein, M 2017, 'Deep Learning Based Face Recognition with Sparse Representation Classification', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 665-674.View/Download from: Publisher's site
© 2017, Springer International Publishing AG. Feature extraction is an essential step in solving real-world pattern recognition and classification problems. The accuracy of face recognition highly depends on the extracted features to represent a face. The traditional algorithms uses geometric techniques, comprising feature values including distance and angle between geometric points (eyes corners, mouth extremities, and nostrils). These features are sensitive to the elements such as illumination, variation of poses, various expressions, to mention a few. Recently, deep learning techniques have been very effective for feature extraction, and deep features have considerable tolerance for various conditions and unconstrained environment. This paper proposes a two layer deep convolutional neural network (CNN) for face feature extraction and applied sparse representation for face identification. The sparsity and selectivity of deep features can strengthen sparseness for the solution of sparse representation, which generally improves the recognition rate. The proposed method outperforms other feature extraction and classification methods in terms of recognition accuracy.
Sharma, N, Mandal, R, Sharma, R, Pal, U & Blumenstein, M 2015, 'Bag-of-Visual Words for word-wise video script identification: A study', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Killarney, Ireland.View/Download from: Publisher's site
© 2015 IEEE. Use of multiple scripts for information communication through various media is quite common in a multilingual country. Optical character recognition of such document images or videos assists in indexing them for effective information retrieval. Hence, script identification from multi-lingual documents/images is a necessary step for selecting the appropriate OCR, due the absence of a single OCR system capable of handling multiple scripts. Script identification from printed as well as handwritten documents is a well-researched area, but script identification from video frames has not been explored much. Low resolution, blur, noisy background, to mention a few are the major bottle necks when processing video frames, and makes script identification from video images a challenging task. This paper examines the potential of Bag-of-Visual Words based techniques for word-wise script identification from video frames. Two different approaches namely, Bag-Of-Features (BoF) and Spatial Pyramid Matching (SPM), using patch based SIFT descriptors were considered for the current study. SVM Classifier was used for analysing the three popular south Indian scripts, namely Tamil, Telugu and Kannada in combination with English and Hindi. A comparative study of Bag-of-Visual words with traditional script identification techniques involving gradient based features (e.g. HoG) and texture based features (e.g. LBP) is presented. Experimental results shows that patch-based features along with SPM outperformed the traditional techniques and promising accuracies were achieved on 2534 words from the five scripts. The study reveals that patch-based feature can be used for scripts identification in-order to overcome the inherent problems with video frames.
Sharma, N, Mandal, R, Sharma, R, Roy, PP, Pal, U & Blumenstein, M 2015, 'Multi-lingual text recognition from video frames', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, International Conference on Document Analysis and Recognition (ICDAR), IEEE, Nancy, France, pp. 951-955.View/Download from: Publisher's site
© 2015 IEEE. Text recognition from video frames is a challenging task due to low resolution, blur, complex and coloured backgrounds, noise, to mention a few. Consequently, the traditional ways of text recognition from scanned documents having simple backgrounds fails when applied to video text. Although there are various techniques available for text recognition from handwritten and printed documents with simple backgrounds, text recognition from video frames has not been comprehensively investigated, especially for multi-lingual videos. In this paper, we present a technique for multi-lingual video text recognition which involves script identification in the first stage, followed by word and character recognition, and finally the results are refined using a post-processing technique. Considering the inherent problems in videos, a Spatial Pyramid Matching (SPM) based technique, using patch-based SIFT descriptors and SVM classifier, is employed for script identification. In the next stage, a Hidden Markov Model (HMM) based approach is used for word and character recognition, which utilizes the context information. Finally, a lexicon-based post-processing technique is applied to verify and refine the word recognition results. The proposed method was tested on a dataset comprising of 4800 words from three different scripts, namely, Roman (English), Hindi and Bengali. The script identification results obtained are encouraging. The word and character recognition results are also encouraging considering the complexity and problems associated with video text processing.
Sharma, N, Mandal, R, Sharma, R, Pal, U & Blumenstein, M 2015, 'ICDAR2015 Competition on Video Script Identification (CVSI 2015)', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1196-1200.View/Download from: Publisher's site
© 2015 IEEE. This paper presents the final results of the ICDAR 2015 Competition on Video Script Identification. A description and performance of the participating systems in the competition are reported. The general objective of the competition is to evaluate and benchmark the available methods on word-wise video script identification. It also provides a platform for researchers around the globe to particularly address the video script identification problem and video text recognition in general. The competition was organised around four different tasks involving various combinations of scripts comprising tri-script and multi-script scenarios. The dataset used in the competition comprised ten different scripts. In total, six systems were received from five participants over the tasks offered. This report details the competition dataset specifications, evaluation criteria, summary of the participating systems and their performance across different tasks. The systems submitted by Google Inc. were the winner of the competition for all the tasks, whereas the systems received from Huazhong University of Science and Technology (HUST) and Computer Vision Center (CVC) were very close competitors.
Shivakumara, P, Sharma, N, Pal, U, Blumenstein, M & Tan, CL 2014, 'Gradient-angular-features for word-wise video script identification', Proceedings - International Conference on Pattern Recognition, International Conference on Pattern Recognition, IEEE, Sweden, pp. 3098-3103.View/Download from: UTS OPUS or Publisher's site
© 2014 IEEE. Script identification at the word level is challenging because of complex backgrounds and low resolution of video. The presence of graphics and scene text in video makes the problem more challenging. In this paper, we employ gradient angle segmentation on words from video text lines. This paper presents new Gradient-Angular-Features (GAF) for video script identification, namely, Arabic, Chinese, English, Japanese, Korean and Tamil. This work enables us to select an appropriate OCR when the frame has words of multi-scripts. We employ gradient directional features for segmenting words from video text lines. For each segmented word, we study the gradient information in effective ways to identify text candidates. The skeleton of the text candidates is analyzed to identify Potential Text Candidates (PTC) by filtering out unwanted text candidates. We propose novel GAF for the PTC to study the structure of the components in the form of cursiveness and softness. The histogram operation on the GAF is performed in different ways to obtain discriminative features. The method is evaluated on 760 words of six scripts having low contrast, complex background, different font sizes, etc. in terms of the classification rate and is compared with an existing method to show the effectiveness of the method. We achieve 88.2% average classification rate.
Sharma, N, Pal, U & Blumenstein, M 2014, 'A study on word-level multi-script identification from video frames', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, Beijing, China, pp. 1827-1833.View/Download from: UTS OPUS or Publisher's site
© 2014 IEEE. The presence of multiple scripts in multi-lingual document images makes Optical Character Recognition (OCR) of such documents a challenging task. Due to the unavailability of a single OCR system which can handle multiple scripts, script identification becomes an essential step for choosing the appropriate OCR. Although, there are various techniques available for script identification from handwritten and printed documents having simple backgrounds, however script identification from video frames has been seldom explored. Video frames are coloured and suffer from low resolution, blur, complex background and noise to mention a few, which makes the script identification process a challenging task. This paper presents a study of various combinations of features and classifiers to explore whether the traditional script identification techniques can be applied to video frames. A texture based feature namely, Local Binary Pattern (LBP), Gradient based features namely, Histogram of Oriented Gradient (HoG) and Gradient Local Auto-Correlation (GLAC) were used in the study. Combination of the features with SVMs and ANNs where used for classification. Three popular scripts, namely English, Bengali and Hindi were considered in the present study. Due to the inherent problems with the video, a super resolution technique was applied as a pre-processing step. Experiments show that the GLAC feature has performed better than the other features, and an accuracy of 94.25% was achieved when testing on 1271 words from three different scripts. The study also reveals that gradient features are more suitable for script identification than the texture features when using traditional script identification techniques on video frames.
Sharma, N, Chanda, S, Pal, U & Blumenstein, M 2013, 'Word-wise script identification from video frames', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, International Conference on Document Analysis and Recognition (ICDAR), IEEE, Washington, DC, USA, pp. 867-871.View/Download from: UTS OPUS or Publisher's site
Script identification is an essential step for the efficient use of the appropriate OCR in multilingual document images. There are various techniques available for script identification from printed and handwritten document images, but script identification from video frames has not been explored much. This paper presents a study of some pre-processing techniques and features for word-wise script identification from video frames. Traditional features, namely Zernike moments, Gabor and gradient, have performed well for handwritten and printed documents having simple backgrounds and adequate resolution for OCR. Video frames are mostly coloured and suffer from low resolution, blur, background noise, to mention a few. In this paper, an attempt has been made to explore whether the traditional script identification techniques can be useful in video frames. Three feature extraction techniques, namely Zernike moments, Gabor and gradient features, and SVM classifiers were considered for analyzing three popular scripts, namely English, Bengali and Hindi. Some pre-processing techniques such as super resolution and skeletonization of the original word images were used in order to overcome the inherent problems with video. Experiments show that the super resolution technique with gradient features has performed well, and an accuracy of 87.5% was achieved when testing on 896 words from three different scripts. The study also reveals that the use of proper pre-processing approaches can be helpful in applying traditional script identification techniques to video frames. © 2013 IEEE.
Sharma, N, Shivakumara, P, Pal, U, Blumenstein, M & Tan, CL 2013, 'A new method for character segmentation from multi-oriented video words', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, International Conference on Document Analysis and Recognition (ICDAR), IEEE, USA, pp. 413-417.View/Download from: UTS OPUS or Publisher's site
This paper presents a two-stage method for multi-oriented video character segmentation. Words segmented from video text lines are considered for character segmentation in the present work. Words can contain isolated or non-touching characters, as well as touching characters. Therefore, the character segmentation problem can be viewed as a two stage problem. In the first stage, text cluster is identified and isolated (non-touching) characters are segmented. The orientation of each word is computed and the segmentation paths are found in the direction perpendicular to the orientation. Candidate segmentation points computed using the top distance profile are used to find the segmentation path between the characters considering the background cluster. In the second stage, the segmentation results are verified and a check is performed to ascertain whether the word component contains touching characters or not. The average width of the components is used to find the touching character components. For segmentation of the touching characters, segmentation points are then found using average stroke width information, along with the top and bottom distance profiles. The proposed method was tested on a large dataset and was evaluated in terms of precision, recall and f-measure. A comparative study with existing methods reveals the superiority of the proposed method. © 2013 IEEE.
Sharma, N, Shivakumara, P, Pal, U, Blumenstein, M & Tan, CL 2012, 'A new method for arbitrarily-oriented text detection in video', Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, International Workshop on Document Analysis Systems, IEEE, Institute of Electrical and Electronics Engineers, Gold Coast, Australia, pp. 74-78.View/Download from: UTS OPUS or Publisher's site
Text detection in video frames plays a vital role in enhancing the performance of information extraction systems because the text in video frames helps in indexing and retrieving video efficiently and accurately. This paper presents a new method for arbitrarily-oriented text detection in video, based on dominant text pixel selection, text representatives and region growing. The method uses gradient pixel direction and magnitude corresponding to Sobel edge pixels of the input frame to obtain dominant text pixels. Edge components in the Sobel edge map corresponding to dominant text pixels are then extracted and we call them text representatives. We eliminate broken segments of each text representatives to get candidate text representatives. Then the perimeter of candidate text representatives grows along the text direction in the Sobel edge map to group the neighboring text components which we call word patches. The word patches are used for finding the direction of text lines and then the word patches are expanded in the same direction in the Sobel edge map to group the neighboring word patches and to restore missing text information. This results in extraction of arbitrarily-oriented text from the video frame. To evaluate the method, we considered arbitrarily-oriented data, non-horizontal data, horizontal data, Hua's data and ICDAR-2003 competition data (Camera images). The experimental results show that the proposed method outperforms the existing method in terms of recall and f-measure. © 2012 IEEE.
Sharma, N, Pal, U & Blumenstein, M 2012, 'Recent advances in video based document processing: A review', Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, International Workshop on Document Analysis Systems, IEEE, Institute of Electrical and Electronics Engineers, Gold Coast, Australia, pp. 63-68.View/Download from: UTS OPUS or Publisher's site
Extraction and recognition of text present in video has become a very popular research area in the last decade. Generally, text present in video frames is of different size, orientation, style, etc. with complex backgrounds, noise, low resolution and contrast. These factors make the automatic text extraction and recognition in video frames a challenging task. A large number of techniques have been proposed by various researchers in the recent past to address the problem. This paper presents a review of various state-of-the-art techniques proposed towards different stages (e.g. detection, localization, extraction, etc.) of text information processing in video frames. Looking at the growing popularity and the recent developments in the processing of text in video frames, this review imparts details of current trends and potential directions for further research activities to assist researchers. © 2012 IEEE.
Pal, U, Sharma, N, Wakabayashi, T & Kimura, F 2008, 'Handwritten character recognition of popular South Indian scripts', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 251-264.View/Download from: Publisher's site
India is a multi-lingual, multi-script country. Considerably less work has been done towards handwritten character recognition of Indian languages than for other languages. In this paper we propose a quadratic classifier based scheme for the recognition of off-line handwritten characters of three popular south Indian scripts: Kannada, Telugu, and Tamil. The features used here are mainly obtained from the directional information. For feature computation, the bounding box of a character is segmented into blocks, and the directional features are computed in each block. These blocks are then down-sampled by a Gaussian filter, and the features obtained from the down-sampled blocks are fed to a modified quadratic classifier for recognition. Here, we used two sets of features. We used 64-dimensional features for high speed recognition and 400-dimensional features for high accuracy recognition. A five-fold cross validation technique was used for result computation, and we obtained 90.34%, 90.90%, and 96.73% accuracy rates from Kannada, Telugu, and Tamil characters, respectively, from 400 dimensional features. © 2008 Springer-Verlag Berlin Heidelberg.
Sharma, N, Pal, U & Kimura, F 2007, 'Recognition of handwritten Kannada numerals', Proceedings - 9th International Conference on Information Technology, ICIT 2006, pp. 133-136.View/Download from: Publisher's site
This paper deals with a quadratic classifier based scheme for the recognition of off-line handwritten numerals of Kannada, an important Indian script. The features used in the classifier are obtained from the directional chain code information of the contour points of the characters. The bounding box of a character is segmented into blocks and the chain code histogram is computed in each of the blocks. Here we have used 64 dimensional and 100 dimensional features for a comparative study on the recognition accuracy of our proposed system. This chain code features are fed to the quadratic classifier for recognition. We tested our scheme on 2300 data samples and obtained 97.87% and 98.45% recognition accuracy using 64 dimensional and 100 dimensional features respectively, from the proposed scheme using five-fold cross-validation technique. © 2006 IEEE.
Pal, U, Wakabayashi, T, Sharma, N & Kimura, F 2007, 'Handwritten numeral recognition of six popular Indian scripts', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 749-753.View/Download from: Publisher's site
India is a multi-lingual multi-script country but there is not much work towards handwritten character recognition of Indian languages. In this paper we propose a modified quadratic classifier based scheme towards the recognition of off-line handwritten numerals of six popular Indian scripts. Here we consider Devnagari, Bangla, Telugu, Oriya, Kannada and Tamil scripts for our experiment. The features used in the classifier are obtained from the directional information of the numerals. For feature computation, the bounding box of a numeral is segmented into blocks and the directional features are computed in each of the blocks. These blocks are then down sampled by a Gaussian filter and the features obtained from the down sampled blocks are fed to a modified quadratic classifier for recognition. Here we have used two sets of feature. We have used 64 dimensional features for high-speed recognition and 400 dimensional features for high-accuracy recognition in our proposed system. A five-fold cross validation technique has been used for result computation and we obtained 99.56%, 98.99%, 99.37%, 98.40%, 98.71% and 98.51% accuracy from Devnagari, Bangla, Telugu, Oriya, Kannada, and Tamil scripts, respectively.
Pal, U, Sharma, N, Wakabayashi, T & Kimura, F 2007, 'Off-line handwritten character recognition of devnagari script', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 496-500.View/Download from: Publisher's site
In this paper we present a system towards the recognition of off-line handwritten characters of Devnagari, the most popular script in India. The features used for recognition purpose are mainly based on directional information obtained from the arc tangent of the gradient. To get the feature, at first, a 2 x 2 mean filtering is applied 4 times on the gray level image and a non-linear size normalization is done on the image. The normalized image is then segmented to 49 x 49 blocks and a Roberts filter is applied to obtain gradient image. Next, the arc tangent of the gradient (direction of gradient) is initially quantized into 32 directions and the strength of the gradient is accumulated with each of the quantized direction. Finally, the blocks and the directions are down sampled using Gaussian filter to get 392 dimensional feature vector. A modified quadratic classifier is applied on these features for recognition. We used 36172 handwritten data for testing our system and obtained 94.24% accuracy using 5-fold cross-validation scheme. © 2007 IEEE.