Dr. Nabin Sharma is a Senior Lecturer, School of Software, Faculty of Engineering & IT at UTS. He graduated with a PhD from the School of ICT, Griffith University, Queensland, Australia. His research area focuses on video and image processing, pattern recognition, and machine learning techniques for object detection and recognition. He has more than 14 years of experience in research & development and academia. He has substantial industry experience in software design and development while working on various projects at IBM India Private Ltd.
He have published over 38 papers in referred books, conferences and journals. His Google Scholar profile reports 904 citations with i10-index of 15 and h-index of 13. He secured research grants for projects with funds exceeding AUD$275K, in collaboration with industry and academia. He received the runner-up award in ‘The Young IT Professional Award Competition 2006’ organized by the Computer Society of India (CSI), East Zone, and was also nominated for the prestigious ‘Young Scientist Award 2006’, of The Indian Science Congress Association. He received the ‘Spirit of GovHack 2015’ award at GovHack 2015, Griffith University Gold Coast Campus. Winner of iAward NSW 2018 in the following three major categories for the SharkSpotter project,
- Research and Development Project of the Year,
- Artificial Intelligence or Machine Learning Innovation of the Year, and
- Community Service Markets
The Australian Information Industry Association (AIIA) NSW iAwards is the nation’s leading awards program for innovation in the digital economy.
Winner of National iAwards 2018 for the SharkSpotter project in the Artificial Intelligence or Machine Learning Innovation of the Year category.
Recieved Merit Award at Asia Pacific ICT Alliance Awards (APICTA) 2018 for the SharkSpotter project in the Artificial Intelligence Technology of the Year category.
- Senior Member of Institute of Electrical and Electronics Engineers (IEEE)
- Member of Association for Computing Machinery (ACM)
- Life Member of Indian Unit for Pattern Recognition and Artificial Intelligence (IUPRAI) (IAPR)
- Member of Australian Water Association (AWA)
- Pattern Recognition (Elsevier)
- Pattern Recognition Letters (Elsevier)
- ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
- Malaysian Journal of Computer Science
- IPSJ Transactions on Computer Vision and Applications (Springer)
- Frontiers of Computer Science
- Connection Science
- International Journal of Pattern Recognition and Artificial Intelligence.
- Karbala International Journal of Modern Science
- Journal of Imaging
- International Conference on Pattern Recognition (ICPR), 2012, 2014, 2016.
- Digital Image Computing: Techniques and Applications (DICTA), 2016, 2017, 2018.
- International Conference on Document Analysis and Recognition (ICDAR) 2013.
- International Conference on Frontiers in Handwriting Recognition. 2014, 2016.
- International Conference on Intelligent Systems Design and Applications (ISDA), 2012
- International Conference on Hybrid Intelligent Systems (HIS) 2012.
- IAPR International Workshop on Document Analysis Systems (DAS) 2012.
- International Workshop on Camera-Based Document Analysis and Recognition (CBDAR) 2013.
- International Conference on Advances in Pattern Recognition (ICAPR), 2017
- Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI) 2016, 2018.
- IAPR International Workshop on Document Analysis Systems (DAS) 2012
- Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI) 2016.
- International Conference on Man-Machine Interactions (ICMMI), 2017
- Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI) 2018.
- The International Conference on Digital Image Computing: Techniques and Applications (DICTA) 2017.
- 16th Australasian Symposium on Parallel and Distributed Computing (AusPDC) 2018.
- Organizing Chair: 15th International Conference on Document Analysis and Recognition (ICDAR) 2019, Sydney.
- 2014 – Present: Associate Editor - Gate to Computer Vision and Pattern Recognition (gtCVPR) journal.
Conference Special Session organizer:
- Deep learning for computer vision: theory and applications, International Conference on Neural Information Processing (ICONIP) 2017.
- Feature extraction and learning on image and text data, 2018 IEEE World Congress on Computational Intelligence, IJCNN.
- Advances in Document Analysis and Recognition, 2018 IEEE World Congress on Computational Intelligence, IJCNN.
Can supervise: YES
Nabin's research interests include Object detection, Document Analysis, Handwritten Character Recognition, Biometrics, Machine Learning, Image Processing, and Pattern Recognition. Nabin has published several research papers and book chapters based on his work.
His current reseach focus is on marine animal detection from aerial imagery, beach surveillance from aerial imagery, human gesture recognition, crowd analysis, use of deep learninig technqiues for solving object detection and classtification problems, to mention a few.
Fundamental of Software Development (32555)
.Net Application Development (32998)
Application Development with .Net (31927)
- Programming: Java, Python, C, C++
- Deep Learning and Convolutional Neural Networks
- Data Structure and Algorithms
- System Analysis and Design
Chou, KP, Prasad, M, Wu, D, Sharma, N, Li, DL, Lin, YF, Blumenstein, M, Lin, WC & Lin, CT 2018, 'Robust Feature-Based Automated Multi-View Human Action Recognition System', IEEE Access, vol. 6, pp. 15283-15296.View/Download from: Publisher's site
© 2013 IEEE. Automated human action recognition has the potential to play an important role in public security, for example, in relation to the multiview surveillance videos taken in public places, such as train stations or airports. This paper compares three practical, reliable, and generic systems for multiview video-based human action recognition, namely, the nearest neighbor classifier, Gaussian mixture model classifier, and the nearest mean classifier. To describe the different actions performed in different views, view-invariant features are proposed to address multiview action recognition. These features are obtained by extracting the holistic features from different temporal scales which are modeled as points of interest which represent the global spatial-temporal distribution. Experiments and cross-data testing are conducted on the KTH, WEIZMANN, and MuHAVi datasets. The system does not need to be retrained when scenarios are changed which means the trained database can be applied in a wide variety of environments, such as view angle or background changes. The experiment results show that the proposed approach outperforms the existing methods on the KTH and WEIZMANN datasets.
Sharma, N, Shivakumara, P, Pal, U, Blumenstein, M & Tan, CL 2015, 'Piece-wise linearity based method for text frame classification in video', Pattern Recognition, vol. 48, no. 3, pp. 862-881.View/Download from: UTS OPUS or Publisher's site
© 2014 Elsevier Ltd. All rights reserved. The aim of text frame classification technique is to label a video frame as text or non-text before text detection and recognition. It is an essential step prior to text detection because text detection methods assume the input to be a text frame. Consequently, when a non-text frame is subjected to text detection, the precision of the text detection method decreases because of false positives. In this paper a new text frame classification approach based on component linearity is proposed. The method firstly obtains probable text clusters from the gradient values of the RGB images of an input video frame. The Sobel edges corresponding to the text cluster are then extracted and used for further processing. Next, the method proposes to eliminate false text components before undertaking a linearity check where the linearity of the text components is determined using their centroids in a piece-wise manner. If the components in a frame satisfy the defined linearity condition, then the frame is considered as a text frame; otherwise it is considered as a non-text frame. The proposed method has been tested on standard text and non-text datasets of different orientations to demonstrate that it is independent of orientation. A comparative study with the existing method shows that the proposed method is superior in terms of classification rate and processing time.
Pal, U, Jayadevan, R & Sharma, N 2012, 'Handwriting recognition in indian regional scripts: A survey of offline techniques', ACM Transactions on Asian Language Information Processing, vol. 11, no. 1.View/Download from: UTS OPUS or Publisher's site
Offline handwriting recognition in Indian regional scripts is an interesting area of research as almost 460 million people in India use regional scripts. The nine major Indian regional scripts are Bangla (for Bengali and Assamese languages), Gujarati, Kannada, Malayalam, Oriya, Gurumukhi (for Punjabi language), Tamil, Telugu, and Nastaliq (for Urdu language). A state-of-the-art survey about the techniques available in the area of offline handwriting recognition (OHR) in Indian regional scripts will be of a great aid to the researchers in the subcontinent and hence a sincere attempt is made in this article to discuss the advancements reported in this regard during the last few decades. The survey is organized into different sections. A brief introduction is given initially about automatic recognition of handwriting and official regional scripts in India. The nine regional scripts are then categorized into four subgroups based on their similarity and evolution information. The first group contains Bangla, Oriya, Gujarati and Gurumukhi scripts. The second group contains Kannada and Telugu scripts and the third group contains Tamil and Malayalam scripts. The fourth group contains only Nastaliq script (Perso-Arabic script for Urdu), which is not an Indo-Aryan script. Various feature extraction and classification techniques associated with the offline handwriting recognition of the regional scripts are discussed in this survey. As it is important to identify the script before the recognition step, a section is dedicated to handwritten script identification techniques. A benchmarking database is very important for any pattern recognition related research. The details of the datasets available in different Indian regional scripts are also mentioned in the article. A separate section is dedicated to the observations made, future scope, and existing difficulties related to handwriting recognition in Indian regional scripts. We hope that this survey will serve as a compendi...
Saqib, M, Daud Khan, S, Sharma, N & Blumenstein, M 2018, 'Extracting descriptive motion information from crowd scenes', International Conference Image and Vision Computing New Zealand, pp. 1-6.View/Download from: Publisher's site
© 2017 IEEE. An important contribution that automated analysis tools can generate for management of pedestrians and crowd safety is the detection of conflicting large pedestrian flows: this kind of movement pattern, in fact, may lead to dangerous situations and potential threats to pedestrian's safety. For this reason, detecting dominant motion patterns and summarizing motion information from the scene are inevitable for crowd management. In this paper, we develop a framework that extracts motion information from the scene by generating point trajectories using particle advection approach. The trajectories obtained are then clustered by using unsupervised hierarchical clustering algorithm, where the similarity is measured by the Longest Common Sub-sequence (LCS) metric. The achieved motions patterns in the scene are summarized and represented by using color-coded arrows, where speeds of the different flows are encoded with colors, the width of an arrow represents the density (number of people belonging to a particular motion pattern) while the arrowhead represents the direction. This novel representation of crowded scene provides a clutter free visualization which helps the crowd managers in understanding the scene. Experimental results show that our method outperforms state-of-the-art methods.
Coluccia, A, Ghenescu, M, Piatrik, T, De Cubber, G, Schumann, A, Sommer, L, Klatte, J, Schuchert, T, Beyerer, J, Farhadi, M, Amandi, R, Aker, C, Kalkan, S, Saqib, M, Sharma, N, Makkah, SDK & Blumenstein, M 2017, 'Drone-vs-Bird detection challenge at IEEE AVSS2017', Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017, IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Lecce, Italy, pp. 1-6.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Small drones are a rising threat due to their possible misuse for illegal activities, in particular smuggling and terrorism. The project SafeShore, funded by the European Commission under the Horizon 2020 program, has launched the 'drone-vs-bird detection challenge' to address one of the many technical issues arising in this context. The goal is to detect a drone appearing at some point in a video where birds may be also present: the algorithm should raise an alarm and provide a position estimate only when a drone is present, while not issuing alarms on birds. This paper reports on the challenge proposal, evaluation, and results1.
Saqib, M, Daud Khan, S, Sharma, N & Blumenstein, M 2017, 'A study on detecting drones using deep convolutional neural networks', Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017, IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Lecce, Italy.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. The object detection is a challenging problem in computer vision with various potential real-world applications. The objective of this study is to evaluate the deep learning based object detection techniques for detecting drones. In this paper, we have conducted experiments with different Convolutional Neural Network (CNN) based network architectures namely Zeiler and Fergus (ZF), Visual Geometry Group (VGG16) etc. Due to sparse data available for training, networks are trained with pre-trained models using transfer learning. The snapshot of trained models is saved at regular interval during training. The best models having high mean Average Precision (mAP) for each network architecture are used for evaluation on the test dataset. The experimental results show that VGG16 with Faster R-CNN perform better than other architectures on the training dataset. Visual analysis of the test dataset is also presented.
Wu, D, Sharma, N & Blumenstein, M 2017, 'Recent Advances in Video Based Human Action Recognition Using Deep Learning: A Review', Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), International Joint Conference on Neural Networks, IEEE, Anchorage, Alaska, USA, pp. 2865-2872.View/Download from: UTS OPUS or Publisher's site
In this work, we consider the problem of robust principal component analysis (RPCA) for streaming noisy data that has been highly compressed. This problem is prominent when one deals with high-dimensional and large-scale data and data compression is necessary. To solve this problem, we propose an online compressed RPCA algorithm to efficiently recover the low-rank components of raw data. Though data compression incurs severe information loss, we provide deep analysis on the proposed algorithm and prove that the low-rank component can be asymptotically recovered under mild conditions. Compared with other recent works on compressed RPCA, our algorithm reduces the memory cost significantly by processing data in an online fashion and reduces the communication cost by accepting sequential compressed data as input.
Cheng, EJ, Prasad, M, Puthal, D, Sharma, N, Prasad, OK, Chin, PH, Lin, CT & Blumenstein, M 2017, 'Deep Learning Based Face Recognition with Sparse Representation Classification', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 665-674.View/Download from: Publisher's site
© 2017, Springer International Publishing AG. Feature extraction is an essential step in solving real-world pattern recognition and classification problems. The accuracy of face recognition highly depends on the extracted features to represent a face. The traditional algorithms uses geometric techniques, comprising feature values including distance and angle between geometric points (eyes corners, mouth extremities, and nostrils). These features are sensitive to the elements such as illumination, variation of poses, various expressions, to mention a few. Recently, deep learning techniques have been very effective for feature extraction, and deep features have considerable tolerance for various conditions and unconstrained environment. This paper proposes a two layer deep convolutional neural network (CNN) for face feature extraction and applied sparse representation for face identification. The sparsity and selectivity of deep features can strengthen sparseness for the solution of sparse representation, which generally improves the recognition rate. The proposed method outperforms other feature extraction and classification methods in terms of recognition accuracy.
Sharma, N, Mandal, R, Sharma, R, Pal, U & Blumenstein, M 2015, 'Bag-of-Visual Words for word-wise video script identification: A study', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Killarney, Ireland.View/Download from: Publisher's site
© 2015 IEEE. Use of multiple scripts for information communication through various media is quite common in a multilingual country. Optical character recognition of such document images or videos assists in indexing them for effective information retrieval. Hence, script identification from multi-lingual documents/images is a necessary step for selecting the appropriate OCR, due the absence of a single OCR system capable of handling multiple scripts. Script identification from printed as well as handwritten documents is a well-researched area, but script identification from video frames has not been explored much. Low resolution, blur, noisy background, to mention a few are the major bottle necks when processing video frames, and makes script identification from video images a challenging task. This paper examines the potential of Bag-of-Visual Words based techniques for word-wise script identification from video frames. Two different approaches namely, Bag-Of-Features (BoF) and Spatial Pyramid Matching (SPM), using patch based SIFT descriptors were considered for the current study. SVM Classifier was used for analysing the three popular south Indian scripts, namely Tamil, Telugu and Kannada in combination with English and Hindi. A comparative study of Bag-of-Visual words with traditional script identification techniques involving gradient based features (e.g. HoG) and texture based features (e.g. LBP) is presented. Experimental results shows that patch-based features along with SPM outperformed the traditional techniques and promising accuracies were achieved on 2534 words from the five scripts. The study reveals that patch-based feature can be used for scripts identification in-order to overcome the inherent problems with video frames.
Sharma, N, Mandal, R, Sharma, R, Pal, U & Blumenstein, M 2015, 'ICDAR2015 Competition on Video Script Identification (CVSI 2015)', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1196-1200.View/Download from: Publisher's site
© 2015 IEEE. This paper presents the final results of the ICDAR 2015 Competition on Video Script Identification. A description and performance of the participating systems in the competition are reported. The general objective of the competition is to evaluate and benchmark the available methods on word-wise video script identification. It also provides a platform for researchers around the globe to particularly address the video script identification problem and video text recognition in general. The competition was organised around four different tasks involving various combinations of scripts comprising tri-script and multi-script scenarios. The dataset used in the competition comprised ten different scripts. In total, six systems were received from five participants over the tasks offered. This report details the competition dataset specifications, evaluation criteria, summary of the participating systems and their performance across different tasks. The systems submitted by Google Inc. were the winner of the competition for all the tasks, whereas the systems received from Huazhong University of Science and Technology (HUST) and Computer Vision Center (CVC) were very close competitors.
Sharma, N, Mandal, R, Sharma, R, Roy, PP, Pal, U & Blumenstein, M 2015, 'Multi-lingual text recognition from video frames', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, International Conference on Document Analysis and Recognition (ICDAR), IEEE, Nancy, France, pp. 951-955.View/Download from: Publisher's site
© 2015 IEEE. Text recognition from video frames is a challenging task due to low resolution, blur, complex and coloured backgrounds, noise, to mention a few. Consequently, the traditional ways of text recognition from scanned documents having simple backgrounds fails when applied to video text. Although there are various techniques available for text recognition from handwritten and printed documents with simple backgrounds, text recognition from video frames has not been comprehensively investigated, especially for multi-lingual videos. In this paper, we present a technique for multi-lingual video text recognition which involves script identification in the first stage, followed by word and character recognition, and finally the results are refined using a post-processing technique. Considering the inherent problems in videos, a Spatial Pyramid Matching (SPM) based technique, using patch-based SIFT descriptors and SVM classifier, is employed for script identification. In the next stage, a Hidden Markov Model (HMM) based approach is used for word and character recognition, which utilizes the context information. Finally, a lexicon-based post-processing technique is applied to verify and refine the word recognition results. The proposed method was tested on a dataset comprising of 4800 words from three different scripts, namely, Roman (English), Hindi and Bengali. The script identification results obtained are encouraging. The word and character recognition results are also encouraging considering the complexity and problems associated with video text processing.
Sharma, N, Pal, U & Blumenstein, M 2014, 'A study on word-level multi-script identification from video frames', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, Beijing, China, pp. 1827-1833.View/Download from: UTS OPUS or Publisher's site
© 2014 IEEE. The presence of multiple scripts in multi-lingual document images makes Optical Character Recognition (OCR) of such documents a challenging task. Due to the unavailability of a single OCR system which can handle multiple scripts, script identification becomes an essential step for choosing the appropriate OCR. Although, there are various techniques available for script identification from handwritten and printed documents having simple backgrounds, however script identification from video frames has been seldom explored. Video frames are coloured and suffer from low resolution, blur, complex background and noise to mention a few, which makes the script identification process a challenging task. This paper presents a study of various combinations of features and classifiers to explore whether the traditional script identification techniques can be applied to video frames. A texture based feature namely, Local Binary Pattern (LBP), Gradient based features namely, Histogram of Oriented Gradient (HoG) and Gradient Local Auto-Correlation (GLAC) were used in the study. Combination of the features with SVMs and ANNs where used for classification. Three popular scripts, namely English, Bengali and Hindi were considered in the present study. Due to the inherent problems with the video, a super resolution technique was applied as a pre-processing step. Experiments show that the GLAC feature has performed better than the other features, and an accuracy of 94.25% was achieved when testing on 1271 words from three different scripts. The study also reveals that gradient features are more suitable for script identification than the texture features when using traditional script identification techniques on video frames.
Shivakumara, P, Sharma, N, Pal, U, Blumenstein, M & Tan, CL 2014, 'Gradient-angular-features for word-wise video script identification', Proceedings - International Conference on Pattern Recognition, International Conference on Pattern Recognition, IEEE, Sweden, pp. 3098-3103.View/Download from: UTS OPUS or Publisher's site
© 2014 IEEE. Script identification at the word level is challenging because of complex backgrounds and low resolution of video. The presence of graphics and scene text in video makes the problem more challenging. In this paper, we employ gradient angle segmentation on words from video text lines. This paper presents new Gradient-Angular-Features (GAF) for video script identification, namely, Arabic, Chinese, English, Japanese, Korean and Tamil. This work enables us to select an appropriate OCR when the frame has words of multi-scripts. We employ gradient directional features for segmenting words from video text lines. For each segmented word, we study the gradient information in effective ways to identify text candidates. The skeleton of the text candidates is analyzed to identify Potential Text Candidates (PTC) by filtering out unwanted text candidates. We propose novel GAF for the PTC to study the structure of the components in the form of cursiveness and softness. The histogram operation on the GAF is performed in different ways to obtain discriminative features. The method is evaluated on 760 words of six scripts having low contrast, complex background, different font sizes, etc. in terms of the classification rate and is compared with an existing method to show the effectiveness of the method. We achieve 88.2% average classification rate.
Sharma, N, Chanda, S, Pal, U & Blumenstein, M 2013, 'Word-wise script identification from video frames', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, International Conference on Document Analysis and Recognition (ICDAR), IEEE, Washington, DC, USA, pp. 867-871.View/Download from: UTS OPUS or Publisher's site
Script identification is an essential step for the efficient use of the appropriate OCR in multilingual document images. There are various techniques available for script identification from printed and handwritten document images, but script identification from video frames has not been explored much. This paper presents a study of some pre-processing techniques and features for word-wise script identification from video frames. Traditional features, namely Zernike moments, Gabor and gradient, have performed well for handwritten and printed documents having simple backgrounds and adequate resolution for OCR. Video frames are mostly coloured and suffer from low resolution, blur, background noise, to mention a few. In this paper, an attempt has been made to explore whether the traditional script identification techniques can be useful in video frames. Three feature extraction techniques, namely Zernike moments, Gabor and gradient features, and SVM classifiers were considered for analyzing three popular scripts, namely English, Bengali and Hindi. Some pre-processing techniques such as super resolution and skeletonization of the original word images were used in order to overcome the inherent problems with video. Experiments show that the super resolution technique with gradient features has performed well, and an accuracy of 87.5% was achieved when testing on 896 words from three different scripts. The study also reveals that the use of proper pre-processing approaches can be helpful in applying traditional script identification techniques to video frames. © 2013 IEEE.
Sharma, N, Shivakumara, P, Pal, U, Blumenstein, M & Tan, CL 2013, 'A new method for character segmentation from multi-oriented video words', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, International Conference on Document Analysis and Recognition (ICDAR), IEEE, USA, pp. 413-417.View/Download from: UTS OPUS or Publisher's site
This paper presents a two-stage method for multi-oriented video character segmentation. Words segmented from video text lines are considered for character segmentation in the present work. Words can contain isolated or non-touching characters, as well as touching characters. Therefore, the character segmentation problem can be viewed as a two stage problem. In the first stage, text cluster is identified and isolated (non-touching) characters are segmented. The orientation of each word is computed and the segmentation paths are found in the direction perpendicular to the orientation. Candidate segmentation points computed using the top distance profile are used to find the segmentation path between the characters considering the background cluster. In the second stage, the segmentation results are verified and a check is performed to ascertain whether the word component contains touching characters or not. The average width of the components is used to find the touching character components. For segmentation of the touching characters, segmentation points are then found using average stroke width information, along with the top and bottom distance profiles. The proposed method was tested on a large dataset and was evaluated in terms of precision, recall and f-measure. A comparative study with existing methods reveals the superiority of the proposed method. © 2013 IEEE.
Sharma, N., Shivakumara, P., Pal, U., Blumenstein, M. & Tan, C.L. 2012, 'A new method for word segmentation from arbitrarily-oriented video text lines', 2012 International Conference on Digital Image Computing Techniques and Applications, DICTA 2012.View/Download from: Publisher's site
Word segmentation has become a research topic to improve OCR accuracy for video text recognition, because a video text line suffers from arbitrary orientation, complex background and low resolution. Therefore, for word segmentation from arbitrarily-oriented video text lines, in this paper, we extract four new gradient directional features for each Canny edge pixel of the input text line image to produce four respective pixel candidate images. The union of four pixel candidate images is performed to obtain a text candidate image. The sequence of the components in the text candidate image according to the text line is determined using nearest neighbor criteria. Then we propose a two-stage method for segmenting words. In the first stage, for the distances between the components, we apply K-means clustering with K=2 to get probable word and non-word spacing clusters. The words are segmented based on probable word spacing and all other components are passed to the second stage for segmenting correct words. For each segmented and un-segmented words passed to the second stage, the method repeats all the steps until the K-means clustering step to find probable word and non-word spacing clusters. Then the method considers cluster nature, height and width of the components to identify the correct word spacing. The method is tested extensively on video curved text lines, non-horizontal straight lines, horizontal straight lines and text lines from the ICDAR-2003 competition data. Experimental results and a comparative study shows the results are encouraging and promising. © 2012 IEEE.
Sharma, N, Pal, U & Blumenstein, M 2012, 'Recent advances in video based document processing: A review', Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, International Workshop on Document Analysis Systems, IEEE, Institute of Electrical and Electronics Engineers, Gold Coast, Australia, pp. 63-68.View/Download from: UTS OPUS or Publisher's site
Extraction and recognition of text present in video has become a very popular research area in the last decade. Generally, text present in video frames is of different size, orientation, style, etc. with complex backgrounds, noise, low resolution and contrast. These factors make the automatic text extraction and recognition in video frames a challenging task. A large number of techniques have been proposed by various researchers in the recent past to address the problem. This paper presents a review of various state-of-the-art techniques proposed towards different stages (e.g. detection, localization, extraction, etc.) of text information processing in video frames. Looking at the growing popularity and the recent developments in the processing of text in video frames, this review imparts details of current trends and potential directions for further research activities to assist researchers. © 2012 IEEE.
Sharma, N, Shivakumara, P, Pal, U, Blumenstein, M & Tan, CL 2012, 'A new method for arbitrarily-oriented text detection in video', Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, International Workshop on Document Analysis Systems, IEEE, Institute of Electrical and Electronics Engineers, Gold Coast, Australia, pp. 74-78.View/Download from: UTS OPUS or Publisher's site
Text detection in video frames plays a vital role in enhancing the performance of information extraction systems because the text in video frames helps in indexing and retrieving video efficiently and accurately. This paper presents a new method for arbitrarily-oriented text detection in video, based on dominant text pixel selection, text representatives and region growing. The method uses gradient pixel direction and magnitude corresponding to Sobel edge pixels of the input frame to obtain dominant text pixels. Edge components in the Sobel edge map corresponding to dominant text pixels are then extracted and we call them text representatives. We eliminate broken segments of each text representatives to get candidate text representatives. Then the perimeter of candidate text representatives grows along the text direction in the Sobel edge map to group the neighboring text components which we call word patches. The word patches are used for finding the direction of text lines and then the word patches are expanded in the same direction in the Sobel edge map to group the neighboring word patches and to restore missing text information. This results in extraction of arbitrarily-oriented text from the video frame. To evaluate the method, we considered arbitrarily-oriented data, non-horizontal data, horizontal data, Hua's data and ICDAR-2003 competition data (Camera images). The experimental results show that the proposed method outperforms the existing method in terms of recall and f-measure. © 2012 IEEE.
Pal, U, Sharma, N, Wakabayashi, T & Kimura, F 2008, 'Handwritten character recognition of popular South Indian scripts', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 251-264.View/Download from: Publisher's site
India is a multi-lingual, multi-script country. Considerably less work has been done towards handwritten character recognition of Indian languages than for other languages. In this paper we propose a quadratic classifier based scheme for the recognition of off-line handwritten characters of three popular south Indian scripts: Kannada, Telugu, and Tamil. The features used here are mainly obtained from the directional information. For feature computation, the bounding box of a character is segmented into blocks, and the directional features are computed in each block. These blocks are then down-sampled by a Gaussian filter, and the features obtained from the down-sampled blocks are fed to a modified quadratic classifier for recognition. Here, we used two sets of features. We used 64-dimensional features for high speed recognition and 400-dimensional features for high accuracy recognition. A five-fold cross validation technique was used for result computation, and we obtained 90.34%, 90.90%, and 96.73% accuracy rates from Kannada, Telugu, and Tamil characters, respectively, from 400 dimensional features. © 2008 Springer-Verlag Berlin Heidelberg.
Pal, U, Sharma, N, Wakabayashi, T & Kimura, F 2007, 'Off-line handwritten character recognition of devnagari script', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 496-500.View/Download from: Publisher's site
In this paper we present a system towards the recognition of off-line handwritten characters of Devnagari, the most popular script in India. The features used for recognition purpose are mainly based on directional information obtained from the arc tangent of the gradient. To get the feature, at first, a 2 x 2 mean filtering is applied 4 times on the gray level image and a non-linear size normalization is done on the image. The normalized image is then segmented to 49 x 49 blocks and a Roberts filter is applied to obtain gradient image. Next, the arc tangent of the gradient (direction of gradient) is initially quantized into 32 directions and the strength of the gradient is accumulated with each of the quantized direction. Finally, the blocks and the directions are down sampled using Gaussian filter to get 392 dimensional feature vector. A modified quadratic classifier is applied on these features for recognition. We used 36172 handwritten data for testing our system and obtained 94.24% accuracy using 5-fold cross-validation scheme. © 2007 IEEE.
Pal, U, Wakabayashi, T, Sharma, N & Kimura, F 2007, 'Handwritten numeral recognition of six popular Indian scripts', Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 749-753.View/Download from: Publisher's site
India is a multi-lingual multi-script country but there is not much work towards handwritten character recognition of Indian languages. In this paper we propose a modified quadratic classifier based scheme towards the recognition of off-line handwritten numerals of six popular Indian scripts. Here we consider Devnagari, Bangla, Telugu, Oriya, Kannada and Tamil scripts for our experiment. The features used in the classifier are obtained from the directional information of the numerals. For feature computation, the bounding box of a numeral is segmented into blocks and the directional features are computed in each of the blocks. These blocks are then down sampled by a Gaussian filter and the features obtained from the down sampled blocks are fed to a modified quadratic classifier for recognition. Here we have used two sets of feature. We have used 64 dimensional features for high-speed recognition and 400 dimensional features for high-accuracy recognition in our proposed system. A five-fold cross validation technique has been used for result computation and we obtained 99.56%, 98.99%, 99.37%, 98.40%, 98.71% and 98.51% accuracy from Devnagari, Bangla, Telugu, Oriya, Kannada, and Tamil scripts, respectively.
Sharma, N, Pal, U & Kimura, F 2007, 'Recognition of handwritten Kannada numerals', Proceedings - 9th International Conference on Information Technology, ICIT 2006, pp. 133-136.View/Download from: Publisher's site
This paper deals with a quadratic classifier based scheme for the recognition of off-line handwritten numerals of Kannada, an important Indian script. The features used in the classifier are obtained from the directional chain code information of the contour points of the characters. The bounding box of a character is segmented into blocks and the chain code histogram is computed in each of the blocks. Here we have used 64 dimensional and 100 dimensional features for a comparative study on the recognition accuracy of our proposed system. This chain code features are fed to the quadratic classifier for recognition. We tested our scheme on 2300 data samples and obtained 97.87% and 98.45% recognition accuracy using 64 dimensional and 100 dimensional features respectively, from the proposed scheme using five-fold cross-validation technique. © 2006 IEEE.