Stuart Perry has over 20 years of experience conducting research into image processing, psychophysics, signal processing, image quality, and models for the quantification of image preference and aesthetics for both government, industry and academia. Following receiving a PhD from the University of Sydney in 1999, he began his research career studying the application of image processing to the detection of underwater objects in sector-scan sonar imagery for the Maritime Operations Division, Defence Science and Technology Organisation (DSTO). During this time he represented Australia on Computer-Aided Detection and Classification Specialist Group, Technical Panel 13 (Mine Warfare and High Frequency Acoustics), Maritime Group, The Technical Cooperation Program (TTCP).
From 2003 to 2016, he worked for Canon Information Systems Research Australia (CiSRA), a Canon group company and one of the Canon Group's largest R&D facilities outside of Japan. During this time he worked on camera white balancing technologies, and led research teams working on print quality measurement, document security and perceptual quality measurement for various consumer devices.
In 2016 he joined the Faculty of Engineering and IT's Perceptual Imaging Laboratory (PILab) conducting research into colour and perceptual quality in 3D environments.
- Member of Institute of Electrical and Electronic Engineers (IEEE) from 1996
- Member of Society for Photo-Optical Instrumentation Engineers (SPIE) from 1998
- Member of Society for Imaging Science and Technology (IS&T)
Membership on Scholarly Committees:
- Australian Point of Contact for the Computer-Aided Detection and Classification Specialist Group, Technical Panel 13 (Mine Warfare and High Frequency Acoustics), Maritime Group, The Technical Cooperation Program (TTCP). (2001–2003)
- Local Arrangements and Registration Chair: 2000 IEEE Workshop on Neural Networks in Signal Processing, December 2000, University of Sydney, NSW, Australia.
- Session Chair and Local Arrangements and Registration Chair: First IEEE Pacific Rim Conference on Multimedia, December 2000, University of Sydney, NSW, Australia.
- Organising Committee Member and Session Chair: First Computer-Aided Detection and Computer-Aided Classification Conference (CADCAC 2001), 12-14th of November 2001, Halifax, Nova Scotia, Canada.
Technical Committee Member and Session Chair, 2004 IEEE International Conference on Image Processing (ICIP 2004), 24th-26th of September 2004, Singapore.
- Technical Commitee Member and Session Chair, 2013 IEEE International Conference on Image Processing (ICIP 2013), 15th–18th of September 2013, Melbourne, Australia.
- Program Committee Member, Image Quality and System Performance , IS&T/SPIE Electronic Imaging, 2013–current.
- Program Committee Member, Optics, Photonics and Digital Technologies for Imaging Applications, SPIE Optics + Photonics, 2014-current.
- Committee Member, Standards Australia Committee MS-65, Australian Mirror Committee for ISO/TC 42 Photography.
- Committee Member, Standards Australia Committee IT-029-01, Australian Mirror Committee for ISO/IED JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information. He is an editor of ISO IEC 21794 Part 1 the upcoming ISO standard on compression of light field images as well as chair of the JPEG Pleno Ad Hoc Group on Point Cloud Compression.
- “Electrical Engineering Foundation Award for Excellence in Teaching (Tutoring) 1998”, awarded by the School of Electrical and Information Engineering, University of Sydney, Australia.
- “IS&T Service Award 2014”, awarded by the Society for Imaging Science and Technology (IS&T) for contributions to the Data Analytics Task Force in 2013.
Australian Patent Applications:
2015201623 Choosing optimal images with preference distributions, Perry, Stuart William
2014201797 Method, apparatus and system for determining chromatic difference, Perry, Stuart William; Bonnier, Nicolas Pierre Marie Frederic
2013276980 Chroma structure affecting chroma perception, Woolfe, Geoffrey John; Bonnier, Nicolas Pierre Marie Frederic; Pakulski, Peter Jan; Perry, Stuart William; Rich, Anina Nicole; Williams, Mark Alexander; Weldon, Kimberly
2013273630 Observer preference model. Perry, Stuart William
2009238260 Forgery detection using finger print, Perry, Stuart William; Gupta, Amit Kumar
2009203182 Document authentication using handheld device, Fields, Andrew James; DeQiang, Eugene Cai; Gibson, Ian Richard; Perry, Stuart William
2008264191 Detecting and marking incidental test patches in a printed document for the purpose of print quality analysis, Yenson, Brendon; Degros, Francois; Perry, Stuart William
2008260092 Document authentication and workflow, Drake, Barry James; Gibson, Ian Richard; Amielh, Myriam Elisa Lucie; Hardy, Stephen James; Perry, Stuart William
2008252022 Colour printing with achromatic substance, Perry, Stuart William
2007254658 Positional alignment accuracy of a printer, Larkin, Kieran Gerard; Duggan, Matthew Christian; Perry, Stuart William
2007254655 Authenticating partially transparent medium, Rudkin, Scott Alexander; Ecob, Stephen Edward; Perry, Stuart William
2007254624 A method for recommending quality analysis techniques for test targets in the process of designing test charts, Degros, Francois; Perry, Stuart William; Tot, Robert; Duggan, Matthew Christian
2006202198 Method of reviewing multiple images, Dorrell, Andrew James; Gibson, Richard Ian; Perry, Stuart William; Chan, Woei
2005242227 Camera System Implementing Flash-No-Flash Processing Mode, Dorrell, Andrew James; Gibson, Ian Richard; Perry, Stuart William; Chan, Woei
2005203381 White balance adjustment, Dorrell, Andrew James; Perry, Stuart William; Chan, Woei
2004906703 Selection of Images for White Balance Adjustment, Dorrell, Andrew James; Perry, Stuart William; Chan, Woei
2004906020 Post-capture fill flash, Dorrell, Andrew James; Perry, Stuart William; Chan, Woei
2004904409 White Balance Adjustment, Dorrell, Andrew James; Perry, Stuart William; Chan, Woei
US Granted Patents:
7,551,797, White Balance Adjustment, Andrew James Dorrell, Stuart William Perry, Woei Chan
US Patent Applications:
US2015/0169982, Observer Preference Model, Stuart William Perry
Can supervise: YES
Stuart is currently interested in many aspects of image processing and enabling technologies that allow machines to sense and understand their environment. This includes, adaptive image processing, adaptive image restoration, image restoration, noise removal, filtering, general image processing, as well as the detection and classification of objects using statistical and machine learning techniques.
Stuart is very interested in psychophysics and mathematical models that describe the human visual system and model perceptual image quality and other subjective human responses to imagery and objects such as material appearance, aesthetics and three dimensionality. Stuart believes that perceptual image quality remains an unsolved problem and effective models of perceptual image quality have the potential to improve a variety of image processing algorithms and open up new applications. In addition, new aspects of human perception have begun to be examined such as aesthetics and material appearance. These new aspects as well as the continuing need for effective image quality/preference measures represent exciting new directions in the image processing field. In the last few years, Stuart's attention has been shifting to the design and statistical analysis of psychophysical experiments to support research into the human perception of imagery. He has a keen interest in experimental design methodologies, machine learning, big data techniques and statistical analysis and regression and how these tools might be applied to problems in human perception, including the perception of aesthetics, three-dimensionality, material properties and quality.
31256 Image Processing and Pattern Recognition
31261 Internetworking Project
Yap, KH, Guan, L, Perry, SW & Wong, HS 2009, Adaptive image processing: A computational intelligence perspective, second edition.
© 2018 by Taylor & Francis Group, LLC. Illustrating essential aspects of adaptive image processing from a computational intelligence viewpoint, the second edition of Adaptive Image Processing: A Computational Intelligence Perspective provides an authoritative and detailed account of computational intelligence (CI) methods and algorithms for adaptive image processing in regularization, edge detection, and early vision. With three new chapters and updated information throughout, the new edition of this popular reference includes substantial new material that focuses on applications of advanced CI techniques in image processing applications. It introduces new concepts and frameworks that demonstrate how neural networks, support vector machines, fuzzy logic, and evolutionary algorithms can be used to address new challenges in image processing, including low-level image processing, visual content analysis, feature extraction, and pattern recognition. Emphasizing developments in state-of-the-art CI techniques, such as content-based image retrieval, this book continues to provide educators, students, researchers, engineers, and technical managers in visual information processing with the up-to-date understanding required to address contemporary challenges in image content processing and analysis.
Yap, K-H, Guan, L, Perry, SW & Wong, HS 2009, Adaptive Image Processing A Computational Intelligence Perspective, Second Edition, CRC Press.
Emphasizing developments in state-of-the-art CI techniques, such as content-based image retrieval, this book continues to provide educators, students, researchers, engineers, and technical managers in visual information processing with the ...
Pekka, A, da Silva Cruz, LA, da Silva, EAB, Ebrahimi, T, Freitas, PG, Gilles, A, Oh, K-J, Pagliari, C, Pereira, F, Perra, C, Perry, S, Pinheiro, AMG, Schelkens, P, Seidel, I & Tabus, I 2020, 'JPEG PLENO: STANDARDIZING A CODING FRAMEWORK AND TOOLS FOR PLENOPTIC IMAGING MODALITIES', ITU Journal: ICT Discoveries, vol. 3, no. 1, pp. 1-1.
JPEG Pleno is an upcoming standard from the ISO/IEC JTC 1/SC 29/WG 1 (JPEG) Committee. It aims to provide
a standard framework for coding new imaging modalities derived from representations inspired by the plenoptic function.
The image modalities addressed by the current standardization activities are light field, holography, and point clouds, where
these image modalities describe different sampled representations of the plenoptic function. The applications that may benefit
from these emerging image modalities range from supporting varying capture platforms, interactive content viewing,
cultural environments exploration and medical imaging to more immersive browsing with novel special effects and more realistic
images. These use cases come with a set of requirements addressed by the JPEG Pleno standard. Main requirements
envision high compression efficiency, random access, scalability, error-resilience, low complexity, and metadata support. This
paper presents a synopsis of the status of the standardization process and provides technical insights as well as the latest
performance evaluation results.
Wu, L, Xu, M, Wang, J & Perry, S 2020, 'Recall What You See Continually Using GridLSTM in Image Captioning', IEEE Transactions on Multimedia, vol. 22, no. 3, pp. 808-818.View/Download from: Publisher's site
The goal of image captioning is to automatically describe an image with a sentence, and the task has attracted research attention from both the computer vision and natural-language processing research communities. The existing encoder–decoder model and its variants, which are the most popular models for image captioning, use the image features in three ways: first, they inject the encoded image features into the decoder only once at the initial step, which does not enable the rich image content to be explored sufficiently while gradually generating a text caption; second, they concatenate the encoded image features with text as extra inputs at every step, which introduces unnecessary noise; and, third, they using an attention mechanism, which increases the computational complexity due to the introduction of extra neural nets to identify the attention regions. Different from the existing methods, in this paper, we propose a novel network, Recall Network, for generating captions that are consistent with the images. The recall network selectively involves the visual features by using a GridLSTM and, thus, is able to recall image contents while generating each word. By importing the visual information as the latent memory along the depth dimension LSTM, the decoder is able to admit the visual features dynamically through the inherent LSTM structure without adding any extra neural nets or parameters. The Recall Network efficiently prevents the decoder from deviating from the original image content. To verify the efficiency of our model, we conducted exhaustive experiments on full and dense image captioning. The experimental results clearly demonstrate that our recall network outperforms the conventional encoder–decoder model by a large margin and that it performs comparably to the state-of-the-art methods.
The importance of three-dimensional (3D) point cloud technologies in the field of agriculture environmental research has increased in recent years. Obtaining dense and accurate 3D reconstructions of plants and urban areas provide useful information for remote sensing. In this paper, we propose a novel strategy for the enhancement of 3D point clouds from a single 4D light field (LF) image. Using a light field camera in this way creates an easy way for obtaining 3D point clouds from one snapshot and enabling diversity in monitoring and modelling applications for remote sensing. Considering an LF image and associated depth map as an input, we first apply histogram equalization and histogram stretching to enhance the separation between depth planes. We then apply multi-modal edge detection by using feature matching and fuzzy logic from the central sub-aperture LF image and the depth map. These two steps of depth map enhancement are significant parts of our novelty for this work. After combing the two previous steps and transforming the point–plane correspondence, we can obtain the 3D point cloud. We tested our method with synthetic and real world image databases. To verify the accuracy of our method, we compared our results with two different state-of-the-art algorithms. The results showed that our method can reliably mitigate noise and had the highest level of detail compared to other existing methods.
Feng, X, Wan, W, Xu, RYD, Perry, S, Li, P & Zhu, S 2018, 'A novel spatial pooling method for 3D mesh quality assessment based on percentile weighting strategy', Computers & Graphics, vol. 74, pp. 12-22.View/Download from: Publisher's site
Feng, X, Wan, W, Xu, RYD, Perry, S, Zhu, S & Liu, Z 2018, 'A new mesh visual quality metric using saliency weighting-based pooling strategy', Graphical Models, vol. 99, pp. 1-12.View/Download from: Publisher's site
© 2018 Elsevier Inc. Several metrics have been proposed to assess the visual quality of 3D triangular meshes during the last decade. In this paper, we propose a mesh visual quality metric by integrating mesh saliency into mesh visual quality assessment. We use the Tensor-based Perceptual Distance Measure metric to estimate the local distortions for the mesh, and pool local distortions into a quality score using a saliency weighting-based pooling strategy. Three well-known mesh saliency detection methods are used to demonstrate the superiority and effectiveness of our metric. Experimental results show that our metric with any of three saliency maps performs better than state-of-the-art metrics on the LIRIS/EPFL general-purpose database. We generate a synthetic saliency map by assembling salient regions from individual saliency maps. Experimental results reveal that the synthetic saliency map achieves better performance than individual saliency maps, and the performance gain is closely correlated with the similarity between the individual saliency maps.
Perry, SW, Guan, L & Varjavandi, P 2006, 'Incorporating local statistics in image error measurement for adaptive image restoration', OPTICAL ENGINEERING, vol. 45, no. 3.View/Download from: Publisher's site
Perry, SW & Guan, L 2004, 'Pulse-length-tolerant features and detectors for sector-scan sonar imagery', IEEE JOURNAL OF OCEANIC ENGINEERING, vol. 29, no. 1, pp. 138-156.View/Download from: Publisher's site
Perry, SW & Ling, G 2004, 'A recurrent neural network for detecting objects in sequences of sector-scan sonar images', IEEE JOURNAL OF OCEANIC ENGINEERING, vol. 29, no. 3, pp. 857-871.View/Download from: Publisher's site
Lo, KW, Perry, SW & Ferguson, BG 2002, 'Aircraft flight parameter estimation using acoustical Lloyd's mirror effect', IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, vol. 38, no. 1, pp. 137-151.View/Download from: Publisher's site
Perry, SW & Guan, L 2002, 'A pulse length tolerant neural network‐based detector for sector‐scan sonar', The Journal of the Acoustical Society of America, vol. 112, no. 5, pp. 2307-2307.View/Download from: Publisher's site
Perry, SW & Wyber, RJ 2000, 'Hopfield neural network approach for the reconstruction of wide-bandwidth sonar data', Neural Networks for Signal Processing - Proceedings of the IEEE Workshop, vol. 2, pp. 876-885.
Sonar systems with small physical apertures are easier to mount on small vessels and remotely operated vehicles (ROVs). Such systems however are limited in terms of angular resolution. Although wide-bandwidth signals may be used to increase the range resolution of a sonar system, angular resolution is unaffected. Such limitations can be overcome if the region of interest in the underwater environment is insonified from a number of different angles, and this low resolution information reconstructed into a high resolution image of the region. This paper proposes a reconstruction approach based on a Hopfield neural network. This approach is shown to perform better than the Inverse Radon Transform for image reconstruction under both noisy and noise-less conditions. To verify these claims, results are presented using both real and simulated sonar data.
Sutton, JP, Sha, DD, Perry, S & Guan, L 1999, 'Enhancing mine signatures in sonar images using nested neural networks', DETECTION AND REMEDIATION TECHNOLOGIES FOR MINES AND MINELIKE TARGETS IV, PTS 1 AND 2, vol. 3710, pp. 570-577.View/Download from: Publisher's site
Perry, SW & Guan, L 1996, 'A partitioned modified Hopfield neural network algorithm for real-time image restoration', REAL-TIME IMAGING, vol. 2, no. 4, pp. 215-224.View/Download from: Publisher's site
Perry, SW 2018, 'Image and Video Noise: An Industry Perspective' in Bertalmio, M (ed), Denoising of Photographic Images and Video Fundamentals, Open Challenges and New Trends, Springer, Switzerland, pp. 207-234.View/Download from: Publisher's site
This unique text/reference presents a detailed review of noise removal for photographs and video. An international selection of expert contributors provide their insights into the fundamental challenges that remain in the field of denoising, examining how to properly model noise in real scenarios, how to tailor denoising algorithms to these models, and how to evaluate the results in a way that is consistent with perceived image quality. The book offers comprehensive coverage from problem formulation to the evaluation of denoising methods, from historical perspectives to state-of-the-art algorithms, and from fast real-time techniques that can be implemented in-camera to powerful and computationally intensive methods for off-line processing.
Topics and features: describes the basic methods for the analysis of signal-dependent and correlated noise, and the key concepts underlying sparsity-based image denoising algorithms; reviews the most successful variational approaches for image reconstruction, and introduces convolutional neural network-based denoising methods; provides an overview of the use of Gaussian priors for patch-based image denoising, and examines the potential of internal denoising; discusses selection and estimation strategies for patch-based video denoising, and explores how noise enters the imaging pipeline; surveys the properties of real camera noise, and outlines a fast approximation of nonlocal means filtering; proposes routes to improving denoising results via indirectly denoising a transform of the image, considering the right noise model and taking into account the perceived quality of the outputs.
This concise and clearly written volume will be of great value to researchers and professionals working in image processing and computer vision. The book will also serve as an accessible reference for advanced undergraduate and graduate students in computer science, applied mathematics, and related fields.
Luu, V-H, Dao, M-S, Nguyen, TN-T, Perry, S & Zettsu, K 2019, 'Semi-supervised Convolutional Neural Networks for Flood Mapping using Multi-modal Remote Sensing Data', 2019 6th NAFOSTED Conference on Information and Computer Science, Hanoi, Vietnam.
When floods hit populated areas, quick detection of
flooded areas is crucial for initial response by local government,
residents, and volunteers. Space-borne polarimetric synthetic
aperture radar (PolSAR) is an authoritative data sources for
flood mapping since it can be acquired immediately after a
disaster even at night time or cloudy weather. Conventionally,
a lot of domain-specific heuristic knowledge has been applied
for PolSAR flood mapping, but their performance still suffers
from confusing pixels caused by irregular reflections of radar
waves. Optical images are another data source that can be used
to detect flooded areas due to their high spectral correlation
with the open water surface. However, they are often affected
by day, night, or severe weather conditions (i.e., cloud). This
paper presents a convolution neural network (CNN) based multimodal
approach utilizing the advantages of both PolSAR and
optical images for flood mapping. First, reference training data
is retrieved from optical images by manual annotation. Since
clouds may appear in the optical image, only areas with a clear
view of flooded or non-flooded are annotated. Then, a semisupervised
polarimetric-features-aided CNN is utilized for flood
mapping using PolSAR data. The proposed model not only can
handle the issue of learning with incomplete ground truth but
also can leverage a large portion of unlabelled pixels for learning.
Moreover, our model takes the advantages of expert knowledge
on scattering interpretation to incorporate polarimetric-features
as the input. Experiments results are given for the flood event
that occurred in Sendai, Japan, on 12th March 2011. The
experiments show that our framework can map flooded area
with high accuracy (F1 = 96:12) and outperform conventional
flood mapping methods.
Farhood, H, Perry, S, Cheng, E & Kim, J 2020, '3D point cloud reconstruction from a single 4D light field image', Optics, Photonics and Digital Technologies for Imaging Applications VI, Optics, Photonics and Digital Technologies for Imaging Applications VI, SPIE.View/Download from: Publisher's site
Akter, N, Li, A, Shi, R, Phu, J, Perry, S, Fletcher, J & Roy, M 2019, 'A feature agnostic based glaucoma diagnosis from OCT images with deep learning technique', 2019 Meeting of the American Academy of Optometry, Orlando, Florida, USA.
© 2019 The Author(s). In this paper, data from OCT images are extracted, statistically analyzed and further an image processing task has been performed on optic nerve head images to optimize features used in the diagnosis of glaucoma.
Cong, HP, Perry, S & HoangVan, X 2019, 'A low complexity Wyner-Ziv coding solution for Light Field image transmission and storage', 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, IEEE, Jeju, Korea.View/Download from: Publisher's site
Compressing Light Field (LF) imaging data is a challenging but very important task for both LF image transmission and storage applications. In this paper, we propose a novel coding solution for LF images using the well-known Wyner-Ziv (WZ) information theorem. First, the LF image is decomposed into a fourth-dimensional LF (4D-LF) data format. Using a spiral scanning procedure, a pseudo-sequence of 4D-LF is generated. This sequence is then compressed in a distributed coding manner as specified in the WZ theorem. Secondly, a novel adaptive frame skipping algorithm is introduced to further explore the high correlation between 4D-LF pseudo-sequences. Experimental results show that the proposed LF image compression solution is able to achieve a significant performance improvement with respect to the standard, notably around 54% bitrate saving when compared with the standard High Efficiency Video Coding (HEVC) Intra benchmark while requiring less computational complexity.
Perry, S, Pinheiro, A, Dumic, E & da Silva Cruz, LA 2019, 'Study of Subjective and Objective Quality Evaluation of 3D Point Cloud Data by the JPEG Committee', Image Quality and System Performance XVI, IS&T, Burlingame, CA, USA, pp. 312-1-312-1.View/Download from: Publisher's site
The SC29/WG1 (JPEG) Committee within ISO/IEC is currently working on developing standards for the storage, compression and transmission of 3D point cloud information. To support the creation of these standards, the committee has created a database of 3D point clouds representing various quality levels and use-cases and examined a range of 2D and 3D objective quality measures. The examined quality measures are correlated with subjective judgments for a number of compression levels. In this paper we describe the database created, tests performed and key observations on the problems of 3D point cloud quality assessment.
Pham, T, Takalkar, M, Xu, M, Hoang, DT, Truong, HA, Dutkiewicz, E & Perry, S 2019, 'Airborne Object Detection Using Hyperspectral Imaging: Deep Learning Review', Computational Science and Its Applications – ICCSA 2019, International Conference on Computational Science and Its Applications, Springer, Saint Petersburg, Russia, pp. 306-321.View/Download from: Publisher's site
Hyperspectral images have been increasingly important in object detection applications especially in remote sensing scenarios. Machine learning algorithms have become emerging tools for hyperspectral image analysis. The high dimensionality of hyperspectral images and the availability of simulated spectral sample libraries make deep learning an appealing approach. This report reviews recent data processing and object detection methods in the area including hand-crafted and automated feature extraction based on deep learning neural networks. The accuracy performances were compared according to existing reports as well as our own experiments (i.e., re-implementing and testing on new datasets). CNN models provided reliable performance of over 97% detection accuracy across a large set of HSI collections. A wide range of data were used: a rural area (Indian Pines data), an urban area (Pavia University), a wetland region (Botswana), an industrial field (Kennedy Space Center), to a farm site (Salinas). Note that, the Botswana set was not reviewed in recent works, thus high accuracy selected methods were newly compared in this work. A plain CNN model was also found to be able to perform comparably to its more complex variants in target detection applications.
Matthews, L, Perin, G, Perry, S, Bone, D & Culpepper, J 2018, 'Novel Disruptive Methods: Pattern Adaptations for Military Structures', International Conference on Science and Innovation for Land Power 2018, Department of Defence, Australian Government, Adelaide, SA, Australia.
Recent research reveals that signature disruption strategies of detection delay and disguise can provide effective counter-surveillance techniques for contemporary low-altitude Uninhabited Aerial Vehicle (UAV) or drone detection platforms. As the first in a series of tiered tests, a virtual 3D model of selected 'scaled-up' HSV-based (Human Visual System based) algorithmic patterns and 3D biological nanostructures were found to disrupt a camera sensor when
mirrored in a physical surface. Further prototype and field tests will be conducted to corroborate these findings, with the ultimate aim of proposing an effective, controllable and disruptive mechanism to overhead UAV surveillance technology.
Nguyen, N, Le, TH, Perry, S & Nguyen, TT 2018, 'Pavement crack detection using convolutional neural network', International Symposium on Information and Communication Technology, Association for Computing Machinery, Danang, Vietnam, pp. 251-256.View/Download from: Publisher's site
Pavement crack detection is an important problem in road maintenance. There are many processing methods, including traditional and modern methods, solving this issue. Traditional methods use edge detection or some other digital image processing for crack detection, but these approaches are sensitive to many types of noise and unwanted objects on the road. For the purpose of increasing accuracy, image pre-processing methods are required for many of these techniques. Recently, some techniques that utilize deep learning to detect cracks in images have achieved high accuracy, without pre-processing. However, some of them are very complicated, some make use of manually collected data and some methods still need some form of pre-processing. In this paper, we propose a method that applies a convolutional neural networks to detect cracks in pavement images. Our research uses two data sets, one public data set and the other collected by ourselves. We also experimentally compare our method with some exiting methods and the experiments show that the proposed approach achieves high accuracy and generates stable models.
Cong, HP, Perry, SW, Vu, TA & Hoang, XV 2017, 'Joint exploration model based light field image coding: A comparative study', 2017 4th NAFOSTED Conference on Information and Computer Science NICS 2017 Proceedings, 2017 4th NAFOSTED Conference on Information and Computer Science, IEEE, Hanoi, Vietnam, pp. 308-313.View/Download from: Publisher's site
The recent light field imaging technology has been attracting a lot of interests due to its potential applications in a large number of areas including Virtual Reality, Augmented Reality (VR/AR), Teleconferencing, and E-learning. Light Field (LF) data is able to provide rich visual information such as scene rendering with changes in depth of field, viewpoint, and focal length. However, Light Field data usually associates to a critical problem — the massive data. Therefore, compressing LF data is one of the main challenges in LF research. In this context, we present in this paper a comparative study for compressing LF data with not only the widely used image/video coding standards, such as JPEG-2000, H.264/AVC, HEVC and Google/VP9 but also with the most recent image/video coding solution, the Joint Exploration Model. In addition, this paper also proposes a LF image coding flow, which can be used as a benchmark for future LF compression evaluation. Finally, the compression efficiency of these coding solutions is thoroughly compared throughout a rich set of test conditions.
We present an adaptive weighted temporal averaging filter with implicit motion-compensation for effective object enhancement in sector scan sonar image sequences. Visual blurring artifacts introduced by the temporal filtering process due to motion of the sonar platform are minimized by accurate motion estimation and compensation. An algorithm is proposed to perform object boundary extraction for better motion estimation. Motion estimation is performed directly on polar image sequences using cross-correlation followed by a Minimum Mean Square Error (MMSE) method. Each pixel of the filtered image is computed as the weighted average of the image pixel values over successive frames after motion compensation. The performance of the proposed filter is tested using real sector scan sonar image sequences and the results are compared with those obtained using the temporal averaging and motion compensated temporal averaging filters. © 2010 IEEE.
Pham, TQ, Perry, SW & Fletcher, PA 2009, 'Paper fingerprinting using alpha-masked image matching', 2009 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2009), 11th Conference on Digital Image Computing: Techniques and Applications, IEEE, Melbourne, AUSTRALIA, pp. 439-446.View/Download from: Publisher's site
Perry, SW, Varjavandi, P & Guan, L 2004, 'Adaptive image restoration using a perception based error meausrement', Canadian Conference on Electrical and Computer Engineering, pp. 1585-1588.
This paper deals with image restoration; we have developed a novel, perceptually inspired image restoration method which takes human perception knowledge into consideration to reverse the effects of blur. Instead of using a conventional greyscale based error measurement such as the MSE, we compare local statistical information about regions in two images using a new error measure. The new method provides a better appraisal of image quality in terms of human vision. We extended the popular constrained least square error cost function by incorporating this novel image error measure. Using the well known Karush-Kuhn-Tucker theorem, we have mathematically verified that there exists an optimal solution to this non-linear constrained optimization problem in terms of the Hopfield neural network . We will show that the new restoration algorithm visually restores images as well as the previously presented LVMSE-based algorithm .
Perry, SW & Guan, L 2001, 'Detection of small man-made objects in sector scan imagery using neural networks', OCEANS 2001 MTS/IEEE: AN OCEAN ODYSSEY, VOLS 1-4, CONFERENCE PROCEEDINGS, Annual Conference of the Marine-Technology-Society, MARINE TECHNOLOGY SOC, HONOLULU, HI, pp. 2108-2114.
Lo, KW, Perry, SW & Ferguson, BG 1999, 'An image processing approach for aircraft flight parameter estimation using the acoustical Lloyd's mirror effect', ISSPA 1999 - Proceedings of the 5th International Symposium on Signal Processing and Its Applications, pp. 503-506.View/Download from: Publisher's site
A time-frequency analysis of the output of an acoustic sensor located above the ground during the transit of an aircraft shows an interference (or fringe) pattern on the time-frequency plane. This interference pattern, referred to as the Lloyd's mirror effect, is caused by the temporal variations of the constructive/destructive interference frequencies of the direct and ground-reflected aircraft sound fields at the sensor. A model has been developed to describe the temporal variations of the destructive-interference frequencies for an aircraft in level flight over a hard ground. This paper describes two methods to estimate the aircraft flight parameters based on this model. In both methods, the time-frequency distribution of the sensor output is treated as an image. This image is pre-processed to enhance the destructive-interference pattern and then the flight parameters are extracted from the resultant image by optimising a cost function. The effectiveness of the methods is verified using real acoustic data. © 1999 IEEE.
Guan, L, Perry, S, Romagnoli, R, Wong, HS & Kong, HS 1998, 'Neural vision system and applications in image processing and analysis', PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 98), IEEE, SEATTLE, WA, pp. 1245-1248.
Perry, SW & Guan, L 1998, 'A statistics-based weight assignment in a Hopfield neural network for adaptive image restoration', IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 2nd IEEE World Congress on Computational Intelligence (WCCI 98), IEEE, ANCHORAGE, AK, pp. 922-927.
Perry, SW & Guan, L 1998, 'Perception based adaptive image restoration', PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 98), IEEE, SEATTLE, WA, pp. 2893-2896.
Guan, L, Perry, S & Wong, H 1997, 'A recursive low level vision system', SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1997 IEEE International Conference on Systems, Man, and Cybernetics - Computational Cybernetics and Simulation (SMC 97), I E E E, ORLANDO, FL, pp. 637-642.
Perry, SW & Guan, L 1997, 'Adaptive constraint restoration and error analysis using a neural network', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 87-95.View/Download from: Publisher's site
© Springer-Verlag Berlin Heidelberg 1997. In this paper we present a restoration technique aimed at correcting image degradations by consideration of human visual criteria. A neural network model with an adaptive constraint factor is used. By considering local statistical information about regions within an image, the value of constraint factor can be selected which produces an optimal trade-off between noise suppression and edge preservation in each statistically homogeneous region. In addition a novel image error measure is presented which takes into account the statistical matching of homogeneous regions and its effect on human visual appraisal of image quality.
Wong, EPK, Guan, L & Perry, SW 1996, 'A neural network implementation of the SMSE filter for image processing', REAL-TIME IMAGING, Conference on Real-Time Imaging, SPIE - INT SOC OPTICAL ENGINEERING, SAN JOSE, CA, pp. 77-85.View/Download from: Publisher's site
Perry, SW & Guan, L 1995, 'Restoration of images degraded by space-variant distortion using a neural network', 1995 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS PROCEEDINGS, VOLS 1-6, 1995 IEEE International Conference on Neural Networks (ICNN 95), IEEE, UNIV W AUSTRAIA, PERTH, AUSTRALIA, pp. 2067-2070.
Timmerer, C, Baraković, S, Baraković Husić, J, Bech, S, Bosse, S, Botev, J, Brunnström, K, Cruz, L, De Moor, K, de Polo Saibanti, A, Durnez, W, Egger-Lampl, S, Engelke, U, Falk, TH, Hameed, A, Hines, A, Kojic, T, Kukolj, D, Liotou, E, Milovanovic, D, Möller, S, Murray, N, Naderi, B, Pereira, M, Perry, S, Pinheiro, A, Pinilla, A, Raake, A, Agrawal, SR, Reiter, U, Rodrigues, R, Schatz, R, Schelkens, P, Schmidt, S, Sabet, SS, Singla, A, Skorin-Kapov, L, Suznjevic, M, Uhrig, S, Vlahović, S, Voigt-Antons, J-N & Zadtootaghaj, S QUALINET 2020, QUALINET White Paper on Definitions of Immersive Media Experience (IMEx), pp. 1-1, 14th QUALINET meeting (online).
With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions, scope, and constituents that are required to be addressed so that a coherent understanding of the concepts can be achieved. Such consensus is vital for paving the directionality of the future of immersive media experiences (IMEx) and all related matters.
The aim of this white paper is to provide a survey of definitions of immersion and presence which leads to a definition of immersive media experience (IMEx). The Quality of Experience (QoE) for immersive media is described by establishing a relationship between the concepts of QoE and IMEx followed by application areas of immersive media experience. Influencing factors on immersive media experience are elaborated as well as the assessment of immersive media experience. Finally, standardization activities related to IMEx are highlighted and the white paper is concluded with an outlook related to future developments.
Perry, SW DSTO Aeronautical and Maritime Research Laboratory, Department of Defence, Commonwealth of Australia 2000, Applications of Image Processing to Mine Warfare Sonar, no. DSTO-GD-0237, Sydney, Australia.
Information from various mine warfare sonar systems is often presented to the operator in a visual form. To obtain the optimum performance of these systems, it is desirable to apply intelligent processing techniques to the corresponding imagery. This report examines image processing techniques which may have the potential to improve either system or operator performance. The types of mine warfare sonar imagery examined in this report are sector-scan, side-scan, and the AMI project imagery. For each of these three types of imagery/applicable image processing concepts and techniques are examined with reference to techniques recorded in the literature.