Can supervise: YES
Aliyu, A, El-Sayed, H, Abdullah, AH, Alam, I, Li, J & Prasad, M 2019, 'Video streaming in urban vehicular environments: Junction-aware multipath approach', Electronics (Switzerland), vol. 8, no. 11.View/Download from: Publisher's site
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. In multipath video streaming transmission, the selection of the best vehicle for video packet forwarding considering the junction area is a challenging task due to the several diversions in the junction area. The vehicles in the junction area change direction based on the different diversions, which lead to video packet drop. In the existing works, the explicit consideration of different positions in the junction areas has not been considered for forwarding vehicle selection. To address the aforementioned challenges, a Junction-Aware vehicle selection for Multipath Video Streaming (JA-MVS) scheme has been proposed. The JA-MVS scheme considers three different cases in the junction area including the vehicle after the junction, before the junction and inside the junction area, with an evaluation of the vehicle signal strength based on the signal to interference plus noise ratio (SINR), which is based on the multipath data forwarding concept using greedy-based geographic routing. The performance of the proposed scheme is evaluated based on the Packet Loss Ratio (PLR), Structural Similarity Index (SSIM) and End-to-End Delay (E2ED) metrics. The JA-MVS is compared against two baseline schemes, Junction-Based Multipath Source Routing (JMSR) and the Adaptive Multipath geographic routing for Video Transmission (AMVT), in urban Vehicular Ad-Hoc Networks (VANETs).
© 2018 State-of-art visual detectors utilize object proposals as the reference of objects to achieve higher efficiency. However, the number of the proposal to ensure full coverage of potential objects is still large because the proposals are generated with thread and thrum, exposing proposal computation as a bottleneck. This paper presents a complementary technique that aims to work with any existing proposal generating system, amending the work-flow from "propose-assess" to "propose-adjust-assess". Inspired by the biological processing, we propose to improve the quality of object proposals by analyzing visual contexts and gradually focusing proposals on targets. In particular, the proposed method can be employed with existing proposals generation algorithms based on both hand-crafted features and Convolutional Neural Network (CNN) features. For the former, we realize the focusing function by two learning-based transformation models, which are trained for identifying generic objects using image cues. For the latter, a Focus Proposal Net (FoPN) with cascaded layers, which can be directly injected into CNN models in an end-to-end manner, is developed as the implementation of focusing operation. Experiments on real-life image data sets demonstrate that the quality of the proposal is improved by the proposed technique. Besides, it can reduce the number of proposals to achieve high recall rate of the objects based on both hand-crafted features and CNN-features, and can boost the performance of state-of-art detectors.
Financial time series forecasting is a crucial measure for improving and making more robust financial decisions throughout the world. Noisy data and non-stationarity information are the two key factors in financial time series prediction. This paper proposes twin support vector regression for financial time series prediction to deal with noisy data and nonstationary information. Various interesting financial time series datasets across a wide range of industries, such as information technology, the stock market, the banking sector, and the oil and petroleum sector, are used for numerical experiments. Further, to test the accuracy of the prediction of the time series, the root mean squared error and the standard deviation are computed, which clearly indicate the usefulness and applicability of the proposed method. The twin support vector regression is computationally faster than other standard support vector regression on the given 44 datasets.
Capsule networks (CapsNet) are recently proposed neural network models containing newly introduced processing layer, which are specialized in entity representation and discovery in images. CapsNet is motivated by a view of parse tree-like information processing mechanism and employs an iterative routing operation dynamically determining connections between layers composed of capsule units, in which the information ascends through different levels of interpretations, from raw sensory observation to semantically meaningful entities represented by active capsules. The CapsNet architecture is plausible and has been proven to be effective in some image data processing tasks, the newly introduced routing operation is mainly required for determining the capsules' activation status during the forward pass. However, its influence on model fitting and the resulted representation is barely understood. In this work, we investigate the following: 1) how the routing affects the CapsNet model fitting; 2) how the representation using capsules helps discover global structures in data distribution, and; 3) how the learned data representation adapts and generalizes to new tasks. Our investigation yielded the results some of which have been mentioned in the original paper of CapsNet, they are: 1) the routing operation determines the certainty with which a layer of capsules pass information to the layer above and the appropriate level of certainty is related to the model fitness; 2) in a designed experiment using data with a known 2D structure, capsule representations enable a more meaningful 2D manifold embedding than neurons do in a standard convolutional neural network (CNN), and; 3) compared with neurons of the standard CNN, capsules of successive layers are less coupled and more adaptive to new data distribution.
Patel, OP, Bharill, N, Tiwari, A, Patel, V, Gupta, O, Cao, J, Li, J & Prasad, M 2019, 'Advanced quantum based neural network classifier and its application for objectionable web content filtering', IEEE Access, vol. 7, pp. 98069-98082.View/Download from: Publisher's site
© 2013 IEEE. In this paper, an Advanced Quantum-based Neural Network Classifier (AQNN) is proposed. The proposed AQNN is used to form an objectionable Web content filtering system (OWF). The aim is to design a neural network with a few numbers of hidden layer neurons with the optimal connection weights and the threshold of neurons. The proposed algorithm uses the concept of quantum computing and genetic concept to evolve connection weights and the threshold of neurons. Quantum computing uses qubit as a probabilistic representation which is the smallest unit of information in the quantum computing concept. In this algorithm, a threshold boundary parameter is also introduced to find the optimal value of the threshold of neurons. The proposed algorithm forms neural network architecture which is used to form an objectionable Web content filtering system which detects objectionable Web request by the user. To judge the performance of the proposed AQNN, a total of 2000 (1000 objectionable + 1000 non-objectionable) Website's contents have been used. The results of AQNN are also compared with QNN-F and well-known classifiers as backpropagation, support vector machine (SVM), multilayer perceptron, decision tree algorithm, and artificial neural network. The results show that the AQNN as classifier performs better than existing classifiers. The performance of the proposed objectionable Web content filtering system (OWF) is also compared with well-known objectionable Web filtering software and existing models. It is found that the proposed OWF performs better than existing solutions in terms of filtering objectionable content.
Zhang, Y, Lu, X & Li, J 2019, 'Single-sample face recognition under varying lighting conditions based on logarithmic total variation', Signal, Image and Video Processing, vol. 13, pp. 657-665.View/Download from: Publisher's site
© 2018, Springer-Verlag London Ltd., part of Springer Nature. The logarithmic total variation (LTV) algorithm is a classical algorithm that is proposed to address the illumination interference in face recognition. Some state-of-the-art techniques based on LTV assume that the illumination component mainly lies in the low-frequency features among face images. However, these techniques adopt unsuitable methods to process low-frequency features, resulting in final unsatisfactory recognition rates. In this paper, we propose an improved illumination normalization method based on the LTV method, called the RETINA&TH-LTV algorithm. In this algorithm, the retina model is utilized to eliminate most of the illumination component in low-frequency features. Then, an advanced contrast-limited adaptive histogram equalization technique is proposed to remove the residual lighting component. At the same time, through realizing threshold-value filtering on high-frequency features, the enhancement of facial features is achieved. Finally, the processed frequency features are combined to form a robust holistic feature image, which is then utilized for recognition. Insufficient training images in face recognition are also taken into consideration in this research. Comparative experiments for single-sample face recognition are conducted on YALE B, CMU PIE and our self-built driver databases. The nearest neighbor classifier and extended sparse representation classifier are employed as classification methods. The results indicate that the RETINA&TH-LTV algorithm has promising performance, especially in serious illumination and insufficient training sample conditions.
Zhang, Y, Lv, P, Lu, X & Li, J 2019, 'Face detection and alignment method for driver on highroad based on improved multi-task cascaded convolutional networks', Multimedia Tools and Applications, vol. 78, no. 18, pp. 26661-26679.View/Download from: Publisher's site
© 2019, Springer Science+Business Media, LLC, part of Springer Nature. Driver's face detection and alignment techniques in Intelligent Transportation System (ITS) under unlimited environment are challenging issues, which are conductive to supervising traffic order and maintaining public safety. This paper proposes the improved Multi-task Cascaded Convolutional Networks (ITS-MTCNN) to realize accurate face region detection and feature alignment of driver's face on highway, predicting face and feature location via a coarse-to-fine pattern. Moreover, the improved regularization method and effective online hard sample mining technique are proposed in ITS-MTCNN method. Then, the training model and contrast experiment are conducted on our self-build traffic driver's face database. Finally, the effectiveness of ITS-MTCNN method is validated by comparative experiments and verified under various complex highway conditions. At the same time, average alignment errors on left eye, right eye, nose, left mouth as well as right mouth of the proposed technique are performed. Experimental results show that ITS-MTCNN model shows satisfied performance compared to other state-of-the-art techniques used in driver's face detection and alignment, keeping robust to the occlusion, varying pose and extreme illumination on highway.
Fan, X, Zhao, J, Ren, F, Wang, Y, Feng, Y, Ding, L, Zhao, L, Shang, Y, Li, J, Ni, J, Jia, B, Liu, Y & Chang, Z 2018, 'Dimerization of p15RS mediated by a leucine zipper–like motif is critical for its inhibitory role on Wnt signaling', Journal of Biological Chemistry, vol. 293, no. 20, pp. 7618-7628.View/Download from: Publisher's site
© 2018 Fan et al. We previously demonstrated that p15RS, a newly discovered tumor suppressor, inhibits Wnt/-catenin signaling by interrupting the formation of -cateninTCF4 complex. However, it remains unclear how p15RS helps exert such an inhibitory effect on Wnt signaling based on its molecular structure. In this study, we reported that dimerization of p15RS is required for its inhibition on the transcription regulation of Wnt-targeted genes. We found that p15RS forms a dimer through a highly conserved leucine zipper–like motif in the coiled-coil terminus domain. In particular, residues Leu-248 and Leu-255 were identified as being responsible for p15RS dimerization, as mutation of these two leucines into prolines disrupted the homodimer formation of p15RS and weakened its suppression of Wnt signaling. Functional studies further confirmed that mutations of p15RS at these residues results in diminishment of its inhibition on cell proliferation and tumor formation. We therefore concluded that dimerization of p15RS governed by the leucine zipper–like motif is critical for its inhibition of Wnt/-catenin signaling and tumorigenesis.
Kang, G, Li, J & Tao, D 2018, 'Shakeout: A New Approach to Regularized Deep Neural Network Training', IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 40, no. 5, pp. 1245-1258.View/Download from: Publisher's site
Visual tracking is a critical task in many computer vision applications such as surveillance and robotics. However, although the robustness to local corruptions has been improved, prevailing trackers are still sensitive to large scale corruptions, such as occlusions and illumination variations. In this paper, we propose a novel robust object tracking technique depends on subspace learning-based appearance model. Our contributions are twofold. First, mask templates produced by frame difference are introduced into our template dictionary. Since the mask templates contain abundant structure information of corruptions, the model could encode information about the corruptions on the object more efficiently. Meanwhile, the robustness of the tracker is further enhanced by adopting system dynamic, which considers the moving tendency of the object. Second, we provide the theoretic guarantee that by adapting the modulated template dictionary system, our new sparse model can be solved by the accelerated proximal gradient algorithm as efficient as in traditional sparse tracking methods. Extensive experimental evaluations demonstrate that our method significantly outperforms 21 other cutting-edge algorithms in both speed and tracking accuracy, especially when there are challenges such as pose variation, occlusion, and illumination changes.
Li, J, Mei, X, Prokhorov, D & Tao, D 2017, 'Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene', IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 28, no. 3, pp. 690-703.View/Download from: Publisher's site
Li, J, Lin, X, Rui, X, Rui, Y & Tao, D 2015, 'A Distributed Approach Toward Discriminative Distance Metric Learning', IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 9, pp. 2111-2122.View/Download from: Publisher's site
Distance metric learning (DML) is successful in
discovering intrinsic relations in data. However, most algorithms
are computationally demanding when the problem size becomes
large. In this paper, we propose a discriminative metric learning
algorithm, develop a distributed scheme learning metrics on
moderate-sized subsets of data, and aggregate the results into
a global solution. The technique leverages the power of parallel
computation. The algorithm of the aggregated DML (ADML)
scales well with the data size and can be controlled by the
partition. We theoretically analyze and provide bounds for the
error induced by the distributed treatment. We have conducted
experimental evaluation of the ADML, both on specially designed
tests and on practical image annotation tasks. Those tests have
shown that the ADML achieves the state-of-the-art performance
at only a fraction of the cost incurred by most existing methods.
Factorization-based techniques explain arrays of observations using a relatively small number of factors and provide an essential arsenal for multi-dimensional data analysis. Most factorization models are, however, developed on general arrays of scalar v
Li, J & Tao, D 2013, 'Exponential Family Factors For Bayesian Factor Analysis', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 6, pp. 964-976.View/Download from: Publisher's site
Expressing data as linear functions of a small number of unknown variables is a useful approach employed by several classical data analysis methods, e.g., factor analysis, principal component analysis, or latent semantic indexing. These models represent the data using the product of two factors. In practice, one important concern is how to link the learned factors to relevant quantities in the context of the application. To this end, various specialized forms of the factors have been proposed to improve interpretability. Toward developing a unified view and clarifying the statistical significance of the specialized factors, we propose a Bayesian model family. We employ exponential family distributions to specify various types of factors, which provide a unified probabilistic formulation. A Gibbs sampling procedure is constructed as a general computation routine. We verify the model by experiments, in which the proposed model is shown to be effective in both emulating existing models and motivating new model designs for particular problem settings.
Principal component analysis (PCA) is a widely used model for dimensionality reduction. In this paper, we address the problem of determining the intrinsic dimensionality of a general type data population by selecting the number of principal components for a generalized PCA model. In particular, we propose a generalized Bayesian PCA model, which deals with general type data by employing exponential family distributions. Model selection is realized by empirical Bayesian inference of the model. We name the model as simple exponential family PCA (SePCA), since it embraces both the principal of using a simple model for data representation and the practice of using a simplified computational procedure for the inference. Our analysis shows that the empirical Bayesian inference in SePCA formally realizes an intuitive criterion for PCA model selection - a preserved principal component must sufficiently correlate to data variance that is uncorrelated to the other principal components. Experiments on synthetic and real data sets demonstrate effectiveness of SePCA and exemplify its characteristics for model selection.
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional manual or limited automatic colour assignment involves intensive human
Li, J & Tao, D 2012, 'On Preserving Original Variables in Bayesian PCA with Application to Image Analysis', IEEE Transactions On Image Processing, vol. 21, no. 12, pp. 4830-4843.View/Download from: Publisher's site
Principal component analysis (PCA) computes a succinct data representation by converting the data to a few new variables while retaining maximum variation. However, the new variables are dif?cult to interpret, because each one is combined with all of the original input variables and has obscure semantics. Under the umbrella of Bayesian data analysis, this paper presents a new prior to explicitly regularize combinations of input variables. In particular, the prior penalizes pair-wise products of the coef?cients of PCA and encourages a sparse model. Compared to the commonly used 1 -regularizer, the proposed prior encourages the sparsity pattern in the resultant coef?cients to be consistent with the intrinsic groups in the original input variables. Moreover, the proposed prior can be explained as recovering a robust estimation of the covariance matrix for PCA. The proposed model is suited for analyzing visual data, where it encourages the output variables to correspond to meaningful parts in the data. We demonstrate the characteristics and effectiveness of the proposed technique through experiments on both synthetic and real data.
For image analysis, an important extension to principal component analysis (PCA) is to treat an image as multiple samples, which helps alleviate the small sample size problem. Various schemes of transforming an image to multiple samples have been proposed. Although having been shown effective in practice, the schemes are mainly based on heuristics and experience. In this paper, we propose a probabilistic PCA model, in which we explicitly represent the transformation scheme and incorporate the scheme as a stochastic component of the model. Therefore fitting the model automatically learns the transformation. Moreover, the learned model allows us to distinguish regions that can be well described by the PCA model from those that need further treatment. Experiments on synthetic images and face data sets demonstrate the properties and utility of the proposed model
In this paper, we propose an approach termed segment-based features (SBFs) to classify time series. The approach is inspired by the success of the component- or part-based methods of object recognition in computer vision, in which a visual object is described as a number of characteristic parts and the relations among the parts. Utilizing this idea in the problem of time series classification, a time series is represented as a set of segments and the corresponding temporal relations. First, a number of interest segments are extracted by interest point detection with automatic scale selection. Then, a number of feature prototypes are collected by random sampling from the segment set, where each feature prototype may include single segment or multiple ordered segments. Subsequently, each time series is transformed to a standard feature vector, i.e. SBF, where each entry in the SBF is calculated as the maximum response (maximum similarity) of the corresponding feature prototype to the segment set of the time series.
Li, J, Chen, Z & Ma, Z 2018, 'Learning Colours from Textures by Effective Representation of Images' in Yurish, SY (ed), Advances in Signal Processing: Reviews, International Frequency Sensor Association (IFSA) Publishing, Spain, pp. 277-304.
Arguably the majority of existing image and video analytics are done based on the texture. However, the other important aspect, colours, must also be considered for comprehensive analytics. Colours do not only make images feel more vivid to viewers, they also contains important visual clues of the image [20, 54, 24]. Although a modern point-and-shoot digital camera can easily capture colour images, there are circumstances where we need to recover the chromatic information in an image. For example, photography in the old days was monochrome and provided only gray-scale images. Adding colours can rejuvenate these old pictures and make them more adorable as personal memoir or more accessible as archival documents for public or educational purposes. For a colour image, re-coloursation may be necessary if the white balance was poorly set when shooting the picture. In this case, a particular colour channel can be severely over- or under- exposure, and makes infeasible to adjust the white balance based on the recorded colours. A possible rescue of the picture is to keep only the luminance and re-colourise the image. Another example of the application of colourisation arises from the area of specialised imaging, where the sensors capture signals that are out of the visible spectrum of light, e.g. X-ray, MRI, near infrared images. Pseudo colours for these images make them more readily for interpretation by human experts, and can also indicate potentially interesting regions.
Chen, Z, You, X & Li, J 2017, 'Learning to focus for object proposals', 2017 International Conference on Security, Pattern Analysis, and Cybernetics, SPAC 2017, International Conference on Security, Pattern Analysis, and Cybernetics, IEEE, Shenzhen, China, pp. 439-444.View/Download from: Publisher's site
© 2017 IEEE. Object proposal generators address the wasteful exhaustive search of the sliding window scheme in visual object detection and have been shown effective. However, the number of candidate windows is still large in order to ensure full coverage of potential objects. This paper presents a complementary technique that aims to work with any proposal generating system, amending the workflow from 'propose-assess' to 'propose-adjust-assess'. The adjustment serves as an auto-focus mechanism for the system and reduces the number of object proposals to be processed. The auto-focus is realized by two learning-based transformation models, one translating and the other deforming the windows towards better alignments of the objects, which are trained for identifying generic objects using image cues. Experiments on reallife image data sets show that the proposed technique can reduce the number of proposals without loss of performance.
Chivukula, AS, Li, J & Liu, W 2018, 'Discovering granger-causal features from deep learning networks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), The 31st Australasian Joint Conference on Artificial Intelligence, Wellington, New Zealand, pp. 692-705.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2018. In this research, we propose deep networks that discover Granger causes from multivariate temporal data generated in financial markets. We introduce a Deep Neural Network (DNN) and a Recurrent Neural Network (RNN) that discover Granger-causal features for bivariate regression on bivariate time series data distributions. These features are subsequently used to discover Granger-causal graphs for multivariate regression on multivariate time series data distributions. Our supervised feature learning process in proposed deep regression networks has favourable F-tests for feature selection and t-tests for model comparisons. The experiments, minimizing root mean squared errors in the regression analysis on real stock market data obtained from Yahoo Finance, demonstrate that our causal features significantly improve the existing deep learning regression models.
Fu, L, Li, J, Zhou, L, Ma, Z, Liu, S, Lin, Z & Prasad, M 2018, 'Utilizing Information from Task-Independent Aspects via GAN-Assisted Knowledge Transfer', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Rio de Janeiro, Brazil.View/Download from: Publisher's site
© 2018 IEEE. Observed data often have multiple labels with respect to different aspects. For example, a picture can have one label specifying the contents in terms of the object category such as aeroplane, building, cat, etc. And in the meanwhile have another label describing the image style such as photo-realistic or artistic. The central idea of this work is that any annotation of the data contains precious knowledge and is not to be foregone: An analytic task focusing on one aspect of the data can benefit from the knowledge transferred from the other aspects. We propose a passive knowledge transfer scheme for deep neural network training based on the generative adversarial nets (GANs). The adversarial training scheme encourages the nets to encode data into representations that are both discriminative for the target aspect and invariant with respect to the irrelevant aspects. We show that the scheme mixes the conditional distributions of the encoded data on the irrelevant aspects, by the theory on the link between the GAN framework and the Wasserstein metric in distribution spaces. Moreover, we empirically verified the method by i) classifying images despite influence by geometric transform and ii) recognizing the movements (geometric transform) regardless the image contents.
Li, M, Puthal, D, Yang, C, Luo, Y, Zhang, J & Li, J 2018, 'Stock market analysis using social networks', Proceedings of the Australasian Computer Science Week Multiconference, Australasian Computer Science Week Multiconference, ACM, Brisbane, Queensland, Australia.View/Download from: Publisher's site
© 2018 ACM. Nowadays, the use of social media has reached unprecedented levels. Among all social media, with its popular micro-blogging service, Twitter enables users to share short messages in real time about events or express their own opinions. In this paper, we examine the effectiveness of various machine learning techniques on retrieved tweet corpus. A machine learning model is deployed to predict tweet sentiment, as well as gain an insight into the correlation between twitter sentiment and stock prices. Specifically, that correlation is acquired by mining tweets using Twitter's search API and process it for further analysis. To determine tweet sentiment, two types of machine learning techniques are adopted including Naïve Bayes classification and Support vector machines. By evaluating each model, we discover that support vector machine gives higher accuracy through cross validation. After predicting tweet sentiment, we mine historical stock data using Yahoo finance API, while the designed feature matrix for stock market prediction includes positive, negative, neutral and total sentiment score and stock price for each day. In order to capturing the correlation situation between tweet opinions and stock market prices, hence, evaluating the direct correlation between tweet sentiments and stock market prices, the same machine learning algorithm is implemented for conducting our empirical study.
Lin, A, Li, J, Zhang, L, Ma, Z & Luo, W 2018, 'Multiple-task learning and knowledge transfer using generative adversarial capsule nets', AI 2018: Advances in Artificial Intelligence (LNAI), Australasian Joint Conference on Artificial Intelligence, Springer, Wellington, New Zealand, pp. 669-680.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2018. It is common that practical data has multiple attributes of interest. For example, a picture can be characterized in terms of its content, e.g. the categories of the objects in the picture, and in the meanwhile the image style such as photo-realistic or artistic is also relevant. This work is motivated by taking advantage of all available sources of information about the data, including those not directly related to the target of analytics. We propose an explicit and effective knowledge representation and transfer architecture for image analytics by employing Capsules for deep neural network training based on the generative adversarial nets (GAN). The adversarial scheme help discover capsule-representation of data with different semantic meanings in respective dimensions of the capsules. The data representation includes one subset of variables that are particularly specialized for the target task – by eliminating information about the irrelevant aspects. We theoretically show the elimination by mixing conditional distributions of the represented data. Empirical evaluations show the propose method is effective for both standard transfer-domain recognition tasks and zero-shot transfer.
Lin, A, Li, J, Zhang, L, Shi, L & Ma, Z 2018, 'A new family of generative adversarial nets using heterogeneous noise to model complex distributions', AI 2018: AI 2018: Advances in Artificial Intelligence LNAI, Australasian Joint Conference on Artificial Intelligence, Springer, Wellington, New Zealand, pp. 706-717.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2018. Generative adversarial nets (GANs) are effective framework for constructing data models and enjoys desirable theoretical justification. On the other hand, realizing GANs for practical complex data distribution often requires careful configuration of the generator, discriminator, objective function and training method and can involve much non-trivial effort. We propose an novel family of generative adversarial nets (GANs), where we employ both continuous noise and random binary codes in the generating process. The binary codes in the new GAN model (named BGANs) play the role of categorical latent variables helps improve the model capability and training stability when dealing with complex data distributions. BGAN has been evaluated and compared with existing GANs trained with the state-of-the-art method on both synthetic and practical data. The empirical evaluation shows effectiveness of BGAN.
Pan, J, Li, J, Han, X & Jia, K 2018, 'Residual MeshNet: Learning to deform meshes for single-view 3D reconstruction', Proceedings - 2018 International Conference on 3D Vision, 3DV 2018, International Conference on 3D Vision, IEEE, Verona, Italy, pp. 719-727.View/Download from: Publisher's site
© 2018 IEEE. This work presents a novel architecture of deep neural networks to generate meshes approximating the surface of a 3D object from a single image. Compared to existing learning-based 3D reconstruction models, our architecture is characterized by (1) deep mesh deformation stacks with residual network design, where a simple mesh is transformed to approximate the target surface and undergoes multiple deformation steps to progressively refine the result and reduce the residuals, and (2) parallel paths per deformation step, which can exponentially enrich the generated meshes using deeper structure and more model parameters. We also propose novel regularization scheme that encourages the meshes to be both globally complementary to cover the target surface and locally consistent with each other. Empirical evaluation on benchmark datasets show advantage of the proposed architecture over existing methods.
Zhang, L, Li, J, Huang, T, Ma, Z, Lin, Z & Prasad, M 2018, 'GAN2C: Information Completion GAN with Dual Consistency Constraints', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Rio de Janeiro, Brazil.View/Download from: Publisher's site
© 2018 IEEE. This paper proposes an information completion technique, GAN2C, by imposing dual consistency constraints (2C) to a closed loop encoder-decoder architecture based on the generative adversarial nets (GAN). When adopting deep neural networks as function approximators, GAN2C enables highly effective multi-modality image conversion with sparse observation in the target modes. For empirical demonstration and model evaluation, we show that trained deep neural networks in GAN2C can infer colors for grayscale images, as well as estimate rich 3D information of a scene by densely predicting the depths. The results of the experiments show that in both tasks GAN2C as a generic framework has been comparable to or advanced the state-of-the-art performance which are achieved by highly specialized systems. Code is available at https://github.com/AdalinZhang/GAN2C.
Zhang, Y, Hu, C, Lu, X & Li, J 2018, 'A novel illumination normalization method in face recognition based on logarithmic total variation', Proceedings of SPIE - The International Society for Optical Engineering, International Conference on Digital Image Processing, SPIE, Shanghai, China.View/Download from: Publisher's site
© 2018 SPIE. Varying illumination is a tricky issue in face recognition. In this paper, we make improvement on the logarithmic total variation (LTV) algorithm to handle the varying illumination in face image. First of all, logarithmic total variation (LTV) is adopt to separate the face image into high-frequency and low-frequency features. Then, a novel illumination normalization method is proposed to handle low-frequency feature, which is founded on the advanced contrast limited adaptive histogram equalization (CLAHE). Furthermore, threshold-value filtering is utilized to realize enhancement on high-frequency feature. Finally, the normalized face image can take shape through the normalized high-frequency feature and enhanced low-frequency feature. We make comparative experiments on YALE B databases, including three types of techniques. The finnal results show that CLA&TH-LTV algorithm owns excellent recognition performance compared to other state-of-art algorithms.
Chen, Z, Li, J, Chen, Z & You, X 2017, 'Generic pixel level object tracker using bi-channel fully convolutional network', Neural Information Processing (LNCS), International Conference on Neural Information Processing, Springer, Guangzhou, China, pp. 666-676.View/Download from: Publisher's site
© Springer International Publishing AG 2017. As most of the object tracking algorithms predict bounding boxes to cover the target, pixel-level tracking methods provide a better description of the target. However, it remains challenging for a tracker to precisely identify detailed foreground areas of the target. In this work, we propose a novel bi-channel fully convolutional neural network to tackle the generic pixel-level object tracking problem. By capturing and fusing both low-level and high-level temporal information, our network is able to produce pixel-level foreground mask of the target accurately. In particular, our model neither updates parameters to fit the tracked target nor requires prior knowledge about the category of the target. Experimental results show that the proposed network achieves compelling performance on challenging videos in comparison with competitive tracking algorithms.
Kang, G, Li, J & Tao, D 2016, 'Shakeout: A New Regularized Deep Neural Network Training Scheme', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), AAAI Conference on Artificial Intelligence, AAAI, Phoenix, USA, pp. 1751-1757.
Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. The invention of effective training techniques largely contributes to this success. The so-called "Dropout" training scheme is one of the most powerful tool to reduce over-fitting. From the statistic point of view, Dropout works by implicitly imposing an L2 regularizer on the weights. In this paper, we present a new training scheme: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, our method randomly chooses to enhance or inverse the contributions of each unit to the next layer. We show that our scheme leads to a combination of L1 regularization and L2 regularization imposed on the weights, which has been proved effective by the Elastic Net models in practice.We have empirically evaluated the Shakeout scheme and demonstrated that sparse network weights are obtained via Shakeout training. Our classification experiments on real-life image datasets MNIST and CIFAR-10 show that Shakeout deals with over-fitting effectively.
Zhang, Z, Huang, K, Tan, T, Yang, P & Li, J 2016, 'ReD-SFA: Relation Discovery Based Slow Feature Analysis for Trajectory Clustering', Proceedings for the Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, USA, pp. 752-760.View/Download from: Publisher's site
For spectral embedding/clustering, it is still an open
problem on how to construct an relation graph to reflect the
intrinsic structures in data. In this paper, we proposed an
approach, named Relation Discovery based Slow Feature
Analysis (ReD-SFA), for feature learning and graph construction
simultaneously. Given an initial graph with only
a few nearest but most reliable pairwise relations, new reliable
relations are discovered by an assumption of reliability
preservation, i.e., the reliable relations will preserve their
reliabilities in the learnt projection subspace. We formulate
the idea as a cross entropy (CE) minimization problem to
reduce the discrepancy between two Bernoulli distributions
parameterized by the updated distances and the existing
relation graph respectively. Furthermore, to overcome the
imbalanced distribution of samples, a Boosting-like strategy
is proposed to balance the discovered relations over all
clusters. To evaluate the proposed method, extensive experiments
are performed with various trajectory clustering
tasks, including motion segmentation, time series clustering
and crowd detection. The results demonstrate that ReDSFA
can discover reliable intra-cluster relations with high
precision, and competitive clustering performance can be
achieved in comparison with state-of-the-art
Li, J & Tao, D 2013, 'A Bayesian factorised covariance model for image analysis', International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, AAAI, Beijing, China, pp. 1465-1471.
Li, J & Tao, D 2012, 'Sampling Normal Distribution Restricted on Multiple Regions', International Conference on Neural Information Processing, International Conference on Neural Information Processing, Springer-Verlag, Doha, Qatar, pp. 492-500.View/Download from: Publisher's site
We develop an accept-reject sampler for probability densities that have the similar form of a normal density function, but supported on restricted regions. Compared to existing techniques, the proposed method deals with multiple disjoint regions, truncated on one or both sides. For the original problem of sampling from one region, the efficiency is enhanced as well. We verify the desirable attributes of the proposed algorithm by both theoretical analysis and simulation studies.
Li, J & Tao, D 2011, 'Wisdom of Crowds: Single Image Super-resolution from the Web', Workshop on Large Scale Visual Analytics with the IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE- Computer Society, Vancouver, Canada, pp. 812-816.View/Download from: Publisher's site
This paper addresses the problem of learning based single image super-resolution. Previous research on this problem employs human user to provide a set of images that are similar to the target image as a reference. Then the superresolution algorithm can learn from the provided reference images to predict the high resolution details for the target image. We propose a fully automatic scheme, which leverages the knowledge of the entire visual world and to query relevant references from the Internet. The proposed scheme is free of human supervision, and the performance compromise is small. We conduct experiments to show the effectiveness of the method.
Li, J & Tao, D 2011, 'A Probabilistic Model for Discovering High Level Brain Activities from fMRI', Lecture Notes in Computer Science, International Conference on Neural Information Processing, Springer-Verlag, Shanghai, China, pp. 329-336.View/Download from: Publisher's site
Functional magnetic resonance imaging (fMRI) has provided an invaluable method of investing real time neuron activities. Statistical tools have been developed to recognise the mental state from a batch of fMRI observations over a period. However, an interesting question is whether it is possible to estimate the real time mental states at each moment during the fMRI observation. In this paper, we address this problem by building a probabilistic model of the brain activity. We model the tempo-spatial relations among the hidden high-level mental states and observable low-level neuron activities. We verify our model by experiments on practical fMRI data. The model also implies interesting clues on the task-responsible regions in the brain.
Li, J, Bian, W, Tao, D & Zhang, C 2011, 'Learning Colours from Textures by Sparse Manifold Embedding', Lecture Notes in Artificial Intelligence.AI 2011: Advances in Artificial Intelligence.24th Australasian Joint Conference, Australasian Joint Conference on Artificial Intelligence, Springer-Verlag Berlin / Heidelberg, Perth, Australia, pp. 600-608.View/Download from: Publisher's site
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional colour assignment involves intensive human effort. Automatic methods have been proposed to establish relations between image textures and the corresponding colours. Existing research mainly focuses on linear relations. In this paper, we employ sparse constraints in the model of texture-colour relationship. The technique is developed on a locally linear model, which assumes manifold assumption of the distribution of the image data. Given the texture of an image patch, learning the model transfers colours to the texture patch by combining known colours of similar texture patches. The sparse constraint checks the contributing factors in the model and helps improve the stability of the colour transfer. Experiments show that our method gives superior results to those of the previous work.
Li, J & Tao, D 2010, 'An Exponential Family Extension to Principal Component Analysis', International Conference on Neural Information Processing 2011, International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 1-9.
In this paper, we present a unified probabilistic model for constrained factorisation models, which employs exponential family distributions to represent the constrained factors. Our main objective is to provide a versatile framework, on which prototype models with various constraints can be implemented effortlessly. For learning the proposed stochastic model, Gibbs sampling is employed for model inference. We also demonstrate the utility and versatility of the model by experiments.
Li, J & Tao, D 2010, 'Boosted Dynamic Cognitive Activity Recognition from Brain Images', Proceedings - The 9th International Conference on Machine Learning and Applications, ICMLA 2010, International Conference on Machine Learning and Applications, IEEE, Washington, D.C., USA, pp. 361-366.View/Download from: Publisher's site
Functional Magnetic Resonance Imaging (fMRI) has become an important diagnostic tool for measuring brain haemodynamics. Previous research on analysing fMRI data mainly focuses on detecting low-level neuron activation from the ensued haemodynamic activities. An important recent advance is to show that the high-level cognitive status is recognisable from a period of fMRI records. Nevertheless, it would also be helpful to reveal dynamics of cognitive activities during the period. In this paper, we tackle the problem of discovering the dynamic cognitive activities by proposing an algorithm of boosted structure learning. We employ statistic model of random fields to represent the dynamics of the brain. To exploit the rich fMRI observations with reasonable model complexity, we build multiple models, where one model links the cognitive activities to only a fraction of the fMRI observations. We combine the simple models by using an altered AdaBoost scheme for multi-class structure learning and show theoretical justification of the proposed scheme. Empirical test shows the method effectively links the physiological and the psychological activities of the brain.
Li, J & Tao, D 2010, 'Simple exponential family PCA', Journal of Machine Learning Research, pp. 453-460.
Bayesian principal component analysis (BPCA), a probabilistic reformulation of PCA with Bayesian model selection, is a systematic approach to determining the number of essential principal components (PCs) for data representation. However, it assumes that data are Gaussian distributed and thus it cannot handle all types of practical observations, e.g. integers and binary values. In this paper, we propose simple exponential family PCA (SePCA), a generalised family of probabilistic principal component analysers. SePCA employs exponential family distributions to handle general types of observations. By using Bayesian inference, SePCA also automatically discovers the number of essential PCs. We discuss techniques for fitting the model, develop the corresponding mixture model, and show the effectiveness of the model based on experiments.
The manifold learning algorithms are promising data analysis tools. However, to fit an unseen point in a learned model, the point must be located in the training set, which limits its scalability. In this paper, we discuss how to select landmarks from the data to help locate the test points. Our method is for data on manifolds: the way the landmarks represent the data in the ambient space should resemble the way they represent the data on the manifold. Compared to the previous research, (i) Our test foregoes the requirement of knowing the intrinsic manifold dimension and thus is more applicable and robust. (ii) Our selection implies a provable topology preservation property. (iii) We also provide a way to improve existing landmarks. Experiments on the synthetic data and the real data have been done. The results support the proposed properties and algorithms.
Li, J, Hao, P & Zhang, C 2008, 'Transferring colours to grayscale images by locally linear embedding', BMVC 2008 - Proceedings of the British Machine Vision Conference 2008.View/Download from: Publisher's site
In this paper, we propose a learning-based method for adding colours to grayscale images. In contrast to many previous computer-aided colourizing methods, which require intensive and accurate human intervention, our method needs only the user to provide a colourful image of the similar content as the grayscale image. We accept the "image manifold" assumption and apply manifold learning methods to model the relations between the chromatic channels and the gray levels in the training images. Then we synthesize the objective chromatic channels using the learned relations. Experiments show that our method gives superior results to those of the previous work.