Can supervise: YES
Zhang, H & Xu, M 2020, 'Improving the generalization performance of deep networks by dual pattern learning with adversarial adaptation', KNOWLEDGE-BASED SYSTEMS, vol. 200.View/Download from: Publisher's site
Hu, S, Xu, M, Zhang, H, Xiao, C & Gui, C 2019, 'Affective Content-aware Adaptation Scheme on QoE Optimization of Adaptive Streaming over HTTP', ACM Transactions on Multimedia Computing Communications and Applications, vol. 15, no. 3s, pp. 100-118.View/Download from: Publisher's site
The article presents a novel affective content-aware adaptation scheme (ACAA) to optimize Quality of Experience (QoE) for dynamic adaptive video streaming over HTTP (DASH). Most of the existing DASH adaptation schemes conduct video bit-rate adaptation based on an estimation of available network resources, which ignore user preference on affective content (AC) embedded in video data streaming over the network. Since the personal demands to AC is very different among all viewers, to satisfy individual affective demand is critical to improve the QoE in commercial video services. However, the results of video affective analysis cannot be applied into a current adaptive streaming scheme directly. Correlating the AC distributions in user's viewing history to each being streamed segment, the affective relevancy can be inferred as an affective metric for the AC related segment. Further, we have proposed an ACAA scheme to optimize QoE for user desired affective content while taking into account both network status and affective relevancy. We have implemented the ACAA scheme over a realistic trace-based evaluation and compared its performance in terms of network performance, QoE with that of Probe and Adaptation (PANDA), buffer-based adaptation (BBA), and Model Predictive Control (MPC). Experimental results show that ACAA can preserve available buffer time for future being delivered affective content preferred by viewer's individual preference to achieve better QoE in affective contents than those normal contents while remain the overall QoE to be satisfactory.
Rao, T, Li, X, Zhang, H & Xu, M 2019, 'Multi-level region-based Convolutional Neural Network for image emotion classification', Neurocomputing, vol. 333, pp. 429-439.View/Download from: Publisher's site
© 2018 Analyzing emotional information of visual content has attracted growing attention for the tendency of internet users to share their feelings via images and videos online. In this paper, we investigate the problem of affective image analysis, which is very challenging due to its complexity and subjectivity. Previous research reveals that image emotion is related to low-level to high-level visual features from both global and local view, while most of the current approaches only focus on improving emotion recognition performance based on single-level visual features from a global view. Aiming to utilize different levels of visual features from both global and local view, we propose a multi-level region-based Convolutional Neural Network (CNN) framework to discover the sentimental response of local regions. We first employ Feature Pyramid Network (FPN) to extract multi-level deep representations. Then, an emotional region proposal method is used to generate proper local regions and remove excessive non-emotional regions for image emotion classification. Finally, to deal with the subjectivity in emotional labels, we propose a multi-task loss function to take the probabilities of images belonging to different emotion classes into consideration. Extensive experiments show that our method outperforms the state-of-the-art approaches on various commonly used benchmark datasets.
Zhang, H & Xu, M 2018, 'Recognition of Emotions in User-Generated Videos with Kernelized Features', IEEE Transactions on Multimedia, vol. 20, no. 10, pp. 2824-2835.View/Download from: Publisher's site
© 1999-2012 IEEE. Recognition of emotions in user-generated videos has attracted increasing research attention. Most existing approaches are based on spatial features extracted from video frames. However, due to the broad affective gap between spatial features of images and high-level emotions, the performance of existing approaches is restricted. To bridge the affective gap, we propose recognizing emotions in user-generated videos with kernelized features. We reformulate the equation of the discrete Fourier transform as a linear kernel function and construct a polynomial kernel function based on the linear kernel. The polynomial kernel is applied to spatial features of video frames to generate kernelized features. Compared with spatial features, kernelized features show superior discriminative capability. Moreover, we are the first to apply the sparse representation method to reduce the impact of noise contained in videos; this method helps contribute to performance improvement. Extensive experiments are conducted on two challenging benchmark datasets, that is, VideoEmotion-8 and Ekman-6. The experimental results demonstrate that the proposed method achieves state-of-the-art performance.
Takalkar, MA, Zhang, H & Xu, M 2019, 'Improving Micro-expression Recognition Accuracy Using Twofold Feature Extraction', MultiMedia Modeling (LNCS), International Conference on Multimedia Modeling, Springer, Thessaloniki, Greece, pp. 652-664.View/Download from: Publisher's site
© 2019, Springer Nature Switzerland AG. Micro-expressions are generated involuntarily on a person's face and are usually a manifestation of repressed feelings of the person. Micro-expressions are characterised by short duration, involuntariness and low intensity. Because of these characteristics, micro-expressions are difficult to perceive and interpret correctly, and they are profoundly challenging to identify and categorise automatically. Previous work for micro-expression recognition has used hand-crafted features like LBP-TOP, Gabor filter, HOG and optical flow. Recent work also has demonstrated the possible use of deep learning for micro-expression recognition. This paper is the first work to explore the use of hand-craft feature descriptor and deep feature descriptor for micro-expression recognition task. The aim is to use the hand-craft and deep learning feature descriptor to extract features and integrate them together to construct a large feature vector to describe a video. Through experiments on CASME, CASME II and CASME+2 databases, we demonstrate our proposed method can achieve promising results for micro-expression recognition accuracy with larger training samples.
Shi, Z, Xu, M, Pan, Q, Yan, B & Zhang, H 2018, 'LSTM-based Flight Trajectory Prediction', International Joint Conference on Neural Networks, IEEE, Rio de Janeiro, Brazil.View/Download from: Publisher's site
Safety ranks the first in Air Traffic Management (ATM). Accurate trajectory prediction can help ATM to forecast potential dangers and effectively provide instructions for safely traveling. Most trajectory prediction algorithms work for land traffic, which rely on points of interest (POIs) and are only suitable for stationary road condition. Compared with land traffic prediction, flight trajectory prediction is very difficult because way-points are sparse and the flight envelopes are heavily affected by external factors. In this paper, we propose a flight trajectory prediction model based on a Long Short-Term Memory (LSTM) network. The four interacting layers of a repeating module in an LSTM enables it to connect the long-term dependencies to present predicting task. Applying sliding windows in LSTM maintains the continuity and avoids compromising the dynamic dependencies of adjacent states in the long-term sequences, which helps to improve accuracy of trajectory prediction. Taking time dimension into consideration, both 3-D (time stamp, latitude and longitude) and 4-D (time stamp, latitude, longitude and altitude) trajectories are predicted to prove the efficiency of our approach. The dataset we use was collected by ADS-B ground stations. We evaluate our model by widely used measurements, such as the mean absolute error (MAE), the mean relative error (MRE), the root mean square error (RMSE) and the dynamic warping time (DWT) methods. As Markov Model is the most popular in time series processing, comparisons among Markov Model (MM), weighted Markov Model (wMM) and our model are presented. Our model outperforms the existing models (MM and wMM) and provides a strong basis for abnormal detection and decision-making.
Zhang, H & Xu, M 2016, 'Modeling Temporal Information Using Discrete Fourier Transform for Recognizing Emotions in User-generated Videos', Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), IEEE International Conference on Image Processing, IEEE, Phoenix, Arizona, USA, pp. 629-633.View/Download from: Publisher's site
With the widespread of user-generated Internet videos, emotion recognition in those videos attracts increasing research efforts. However, most existing works are based on framelevel visual features and/or audio features, which might fail to model the temporal information, e.g. characteristics accumulated along time. In order to capture video temporal information, in this paper, we propose to analyse features in frequency domain transformed by discrete Fourier transform (DFT features). Frame-level features are firstly extract by a pre-trained deep convolutional neural network (CNN). Then, time domain features are transferred and interpolated into DFT features. CNN and DFT features are further encoded and fused for emotion classification. By this way, static image features extracted from a pre-trained deep CNN and temporal information represented by DFT features are jointly considered for video emotion recognition. Experimental results demonstrate that combining DFT features can effectively capture temporal information and therefore improve emotion recognition performance. Our approach has achieved a state-of-the-art performance on the largest video emotion dataset (VideoEmotion-8 dataset), improving accuracy from 51.1% to 55.6%.