- Matrix and Tensor Factorization
- Deep learning
- Multi-label learning
- Class imbalance and dimensionality reduction
- Data science in Bioinformatics and Structural health monitoring
- Introduction to Data Analytics
- Cloud Computing and Software as a Service
- Fundamentals of Software Development
The performance of a wind turbine is profoundly affected by wind conditions. Small wind turbines usually achieve the demand for electricity in rural areas. The shape of the blade greatly influences the performance of the wind turbine. The present study aims to optimize the performance of a 20 kW horizontal-axis wind turbine (HAWT) under local wind conditions at Deniliquin, New South Wales, Australia. ANSYS Fluent was used to investigate the aerodynamic performance of the 20 KW HAWT. The effects of four Reynolds Averaged Navier Stokes (RANS) turbulence models on predicting the flow over the wind turbine under separation condition were examined. Transition SST model had the best agreement with NREL CER data, which was used to investigate the mechanical output at different rotational speeds and variable pitch angles. Then the aerodynamic shape of the rotor of the wind turbine was optimized to maximize the annual energy production (AEP) in the Deniliquin region. Statistical wind analysis was applied to define the Weibull function and scale parameters which were 2.096 and 5.042 m/s, respectively. HARP_Opt was enhanced with design variables concerning the shape of the blade, rated rotational speed, and pitch angle. Pitch angle remained at 0ᵒ while the rising wind speed improved rotor speed to 148.4482 rpm at rated speed. This optimization improved the AEP rate by 9.068% when compared to the original NREL design.
Braytee, A, Liu, W, Anaissi, A & Kennedy, PJ 2019, 'Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance', ACM Transactions on Intelligent Systems and Technology (TIST), vol. 10, no. 5.
Altaee, A, Braytee, A, Millar, GJ & Naji, O 2019, 'Energy efficiency of hollow fibre membrane module in the forward osmosis seawater desalination process', JOURNAL OF MEMBRANE SCIENCE, vol. 587.View/Download from: Publisher's site
Chacon, D, Braytee, A, Huang, Y, Thoms, J, Subramanian, S, Sauerland, MC, Bohlander, SK, Braess, J, Wörmann, BJ, Berdel, WE, Hiddemann, W, Gabrys, B, Metzeler, KH, Herold, T, Pimanda, J & Beck, D 2019, 'Prospective Identification of Acute Myeloid Leukemia Patients Who Benefit from Gene-Expression Based Risk Stratification', Blood, vol. 134, no. Supplement_1, pp. 1397-1397.View/Download from: Publisher's site
Background: Acute myeloid leukemia (AML) is a highly heterogeneous malignancy and risk stratification based on genetic and clinical variables is standard practice. However, current models incorporating these factors accurately predict clinical outcomes for only 64-80% of patients and fail to provide clear treatment guidelines for patients with intermediate genetic risk. A plethora of prognostic gene expression signatures (PGES) have been proposed to improve outcome predictions but none of these have entered routine clinical practice and their role remains uncertain.
Methods: To clarify clinical utility, we performed a systematic evaluation of eight highly-cited PGES i.e. Marcucci-7, Ng-17, Li-24, Herold-29, Eppert-LSCR-48, Metzeler-86, Eppert-HSCR-105, and Bullinger-133. We investigated their constituent genes, methodological frameworks and prognostic performance in four cohorts of non-FAB M3 AML patients (n= 1175). All patients received intensive anthracycline and cytarabine based chemotherapy and were part of studies conducted in the United States of America (TCGA), the Netherlands (HOVON) and Germany (AMLCG).
Results: There was a minimal overlap of individual genes and component pathways between different PGES and their performance was inconsistent when applied across different patient cohorts. Concerningly, different PGES often assigned the same patient into opposing adverse- or favorable- risk groups (Figure 1A: Rand index analysis; RI=1 if all patients were assigned to equal risk groups and RI =0 if all patients were assigned to different risk groups). Differences in the underlying methodological framework of different PGES and the molecular heterogeneity between AMLs contributed to these low-fidelity risk assignments. However, all PGES consistently assigned a significant subset of patients into the same adverse- or favorable-risk groups (40%-70%; Figure 1B: Principal component analysis...
Krivtsov, AV, Evans, K, Gadrey, JY, Eschle, BK, Hatton, C, Uckelmann, HJ, Ross, KN, Perner, F, Olsen, SN, Pritchard, T, McDermott, L, Jones, CD, Jing, D, Braytee, A, Chacon, D, Earley, E, McKeever, BM, Claremon, D, Gifford, AJ, Lee, HJ, Teicher, BA, Pimanda, JE, Beck, D, Perry, JA, Smith, MA, McGeehan, GM, Lock, RB & Armstrong, SA 2019, 'A Menin-MLL Inhibitor Induces Specific Chromatin Changes and Eradicates Disease in Models of MLL-Rearranged Leukemia.', Cancer cell, vol. 36, no. 6, pp. 660-673.View/Download from: Publisher's site
Inhibition of the Menin (MEN1) and MLL (MLL1, KMT2A) interaction is a potential therapeutic strategy for MLL-rearranged (MLL-r) leukemia. Structure-based design yielded the potent, highly selective, and orally bioavailable small-molecule inhibitor VTP50469. Cell lines carrying MLL rearrangements were selectively responsive to VTP50469. VTP50469 displaced Menin from protein complexes and inhibited chromatin occupancy of MLL at select genes. Loss of MLL binding led to changes in gene expression, differentiation, and apoptosis. Patient-derived xenograft (PDX) models derived from patients with either MLL-r acute myeloid leukemia or MLL-r acute lymphoblastic leukemia (ALL) showed dramatic reductions of leukemia burden when treated with VTP50469. Multiple mice engrafted with MLL-r ALL remained disease free for more than 1 year after treatment. These data support rapid translation of this approach to clinical trials.
Mustapha, S, Braytee, A & Ye, L 2018, 'Multisource data fusion for classification of surface cracks in steel pipes', Journal of Nondestructive Evaluation, Diagnostics and Prognostics of Engineering Systems, vol. 1, no. 2.View/Download from: Publisher's site
Copyright © 2018 by ASME. This paper focuses on the development and validation of a robust framework for surface crack detection and assessment in steel pipes based on measured vibration responses collected using a network of piezoelectric (PZT) wafers. The pipe structure considered in this study contained multiple progressive cracks occurring at different locations and with various orientations (along the circumference or length). The fusion of data collected from multiple PZT wafers was investigated based on two approaches: (a) combining the raw data from all sensors before establishing a statistical model for damage classification and (b) combining the features from each sensor after applying a multiclass support vector machine recursive feature elimination (MCSVM-RFE), for dimensionality reduction, and taking the union of discriminative features among the different sources of data. A MCSVM learning algorithm was employed to train the data and generate a statistical classifier. The dataset consisted of ten classes, consisting of nine damage cases and the healthy state. The accuracy of the prediction based on the two fusion approaches resulted in a high accuracy, exceeding 95%, but the number of features needed to enrich the accuracy (95%) differed between the two approaches. Furthermore, the performance and the precision in the prediction of the classifier were evaluated when the data from only a single sensor was used compared with the combined data from all the sensors within the network. Very promising results in the classification of damage were obtained, based on the case study that included multiple damage scenarios with different lengths and orientations.
Gill, AQ, Braytee, A & Hussain, FK 2017, 'Adaptive service e-contract information management reference architecture', VINE Journal of Information and Knowledge Management Systems, vol. 47, no. 3, pp. 395-410.View/Download from: Publisher's site
© 2017, © Emerald Publishing Limited. Purpose: The aim of this paper is to report on the adaptive e-contract information management reference architecture using the systematic literature review (SLR) method. Enterprises need to effectively design and implement complex adaptive e-contract information management architecture to support dynamic service interactions or transactions. Design/methodology/approach: The SLR method is three-fold and was adopted as follows. First, a customized literature search with relevant selection criteria was developed, which was then applied to initially identify a set of 1,573 papers. Second, 55 of 1,573 papers were selected for review based on the initial review of each identified paper title and abstract. Finally, based on the second review, 24 papers relevant to this research were selected and reviewed in detail. Findings: This detailed review resulted in the adaptive e-contract information management reference architecture elements including structure, life cycle and supporting technology. Research limitations/implications: The reference architecture elements could serve as a taxonomy for researchers and practitioners to develop context-specific service e-contract information management architecture to support dynamic service interactions for value co-creation. The results are limited to the number of selected databases and papers reviewed in this study. Originality/value: This paper offers a review of the body of knowledge and novel e-contract information management reference architecture, which is important to support the emerging trends of internet of services.
Anaissi, A, Goyal, M, Catchpoole, DR, Braytee, A & Kennedy, PJ 2016, 'Ensemble Feature Learning of Genomic Data Using Support Vector Machine.', PLoS ONE, vol. 11, no. 6, pp. 1-17.View/Download from: Publisher's site
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.
Anaissi, A, Goyal, M, Catchpoole, DR, Braytee, A & Kennedy, PJ 2015, 'Case-based retrieval framework for gene expression data.', Cancer Informatics, vol. 14, pp. 21-31.View/Download from: Publisher's site
BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. METHODS: This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. RESULTS: The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children's Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. CONCLUSION: The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.
Naji, M, Braytee, A, Al-Ani, A, Anaissi, A, Goyal, M & Kennedy, PJ 2020, 'Design of airport security screening using queueing theory augmented with particle swarm optimisation', Service Oriented Computing and Applications, pp. 119-133.View/Download from: Publisher's site
© 2020, Springer-Verlag London Ltd., part of Springer Nature. Designing an efficient and reliable airport security screening system is a critical and challenging task. It is an essential element of airline and passenger safety which aims to provide the expected level of confidence and to ensure the safety of passengers and the aviation industry. In recent years, security at airports has gone through noticeable improvements with the utilisation of advanced technology and highly trained security officers. However, for many airports, it is important to find the best compromise between the capacity of the security area, the number of passengers and the number of screening machines and officers to maintain a high level of security and to ensure that the cost and waiting times for passengers and airlines are at acceptable levels. This paper proposes a novel method based on queueing theory augmented with particle swarm optimisation (QT-PSO) to predict passenger waiting times in a security screening context. This model consists of multiple servers operating in parallel and takes into consideration the complete scenario such as normal, slow and express lanes. Such an approach has the potential to be a reliable model that is able to assimilate variations in the number of passengers, security officers and security machines on the service time. To evaluate our proposed method, we collected real-world security screening data from an Australian airport from December to March for the two consecutive years of 2016 and 2017. The results show that our proposed QT-PSO method is superior to predict the average waiting time of passengers compared to the state of the art.
Naji, M, Braytee, A, Anaissi, A, Sianaki, OA & Al-Ani, A 2020, 'Optimizing the Waiting Time for Airport Security Screening Using Multiple Queues and Servers', Advances in Intelligent Systems and Computing, pp. 496-507.View/Download from: Publisher's site
© 2020, Springer Nature Switzerland AG. Airport security screening processes are essential to ensure the safety of passengers and the aviation industry. Security at airports has improved noticeably in recent years through the utilisation of state-of-the-art technologies and highly trained security officers. However, maintaining a high level of security can be costly to operate and implement and can cause delays for passengers and airlines. In optimising a security process it is essential to strike a balance between time delays, security and reduced operation cost. This paper uses queueing theory as a method to study the impact of queue formation and the size of the security area on the average waiting time for the case of multi-lane parallel servers. An experiment is conducted to validate the proposed approach.
Abdo, P, Phuoc Huynh, B, Braytee, A & Taghipour, R 2019, 'Effect of phase change material on temperature in a room fitted with a windcatcher', ASME International Mechanical Engineering Congress and Exposition, Proceedings (IMECE).View/Download from: Publisher's site
Copyright © 2019 ASME. Global warming and climate change have been considered as major challenges over the past few decades. Sustainable and renewable energy sources are nowadays needed to overcome the undesirable consequences of rapid development in the world. Phase change materials (PCM) are substances with high latent heat storage capacity which absorb or release the heat from or to the surrounding environment. They change from solid to liquid and vice versa. PCMs could be used as a passive cooling method which enhances energy efficiency in buildings. Integrating PCM with natural ventilation is investigated in this study by exploring the effect of phase change material on the temperature in a room fitted with a windcatcher. A chamber made of acrylic sheets fitted with a windcatcher is used to monitor the temperature variations. The dimensions of the chamber are 1250 x 1000 x 750 mm3. Phase change material is integrated respectively at the walls of the room, its floor and ceiling and within the windcatchers inlet channel. Temperature is measured at different locations inside the chamber. Wind is blown through the room using a fan with heating elements.
Naji, M, Al-Ani, A, Braytee, A, Anaissi, A & Kennedy, P 2019, 'Queue Formation Augmented with Particle Swarm Optimisation to Improve Waiting Time in Airport Security Screening', Advances in Intelligent Systems and Computing, Workshops of the 33rd International Conference on Advanced Information Networking and Applications, Springer, Japan, pp. 923-935.View/Download from: Publisher's site
© 2019, Springer Nature Switzerland AG. Airport security screening processes are essential to ensure the safety of both passengers and the aviation industry. Security at airports has improved noticeably in recent years through the utilisation of state-of-the-art technologies and highly trained security officers. However, maintaining a high level of security can be costly to operate and implement. It may also lead to delays for passengers and airlines. This paper proposes a novel queue formation method based on a queueing theory model augmented with a particle swarm optimisation method known as QQT-PSO to improve the average waiting time in airport security areas. Extensive experiments were conducted using real-world datasets collected from Sydney airport. Compared to the existing system, our method significantly reduces the average waiting time and operating cost by 11.89% compared to the one-queue formation.
Braytee, A, Anaissi, A & Kennedy, PJ 2018, 'Sparse feature learning using ensemble model for highly-correlated high-dimensional data', Neural Information Processing (LNCS), International Conference on Neural Information Processing, Springer, Siem Reap, Cambodia, pp. 423-434.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2018. High-dimensional highly correlated data exist in several domains such as genomics. Many feature selection techniques consider correlated features as redundant and therefore need to be removed. Several studies investigate the interpretation of the correlated features in domains such as genomics, but investigating the classification capabilities of the correlated feature groups is a point of interest in several domains. In this paper, a novel method is proposed by integrating the ensemble feature ranking and co-expression networks to identify the optimal features for classification. The main advantage of the proposed method lies in the fact, that it does not consider the correlated features as redundant. But, it shows the importance of the selected correlated features to improve the performance of classification. A series of experiments on five high dimensional highly correlated datasets with different levels of imbalance ratios show that the proposed method outperformed the state-of-the-art methods.
Anaissi, A, Braytee, A & Naji, M 2018, 'Gaussian Kernel Parameter Optimization in One-Class Support Vector Machines', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Rio de Janeiro, Brazil.View/Download from: Publisher's site
© 2018 IEEE. The one-class support vector machines with Gaussian kernel function is a promising machine learning method which have been employed extensively in the area of anomaly detection. However, generalization performance of OCSVM is profoundly influenced by its Gaussian model parameter σ. This paper proposes a new algorithm named Edged Support Vector (ESV) for tuning the Gaussian model parameter. The semantic idea of this algorithm is based on inspecting the spatial locations of the selected support vector samples. The algorithm selects the optimal value of σ which leads to a decision boundary that has all its support vectors reside on the surface of the training data (i.e. Edged support vector). A support vector is identified as an edge sample by constructing a hyperplane with its k-nearest neighbour samples using a hard margin linear support vector machine. The algorithm was successfully validated using two real world sensing datasets, one collected from a lab specimen which was replicated a jack arch from the Sydney Harbour Bridge, and another one collected from sensors mounted on vehicles for road condition assessment. Results show that the designed ESV algorithm is an appropriate choice to identify the optimal value of σ for OCSVM.
Braytee, A, Liu, W & Kennedy, PJ 2017, 'Supervised context-aware non-negative matrix factorization to handle high-dimensional high-correlated imbalanced biomedical data', Proceedings of the International Joint Conference on Neural Networks, 2017 International Joint Conference on Neural Networks, Anchorage, AK, USA, pp. 4512-4519.View/Download from: Publisher's site
© 2017 IEEE. Traditional feature selection techniques are used to identify a subset of the most useful features, and consider the rest as unimportant, redundant or noisy. In the presence of highly correlated features, many variable selection methods consider correlated features as redundant and need to be removed. In this paper, a novel supervised feature selection algorithm SCANMF is proposed by jointly integrating correlation analysis and structural analysis of the balanced supervised non-negative matrix factorization (NMF). Furthermore, ℓ2,1-norm minimization constraint is incorporated into the objective function to guarantee sparsity in the feature matrix rows and reduce noisy features. Our algorithm exploits the discriminative information, feature combinations, and the original features in the context of a supervised NMF method which can be beneficial for both classification and interpretation. An efficient iterative algorithm is designed to solve the constrained optimization problem with guaranteed convergence. Finally, a series of extensive experiments are conducted on 8 complex datasets. Promising results using multiple classifiers demonstrate the effectiveness and efficiency of our algorithm over state-of-the-art methods.
Braytee, A, Liu, W, Catchpoole, DR & Kennedy, PJ 2017, 'Multi-label feature selection using correlation information', International Conference on Information and Knowledge Management, Proceedings, ACM on Conference on Information and Knowledge Management, ACM, Singapore, Singapore, pp. 1649-1656.View/Download from: Publisher's site
© 2017 ACM. High-dimensional multi-labeled data contain instances, where each instance is associated with a set of class labels and has a large number of noisy and irrelevant features. Feature selection has been shown to have great benefits in improving the classification performance in machine learning. In multi-label learning, to select the discriminative features among multiple labels, several challenges should be considered: interdependent labels, different instances may share different label correlations, correlated features, and missing and .awed labels. This work is part of a project at .e Children's Hospital at Westmead (TB-CHW), Australia to explore the genomics of childhood leukaemia. In this paper, we propose a CMFS (Correlated-and Multi-label Feature Selection method), based on non-negative matrix factorization (NMF) for simultaneously performing feature selection and addressing the aforementioned challenges. Significantly, a major advantage of our research is to exploit the correlation information contained in features, labels and instances to select the relevant features among multiple labels. Furthermore, l2;1-norm regularization is incorporated in the objective function to undertake feature selection by imposing sparsity on the feature matrix rows. We employ CMFS to decompose the data and multi-label matrices into a low-dimensional space. To solve the objective function, an efficient iterative optimization algorithm is proposed with guaranteed convergence. Finally, extensive experiments are conducted on high-dimensional multi-labeled datasets. The experimental results demonstrate that our method significantly outperforms state-of-the-art multi-label feature selection methods.
Anaissi, A, Khoa, NLD, Mustapha, S, Alamdari, MM, Braytee, A, Wang, Y & Chen, F 2017, 'Adaptive one-class support vector machine for damage detection in structural health monitoring', Advances in Knowledge Discovery and Data Mining (LNAI), Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Springer, Jeju, South Korea, pp. 42-57.View/Download from: Publisher's site
© 2017, Springer International Publishing AG. Machine learning algorithms have been employed extensively in the area of structural health monitoring to compare new measurements with baselines to detect any structural change. One-class support vector machine (OCSVM) with Gaussian kernel function is a promising machine learning method which can learn only from one class data and then classify any new query samples. However, generalization performance of OCSVM is profoundly influenced by its Gaussian model parameter ϭ. This paper proposes a new algorithm named Appropriate Distance to the Enclosing Surface (ADES) for tuning the Gaussian model parameter. The semantic idea of this algorithm is based on inspecting the spatial locations of the edge and interior samples, and their distances to the enclosing surface of OCSVM. The algorithm selects the optimal value of ϭ which generates a hyperplane that is maximally distant from the interior samples but close to the edge samples. The sets of interior and edge samples are identified using a hard margin linear support vector machine. The algorithm was successfully validated using sensing data collected from the Sydney Harbour Bridge, in addition to five public datasets. The designed ADES algorithm is an appropriate choice to identify the optimal value of ϭ for OCSVM especially in high dimensional datasets.
Mustapha, S, Braytee, A & Ye, L 2017, 'Detection of Surface Cracking in Steel Pipes based on Vibration Data using a Multi-Class Support Vector Machine Classifier', SENSORS AND SMART STRUCTURES TECHNOLOGIES FOR CIVIL, MECHANICAL, AND AEROSPACE SYSTEMS 2017, Conference on Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems, SPIE-INT SOC OPTICAL ENGINEERING, Portland, OR.View/Download from: Publisher's site
Braytee, A, Catchpoole, DR, Kennedy, PJ & Liu, W 2016, 'Balanced Supervised Non-Negative Matrix Factorization for Childhood Leukaemia Patients', CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ACM International Conference on Information and Knowledge Management, ACM, Indianapolis, Indiana, USA.View/Download from: Publisher's site
Supervised feature extraction methods have received considerable attention in the data mining community due to their capability to improve the classification performance of the unsupervised dimensionality reduction methods. With increasing dimensionality, several methods based on supervised feature extraction are proposed to achieve a feature ranking especially on microarray gene expression data. This paper proposes a method with twofold objectives: it implements a balanced supervised non-negative matrix factorization (BSNMF) to handle the class imbalance problem in supervised non-negative matrix factorization techniques. Furthermore, it proposes an accurate gene ranking method based on our proposed BSNMF for microarray gene expression datasets. To the best of our knowledge, this is the first work to handle the class imbalance problem in supervised feature extraction methods. This work is part of a Human Genome project at The Children's Hospital at Westmead (TB-CHW), Australia. Our experiments indicate that the factorized components using supervised feature extraction approach have more classification capability than the unsupervised one, but it drastically fails at the presence of class imbalance problem. Our proposed method outperforms the state-of-the-art methods and shows promise in overcoming this concern.
Braytee, A, Liu, W & Kennedy, P 2016, 'A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data', Springer International Publishing, International Conference on Neural Information Processing, Springer International Publishing, Kyoto, Japan, pp. 78-86.View/Download from: Publisher's site
In this paper, novel cost-sensitive principal component analysis (CSPCA) and cost-sensitive non-negative matrix factorization (CSNMF) methods are proposed for handling the problem of feature extraction from imbalanced data. The presence of highly imbalanced data misleads existing feature extraction techniques to produce biased features, which results in poor classification performance especially for the minor class problem. To solve this problem, we propose a cost-sensitive learning strategy for feature extraction techniques that uses the imbalance ratio of classes to discount the majority samples. This strategy is adapted to the popular feature extraction methods such as PCA and NMF. The main advantage of the proposed methods is that they are able to lessen the inherent bias of the extracted features to the majority class in existing PCA and NMF algorithms. Experiments on twelve public datasets with different levels of imbalance ratios show that the proposed methods outperformed the state-of-the-art methods on multiple classifiers.
Braytee, A, Gill, AQ, Kennedy, PJ & Hussain, FK 2015, 'A Review and comparison of service E-Contract Architecture Metamodels', Neural Information Processing (LNCS), International Conference on Neural Information Processing, Springer, Istanbul, Turkey, pp. 583-595.View/Download from: Publisher's site
© Springer International Publishing Switzerland 2015. An adaptive service e-contract is an electronic agreement which is required to enable adaptive or agile service sourcing and pro- visioning. There are a number of e-contract metamodels that can be used to create a context specific adaptive service e-contract. The chal- lenge is which one to choose and adopt for adaptive services. This paper presents a review and comparison of well-known e-contract metamod- els using the architecture theory. The architecture theory allows the analysis of the e-contract metamodels using a three-dimension analyt- ical lens: structure, behavior and technology. The results of this paper highlight the metamodels structural, behavioral and technological differ- ences and similarities. This paper will help researchers and practitioners to observe the existing e-contract metamodels are appropriate to the adaptive services or if thwhetherere is a need to merge and integrate the concepts of these metamodels to propose a new unifying adaptive service e-contract metamodel. This paper is limited to the number of compared metamodels.
Braytee, A, Hussain, F, Anaissi, A & Kennedy, PJ 2015, 'ABC-Sampling for balancing imbalanced datasets based on Artificial Bee Colony algorithm', Proceedings 2015 IEEE 14th International Conference on Machine Learning and Applications ICMLA 2015, International Conference on Machine Learning and Applications, IEEE, Miami, Florida, pp. 594-599.View/Download from: Publisher's site
Class imbalanced data is a common problem for predictive modelling in domains such as bioinformatics. It occurs when the distribution of classes is not uniform among samples and results in a biased prediction of learning towards majority classes. In this study, we propose the ABC-Sampling algorithm based on a swarm optimization method called Artificial Bee Colony, which models the natural foraging behaviour of honeybees. Our algorithm lessens the effects of imbalanced classes by selecting the most informative majority samples using a forward search and storing them in a ranked subset. Then we construct a balanced dataset with a planned undersampling strategy to extract the most frequent majority samples from the top ranked subset and combine them with all minority samples. Our algorithm is superior to a state-of-the-art method on nine benchmark datasets with various levels of imbalance ratios.