Many contemporary predictive methods take it for granted that data in the future will arise from the same underlying distributions as it has in the past. In the real world, however, the existence of adversaries and intruders indicates that future data can be deliberately constructed to maliciously increase the error rate of prediction models. A prominent example is the detection of spam emails, where words and images are intelligently transformed by the senders of spam in an effort to deceive spam filters.
AAi data scientist and lecturer Dr Wei Liu is investigating the use of game theory principles to address the above challenges and solve adversarial learning problems in big data analytics. Dr Liu’s research takes the behaviour of potential adversaries into model training processes and produces learners (classifiers) that make robust future predictions. His work provides business solutions that take proactive steps against fraudulent insurance claims, spam, fake product reviews, false service ratings, etc.
The roles of adversaries and data miners are modelled as two players competing in a game. The interactions between these two players are formulated as sequential Stackelberg games, where the Nash equilibrium suggests insights into the most robust classifiers that the data miner should build.
The learning theories produced by this research are applicable to a wide range of practical problems. They provide business solutions for taking pro-active action in respect of:
- spam filtering,
- fraudulent insurance claims,
- fraud detection,
- fake product review analysis, and
- false service ratings.
The results of this research have been published in the prestigious journal Machine Learning as well as top-tier conferences. In a recent paper presented at the 2014 IEEE ICDM Conference, Dr Liu (in collaboration with the University of Sydney) proposed an advanced sparse modelling algorithm that identifies which components (out of thousands) of a learning system are the most vulnerable under attack and suggested appropriate remedial strategies. In their empirical evaluations on digital image classification tasks, the paper’s authors found that adding one extra pixel dot in the centre-left of a digital image “7” can deceive a well-trained standard SVM (support vector machine) classifier, causing it to misclassify the digit as a “9”.

Sparse adversarial learning model identifies the most significant feature that distinguishes “7” and “9”, and keeps most original pixels intact after the attack.
“Intruders are becoming more intelligent and are capable of reverse-engineering static learning systems,” Dr Liu highlighted, “Existing efforts in big data research largely concentrate on the scalability of learning algorithms, but in practice, the potential usefulness to future events of patterns and rules learned from historical data deserves more attention.”
Dr Liu can be reached at wei.liu@uts.edu.au.