UTS site search

Who said what,where? Big Data’s geo-textual footprint

19 June 2017

Analysing large scale, multi-dimensional data streams for geo-textual information: Image: Unsplash

Dr Ying Zhang from the Centre for Artificial Intelligence, Faculty of Engineering and IT, is the latest UTS ARC Future Fellow.

With a research focus on Big Data Analytics, which is fundamental to Data Science, Dr Zhang has led the Centre’s database group since he joined the Faculty in 2014.  The Fellowship is awarded to support research into efficient query processing and analytics on large scale, multi-dimensional data streams, produced by increasing user-generated content on social media platforms. 

The four year project - Effective, Efficient and Scalable Processing of Big Geo-textual Streams  - has an $800k+ grant to examine geo-textual data through two types of information from publically available content on social media: the text information produced, and the location of the producer.

Advances in geo-positioning technologies and location-based social network services are generating rapidly growing amounts of data with both spatial and textual information, such as geo-tagged tweets, check-ins and user comments on points of interest.

“Geo-textual data has tremendous potential for the discovery of new and useful knowledge in many key applications including cybersecurity, public safety, e-marketing and social media analytics,” said Dr Zhang.

“This project will use database management and pattern recognition and data mining to develop effective, efficient and scalable query processing techniques for the big geo-textual streams Big Data is generating.“

Techniques developed will enable authorities to effectively collect, analyse and deliver urgent information, such as bushfire alerts and terrorist threats, for the sake of public safety. It can also help predict behaviours, as was evident in the recent US elections where a high volume of Tweets talking about presidential candidate Trump were generated in an area which he subsequently predicted to win and did, despite it being a long-term Democrat stronghold.  It could also be as simple as allowing a business to identify interest and promote the proximity of its own goods/service, perhaps with an e-coupon for immediate discount/encouragement

The project expects to address three key challenges brought by massive volumes and high speeds of big geo-textual streams:  better user experiences, increased efficiency and greater scalability in query processing. The data query process and database indexing will require new semantics and employ machine learning techniques for indexing and query processing techniques, and advanced approximate and distributed algorithms, as well as a system prototype.

“We expect to deliver technical innovations to tackle the distinct challenges faced by query processing in big geo-textual streams,” said Dr Zhang who will lead the research team which includes collaborators from UNSW, Simon Fraser University Canada, the Chinese University of Hong Kong and Microsoft Research Asia.