Skip to main content
Train at Sydney station

How hard can it be to get trains to run on time? Anyone waiting impatiently for a delayed train anywhere in the world might be tempted to think they could get things running smoothly.

But the reality is that coordinating a public transport system is a huge challenge involving a complex mix of working parts.

That’s why Sydney Trains turned to data science to gain a more comprehensive understanding of the cause of delays on the network.

They engaged a UTS research team led by Distinguished Professor Fang Chen to crunch data using the latest machine learning technology, hoping to better understand what causes delay in order to help provide solutions.

A world leader in AI and data science, Professor Chen is the Executive Director of Data Science UTS. A Eureka Prize-winner in 2018, she has worked extensively with industry to produce innovative AI solutions for big problems.

Although she started out as a student of electrical engineering and remote sensing, she eventually switched to the “cool” discipline of artificial intelligence, applying the blossoming field of data science to problems in a range of industries including water storage, civil infrastructure and public transport systems.

Distinguished Professor Fang Chen

Sydney Trains engaged her and her team for a project it hoped would help evaluate the greater impact of a delay, which is costly for operations as well as customer experience.

For example, what might be the knock-on effect of a mere 20-second delay across an entire system? And what could be done about it?

“Trains come into the station, open their doors, more people get on, the door can't close, the train can't depart,” Professor Chen explains.

“We call it primary delay: where does it start and when is it going to escalate into secondary delay, where this train can't depart and the next train can't come in? So, one delay in one train causes a ripple effect across an entire network.

“There might be a storm, some trains are undergoing maintenance at a depot or any other myriad of issues. All of a sudden you have an influx of customers on the platform because of the bad weather. So, there are delays and the problems amplify across the network.”

Challenges of retrieving scattered data

Bringing the power of AI, machine learning and next-gen computational analysis to the rail system was, in fact, a significant challenge for the UTS team.

Sydney Trains has access to a data goldmine – from Opal card information and signal records to CCTV footage, staff rosters and track information. And then there are weather records, too, as storms and extreme heat can affect punctuality.

Professor Chen says that this “scattered” data drawn from diverse systems, while powerful fuel for the complex analysis her team does, needed to be standardised before work could begin.

The challenge of reviewing data from CCTV cameras to help solve the challenge illustrates the point.

Dr Ruimin Li, the project’s lead for Sydney Trains, is an expert in intelligent transport systems and is now Director of Intelligence and Enablement for Greater Sydney operations at Transport for NSW.

“CCTV footage is primarily obtained for passenger safety, not data science,” she says.

“Prior to the project, if we wanted to use CCTV footage to detect start and stop times, it would mean we needed to have somebody review the footage manually, which is a highly time-consuming task.”

The UTS team’s solution was to develop an algorithm that automatically processes the station’s CCTV camera footage, enabling Sydney Trains to obtain the information they needed to determine exact arrival and departure time, as well as number of passengers.

The UTS team came up with a smarter way to accurately detect the train stop and start times.

Dr Ruimin Li, Transport for NSW

When your modelling matches reality

With the data finally standardised, the team used sophisticated AI modelling to predict the system’s performance and to identify problems likely bottlenecks and stations most likely to cause delay.

They tested their predictions against the real-time workings of the train system and were gratified to find their AI analysis had hit the mark in terms of accuracy and was much more effective than manual methods.

“When you see your forecasted delay is really matched with the reality you know that you mastered the art,” laughs Professor Chen. “That is a nice outcome.”

The UTS modelling gives Sydney Trains a software ‘toolkit’ to assist with timetable updates, train maintenance scheduling updates and staff rostering at congested locations.

Of course, big changes to the transport system cost big dollars, and even the infrastructure changes needed to collect easy-to-use big data are extremely costly.

Dr Li is pleased the UTS researchers were able to integrate all the diverse information to deliver a big system picture affordably.

Frustratingly, the two-year project ended shortly before the COVID-19 pandemic dramatically reduced public transport use and introduced new challenges such as social distancing.

But Dr Li says the information is still extremely valuable and that the project has lived up to expectations.

“It shows us the potential of data science and insights to help the business make better decisions, which will improve our customers’ experience,” she says.

Slide of data modelling for TfNSW

Research team