Deep industry collaborations in nlPUG are driving transformative outcomes across a wide range of sectors and applications.
Our research
Movie subtitle translation for Amazon Prime Video
As recipient of an Amazon research grant, we are carrying out a collaborative project with Amazon Prime Video to improve the quality and style of automatically-translated movie subtitles. The outcomes of the project are a set of technologies that can improve movie subtitles in a variety of languages, including those with fewer digital resources (commonly referred to as “low-resource languages”), extending access to diverse speakers populations. For more details, please refer to https://aclanthology.org/2026.findings-eacl.209.pdf
Unstructured text analysis for the Transport Accident Commission (TAC)
Using static and dynamic topic modelling, our team has helped predict the recovery trajectory of the clients of the Victorian Government’s accident compensation agency, the TAC. We have also tackled the complexity of the TAC’s internal documentation, generating an informative taxonomy of their document collections and building a customised search facility. This collaborative industry project, carried out as part of a CRC, has funded a PhD scholarship and a research associate position at UTS, and the deliverables have been used in the TAC’s broader framework for client analysis and needs prediction.
Multi-document summarisation for RoZetta Technology
As part of a close collaboration with Sydney-based data science company RoZetta Technology, our team has developed an automated multi-document summarisation tool which can generate informative and fluent summaries from clusters of related documents. The tool could be used, for instance, to generate real-time summaries of financial news at the beginning of a trading day. The project has funded a PhD position and an adjunct researcher.
Named-entity recognition in Persian
Our researchers have developed a novel approach to named-entity recognition (NER) in Persian, collaborating with Australian company Sintelix. The project funded a PhD scholarship, and the resultant software – also modified to enable Arabic NER – is now used by Sintelix. With Persian having fewer annotations than other mainstream languages, among the project’s key contributions has been the public release of the first NER-annotated Persian dataset. The project has also delivered four different word embeddings trained over unannotated corpora for a comprehensive Persian dictionary of nearly 50,000 unique words.
LEARN MORE
about our researchers