Skip to main content

Site navigation

  • University of Technology Sydney home
  • Home

    Home
  • For students

  • For industry

  • Research

Explore

  • Courses
  • Events
  • News
  • Stories
  • People

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt
  • Study at UTS

    • arrow_right_alt Find a course
    • arrow_right_alt Course areas
    • arrow_right_alt Undergraduate students
    • arrow_right_alt Postgraduate students
    • arrow_right_alt Research Masters and PhD
    • arrow_right_alt Online study and short courses
  • Student information

    • arrow_right_alt Current students
    • arrow_right_alt New UTS students
    • arrow_right_alt Graduates (Alumni)
    • arrow_right_alt High school students
    • arrow_right_alt Indigenous students
    • arrow_right_alt International students
  • Admissions

    • arrow_right_alt How to apply
    • arrow_right_alt Entry pathways
    • arrow_right_alt Eligibility
arrow_right_altVisit our hub for students

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Apply for a coursearrow_right_alt
  • Current studentsarrow_right_alt
  • Scholarshipsarrow_right_alt
  • Featured industries

    • arrow_right_alt Agriculture and food
    • arrow_right_alt Defence and space
    • arrow_right_alt Energy and transport
    • arrow_right_alt Government and policy
    • arrow_right_alt Health and medical
    • arrow_right_alt Corporate training
  • Explore

    • arrow_right_alt Tech Central
    • arrow_right_alt Case studies
    • arrow_right_alt Research
arrow_right_altVisit our hub for industry

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Find a UTS expertarrow_right_alt
  • Partner with usarrow_right_alt
  • Explore

    • arrow_right_alt Explore our research
    • arrow_right_alt Research centres and institutes
    • arrow_right_alt Graduate research
    • arrow_right_alt Research partnerships
arrow_right_altVisit our hub for research

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Find a UTS expertarrow_right_alt
  • Research centres and institutesarrow_right_alt
  • University of Technology Sydney home
Explore the University of Technology Sydney
Category Filters:
University of Technology Sydney home University of Technology Sydney home
  1. home
  2. arrow_forward_ios ... Newsroom
  3. arrow_forward_ios ... 2024
  4. arrow_forward_ios 10
  5. arrow_forward_ios We are the robots

We are the robots

10 October 2024
An android looks at a bot crawler

One measure of how consequential the breakneck rollout of genAI is to the very fundamentals of the internet is the raging battle around the humble robots.txt file, a vital but little-known feature of websites and search engine optimization, which tells web crawlers which parts of your website they can crawl.

Since ChatGPT's breakout moment, AI companies have taken to training their LLMs on pretty much everything they can find on the internet. They use bots to crawl the web, downloading web content as they do so. This is not new – Google has been crawling the internet in a process known as indexing, categorising the world wide web for users to search. 

But AI is different. As AI companies like Perplexity.ai build multimillion dollar business models on the content other people have paid to produce – raising still-unresolved copyright issues – they have trampled on an internet protocol that dates to 1994, the robots.txt file which says 'do not crawl this' as a guideline to bots.

The robots.txt protocol is an agreement, like a code handshake under the hood of how websites are searched, crawled and indexed. Robots.txt files used to have not much more than a rudimentary sitemap and maybe, if the site admin was pedantic, a few specific 'do not crawl' instructions. Not anymore. 

The robots.txt files of Australian media outlets today are like a roll call of bot agents who are told to go elsewhere for training data. Our survey of 34 robots.txt pages of major publishers here found everything from no bots blocked at all to one site which blocked 19.

Perplexity.ai was caught out ignoring the protocol, only to respond with a proposed revenue share model, which incrementally rewards publishers with a proportion of the revenue earned when one of their articles features in an answer to a query. There are concerns this could lead to publishers prioritising content that will 'align with algorithmic demands', much in the way that search and social has driven the growth of clickbait journalism.
 
Reading the fine print on this, Perplexity's Publisher Program offers participants a share of revenue when someone lands on their content through a Perplexity search. But as yet, there is no detail on how to join the program.

The six starting publishers are an odd bunch of huge to community level outlets; TIME, Der Spiegel, Fortune, Entrepreneur and The Texas Tribune, and WordPress.com, all with wildly divergent business models and levels of capitalisation. It's very hard to see how the non-profit Texas Tribune will leverage the necessary resources to ‘create their own custom answer engine on their website,’ one of the 'key components' of Perplexity's program.  

In recent weeks, content-delivery network provider Cloudflare has rolled out a product – for free – to allow site admins to monitor for bots, in real time, including those trying to camouflage their behaviour, like Perplexity was found to be doing. Cloudflare has gone a step further and debuted a tool that allows customers to pick and choose which bots they want to block or permit. Next, Cloudflare plans to build a marketplace where site owners can negotiate Terms of Use with LLM platforms, by allowing site owners to set a price for restricted sections of their sites which they will allow LLMs to crawl. 

It remains to be seen whether this performance enhancement to the venerable old protocol will bring some balance back to the publisher-platform relationship. 

Miguel D'Souza profile picture

Miguel D'Souza

Share
Share this on Facebook Share this on Twitter Share this on LinkedIn
Back to CMT news

Acknowledgement of Country

UTS acknowledges the Gadigal People of the Eora Nation and the Boorooberongal People of the Dharug Nation upon whose ancestral lands our campuses now stand. We would also like to pay respect to the Elders both past and present, acknowledging them as the traditional custodians of knowledge for these lands. 

University of Technology Sydney

City Campus

15 Broadway, Ultimo, NSW 2007

Get in touch with UTS

Follow us

  • Instagram
  • LinkedIn
  • YouTube
  • Facebook

A member of

  • Australian Technology Network
Use arrow keys to navigate within each column of links. Press Tab to move between columns.

Study

  • Find a course
  • Undergraduate
  • Postgraduate
  • How to apply
  • Scholarships and prizes
  • International students
  • Campus maps
  • Accommodation

Engage

  • Find an expert
  • Industry
  • News
  • Events
  • Experience UTS
  • Research
  • Stories
  • Alumni

About

  • Who we are
  • Faculties
  • Learning and teaching
  • Sustainability
  • Initiatives
  • Equity, diversity and inclusion
  • Campus and locations
  • Awards and rankings
  • UTS governance

Staff and students

  • Current students
  • Help and support
  • Library
  • Policies
  • StaffConnect
  • Working at UTS
  • UTS Handbook
  • Contact us
  • Copyright © 2025
  • ABN: 77 257 686 961
  • CRICOS provider number: 00099F
  • TEQSA provider number: PRV12060
  • TEQSA category: Australian University
  • Privacy
  • Copyright
  • Disclaimer
  • Accessibility