Skip to main content

Site navigation

  • University of Technology Sydney home
  • Home

    Home
  • For students

  • For industry

  • Research

Explore

  • Courses
  • Events
  • News
  • Stories
  • People

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt
  • Study at UTS

    • arrow_right_alt Find a course
    • arrow_right_alt Course areas
    • arrow_right_alt Undergraduate students
    • arrow_right_alt Postgraduate students
    • arrow_right_alt Research Masters and PhD
    • arrow_right_alt Online study and short courses
  • Student information

    • arrow_right_alt Current students
    • arrow_right_alt New UTS students
    • arrow_right_alt Graduates (Alumni)
    • arrow_right_alt High school students
    • arrow_right_alt Indigenous students
    • arrow_right_alt International students
  • Admissions

    • arrow_right_alt How to apply
    • arrow_right_alt Entry pathways
    • arrow_right_alt Eligibility
arrow_right_altVisit our hub for students

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Apply for a coursearrow_right_alt
  • Current studentsarrow_right_alt
  • Scholarshipsarrow_right_alt
  • Featured industries

    • arrow_right_alt Agriculture and food
    • arrow_right_alt Defence and space
    • arrow_right_alt Energy and transport
    • arrow_right_alt Government and policy
    • arrow_right_alt Health and medical
    • arrow_right_alt Corporate training
  • Explore

    • arrow_right_alt Tech Central
    • arrow_right_alt Case studies
    • arrow_right_alt Research
arrow_right_altVisit our hub for industry

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Find a UTS expertarrow_right_alt
  • Partner with usarrow_right_alt
  • Explore

    • arrow_right_alt Explore our research
    • arrow_right_alt Research centres and institutes
    • arrow_right_alt Graduate research
    • arrow_right_alt Research partnerships
arrow_right_altVisit our hub for research

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Find a UTS expertarrow_right_alt
  • Research centres and institutesarrow_right_alt
  • University of Technology Sydney home
Explore the University of Technology Sydney
Category Filters:
University of Technology Sydney home University of Technology Sydney home
  1. home
  2. arrow_forward_ios ... About UTS
  3. arrow_forward_ios ... Information on Faculties...
  4. arrow_forward_ios ... Faculty of Science
  5. arrow_forward_ios ... Partners and community
  6. arrow_forward_ios UTS Science in Focus
  7. arrow_forward_ios Data Analytics

Data Analytics

explore
  • UTS Science in Focus
    • Environment
      • arrow_forward Coral reefs and climate change
      • arrow_forward The algae revolution
      • arrow_forward Rebuilding Australia’s Coral Reefs
      • arrow_forward Rebuilding the reef one year on
    • Data Analytics
      • arrow_forward The paradox of probability
    • Forensics
      • arrow_forward Death, decomposition and detector dogs
      • arrow_forward DNA, the silver bullet for solving crime…or is it?’ With Dr Georgina Meakin
      • arrow_forward The fingerprint detection (r)evolution
      • arrow_forward ‘Green’ ammunition and the future of gunshot residue analysis
      • arrow_forward Illicit drugs - are we fighting a losing battle?
      • arrow_forward Inside the forensics world
      • arrow_forward The Ultimate Trivia Super (UTS) Challenge with Maiken Ueland
      • arrow_forward How to catch a killer with your own DNA
      • arrow_forward Crime, security and investigation
    • Medical and Biomedical Science
      • arrow_forward Beating COVID-19: Better Tests and Slicker Labs
      • arrow_forward COVID in the lab – Staying ahead of the curve
      • arrow_forward Curing Disease in Space with Dr Joshua Chou
      • arrow_forward Disease detection at your fingertips with Professor Alaina Ammit
      • arrow_forward Mending Broken Hearts with Cells and Bioprinting Technology
      • arrow_forward Is social media bad for us? With Dr Kristy Goodwin
      • arrow_forward Worms and honey, the unlikely heroes
      • arrow_forward Is social media bad for us? With Louise Remond

Science in Focus: Big Data

27 minutes
 
So, big data the future is here. Mainly the future is approaching very fast and faster than any of us can imagine, in fact. I’m a mathematician, as Alison said. Maths. Let me show you a picture of a mathematician. The author at work in his private study, aided by the isolator.
 
[Laughter]
 
Not outside noises; the worker can concentrate with ease upon the subject at hand. Actually, he’s probably thinking about something rather beautiful, like the shape of soap bubbles. But mathematicians can’t actually be like that all the time, and they’ve done a lot of really positive things. For example, on the positive side, you can think about the internet, computers and data, Google, Facebook, sequencing the human genome, tomtoms – in fact mathematics is infiltrating almost every area of our present life, and a lot of really positive things have come. But on the other hand, disaster movies sell well, so let me think – let me tell you some disaster stories. I want to talk about – the theme is really maths and data, how the two things can get together, what happens when you put them together. On February 25th 1991, the US Army base – I’m going to start with some rocket science – at Daran was hit by a scud missile. Twenty-eight people were killed and 100 hundred injured, and yet it was protected by the patriot missile defence system, and somehow one of Saddam’s scud missiles sailed straight through that. Let me tell you why that happened. They took computers think in base 2, and they took the number one-tenth and they wrote it out in base 2 numbers – 0.00011001100 repeating decimal 0011. They chopped it off after 24 places; then, they let the scud missile thing run to 30, the tracker thing, run for 30 hours, and by that time there was a two-second difference because of the chop-off after 24 decimal places, and so it got the position of the incoming scud missiles wrong; it sailed straight through the defence. It’s an example of where the people programming the thing didn’t understand the implications of the mathematical model. Another one, so 28 people died, 100 people were injured, despite the patriot impregnable defence system. Now, I’d like to tell you about Ariane. No, not that Ariane – this Ariane [background noise]. Decollage – it means ‘lift off’ in French. Ariane was an EU space mission. It blew up on June the 4th, 1996, because of an elementary mathematical error. What happened there was the tracking system had 24 digits and they’d only left 16 digits in the computer program where they wanted to store it. They stored it in the wrong place. As a result, 500 million Euros and a decade, over a decade of development, was wasted. So we have to be fairly careful when we’re doing maths and data together. Here’s another more practical problem about cyclones. In 2011, Cyclone Yasi hit Queensland. I’d like to show you what happens. So here’s the cyclone – it’s starting off near Fiji somewhere, and it’s coming towards Australia, and we’re trying to figure out where it’s going to cross the coastline there. And actually, 100 metres of difference at Fiji makes 200 kilometres of difference about where it crosses the coastline. You have to know this thing so accurately, and can you see it just crossing the coastline there? And a lot of damage was done at Port Douglas, which looked like this beforehand, and like that afterwards. If we could get better, more accurate data analysis, assimilation of the data into our weather models, we could have known where that was going to happen and evacuated Port Douglas before it happened. But it was just unpredictable. Computers, mathematics and data – what could go wrong? Is it a dangerous mix that we should avoid? Well, we can’t avoid it because it’s happening more and more the whole time. The last half century has seen an immense growth in computing power. It’s almost unbelievable. Are any of you old enough to remember the old disk drive? So a computer, this machine, has no brain – please use your own. Disk drive? Does that bring back a memory?
 
[Laughter]
 
People used to drive cars around like this in the United States. We’ve come a long way since the disk drive. In fact, every decade we get about a thousand times as much computing capacity. In 1965, Gordon Moore, who was the head of Intel at the time, suggested that every year we would get twice as much – we would need twice as many components into the computer, twice as many processors, which was good for the business for Intel, and he thought that might last for a decade. And that got translated into there’s twice as much information available. Well, actually, that has continued to be true every decade since.  So twice as much per year, two to the tenth is 1024 – call it 1000, round it a bit. That means every decade we get 1000 times as much information as the previous decade. People thought oh yeah, it’s just going to last up until 1975, but then in 1985 it had happened again; in 1995 it had happened again; 2005 it had happened again; 2015 they were saying ‘It’s starting to level off’ but it actually doesn’t show any signs of levelling off. So what’s happening is each decade, 1000 times more info is being pumped into the world. So we’re now measuring our information in something called petabytes. A petabyte is nothing to do with a pet; it’s actually a very large number – it’s 1000 to the fifth, or 1000 times five years, so that’s 10 to the 15th; that’s a one with 15 zeros after it. So it’s thought that – to give you some idea of the scale of that, it’s thought that the amount of memory in the human brain is about 2.5 petabytes, and one petabyte of average mp3 encoded songs would take about 2000 years to play, so that if you actually were somehow able to record your memory onto an mp3, then you could play it back to your ancestors – it would last about 5000 years. It could still be playing under the nth generation, and here is a dead person with their memory being played through an mp3. So that’s a lot of – if you think, just to give you a memory playing for 5000 years, we’ve got that amount – that’s a petabyte; it’s a huge amount of data. The next step is the exabyte, which is 1000 petabytes, and according to reliable estimates, we’ll have about 40,000 exabytes of data by 2020. That’s 5.2 gigabytes for every person on the planet. So what can we – what’s this about? Well, from the strategic point of view, big businesses are looking at this and thinking wow, money. So for big businesses, they’ve seen competitive pricing push their prices down, they’ve offshored everything to India, everything possible, even people sending their documents to be typed in India these days because people there are very educated and type at a lot cheaper. They’ve tried economies of scale, they’ve squeezed it to the limit, and they see that data, their dataset, is the one remaining thing they can make money from. So big data. Big data was a term invented by McKinseys. They were trying to sell their data analytics services to the US companies and to the US Government. So big data has now gained priority at the government level in Australia, in the UK, in the US, Europe. All governments are saying ‘We need to understanding this big data’. All businesses are saying ‘We want to make money from big data.’ ‘How can I monetise it?’ my friends say. What is big data? Well, nobody can quite tell you. It has the following properties: it’s got volume – that means there’s a lot of it; velocity – that means it comes in very fast; variety – that means it comes in not just on your annual profit and loss statement but from a lot of different things, like Twitter and so forth; and veracity – which means you’d better believe it baby. So these are the things that attract companies to be thinking about monetising their big data. It needs very large computers to store, and it’s costly to move from one site to another, which is good for the computer companies, and handling it requires the employment of people called data scientists. And we’re trying to educate some of those at UTS. In fact, here’s a plug for our new analytics degree which we’re offering for the first time this year and the second time next year – we’ll be training people to actually go out and make some sense of these big data things. Here’s a graph from the US Forbes Magazine, which demonstrates why companies using big data are making more money than others. So on the horizontal – on the horizontal scale, the x axis, we’ve got time in months, and this is the amount of money you might make if you invest something. With big data, they reckon that you can get the blue line amount. I’d like to show you the amount of money – these are millions of dollars up the axis there, and can you look at the difference between the top of the red graph and the top of the blue graph? Now the interesting thing is to look at the scale and the x axis. So after three years, you can see an incredible gap between the red line, which is without big data, and the blue line with big data. No wonder all the companies are rushing to do big data. So that’s with big data analytics and traditional data warehouse. So scientists and government agencies are also starting to use a lot of big data. Everybody’s using it. CERN – this is from the CERN computer – Centre de Calcule – and they are producing, when they were looking for the Higgs boson, they produced about 200 petabytes of data from more than 800 trillion collisions looking for the Higgs boson, which they finally found. That’s just a mind boggling amount of data. We’ve got the square kilometre array here in Australia, looking for very small and very rapid events in the night sky. They’ve gone off looking for very long events; they’re looking for short events in all the data they’re collecting. The human genome – that’s a picture of a genome somewhere. It is requiring a huge amount of data to analyse it. In the UK where I just was, the NHS is planning to save the genetic profile of every UK resident in order to have a better diagnostic health tool, so when you go to the doctor, they don’t just look at your general overall health thing; they look at your original genetic profile and they can prescribe the medicine that you need differently for each person. They need to save around 200 petabytes of data to make that happen. Closer to home, Woolies has reward cards, and every time you shop in Woolies, they are recording what you buy and then they’re directing ads to you, to your computer etcetera, to tell you what’s offering. There was a great story somebody told me about a guy who lent his rewards card to his daughter, and then he started getting little ads for baby products and he didn’t know that his daughter was pregnant until … 
 
[Laughter]
 
… he started getting these things. Anyway, it turned out she had bought some hand cream without perfume in it, suitable for ladies who are expecting a baby, so we have to be a bit careful with these data things. But data is coming, and everybody’s using it. People are thinking about the shape of data. So while we’re rushing headlong into large datasets, data science I think is still at a very early stage, and we really have to avoid Ariane-style disasters with our health systems. We don’t want to use the wrong algorithm, prescribe the wrong medicine. We’d ideally like our fathers not to discover that we’re pregnant by these processes. So let me talk a little bit about datasets. They can be incomplete, inaccurate, inconsistent or redundant. None of these are good things, but basically when it comes in with great velocity and volume, this is the kind of thing you have to face. Mathematical and statistical techniques can help fix them. Numerical analysis and interpolation can fill in the incomplete data. Probability theory can also be used to fill in gaps between data points. Statistical methods can remove inaccuracies and inconsistent data, and even geometry and topology can find and remove redundancies in data. In fact, many of the best ideas from maths over the past 50 years are now being used for data. We’ve all heard of the butterfly effect, where a small butterfly waving its wings can cause a cyclone – well, we just saw that with the tropical cyclone. A very, very small – it wasn’t quite a butterfly, but a very small variation over a rather short period can cause a huge difference. And fractals and chaos seem natural in many phenomena. So here’s an example of a fractal set – it’s actually a cabbage. The idea of a fractal is, if you look at it at large scale and then you look at it at smaller scale, you see nearly the same thing, and when you look at it at tiny scale, you keep seeing the same kind of structure. And you can kind of see with that, if you take the big picture and then you focus down on a smaller area, it kind of looks the same as the big picture. This is a very different way of thinking about data than thinking about a grid of points and interpolating between them, so we need to think about this. The brain actually processes data in a different way and it can see many scales simultaneously, and we need to actually develop this kind of idea about when we’re dealing with our data. So a data or a computer model can be understood as the discrete version of a continuous kind of reality, and that’s a way of trying to understand how that discrete version goes to something continuous. I’d like to tell you about some maths that is happening very recently, which gives us a whole new way of thinking about data. It’s called the maths of barcodes. No, not that kind of barcode, the commercial ones. Robert Grist is a mathematician from Penn State; here he is thinking about the blue-eyed islander problem. That’s a completely different problem that I’d love to tell you about some time. It’s the riddle of the islanders who committed mass suicide when they find out that their eye colour is the same. Anyway, it’s a great problem – Google it sometime. Anyway, he wasn’t thinking about that when he invented the theory of barcodes. Here’s a dataset. I’d like to show you what you can do with your dataset that’s a new way of thinking about it that’s got this kind of scale in variance and gives you a new way of thinking about data. So, the dataset might be the weights and the heights of the population of Sydney. Well, there’s more population, and that’s obviously a much larger set, but let me illustrate it to you with these smaller things. I’m going to do something like balloons – I’m going to blow them up. So I’m drawing little circles around them, and then I’m going to make those circles bigger and bigger until they just form one amorphous mass, which was what the last slide was. So can you imagine just blowing up each of these data points, just putting little circles around them, and these might be three-dimensional data with spheres around them, but anyway. So let’s see what happens – see how those ones there have joined up? There’s a little shape you can start seeing there. This is a circle there – there aren’t any other circles yet, but there’s a circle there. So I’m going to isolate those, that little circle there, and put it over there like a little, well, it’s some sort of shape, but like a closed section in my graph. Yeah? And now I’ll make them a bit bigger and I get that shape. If I keep going, I’m going to represent my datasets like this. See what I’m doing? Whenever they join up, I’ll draw a little line between them. And so I get a little shape like that. Now, that shape occurs just at the radius that I’ve blown them up. If I blow them up a bit more, I’ll get different shapes. And the shape will change as I blow them up a little bit more, yeah? And finally – and they’ll start not only making little cycles, but they’ll start filling right in, and then I’ll draw, I’ll fill in like that. So this is a way of thinking about datasets, but thinking about the scale changing at each point. Let me just keep doing it. And so I get these wonderful graphs of all the data all joined up – looks a little bit like that bubble pattern that the mathematician was looking at in the first place, doesn’t it? So his, Grist’s barcodes, is – I’m going to count the number of circles and the number of lines I get, so H0 is the zero, it’s called homology groups; it’s something you can compute about topological shapes. I’ll write a line for each of those, and my little radius of the circle, which is called epsilon – every mathematician has to have an epsilon in his talk – gets bigger and bigger, and then I see how many blobs there are, how many circles there are. Here there’s one three-dimensional thing with it all filled in. There are circles, so at various different stages of epsilon, I get different graphs, and so that dataset that I started off with is represented by these barcodes here. We’ve started with the dataset, we’ve transformed it into something else we can think about, and whether there are cycles or not tells you how well the data is representing the reality. It’s a really cool idea; it’s using some quite advanced maths, and it’s a completely new way of thinking about data. So people are doing this kind of thing. And so there you can see that, at the end stage, you’ve only got one blob left, because they’ve all joined up. At that stage, there’s still an E1. At this stage, you’ve got the big blob, but you’ve got a cycle or two, so etcetera. I’m trying to get the engineering faculty to represent this kind of thing in their data warehouse; I think I’m having some success, but it’s taking a little while to impinge on them. But this is a great, different way of thinking about data which uses some really sophisticated and fun maths, but which actually is going to tell you about how to handle your data. So people – my real point of doing this, or partly I have the joy of displaying to a general audience some really nice maths, but partly there are clever people thinking about how to analyse data better and how to use it better. So the human brain can use data in many more ways than the traditional approach of applied maths and computing, and computing scientists are really studying how the human brain processes data. Artificial intelligence – people are trying to mimic the brain’s approach to treating large amounts of data from varied inputs, and actually thinking about are there cycles in the data is kind of what your brain does, actually. So some of these methods and approaches that I’ve been talking about are key mathematical techniques to align with AI. I think that the next 20 years are going to see a complete revolution in what we can do with data, and it’s going to need increasingly sophisticated mathematical techniques to help us do it. Unfortunately, at the same time, there’s a kind of maths phobia happening in the west, certainly at the level of maths studied at school. We’re seeing fewer and fewer people taking four-unit maths; they’re all going down to lower and lower levels of maths. So there’s this schism growing up in society where people are afraid of it, and yet it’s being used more and more and more to generate our health, our shopping, our internet – everything is reliant on some perhaps reasonably sophisticated mathematical model. There’s been a lot of public debate about the sinister use of data, and a lot of this is really, in fact, about the mathematical algorithms, which are everywhere and which are [inaudible] behind what people are concluding about the data. I and my maths friends don’t actually worry, don’t share this sinister view, but then we have a feeling for how the algorithms work – we understand really what’s going on, or at least we do, and mostly we do, but I’m sure we can find out. We need to understand the algorithms and what their limitations are and how they’ll likely develop and what they’re likely to show. My conclusion is that actually we need a more mathematical society. We need more people to understand more of these things so that there’s less mystery about them. So it’s – this is a plea for you to tell your kids to do higher level maths at school, otherwise this thousand-times data coming every decade is just going to overwhelm us, and there will be very few people whole understand that and the rest of us will just be slaves to the whole industry, so that’s my kind of conclusion from this talk, and it’s been really fun presenting it to you. Thank you very much. 
 
[Applause]

 

7 December 2016 27:00

Tags: big data, digital data, maths, mathematics, applied maths, applied mathematics, data analytics, data analysis, data scientist, data analyst, mathematician

Have you ever wondered, when Coles or Woolworths email you their weekly specials, how they know what’s on your shopping list? Is this a coincidence or pure luck?

Neither I’m afraid. It’s a clever use of information and understanding of big data by businesses who want to better target their customers’ buying behaviours. Information is power, and more and more businesses are now recognising the opportunities that big data brings. In fact, Forbes recently listed ‘data analyst’ as one of the hottest job in 2016!

So, what is big data? What information is being collected? Does that mean my information is readily out there? What do businesses do with all this information? How do they store it? Is it secure? How can understanding data help solve problems? Or is data controlling our lives? 

About the speaker

 

Anthony Dooley

Professor Anthony Dooley is a mathematician and Head of School, at UTS Science’s School of Mathematical and Physical Sciences. Professor Dooley is passionate about communicating the usefulness and importance of mathematics to the public, and believes that mathematics can help us better understand the almost infinite amount of digital data around us. 

UTS Science in Focus is a free public lecture series showcasing the latest research from prominent UTS scientists and researchers.

Related video

The paradox of probability

Data

In this talk, Dr Stephen Woodcock takes us through some ‘probability paradoxes’ explaining how surprising, counterintuitive and often misleading results can arise. With so much data and information around us, understanding statistical models and their correct interpretation is becoming incredibly important.

Watch the paradox of probability video

 

Acknowledgement of Country

UTS acknowledges the Gadigal People of the Eora Nation and the Boorooberongal People of the Dharug Nation upon whose ancestral lands our campuses now stand. We would also like to pay respect to the Elders both past and present, acknowledging them as the traditional custodians of knowledge for these lands. 

University of Technology Sydney

City Campus

15 Broadway, Ultimo, NSW 2007

Get in touch with UTS

Follow us

  • Instagram
  • LinkedIn
  • YouTube
  • Facebook

A member of

  • Australian Technology Network
Use arrow keys to navigate within each column of links. Press Tab to move between columns.

Study

  • Find a course
  • Undergraduate
  • Postgraduate
  • How to apply
  • Scholarships and prizes
  • International students
  • Campus maps
  • Accommodation

Engage

  • Find an expert
  • Industry
  • News
  • Events
  • Experience UTS
  • Research
  • Stories
  • Alumni

About

  • Who we are
  • Faculties
  • Learning and teaching
  • Sustainability
  • Initiatives
  • Equity, diversity and inclusion
  • Campus and locations
  • Awards and rankings
  • UTS governance

Staff and students

  • Current students
  • Help and support
  • Library
  • Policies
  • StaffConnect
  • Working at UTS
  • UTS Handbook
  • Contact us
  • Copyright © 2025
  • ABN: 77 257 686 961
  • CRICOS provider number: 00099F
  • TEQSA provider number: PRV12060
  • TEQSA category: Australian University
  • Privacy
  • Copyright
  • Disclaimer
  • Accessibility