Skip to main content

Site navigation

  • University of Technology Sydney home
  • Home

    Home
  • For students

  • For industry

  • Research

Explore

  • Courses
  • Events
  • News
  • Stories
  • People

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt
  • Study at UTS

    • arrow_right_alt Find a course
    • arrow_right_alt Course areas
    • arrow_right_alt Undergraduate students
    • arrow_right_alt Postgraduate students
    • arrow_right_alt Research Masters and PhD
    • arrow_right_alt Online study and short courses
  • Student information

    • arrow_right_alt Current students
    • arrow_right_alt New UTS students
    • arrow_right_alt Graduates (Alumni)
    • arrow_right_alt High school students
    • arrow_right_alt Indigenous students
    • arrow_right_alt International students
  • Admissions

    • arrow_right_alt How to apply
    • arrow_right_alt Entry pathways
    • arrow_right_alt Eligibility
arrow_right_altVisit our hub for students

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Apply for a coursearrow_right_alt
  • Current studentsarrow_right_alt
  • Scholarshipsarrow_right_alt
  • Featured industries

    • arrow_right_alt Agriculture and food
    • arrow_right_alt Defence and space
    • arrow_right_alt Energy and transport
    • arrow_right_alt Government and policy
    • arrow_right_alt Health and medical
    • arrow_right_alt Corporate training
  • Explore

    • arrow_right_alt Tech Central
    • arrow_right_alt Case studies
    • arrow_right_alt Research
arrow_right_altVisit our hub for industry

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Find a UTS expertarrow_right_alt
  • Partner with usarrow_right_alt
  • Explore

    • arrow_right_alt Explore our research
    • arrow_right_alt Research centres and institutes
    • arrow_right_alt Graduate research
    • arrow_right_alt Research partnerships
arrow_right_altVisit our hub for research

For you

  • Libraryarrow_right_alt
  • Staffarrow_right_alt
  • Alumniarrow_right_alt
  • Current studentsarrow_right_alt

POPULAR LINKS

  • Find a UTS expertarrow_right_alt
  • Research centres and institutesarrow_right_alt
  • University of Technology Sydney home
Explore the University of Technology Sydney
Category Filters:
University of Technology Sydney home University of Technology Sydney home
  1. home
  2. arrow_forward_ios ... About UTS
  3. arrow_forward_ios ... Information on Faculties...
  4. arrow_forward_ios ... Faculty of Science
  5. arrow_forward_ios ... Partners and community
  6. arrow_forward_ios ... UTS Science in Focus
  7. arrow_forward_ios Data Analytics
  8. arrow_forward_ios The paradox of probability

The paradox of probability

explore
  • UTS Science in Focus
    • Environment
      • arrow_forward Coral reefs and climate change
      • arrow_forward The algae revolution
      • arrow_forward Rebuilding Australia’s Coral Reefs
      • arrow_forward Rebuilding the reef one year on
    • Data Analytics
      • arrow_forward The paradox of probability
    • Forensics
      • arrow_forward Death, decomposition and detector dogs
      • arrow_forward DNA, the silver bullet for solving crime…or is it?’ With Dr Georgina Meakin
      • arrow_forward The fingerprint detection (r)evolution
      • arrow_forward ‘Green’ ammunition and the future of gunshot residue analysis
      • arrow_forward Illicit drugs - are we fighting a losing battle?
      • arrow_forward Inside the forensics world
      • arrow_forward The Ultimate Trivia Super (UTS) Challenge with Maiken Ueland
      • arrow_forward How to catch a killer with your own DNA
      • arrow_forward Crime, security and investigation
    • Medical and Biomedical Science
      • arrow_forward Beating COVID-19: Better Tests and Slicker Labs
      • arrow_forward COVID in the lab – Staying ahead of the curve
      • arrow_forward Curing Disease in Space with Dr Joshua Chou
      • arrow_forward Disease detection at your fingertips with Professor Alaina Ammit
      • arrow_forward Mending Broken Hearts with Cells and Bioprinting Technology
      • arrow_forward Is social media bad for us? With Dr Kristy Goodwin
      • arrow_forward Worms and honey, the unlikely heroes
      • arrow_forward Is social media bad for us? With Louise Remond

Science in Focus: The paradox of probability 

24 minutes 24 seconds
 
I was going to open with the great maxim of always open with a picture of a man more attractive than yourself. That guy. Does anybody know who he is? He’s got a computer; he must be clever.
 
[Laughter]
 
Anybody?
 
[Audience member: Nate Silver?]
 
Nate Silver, thank you. Yeah, he’s been in the news a lot lately, obviously, has made his name on the back of the ’08 and 2012 election predictions. Not so accurate this time, but actually more accurate than many in the sense that whilst he was still predicting a Clinton victory, he was one of the few mainstream analysts that gave Trump a fair chance. Anyway, my point is not to do with his election predictions; it’s to do with his data science blog, 538. And in it, he’s done a meta-analysis of published scientific work, so this is peer reviewed, it should have been properly done, as to whether or not certain foods increase or decrease your cancer risk. So the further to the right you are, the red points are increasing your cancer risk; the ones to the left – the green ones – are decreasing. So what you can notice is apart from bacon being an increase in your risk and olives being a decrease in your risk, everything else is all over the shop, particularly you look at milk. Milk doubles your cancer risk, but it also halves it. 
 
[Laughter]
 
Tea and coffee are similar – basically there is an amazing lack of consensus on these things, and how can that be? If people are doing the experiments properly, we’ve got this data, we’re analysing it properly, in theory. How come? So again, I’m going to go on another diversion – it’s always good to come and listen to an expert talk and then hear somebody talk about something they know nothing about. I know nothing whatsoever about baseball, which is why I’m going to start by talking about it. The very little I understand about it is that they record a batting average for players, which is roughly – is exactly how many times somebody went into bat, and they hit the ball, got at least to first base, got a base hit, divided by how many times they went into bat. So over a period you can calculate a batting average. And you could think this has a probability – if I get a long list of every time the guy went into bat, I pick one of those things at random, all equally likely. The probability I pick the occasion he got a base hit, so at least got to first base, is equal to – the probability I pick an occasion he got to first base equal to their batting average. So a batting average of .250 means one quarter of the time, the player got at least to first base. So there’s a famous example involving two very famous teammates – allegedly two of the greatest players ever to play the game: Babe Ruth and Lou Gehrig, the New York Yankees in the early 20s. Lou Gehrig’s probably best known actually for the fact that he died of motor neurone disease, which a lot of Americans still refer to as Lou Gehrig disease. Have you seen the film Pride of the Yankees? Gary Cooper played him anyway – dig it out. But if you look at their batting averages over the first three years they were teammates, so in 1923 Babe Ruth’s batting average was .393, so 39.3 per cent of the times he went into bat, hit the ball and got to at least first base. So there’s the batting averages, so the question seems stupid and barely worth asking: who was the better batsman over that three-year period? Well, it seems obvious: in ’23, Gehrig was better; in 24, Gehrig was better; in ’25, Gehrig was better. So the idea of who had the better batting average seems trivial. But if I actually write down how those figures are calculated, so how many times they got a base hit divided by how many times they went into bat each time, you can see, for example – if I want to know the average over the period, I’m adding up the three numbers on top for each player and adding up the three numbers on the bottom – the total number of times they went into bat and the total number of times they got a base hit. So Babe Ruth’s figure would be 205 plus 200 is 405, plus 104 is 509 base hits out of about 1400. So be about that. So what you notice when you work out the average, what’s noticeable about that? It’s completely counterintuitive. Every single year, [inaudible] Lou Gehrig had the better batting average, but over three years, Babe Ruth had the better batting average. So if you keep them in the pairs, you get completely the opposite signal out of the data, and it’s the famous effect known as Simpson’s paradox. It’s when you’ve got paired or grouped data and whether or not you regard it as being grouped or not can completely change the analysis. Because I’ve just invented this data – it doesn’t mean anything. There’s a horizontal axis, a vertical axis, so there’s a whole load of data points down there. So you can see the general trend. So if I gave that to somebody to analyse, they’d sort of go ‘Oh yeah, I can see the trend.’ But somebody else might come in and say ‘No, no, no, that isn’t one dataset – that’s five distinct sub-groups; we should be regarding as five distinct sub-groups.’ So if we do that, if you look, those data points are exactly the same, but I’ve now grouped them as if they were five distinct sub-groups. So the first person, his analysis, he’ll say ‘Oh yeah, the trend’s going down.’ The second person will go ‘No, no, no, it’s five sub-groups, and in each of the sub-groups the trend’s going up.’ And this is basically what we got to with the baseball example. So part of the hype of big data everything is this idea that the dataset is king, and if we gather enough information and we just blindly chuck it with whatever algorithm, it will tell me the right answer. And that is what’s technically known as nonsense.
 
[Laughter]
 
Because whatever answer you get out of your analysis, whether that’s a mathematical model or a standard statistical test, there are assumptions made. Whether they’re explicit or they’re implicit in the model and you do or don’t understand the assumptions you’re making, they are there and the answer you get is an artefact of those. So they should be properly chosen. I mean, because more dangerously, regarding [inaudible] of baseball, if you think of like a new drug on the market – if there are different effects in male or female patients, you could have a drug which is less effective for male patients, less effective for female patients, but if there’s a big difference in the gender effect, you could assemble a dataset of the right numbers of males and females to show it’s better, even though it’s worse in both halves. I’m not suggesting that it’s done, but it could be done. And so you need to make very judicious choices before you start throwing these things into algorithms or standard tests, and believing it. Because the other way around – the other famous example of this, the same problem in reverse – is the UC Berkeley gender discrimination case. It’s often wrongly said it was a lawsuit; I don’t think it was. I think the university picked it up and sort of did the analysis to cover their own backs, legally. But I notice that their grad school had a much higher rate of offering places to male applicants than to female applicants, so this obviously had the 44 per cent male, 35 per cent female, is this evidence of bias? So they dug deeper and looked at this on the department-by-department basis, and the message couldn’t have been more different. Because if I just plot up – those are the males of the, sort of, shaded, the checks, the – sorry, the diagonal lines; and the more solid one is the female thing, so you can see there’s a much higher rate of offers being made to male applicants. So if I just look at this as standard, bog-standard test, statistical test for [inaudible] in two proportions, it would say yeah, very strongly based on the sample, very strongly yes, there’s a bias. But if it’s split by department, what you can actually see is of the six departments that were added together to make that, in four of them, A, B, D and F, females were actually offered a place at a higher rate, not a lower rate, and in the two where females were offered at a lower rate – C and E – it’s basically comparable. So if females being offered places at a higher rate in many departments and about the same in the others, how are they getting a lower offer rate overall? If you actually look at the numbers of applicants, there was enormous skews of applicants. I don’t know which departments these were; I would guess that A and B might well be engineering, physical sciences, things like that, which are traditionally male dominated. It was the male applicants were applying in large numbers to departments that were easy to get into – they maybe didn’t have enough quality applicants, whereas the largest number of female applicants was to department C, which of course had a much lower acceptance rate anyway, so if you ignored the effect of the department split, you would get completely the wrong picture here. And this is the dangerous thing whenever – I have no idea how those studies that Nate Silver dug up were done, but this idea that there is just a statistical test, or a standard machine when you keep turning the handle and the answer drops out, you might get a great visualisation; you might get a headline-grabbing paper that nobody’s ever thought of that effect before, but if you completely ignored and violated all the assumptions of your model, it’s crazy. So just returning to that, in this case I would argue that the UC Berkeley data is on the right and should be kept separate. Nobody changed the laws of baseball, and nobody changed the laws of physics regarding how a baseball moves between ’23 and ’25, as far as I know. So in the baseball case, you probably should be pooling them together, whereas in the UC Berkeley case, you probably shouldn’t. There’s all sorts of strong implications for a lot of how big data is gathered. People think of big data because they think of oh, Facebook – I’ll scrape data off Facebook or I’ll trawl through Twitter or I’ll buy some information from the Fly Buys database. These are obviously all wrapped up with artefacts o how that data is gathered and who it represents. Because it may well be that people who are willing to hand over their data to Fly Buys are a particular profile, or people who have very loose privacy settings on Facebook are another profile. I mean, if you got your data from Instagram, you’d believe the human race you’d believe the human race did nothing but photograph its own brunch.
 
[Laughter]
 
Anyway. So just another famous paradox again can get people caught in a tangle, where they think what they think they’re looking at isn’t necessarily what they’re actually looking at. It’s a deliberately [inaudible] fallacious argument. It supposes you’ve got three normal coins, three fair coins, so equal [inaudible] heads or tails. You flip all three, you don’t look where they land, your mate comes along, looks at them and he says ‘I could tell you at least two of them. I’m not telling you how they landed, but at least two of them are showing the same thing.’ So you kinda go well, at least two of them are showing the same thing, because there’s still one coin I don’t know. So two of them are matching, there’s one I don’t know, so the probability to the one that I don’t know matches is 50 per cent, so it’s 50 per cent chance that all three show the same thing. Because if I write out all the possibilities, and there are eight possibilities for how [inaudible], a heads or tails thing for three of them could land. So if I flip them, I could get – I could see young Lizzie Windsor three times, or I could get heads, heads, tails, or heads, tails, heads, or tails, heads, heads, or I could do the mirror of that, which would be those eight. So of the eight, the conditional information was my mate looked at it and he said ‘I could tell you at least two of those show the same thing.’ They always show the same thing – it’s useless information. You can’t flip three coins and get three different outcomes, so the conditional information is actually useless – it doesn’t tell you anything. So all you’re left with is the chance that – it doesn’t rule out any of those eight possibilities, but what it says is what’s the chance all three land the same? Well, there’s two possibilities of the eight, so it’s two out of eight, or one out of four, a quarter probability, not a half. Now, Galten was not an idiot; he obviously knew this. But the problem with the argument is people hear at least two coins, they start saying ‘Well, I’ll say that’s the first coin and the second coin.’ They’ve already imposed extra information that wasn’t in the question. The question is at least two coins landed the same. Them going that was coin one and coin two is changing the information – it’s a bias, and it’s an effect of an assumption that actually violates what you’re told. Because seeing a pattern out of several possible ones, and seeing the specific pattern after that one and after that one is not the same thing. If the question said your mate looked at the coins and said ‘Coin one and coin two have landed the same – what’s the chance all three landed the same?’ Then yes, the 50 per cent is correct, because if I said coin two and coin three, that would rule out all four options in the middle – the four in the middle of that eight – so I’d have two options left out of four, and two out of four is a half. But that is not what’s said. So that was Galten himself who sort of realised – this very impressive man; he was Charles Darwin’s cousin. As well as doing a lot of work in probability, he also did a lot of work in early forensics, looking at fingerprints. He also did a lot of work on how to brew the perfect cup of tea – very impressive man. He also did a lot of work in eugenics, but I won’t talk about that.
 
[Laughter]
 
I will talk about a great entrepreneur – one of the great sort of both industrial and business and scientific leaders of our time, Charles Montgomery Burns of Springfield. 
 
[Laughter]
 
Oh, they’re not moving [inaudible]. Oh, there they are – there we go. The famous thought experiment, the idea that having a monkey with a typewriter – if you got a monkey with a typewriter, the assumption is the monkey is not some great literary figure. The monkey will just randomly throw its hand at keys and make marks. Cue seven spacebar apostrophe [inaudible] whatever else – it’s just producing randomly selected keystrokes. But the point is once in a while, it will write something intelligible – it’ll produce a word now and then; might even produce a sentence now and then. That’s not that the monkey is somehow producing anything meaningful; it’s just chance. But the theory goes that if you increase the amount of time you’re willing to wait, and you increase the amount of monkeys, if you’ve got a billion monkeys working for a billion years, the probability that you haven’t ever reproduced anything meaningful is always, say, the work of Shakespeare – if none of the monkeys have produced the works of Shakespeare, that probability shrinks and shrinks and shrinks the longer you wait and the more and more peanuts and bananas you feed them. So the longer you wait, you’ve actually got this declining probability until, in the limit as you wait for an infinitely long amount of time with your infinitely large number of monkeys, the probability that you haven’t reproduced a work of Shakespeare tends towards zero, so you become tending towards absolute certainty that one of the monkeys has just banged out Othello. Now, it seems like a flippant exercise, but there’s actually a very relevant point, because even if a monkey does produce a work of Shakespeare, we shouldn’t go around hailing that monkey as a literary genius and waiting for its next master work. I mean, just because its happened to type now is the winter of our discontent, its next move is just as likely to be trying to chew off the spacebar, or climb a tree and throw its faeces at you – because it’s a monkey. It’s not going to sit there and kind of [inaudible]. It’s not going to do that. So we shouldn’t mistake a pattern which I was specifically looking for: I think monkey 37 is a genius, watch him for the next month; he’ll write a great book. That is a prior hypothesis that I would need a small amount of evidence to sort of support, versus whichever monkey does this first time, I’m suddenly going to celebrate them and assume that’s it. Because if you have enough monkeys and enough time it becomes certain that something will appear significant, even if the whole thing is just blind, dumb chance. And it’s surprising how rapidly things become certain when you get seemingly relatively small samples. So sometimes called a paradox – I deliberately haven’t used the word paradox here, because I [inaudible] in quite the same way. But the idea of the paradox, so looking at how likely people are to share a common birthday. So I’m assuming everybody’s equally likely to be born on any day of the year, and also I’m ignoring the 29th of February, because those people don’t deserve a birthday present – they make these questions harder. So nobody’s born the 29th of February here, are they?
 
[Audience murmur; laughter]
 
Really? Well, I’m not buying you a birthday present either. Anyway, so we cross Greg off our Christmas list as well. So what we’ve got is if there’s 365 days in the year, then if you meet a random person, you’ve got a 1 in 365 chance that you share their birthday. I have actually asked this in class and somebody said ‘What’s the chance that somebody in this room shares a birthday? One: it’s certain.’ Obviously it’s not – there’s only 20 people in the room. What’s the chance? ‘One.’ No it’s not. ‘Yeah, he’s my twin.’ That actually ruined it. So I’m assuming there’s equal distribution [inaudible]. So you’ve got just over a quarter of a per cent chance that one randomly selected person shares your birthday. So what happens if three people meet – what’s the chance that at least two of them share a birthday? Well, there’s three possibilities: person A with person B, A with C, and B with C, so it’ll be about three times as likely. And sure enough when I work it out, it is about three times as likely, because the first guy can take any birthday, the second person can take any one of the 364 remaining ones out of 365 to still be unique, and the next one can take any one of the remaining 363 and so on, so work that out and it’s just under one per cent. But if I enlarge the group to even as little as 23 people, well, the first [inaudible] you can see the declining every time somebody crosses off a different birthday, I have fewer and fewer unique ones left, so each person, if I’m going to keep unique birthdays, has one fewer option to pick from. So I’ve got this 364 times 363 times 362 times 361 over the top. But at 23, that comes out to more than 50 per cent. Twenty-three is the tipping point. If you get 23 people in a room with randomly distributed birthdays, it is more likely than not that two of them share a birthday. People [inaudible], but it’s very unlikely somebody shares my birthday – that’s one in 365. Twenty-three people assumes an amazingly small number of possibilities. If I actually plot up the graph, if I’ve got 10 people in the room, my chance that nobody shares a birthday is over 90 per cent. By the time I get to 40 people in the room, it’s less than 10 per cent. It drops off amazingly quickly, and by the time I get say 50, 60 people in a room, it’s damn nearly certain that people will have repeated birthdays. People think there’s 365 days; 50 days doesn’t seem like a big sample. But what they overlook is quite how many multiple comparisons they’re making. The number of pairs of 23 people – pairs of 23 people, you could choose any guy of the 23 to be the first one, any one of the remaining 22 to be the second one. So you’ve got 23 times 22. However, person A having the same birthday as person B is the same as B having the same one with A, so you’ve got to halve that number otherwise you’ve doubled everything up. But none the less, that’s still 253 options, so when there’s less than 400 days in the year, and you’ve got over 250 potential pairs, it doesn’t seem that surprising. And that’s what people miss with multiple comparison tests. If you simply don’t have a hypothesis and you’re throwing that many variables at it, you will see things correlate; you will see things come out by chance alone. And so if you throw a thousand variables at something, that’s half a million pairs that might, and though even though every single pair is exceptionally unlikely, you’ve got a hell of a lot of exceptionally unlikely things. It’s a bit like saying ‘Oh, I’ll never win the lottery – I’ve got a one in 10 million chance of winning the lottery. I did buy 8 million tickets though.’ Yes, every ticket is unlikely, but you’ve got so many tickets in the raffle it actually becomes reasonably likely. And this really is the problem with a lot of shock headline results. If you go in without a hypothesis, you really need to change your bar of what constitutes proof and what doesn’t constitute proof. I’m not a statistician; I’m an applied mathematician and probabilist and I know the need for multiple comparisons test – I don’t understand all of them; it terrifies me when I see people who understand less than me kind of going through some of these things – but what we need, we actually need to understand how to adjust that burden of proof in cases where we are doing multiple comparisons. There’s an excellent website, you may know it, run by a guy named Tyler Vigen, Vygen – not sure how to pronounce his name – where what he does, he gathers time series datasets just from anywhere he can find them; he’s got a bank of tens of thousands, hundreds of thousands, and he chucks them in to see which ones correlate with each other. And he produces such scientifically meaningful studies as that. Divorce rate in Maine …
 
[Laughter]
 
… correlates amazingly well. And both the number of letters in the winning word in the national spelling bee, and the number of people killed by venomous spiders peaked around 2004. Who knew? I don’t even know where to start with that one.
 
[Laughter]
 
I have no idea how you murder somebody with steam or hot vapours – I’d like to find out, but I don’t currently know. And my personal favourite is that – number of people drowned falling into swimming pools with films Nicholas Cage has appeared in in a year.
 
[Laughter]
 
I’ve seen Leaving Las Vegas – it’s not life affirming, but you don’t throw yourself into a pool afterwards. But nonetheless, the point of that is, it does correlate but is actually meaningless. It’s not useful as a predictor. I can’t use that trend and say ‘Well, actually I saw Nick Cage made 10 films this year; better start dredging the pools.’ There is no – whilst that is true, it is non-reproducible results, and research is not of any value if this is just one statistical quirk from one particular dataset falling out. Research is about ‘I’ve noticed this trend – this is meaningful; I can exploit this; I can gain new knowledge; I can gain new insights and use, because this is something I will see again and again if I do the same thing.’ It’s not saying ‘I’m waiting for a monkey to do this’. It’s saying ‘That monkey over there is a genius – he will do this again.’ So I just want to end with another bit from Charles Montgomery Burns and the monkeys. In the episode of The Simpsons where he has that bank of monkeys, he walks over to the first monkey he sees to see what it’s written. And he reads the monkey’s read it was the best of times, it was the blurst of times and he gets furious with the monkey for producing gibberish. But that actual genuinely – that genuine quote from Dickens’ Tale of Two Cities – it was the best of times, it was the worst of times –  I think this genuinely sums up my excitement and trepidation about data science. As Tony said, there’s amazing possibilities and possibilities that we as a society are going to have to exploit and exploit well moving forwards. But it’s also the worst of times, because there’s all sorts of marketing buzz about this, and I’ve seen online courses for learn to be a data scientist in two weeks. I mean, this is like me saying ‘I’m going to Google what is dentistry!’ and offer to do a root canal on you.
 
[Laughter]
 
That’s a serious offer – I’ll do it cash in hand if nobody tells the ATO. But it’s about the idea that these things can be learnt quickly, and data analysis is not about who has the flashiest visualisations. It’s not who has produced the most shocking result. People who wear red shoes on a Thursday are 70 per cent more likely to eat chips, or whatever. Meaningless. It might have happened in that dataset, but it’s not a real signal. So we need to sort of get towards this need for getting wisdom out of these datasets without being foolish about it – without just kind of being so determined to get a headline result, something flashy, that we’re actually just mining through meaninglessness, and again, pulling out the monkey that produced that and hailing him as a genius. And we need to actually have results that we can actually believe in and know are reproducible and are actually science. Now, I use that term very broadly in the sense that whether you work in the sciences, the social sciences, humanities, marketing, anything like that, science means knowledge. We need to actually know that what we’ve got is something that we can believe in and that it is scientifically and statistically valid, because otherwise we’re at the point of incredibility and incredulity and just disbelieving what we’ve got. And that’s a dangerous – well, maybe not dangerous, but it’s a complete waste of an enormous potential right here, so thank you. 
 
[Applause]

7 December 2016 24:24

Tags: big data, maths, mathematics, applied maths, applied mathematics, data analytics, data analysis, data scientist, data analyst, mathematician, applied mathematician, probability, statistics, odds, chances

Barely a week passes without seeing a headline proclaiming that some common food or behaviour is either associated with an increased or decreased health risk, or often both, in contradictory reports. How can it be that seemingly rigorous scientific studies can produce the exact opposite conclusions?

In this talk, Dr Stephen Woodcock takes us through some ‘probability paradoxes’ explaining how surprising, counterintuitive and often misleading results can arise. With so much data and information around us, understanding statistical models and their correct interpretation is becoming incredibly important.

About the speaker

Dr Stephen Woodcock is an applied mathematician whose research is motivated by a drive to develop solutions and models for solving real problems in both natural and engineered systems. He is a regular contributor to outreach programs to schools and very passionate about science communication. He frequently writes for The Conversation where he applies his skills to issues as diverse as traffic congestion, game theory to Footy tipping and cricket scoring and negative gearing.

He is currently working on a diverse range of projects including coral health, facial recognition software, fertility outcomes for chlamydia patients and modelling the physical development of elite youth sportsmen. 

UTS Science in Focus is a free public lecture series showcasing the latest research from prominent UTS scientists and researchers.

Related video

Big data: The future is here

Data

Information is power, and more and more businesses are now recognising the opportunities that big data brings. In fact, Forbes recently listed ‘data analyst’ as one of the hottest job in 2016. But what is big data? What information is being collected? Does that mean my information is readily out there? Is data controlling our lives? In this talk, Professor Anthony Dooley explains how mathematics is being used to better understand big data. 

Watch the big data video

 

Acknowledgement of Country

UTS acknowledges the Gadigal People of the Eora Nation and the Boorooberongal People of the Dharug Nation upon whose ancestral lands our campuses now stand. We would also like to pay respect to the Elders both past and present, acknowledging them as the traditional custodians of knowledge for these lands. 

University of Technology Sydney

City Campus

15 Broadway, Ultimo, NSW 2007

Get in touch with UTS

Follow us

  • Instagram
  • LinkedIn
  • YouTube
  • Facebook

A member of

  • Australian Technology Network
Use arrow keys to navigate within each column of links. Press Tab to move between columns.

Study

  • Find a course
  • Undergraduate
  • Postgraduate
  • How to apply
  • Scholarships and prizes
  • International students
  • Campus maps
  • Accommodation

Engage

  • Find an expert
  • Industry
  • News
  • Events
  • Experience UTS
  • Research
  • Stories
  • Alumni

About

  • Who we are
  • Faculties
  • Learning and teaching
  • Sustainability
  • Initiatives
  • Equity, diversity and inclusion
  • Campus and locations
  • Awards and rankings
  • UTS governance

Staff and students

  • Current students
  • Help and support
  • Library
  • Policies
  • StaffConnect
  • Working at UTS
  • UTS Handbook
  • Contact us
  • Copyright © 2025
  • ABN: 77 257 686 961
  • CRICOS provider number: 00099F
  • TEQSA provider number: PRV12060
  • TEQSA category: Australian University
  • Privacy
  • Copyright
  • Disclaimer
  • Accessibility