Facilitator: Since I'm here to talk about the beauty of mathematics, you're probably expecting me to talk about perhaps fractals or perhaps if not fractals, perhaps trying to explain to you how you can calculate Pi to a gazillion decimal places or some esoteric aspect of number theory, but in fact what I'm going to talk to you about is the field that I love which is a very practical part of mathematics which is probability in statistics.
So probability in statistics, particularly probability theory, has its origins in gaming and gambling, and in fact I've often thought that it's not that surprising that it's a field that I ended up with because my grandfather was a country vet and a racehorse trainer, and in fact I have very fond memories growing up of Saturday afternoons with my family listening to the racers. My sister and I - I'm sure it was quite illegal, but my sister and I both had radio accounts with the local TAB and we'd call in our bets. I don't think they knew they were talking to seven year olds, but we both had an account of about $15.00 and it'd go up and down and eventually it would run out. If we were good mum would give us a top up of our TAB accounts [laughs].
But if you think about it, particularly those of us of my generation, growing up what did we use to do particularly on rainy days. You play Monopoly, you play cards, you play dice, all sorts of games, so we actually grow up with all sorts of intuition about probability theory, and it's something that actually starts to make a lot of sense to us. In fact, it's a lot of fun.
We actually are all a lot better at probability theory than we really give ourselves credit for, and there's lots of fun party games. You can be the life of the party, even a mathematician can be the life of the party.
You're probably all familiar with the famous birthday problem. I don't know if people have heard of that one. So if you have even as few as 23 people in the room, the chance of at least two of them sharing a birthday is 50 per cent even with as few as 23 people. It's quite amazing.
Even earlier this week I was coming down from Brisbane on a Qantas flight and I was reading through the magazine and I turned to the puzzles page as I tend to do, and I found this really interesting puzzle. So I thought I'd pose it to you all here.
So the puzzle said the [Gelber] family is known to have two children. One day Mr Gelber is seen leaving the house with one of the children who's a boy. So the question is what's the chance of the other child being a boy?
Does anybody want to be brave and hazard a guess? [Graham]? Fifty/50, who thinks 50/50? Maybe, yeah, maybe half, maybe. Any other suggestions? Two-thirds, okay. 60/30, 50/40.
Facilitator: One over four, okay well let's see. Actually I've just realised to my horror I think I've got the answer wrong.
Facilitator: I've had the flu all week that's my excuse, but anyway the answer - yes I do have it wrong [laughs]. Anyway let me give you the logic. The answer's actually two-thirds, so somebody said two-thirds, very good. So how does that work? Well starting out there's four possibilities: boy/boy, boy/girl, girl/boy and girl/girl. As soon as we see Mr Gelber walking out with the one child who's a boy we get to rule out the girl/girl. There's three opportunities left and so the two sons, boy/boy - oh no, I was right - it's one of the three remaining possibilities so it's one in three.
It's a great little example, it's fun. As I said, it was in the [Qantas] puzzles section, but it gets you thinking and it's just fun and there's lots and lots of examples of this kind of little puzzle that you can play around with.
So as I was then starting to grow up and start to think about what did I want to do with my life, I was definitely drawn to probability and statistics because I often used to find as a child that the world could be a little bit large and overwhelming, a lot of chaos and sometimes it didn't make a whole lot of sense.
Probability and statistics appeal to me because it was a way, a framework of thinking that allowed you to find patterns in otherwise what could seem like chaos. So you were starting to make some sense out of all of the noise of the world.
I started out studying actuarial studies but I quickly found that the finance world wasn't igniting my passion. I wanted to do something where I felt like I could really connect to something that would make a difference for people. I gradually found myself ending up in the field of biostatistics. So biostatics is all about using the tools of probability and statistics to try to make sense of health data and to better understand the factors that influence human health.
So I spent some time as a clinician trial statistician. The kind of things we do there are working with the clinicians to design study so we can figure out what's the best treatment for somebody who's got such and such a disease, but after a while I found myself really wanting to understand what are the things that cause a person to get the disease in the first place.
So I gradually gravitated towards the field of environmental health, and it's an area where it's - I'll show you a couple of examples in a moment, but it's an area where you really need quite sophisticated thinking around probability and statistics because there's lots and lots of noise. The world is a complicated place. Quantifying environmental effects is complicated and you do need those cutting edge tools. So I want to take you through a few examples and then hopefully after the session's over we can talk about them in a little bit more detail.
So this next picture it's quite a difficult picture to look at but this is a very famous picture called Tomoko in Her Bath and it's an award-winning photo by a photojournalist W Eugene Smith. The family had actually approached him - and this is from Minamata in Japan - the family had approached him and offered to make themselves available for this photojournalism because they wanted to bring to worldwide awareness the really horrendous health effects that happened when mothers who were pregnant were exposed to high levels of methylmercury due to toxic chemical spills in Minamata Bay in Japan.
So Tomoko she died when she was 21. I think about - it must have been about 20 or so years ago now, but it was really very generous of the family to make these photos available and very touching and the look between the mother and the child is incredibly touching.
But this is typical of what happens in environmental epidemiology where there'll be some large poisoning and where there'll be an environmental exposure, a large contamination, where it becomes really obvious that the exposure is dangerous and high level exposures cause these horrendous health effects.
But then the more subtle question which it takes a long time and a lot of work to answer is we know high levels of methylmercury exposure very dangerous, but what about the more low level chronic exposures that we're actually all exposed to.
Mercury is a by-product of combustion, so because we're in a world that depends on fossil fuels, there's power plants all over the world burning fossil fuels and emitting lots of toxic exposures including mercury. So that gets into the atmosphere, gets into the waterways, it works its way up through the food chain and particularly those of us who like to eat fish and I think a lot of us do, we probably all have low level chronic exposures to methylmercury.
So the big questions is what is the danger associated with those low level chronic exposures, and it's a really tricky question because in trying to come up with guidance about what to do about those low level exposures, the Environmental Protection Agencies they have to weigh a lot of considerations. You can't just shut down all the power plants, we need power. You can't say to people don't eat fish it's dangerous because fish is good for you, even for a pregnant woman. It's quite a heady thing to be able to say to women no you shouldn't eat fish while you're pregnant. Probably some fish is good, but you have to be careful about which fish, how much fish and so on, so getting that balance right is quite difficult.
So I was really fortunate. This is one of the most enjoyable stimulating experiences of my career I think was working on this National Academy of Science's report on the health effects of methylmercury. It was a great experience because was the National Academy does is they pull together a group of experts, so maybe it's 15 or so people, and you've got people who are expert environmental health people, people who know all about mercury and environmental emissions, people who know about neurology, people like myself who might be a biostatistician.
You've got this interdisciplinary group of experts who come together and meet over a course of maybe a year every couple of - every few weeks they meet and basically pull together all the information, integrate their knowledge and come up with a recommendation, write a report, and that then goes - in this case it was the Environmental Protection Agency in the United States. They were really trying to figure out what should be our guidance around the health effects of low level chronic exposure to methylmercury.
So my role in this committee was to try to take some of the available data and try to make sense of it, and it was a really interesting context because in those days there was two big studies: one was from the Seychelles Island and one was from the Pharaoh's Island. There was two very opinionated investigators who'd run these two studies. They were both really good studies, nice big studies, well designed, really careful measures of environmental exposure and everything, but one study found an effect. They used this magic p < 0.05 means yes there's an effect; p > 0.05 means no effect. So one study found the effect and the other study didn't.
These two investigators were so convinced that they had the answer, they couldn't even be in the same room together. We tried to have a workshop to try to hear from all the investigators. These two wouldn't even agree to attend if the other one was attending, that's how difficult the relationship had gotten between them.
So my job was to try to make sense of it from a quantitative perspective. So here's the story if I translate it into statistical terms. This is just a graph showing from the two studies, the Pharaoh's Island study and the Seychelles Island studies. There was also a smaller study from New Zealand. This dotted line here is the no effect line. What you see over here is the Pharaoh's Island study and you can see that they were estimating a negative effect, in other words as the level of methylmercury increased the IQ of the kids, or it was a measure of IQ in young kids, the IQ went down, so that's why we're getting a negative effect. The more methylmercury the lower the IQ.
So the Pharaoh's Island was estimating a negative effect. These vertical lines is what's called the confidence interval. If that confidence interval crosses zero it means you can't rule out no effect until you conclude there's no effect.
The Pharoah's Island study, the confidence interval just excluded zero so they're like yes, p < 0.05, we're getting a significant effect. Here's the Seychelles Island; they actually got a very similar effect, but their confidence intervals were a little bit wider. They excluded zero so they said oh p > 0.05, no effect. Then New Zealand it was a much small study, the confident interval was much wider.
So my job was to try to integrate all of this information. When I looked at it I thought well, you know, actually the two studies are not that far apart. The information there is actually very similar. They're pointing to a similar conclusion. So what we ended up doing was we built a larger model that had all of the separate studies as special cases within this larger model, and furthermore we pulled in additional data, so we pulled in not just the IQ tests but we also pulled in some other end points that measure young children's developmental effects.
So we looked at something called the Boston Naming Tests and the California Verbal Learning Tests, and what you see here is that when you put it all together you can see that it's all pointing to a fairly similar overall impression. So when we did this thing it's sometimes called a meta-analysis. When we did this analysis pulling together information from the three studies, different kinds of developmental outcomes, we were able to say with a reasonably high level of confidence that yes, there is this adverse effect of methylmercury exposure.
The Environmental Protection Agency was then able to use our report and they in fact used that information to impose much stronger standards on the coal burning power plants and force them to lower their emissions, because we were able to quantitatively show that yes, these higher levels of methylmercury do result in detrimental effects on the kids.
Now which one is this one? Okay, next example I wanted to show you, this is not a diamond although it looks a little bit like a rough diamond I suppose. This is an example of arsenite which is a - it's arsenic which is - we always think of it as a poison but it's actually a naturally occurring mineral. It occurs fairly naturally particularly in mountainous areas in the world, fairly deep down and so on, so we don't normally come into contact with it except maybe if you're living next door to a big mine perhaps; you might get some arsenic exposure.
But we know that when you've got very high levels, acute doses of arsenic - we've all seen that movie or maybe you haven't actually, you're all too young to have seen the movie, Arsenic and Old Lace, where it's a good source of if you want to knock off your next door neighbour or your associate in research or something like that. It's a good fairly subtle way to do it. So we know that high levels of exposure are bad, but it's the same kind of thing that happens. What about those chronic low level exposures, that's what we don't quite know about.
Here's another horrible picture. Sorry, but luckily the resolution's not quite so good on this one. People started to realise that arsenic in the more chronic exposures was dangerous around about the mid-20th century when increasing populations in developing countries such as India, Bangladesh and so on, increasing populations led to a need to access more sources of water.
Anyway, so what would happen is that the villagers would put in what's called a tube well and it's a very simple well. They basically just dig a big tube down till they access the water table and access that water. Now the trouble is some of the places that they were doing this when they dug down deep enough to access the water, they were tapping into some of these rocks that had high natural levels of arsenic. So all of a sudden you had these populations that were starting to get chronic exposures to arsenic. Not enough to kill people immediately but after a while they started to see some of these effects, such as this example here of hypokeratosis and all sorts of effects that were impacting on people's extremities and it could get quite nasty.
But somebody - there was a famous researcher in Taiwan, CJ Chan, who started studying this is a Taiwanese population. What he also started to notice was that these people were also getting cancer. That's a little bit harder to study because cancer has a long latency period and so on. But he did some really groundbreaking work and showed that these villagers that were digging these deep tube wells to access the water not only were they getting these sorts of symptoms, but they also had much higher rates of cancer than the general population.
So once this information started to emerge, people realised that first of all there was a really acute health problem in countries such as Bangladesh, India, Taiwan that were using these tube wells, but also countries such as the US which maybe didn't have quite the same level of concern because the levels were lower, but they still had the question of what is the impact of these chronic low level exposures to arsenic. There's lots of small water companies in the US that were providing water, but not necessarily having the same stringent requirements for water safety as the big metropolitan water companies.
So again the Environmental Protection Agency in the US was really struggling with the question of what should we specify as a safe level of arsenic in drinking water, and it got complicated because it becomes political quite quickly because the water companies they don't want stronger standards because it costs money to lower the levels of arsenic in drinking water. You can't say we don't want any arsenic because it's pretty much impossible to get it out of the water. So what you have to do is try and find an acceptably safe level. So that's where somebody like me comes in.
I was again part of a National Academy committee where our responsibility was to try to look at all the available information, try to quantify the effects and make a recommendation to the Environmental Protection Agency.
So at some level you might say well look how hard can it be, do a regression, do a linear regression or something like that, but it's actually quite complicated. Just to give you a little bit of a flavour of the kinds of things that we had to do, what this graph shows here, it shows as a function of the concentration of arsenic in drinking water, this is micrograms per litre, it shows the excess lifetime risk of cancer.
Keep in that usually the Environmental Protection Agency they consider a safe level - quote safe level - as a level that doesn't increase the risk of - lifetime risk of cancer by more than one in a million. So we had risks here that were 10 and 12 per cent so that's a massive risk. This was based on the Taiwanese data extrapolating back to the US data.
So my job was to think about how do I fit a dose-response curve here to this data, but I had to try to take account of all sorts of uncertainty issues that arose. So for example, we didn't really have good data from Taiwan; all we knew was how much the arsenic levels were in the wells in each village. We didn't have a survey that said Lisa how much water do you usually drink in a day, Graham how much do you drink and take it into account exactly how much water each person drinks. All we knew was this person lived in a village where the average concentration of the wells in those villages is say, 75 parts per billion or something like that.
So there was all of that uncertainty. We didn't really know how much people were getting exposed to. The other thing is we were trying to extrapolate from a Taiwanese population to a US population. The Taiwanese population it was a rural area, people were out in the fields working hard each day, probably drinking a lot more water than the typical US person might be who might be going to an office and drinking lattes instead of drinking water.
The other thing that was one of the things that I got particularly interested in was this whole question of what we call model uncertainty, because remember - well actually I haven't told you, but this goes from zero, 500,000 and so on and this was the range of the doses that we saw in the Taiwanese villages. The World Health Organisation was 10 parts per billion; the US standard was 50 parts per billion. We're talking way down this end of the curve.
So what turned out to happen - and this was something that I got very interested in. I had a PhD student do her thesis on this and we ended up publishing quite a few papers on it - is that which particular shape of dose response you choose really impacts on the decision that you make. If I choose a curve that's on the log scale you get one answer, if I choose a curve that's on the linear scale you get a different answer, and there's all sorts of curves you can choose so how do I make those decisions?
So that was just one of the sources of uncertainty that we had to try to make sense of, and this particular graph here what it shows it's using a technique called Bayesian model averaging and what it's doing, it's the same kind of principle in a way as what I was talking about before: it's building a big model that's got special cases, all of the particular models that you could consider. So you build this big meta-model that considers as a special case the log model or the linear model or the this model or the that model.
So what this green part of the curve shows you is the uncertainty that goes along with not knowing which is actually the correct model, because getting the models right down at these really, really low dose levels is an extremely hard problem and it's something that a lot of people have worried about.
So that's just another of the examples that I've worked on where we've been able to contribute to the practical question, but also push the statistical methodology as well by trying to come up with these methodologies that allow you to quantify these kinds of uncertainties.
I've got one more problem to tell you about and this is a little bit closer to home, and this is a picture of coal trains passing. Obviously this is a loaded coal grain going down from the mines up in the Hunter Valley down to the port down in Newcastle. This is the empty coal train going back up to fill up with coal again to get it back down to the port.
So this was a project that's actually an ongoing project. We were approached by the New South Wales Environmental Protection Agency to help out with some analysis of some data that had been collected at the request of a citizens' group in the Hunter Valley area where they were very, very concerned about some of the health effects associated with coal dust that they were being exposed to because of the high volume of trains going backwards and forwards through the Hunter Valley region.
So they had done a study or gotten a study done where they had monitors of air pollution levels - it was called air particle or total suspended particulate, PM10, PM2.5, so different sized particles in the air - and they had it measured over a two month period every six seconds. So if you can multiply that out, you can probably do it more quickly than I can, that's a lot of data.
So what they also had was in addition to the particulate levels being measured every six seconds, they also had data on was there a train passing, what kind of train, was it a passenger train, was it a coal train, was it a loaded coal train, was it a freight train and there were also a few unknown kinds of trains. We knew how fast the train was going, how long it took to go past, we knew what the wind direction was, how fast the - what the speed of the wind was, and the question was is there a higher level of air pollution when the trains are passing compared to when they're not passing. Also, if that's the case, is there a higher level of air pollution associated with the coal trains compared to the other sorts of trains. Fairly simple questions, but once you start to dig down into things a little bit more it gets complicated.
One of the first things we did of course was just to look at the train data and I have to say I was absolutely astounded at how many trains there are passing through this one monitoring station in the Hunter Valley. So what you can see here is that any typical day you get anywhere between 27 and 47 empty coal trains, 24 to 41 loaded coal trains, anywhere between 55 and 90 or so passenger trains, a few freight trains and a couple of unknowns and that's a lot of trains on a particular day. I had no idea there were so many trains.
The durations are interesting as well. You can see the loaded coal trains. They come past really slow because they're really heavy; they've got lots of coal in there. The passenger trains - the passenger trains that I'm on never seem to quite go this fast but these ones seem to go very fast. They go by in like two seconds. I think this particular site is at a point where they're not stopping for a station or anything so they just zoom past.
So they go past quite quickly and you can see it's also reflected over here in the speeds. The loaded coal trains are slow. The empty coal trains are much faster so they go past in a shorter period of time, and so on. So that was our information about the trains.
Here was some of our data. I'm not quite sure how well it shows up at the back, but first thing we did was just to try to do some visualisations. What you can see here this is just for one particular period from about three o'clock till about eight o'clock in the afternoon, and we've got here - we're showing on the log scale because the data were very spiky - this is the total suspended particulate. The black dots show the particulate levels when there's no train going by; the green dots, here you can see the various green dots, various places there, that's what happens - that's the particle levels when there's a loaded coal train going past; the brown dots here is freight train; the blue is an empty coal train and so on.
So what we had to try to do was tease this out and get the signal from the noise and figure out what is the extra particle exposure levels associated with these train passings, but you can kind of see as you look at this, you can see the complexity in the data. It's not just nice obvious pattern, there's all sorts of things going on. So for example, you see that there's a lot of change throughout the day.
Now we thought well maybe this is the kind of effect that you see every day, but in fact if you look at the different days each day is fairly unique and you see different sorts of patterns. So these are the kind of things that might be explained by say, was it training, was the wind coming in this direction or that direction, so we had to try to build a model that allowed us to try to take account of all of those things, and that's where it gets quite interesting from my perspective from the maths side.
So it's been an interesting and an ongoing project. It's been great because it's been helpful I believe to the EPA in terms of they're trying to define the next steps in terms of putting in place the right kind of standards to make sure that the Hunter Valley health considerations are appropriately taken into account and that the mining companies put the right protections on in terms of the coal dust coming from the trains.
But we've got next steps going, we're trying to get some more data, but also it's a really interesting thing trying to think about what is the signature if you've got a train coming along. It's not like okay here's the train, the level just jumps up and stays even like that. We've all had that experience where you're standing on a platform and you're waiting for your train and a freight train comes by. You're standing there and you feel this rush of air as it's coming towards you, and then you feel the air stirring up, stirring up and then the train comes past and you step back because you're afraid you might get sucked into the vortex as it goes past you.
It's a very complicated turbulent process of air as it goes past, so we're interested to try to see what we can do to characterise how that changes over the course of time as that train passes, but it's quite complicated.
I have a student now, he might even be here. I'm not sure if Alan's here or not, but we have an honour student who's working on this. He's a UTS honour student who's doing this for his thesis and trying to figure out what is the best way to analyse this kind of data. So it's turned out been very satisfying that it's something where we can answer a real world question, but we can really have some fun doing some nice maths at the same time.
So I've got one more slide to finish up and that's this one, where I just basically want to I think try to capsulate what I think I do as a statistician, implied statistician. I think one of the things I love about being a statistician is that we get to collaborate a lot. One of my heroes, statistical heroes, John Tukey, who died a few years ago, very, very talented statistician who actually started out as a chemist, but one of the things that he said was the great thing about being a statistician is that you get to play in other people's backyards.
So you get to dabble in all sorts of interesting, very interesting, topics, so you get to collaborate and work with subject matter experts. You design the studies, you find the patterns in the data, you come up with interesting models, you quantify the uncertainty. Then hopefully you take all of this chaos and all of this noise and all of this confusion and hopefully you put it through these methods, these models, these analyses and you come out with insight that will hopefully then impact on the real world, make the world a better place, change policy and contribute to general knowledge, not only in that subject matter area, but also in terms of the mathematical sciences as well, the statistical methods that we use. It's that duality of solving real world problems but getting to play with the maths and probability theory that we love at the same time.
So I'll finish there, thanks very much.
26 June 2014
Despite being a rare condition, motor neurone disease (MND) has a relatively high profile owing to famous sufferers such as Professor Stephen Hawking. Yet, in over 90% of cases, we don’t know what causes MND and there is no cure. For some time, scientists have observed that exposure to blue-green algae is linked to increased incidence of MND, but the reason for the link has been a mystery – until now.
A recent discovery from UTS scientists, in collaboration with a team of US botanists, has pointed to a role for a common toxin found in blue green algae as a potential trigger. In this talk, Dr Rachael Dunlop describes how thinking outside the box and taking an unorthodox approach lead to this breakthrough. The search begins in the jungles of Guam, heads to the deserts of the Middle East, crosses to baseball pitches in the US and finally arrives in a lab in Sydney. This is a tale of blue green algae, dementia, coconut fruit bat soup, Hollywood and how a ubiquitous “silent killer” might be stalking us all.
Test Tags: medical science, motor neurone disease, blue-green algae, toxins
About the speaker
Dr Rachael Dunlop
Rachael Dunlop PhD FSB, is a postdoctoral fellow in the School of Medical and Molecular Biosciences. She has an interest in motor neurone disease, in particular the role of blue green algae in triggering the disease.
UTS Science in Focus is a free public lecture series showcasing the latest research from prominent UTS scientists and researchers.
Dr Dominic Hare and Dr Blaine Roberts examine how cutting edge analytical technology is providing new insight into how the role of trace elements in normal physiology is being applied to studying devastating diseases in humans.
Professor Liz Harry discusses the secret lives of bacteria. She explains the vital role that bacteria plays in sustaining life on earth and the latest research that is being done to find solutions to this serious threat to human health.