Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google Flu Trends Gets It Wrong Three Years Running

timothy posted about 5 months ago | from the coughedword-coughedword dept.

Stats 64

wabrandsma writes with this story from NewScientist: "Google may be a master at data wrangling, but one of its products has been making bogus data-driven predictions. A study of Google's much-hyped flu tracker has consistently overestimated flu cases in the US for years. It's a failure that highlights the danger of relying on big data technologies.

Evan Selinger, a technology ethicist at Rochester Institute of Technology in New York, says Google Flu's failures hint at a larger problem with the algorithmic approach taken by technology companies to deliver services we all want to use. The problem is with the assumption that either the data that is gathered about us, or the algorithms used to process it, are neutral. Google Flu Trends has been discussed at slashdot before: When Google Got Flu Wrong."

cancel ×

64 comments

Error, Error, Excuses (1)

Sponge Bath (413667) | about 5 months ago | (#46478025)

Google Flu: That unit is defective. Its thinking is chaotic. Absorbing it unsettled me.

Counterpoint (0, Flamebait)

Anonymous Coward | about 5 months ago | (#46478033)

Google aren't "masters of data wrangling", they're masters of PR: advertisers think all that data will help them shovel their shit.

Big Data Fail (2, Insightful)

Anonymous Coward | about 5 months ago | (#46478045)

Not siprising, most analysis on huge data sets is incorrect, that's why the NSA thing is scary! They get it wrong and you end up with a missile through your window! Oops...

Re:Big Data Fail (0)

Anonymous Coward | about 5 months ago | (#46478123)

No, they use big data to find your cell phone and send a drone to destroy the holder, who happens to be an innocent unconnected child at the time...

Re:Big Data Fail (1)

Luckyo (1726890) | about 5 months ago | (#46478247)

No such thing any more. If it's a male, he's a terrorist and legitimate target nowadays. If she's a female, she's terrorist's family member and future suicide bomber. Win/win.

Re:Big Data Fail (2)

Tough Love (215404) | about 5 months ago | (#46478711)

An excellent example is Li's copula, [wired.com] widely credited for triggering the 08 financial crisis.

Re:Big Data Fail (2)

u38cg (607297) | about 5 months ago | (#46480867)

No, Li's copula, nor any other bogeyman formula(B-S, etc), was not to blame. Failure to understand and interpret models, assumptions and limits was.

Re:Big Data Fail (1)

BitZtream (692029) | about 5 months ago | (#46482431)

The failure was the morons who thought didn't stop to think 'if were all making so much money, who is losing it?'

And then someone did. And thats when all hell broke lose, when someone realized that they were about to default on a bond, which uncovered the whole mountain of failure across the industry and a few others that it took out along the way!

Re:Big Data Fail (1)

Tough Love (215404) | about 4 months ago | (#46488849)

Exactly. The same as abuse of big data. My point.

I dn't thin it takes into accout (1)

geekoid (135745) | about 5 months ago | (#46478091)

action into place far showing the data.
You can see a trend and make a forecast. Then take action to slow the trend based on the forecast and then the prediction will be wrong.

Re:I dn't thin it takes into accout (4, Insightful)

rmdingler (1955220) | about 5 months ago | (#46478289)

You can see a trend and make a forecast.

Agreed. Very similar to a weather forecast, but without the hundred odd years of daily data to study and manufacture predictive models on.

It is, however, necessary and noble research... they'll just need more flu seasons under their belt to tweak the variables.

Re:I dn't thin it takes into accout (2)

TapeCutter (624760) | about 5 months ago | (#46479661)

Weather forcasts are NOT based on trends found in any data set, they are based on the laws of physics and chemistry, they use the same "finite element analysis" techniques found in numerical wind tunnels and other engineering models that are used to build everything from bridges to aircraft. Archival data is used to test the "skill" of the model by making "hindcasts" and comparing them to the instrumental record.

Climate is basically the long term statistics of weather - meaning a hundered year trend in temprature is a climate statistic, not a weather trend. Climate models and weather models are more or less the same thing using different spacial and temporal parameters, climate statistics such as temprature trends are a completely seperate line of evidence to climate modelling.

Re:I dn't thin it takes into accout (3, Insightful)

khchung (462899) | about 5 months ago | (#46480083)

Exactly, the correct comparison should be "technical analysis" in stock markets, which can be applied to any stock you like with the same level of (un)success.

Without an underlying theory of how things work, which also needs to be somewhat correct, trying to predict future trends simply by using past data is just dumb curve fitting - with a curve of enough degrees of freedom, you can fit any data, but that doesn't mean its prediction would be any better than random guess.

Re:I dn't thin it takes into accout (1)

akozakie (633875) | about 5 months ago | (#46481429)

...which hasn't stopped anyone from using it - rationality is for the weak. We're wired for "eureka" moments - the curve fits so well, it MUST be right!

OTOH, technical analysis is also not a very good model of this, because economy is not a good model of anything in the real world due to an exceptionally strong positive feedback loop between the model and the modeled. A successful technical analysis "method" (meaning it worked for someone, that's statistically probable no matter how stupid the method is) may become actively used by enough investors to change the actual process, becoming a self-fulfilling prophecy. A given shape may have no meaning as such, but if enough people believe it signifies a change of trend... their reaction may cause the trend to appear, at least in short term.

In case of flu the feedback, if it exists at all, will be negative - overestimation may increase popularity of the vaccine, resulting in reduced number of cases. The reverse is also true, low estimates decrese the willingness to vaccinate. That of course completely ignores the question of how effective flu vaccine really is - we have to predict which strain will dominate; a miss would weaken the feedback.

Re:I dn't thin it takes into accout (0)

Anonymous Coward | about 5 months ago | (#46484303)

No. Technical analysis doesn't work because any patterns identified (i.e. "head and shoulders") are immediately traded on, which tends to drive the trend out of the market. Hence the "efficient market" hypothesis.

Identifying trends in the flu won't change the trends themselves, at least not to the extent that buying and selling on a trend influences the stock market.

Re:I dn't thin it takes into accout (1)

MikeBabcock (65886) | about 5 months ago | (#46482299)

It would seem to me that you underestimate how complex weather systems are.

A farmer who says "when this happened last, the weather did this next" is more likely to be right than the guy who tries to model atmospheric pressure changes in a chaotic system.

Sure, some people rely entirely on models, but good weather forecasting often involves both. How do we know what a strong north-west pressure system will do? The best answer is "what did it do last time" not "lets model it to death."

PS, this is also how medicine works, which has a lot more science in it than weather forcasting -- historical comparison is an incredibly valuable insight in chaotic complex systems.

Re:I dn't thin it takes into accout (1)

aldousd666 (640240) | about 5 months ago | (#46479699)

While I don't think they use Monte Carlo modeling for the weather, you do have a point about it being a young thing. The gist of TFA is a little silly though... sure someone (google) is using a faulty algorithm to grok epidemiology data... and? That's about the only conclusion we can draw. Doesn't mean 'the technology' itself (ie big data platforms, the techniques applied to it -- graph traversal or map reduce, etc) are actually 'bad things.'

Big Models (-1, Flamebait)

jamesl (106902) | about 5 months ago | (#46478095)

It's a failure that highlights the danger of relying on big data technologies.

Or big models. Like climate models. The ones that have predicted a warming climate for the past 15 years while the climate has not warmed.

Re:Big Models (4, Informative)

geekoid (135745) | about 5 months ago | (#46478159)

Yes it has warmed of the last 15 years, you moron.
You statement has been shown false many many times. Please stop.

Re: Big Models (4, Informative)

Anonymous Coward | about 5 months ago | (#46478229)

He's not a moron, he's probably a republican; they have figured out that if you constantly state lies as facts then many people will believe them. It is the second best thing in politics after money, which is why republicans are currently having so much success at ruining America.

Re: Big Models (0)

Anonymous Coward | about 5 months ago | (#46478353)

Wait, how does that differ from Democrats exactly?

Re: Big Models (0)

Anonymous Coward | about 5 months ago | (#46479765)

He's part of the Democratic Party's fan club. Naturally, his team can do no wrong, and the other team are all monsters.

Re: Big Models (0)

Anonymous Coward | about 5 months ago | (#46480095)

Wait, how does that differ from Democrats exactly?

They tell different lies.

Re:Big Models (1)

jamesl (106902) | about 5 months ago | (#46478591)

I'll respond to you and hope that your friends below manage to find it.

On all data sets below, the different times for a slope that is at least very slightly negative ranges from 8 years and 7 months to 16 years and 8 months.
1. For GISS, the slope is flat since February 2001 or 12 years, 6 months. (goes to July)
2. For Hadcrut3, the slope is flat since April 1997 or 16 years, 4 months. (goes to July)
3. For a combination of GISS, Hadcrut3, UAH and RSS, the slope is flat since December 2000 or 12 years, 8 months. (goes to July)
4. For Hadcrut4, the slope is flat since December 2000 or 12 years, 8 months. (goes to July)
5. For Hadsst2, the slope is flat since March 1997 or 16 years, 4 months. (goes to June) (The July anomaly is out, but it is not on WFT yet.)
6. For UAH, the slope is flat since January 2005 or 8 years, 7 months. (goes to July using version 5.5)
7. For RSS, the slope is flat since December 1996 or 16 years and 8 months. (goes to July)

http://wattsupwiththat.com/201... [wattsupwiththat.com]

There is an embarrassing (for you) graph at the link in case you have trouble with numbers.

I've shown you my data now you can show me yours and we'll see who is the moron, you moron.

Re:Big Models (1)

TapeCutter (624760) | about 5 months ago | (#46479749)

I see you bought a nice bunch of cherries from Anthony Watts.
Sadly the only thing it proves is your (and Judith Curry's) lack of education on the subject of statistics [skepticalscience.com] . Note I am actually being generous here by assuming that Curry doesn't understand her mistakes.

Re:Big Models (2)

bug_hunter (32923) | about 5 months ago | (#46478167)

Well, except for the warming climate https://www2.ucar.edu/climate/... [ucar.edu]

Re:Big Models (1)

jamesl (106902) | about 5 months ago | (#46478651)

You'll notice in the graph at your link that the temperature trend is flat since about 2000. No warming. Thanks for proving my point.

Big picture here ...
http://www.ncdc.noaa.gov/sotc/... [noaa.gov]

Re:Big Models (2)

Cenan (1892902) | about 5 months ago | (#46480529)

You don't really get the scientific model do you? You know, the one where you don't pick an outlier as a base, and then try to "prove" that a trend is occurring by picking another outlier point. The technical term for that kind of "research" would be nit-picking, and is generally frowned upon by real researcher. You know, the kind of people who actually knows up from down, contrary to you.

Or maybe you just can't wrap your head around this whole thing called climate. I'll help you, climate is not weather. If you take your malformed little graph and zoom out, you would have one heck of a difficult time trying to make your model fit. That's why the real researchers can pick any range of years and get the same results as any other range, while you can only pick this one set. Isn't that just disheartening? You're trying so hard, and yet failing so badly.

But yeah I get it. You've drunk the cool aid an committed to the lie, there is no going back. Facts be damned, the world will just have to conform to your belief eh? And who gives a shit, the real consequences of your kind of ignorance will only surface when you're long gone.

Re:Big Models (2)

jodio (569370) | about 5 months ago | (#46482269)

"The technical term for that kind of "research" would be nit-picking, and is generally frowned upon by real researcher. You know, the kind of people who actually knows up from down, contrary to you."

Actually the term is cherry-picking. Nit-picking is focusing on trivial details.

Re:Big Models (1)

Anonymous Coward | about 5 months ago | (#46478263)

http://www.ncdc.noaa.gov/sotc/... [noaa.gov]

Global Highlights

        The combined average temperature over global land and ocean surfaces for January was the warmest since 2007 and the fourth warmest on record at 12.7ÂC (54.8ÂF), or 0.65ÂC (1.17ÂF) above the 20th century average of 12.0ÂC (53.6ÂF). The margin of error associated with this temperature is ± 0.08ÂC (± 0.14ÂF).

        The global land temperature was the highest since 2007 and the fourth highest on record for January, at 1.17ÂC (2.11ÂF) above the 20th century average of 2.8ÂC (37.0ÂF). The margin of error is ± 0.18ÂC (± 0.32ÂF).

        For the ocean, the January global sea surface temperature was 0.46ÂC (0.83ÂF) above the 20th century average of 15.8ÂC (60.5ÂF), the highest since 2010 and seventh highest on record for January. The margin of error is ± 0.04ÂC (± 0.07ÂF).

If I choose not to believe it, it cannot be true!

Yesssss! (0)

Anonymous Coward | about 5 months ago | (#46478097)

Woohoo same results as the cdc!

Tweak the Algorithms (4, Funny)

kajong0007 (3558601) | about 5 months ago | (#46478149)

Learn from nature! Google needs a genetic algorithm that modifies itself every flu season.

The fittest algorithm will survive to infect thousands.

Re: Tweak the Algorithms (0)

Anonymous Coward | about 5 months ago | (#46478399)

Agreed, though conditions will never be identical.

That aptly describes Google already (0)

Anonymous Coward | about 5 months ago | (#46480581)

Sling out services that mutate quickly and die if unsuccessful. Google traditionally relies on complex algorithms to get things "right".

The problem is the question, not the answer (5, Interesting)

lucm (889690) | about 5 months ago | (#46478157)

With big data, when you actively look for patterns you always find them; this is how hedge funds have been operating for years. The purpose of the technology is not to make predictions, but rather to confirm existing trends and possibly identify new ones.

Proper way to utilize big data in this case would be:
1) to assist the CDC in confirming or refuting trends observed in the field
2) to offer additional correlations (such as: are people living closer to highways more sensitive fo specific strains of flu)
3) to provide long-term indicators facilitating the assessment of medication and other flu containment factors

Big data is not a magic eight ball but it's not a piece of shit either.

Re:The problem is the question, not the answer (0)

Anonymous Coward | about 5 months ago | (#46478307)

Big data is not a magic eight ball but it's not a piece of shit either.

Big data PR plox go.

we have limits (0)

Anonymous Coward | about 5 months ago | (#46478267)

It's interesting how we always think we know everything, and how silly the mistakes earlier generations made are.

Re: we have limits (0)

Anonymous Coward | about 5 months ago | (#46478309)

Same AC
I mean that we are no better at predicting the future than anyone else in history. There is no model that can take into account what nature has proved capable of. A lot of effort can be put into things, and I like to have a better understanding of what surrounds me. 10 day weather forecast? Haha. I live in PA, USA. I don't know why they bother.

Google Flu got it wrong? (0)

argStyopa (232550) | about 5 months ago | (#46478279)

..well, so pretty much have all the FUD-spreaders in the CDC, government, and NGOs who've been all telling us that "any moment" we could get a "deadly flu" since the (ha ha ha) Sars "epidemic".

All I've ever gotten is the "Cry Wolf" heebie jeebies.

Re:Google Flu got it wrong? (1)

Attila Dimedici (1036002) | about 5 months ago | (#46478469)

They started doing it before the SARS "epidemic". I remember them talking about how the swine flu epidemic in 1976 was going to be like the 1918 flu pandemic because we were "due". It was just a matter of time til another flu pandemic like the one in 1918 happened.

Re:Google Flu got it wrong? (0)

Anonymous Coward | about 5 months ago | (#46478569)

It's a feedback cycle: all the media wolf-crying leads to more people thinking they might have the flu, which leads to google flu reporting more flu victims, which leads to more stories about how people are using google to find out if they have the flu, so next year MORE people will do the same.

Eventually everyone that gets a sniffle from allergies or has back pain from lifting something wrong will type "flu symptoms" into google, because you never know -- it might actually be the flu, and shit man just tell me if it's okay to tell my boss I can't come to work today.

Bogus data analysis? (0)

Anonymous Coward | about 5 months ago | (#46478281)

What kind of nonsense will NewScientist come up with next? The hole in the ozone layer WON'T kill everyone by 2010? Global warming WON'T burn us all to death and evaporate the oceans by 2012? When the weatherman is able to tell me what happens two weeks out accurately I might have a bit more faith in data analysts.

Re:Bogus data analysis? (0)

Anonymous Coward | about 5 months ago | (#46478335)

I'm pretty sure we're supposed to have flying cars by the year 2000, which is awesome, given we'll be out of oil by 2008.

Isn't it one of those perpetual beta things? (1)

dutchwhizzman (817898) | about 5 months ago | (#46478323)

Is it still in Beta? They should get this "right" and maybe look at other large scale models like weather modeling and add culture (how close people tend to get to each other, how much they are inside in the immediate vicinity of other humans) to the algorithms. It took google years to get gmail out of beta but it was pretty good while they were calling it "beta".. Slashdot on the other hand....

Biology varies more than expected. Unsurprised. (0)

Anonymous Coward | about 5 months ago | (#46478325)

According to the Harvard Law of Animal Behavior, "under carefully controlled experimental circumstances, an animal [or a human] will behave as it damned well pleases."

Re:Biology varies more than expected. Unsurprised. (1)

turning in circles (2882659) | about 5 months ago | (#46482747)

It is really a flawed experimental design. If I have the flu, I go to the doctor or I go to bed, I don't go to Google. If I have a bad cold, and can't decide whether it's the flu or not, I google the symptoms. The sicker you are, the less need to Google. The model might be predictive for really bad colds in cities, or really mild cases of flu.

Don't worry... (0)

Anonymous Coward | about 5 months ago | (#46478467)

It's beta, and will be discontinued momentarily.

Calm down, everyone (2)

wonkey_monkey (2592601) | about 5 months ago | (#46478475)

but one of its products has been making bogus data-driven predictions. A study of Google's much-hyped flu tracker has consistently overestimated flu cases in the US for years.

Bogus? Are you sure they weren't just... wrong?

It's a prediction.

Re:Calm down, everyone (1)

Livius (318358) | about 5 months ago | (#46478565)

Not bogus at all - I'm sure they really did make those predictions.

When a prediction changes behavior... (2)

acroyear (5882) | about 5 months ago | (#46478563)

In addition to "all of the above", the other contribution is that of the philosophical equivalent of Heisenberg: the predictions of outbreaks may have increased vaccination usage in the areas involved, which of course will have an effect of downplaying the outbreaks in those areas.

Not saying I have any evidence for that, (and I will wager it unlikely, considering the #s who vaccinate is still far lower than it should be), but a correlation study may be interesting to see.

If the point of knowledge of a possible outcome is to act to deter it, then shouldn't the actions that attempt to deter it be taken into account?

Re:When a prediction changes behavior... (1)

Bite The Pillow (3087109) | about 5 months ago | (#46482201)

It should, but only after google news picks up reporting on it. Then the modelers can say how much impact of reports of the prediction.
Next year, no one may report on it other than mockery, and you can't predict reporting that doesn't happen, so they can't start off with reporting taken in to account.

Actually correct? (0)

Anonymous Coward | about 5 months ago | (#46478571)

How have we decided that it was over-estimated? Do we have ANY data on how often the flu is unreported? Just about every single person I know has been sick with at least 1 kind of upper-respiratory illness this season, but none of them reported it to the CDC, or even a local hospital. How about nyquil sales numbers, for starters?

Influenza Vaccine (1)

dohzer (867770) | about 5 months ago | (#46478861)

In Australia recently they've been pushing people to get vaccinated against influenza for the coming winter because of the reported rise in flu cases during the recent North American winter, especially for the 18 - 60 age bracket. I hope they weren't using Google as their source.

Huh, Flu, who knew? (1)

elsuperjefe (1487639) | about 5 months ago | (#46478885)

All this time i thought the only reason "big data" mattered was to provide motivation for companies to invade my online privacy and better target advertising. i can tell you all those male enhancement product ad placements really hurt my self image.

Big data defeat God? (0)

Anonymous Coward | about 5 months ago | (#46478943)

U think big data cool? Ur algorithm is better than God's?

Think again

Google Flu detects hypocondriacs (0)

Anonymous Coward | about 5 months ago | (#46479621)

All it did was find people talking about the Flu. To actually make forecasts you need to have medical data that Google is not allowed to have due to HIPPA. There are companies doing epidemiology work with data from hospitals and doctors.

Did they? (0)

Anonymous Coward | about 5 months ago | (#46479803)

Just how many cases are reported anyway? How many remain uninsured, certain that Obamacare is going to kill their grandmothers or puppy if they dare to explore signing up?

Googs (0)

Anonymous Coward | about 5 months ago | (#46480177)

It's time to die Goog; it's time to die.

So does this mean that.... (1)

Air-conditioned cowh (552882) | about 5 months ago | (#46480307)

So does this mean that all that shiny blue racks of gleaming hardware in the Google Coud adverts around Slashdot don't actually work??? I really feel sorry for the guys at Google who installed it all and thought they were actually on to something. Only to find that it comes out with the wrong answer every time.

To be completely fair (1)

Aleksander Wistrand (2879351) | about 5 months ago | (#46480467)

not all flu cases are discovered, and not all persons with the flu are knocked out by it, so teh missing numbers are probobly mild cases where the people actually continue to go to work, or rather, study.

Wrong measure (1)

Bazman (4849) | about 5 months ago | (#46480635)

The headline is that the prediction was overestimating three times in the past three years. So what?

Google's Flu Trend plots don't have uncertainties on them, so they'll never be exactly right. So they either have to be overestimates or underestimates. In any three years, you are going to get at least *two* under or over estimates. So post-hoc, saying "ZOMG! There's three overestimates in three years!! #EPICFAIL LOL!" isn't very meaningful.

Until Big Data People understand statistical uncertainty and are happy to put prediction confidence intervals on their data, this will keep happening. However, prediction confidence intervals are an admission of uncertainty, and uncertainty is weakness, right? And we won't have any of that in our corporate Big Data strategy document. Mr Statistician, you're fired, we're hiring some more Big Data Scientists.

Mabe they based it on how many claim to be sick? (1)

KreAture (105311) | about 5 months ago | (#46481315)

I think the error just shows how many take a flu-day without being actually sick.

Put Nate Silver on this (1)

Arancaytar (966377) | about 5 months ago | (#46481791)

"Flu virus predicted to take US congress in 2014 with 96.34% certainty."

Predictions are predictions (1)

jennatalia (2684459) | about 5 months ago | (#46484077)

It doesn't mean they'll come true. I predict I'll win the lottery. Yay, I won $4 instead of $40M. It's a noble cause, but it just needs some tinkering to get things right.
Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...