Announcing: Slashdot Deals - Explore geek apps, games, gadgets and more. (what is this?)

Thank you!

We are sorry to see you leave - Beta is different and we value the time you took to try it out. Before you decide to go, please take a look at some value-adds for Beta and learn more about it. Thank you for reading Slashdot, and for making the site better!

Mystery MLB Team Moves To Supercomputing For Their Moneyball Analysis

timothy posted about 10 months ago | from the stats-nerds-with-bats dept.

Stats 56

An anonymous reader writes "A mystery [Major League Baseball] team has made a sizable investment in Cray's latest effort at bringing graph analytics at extreme scale to bat. Nicole Hemsoth writes that what the team is looking for is a "hypothesis machine" that will allow them to integrate multiple, deep data wells and pose several questions against the same data. They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."

Sorry! There are no comments related to the filter you selected.

The Million Dollar Question (5, Funny)

mutantSushi (950662) | about 10 months ago | (#46668395)

"Supercomputer, is baseball still boring as fuck?" "YES, DAVE."

Re:The Million Dollar Question (2, Insightful)

brxndxn (461473) | about 10 months ago | (#46668435)

It's boring to those that see it but don't understand it.

Re:The Million Dollar Question (4, Funny)

willoughby (1367773) | about 10 months ago | (#46668469)

Ahhh... so it's like cricket, then?

Re:The Million Dollar Question (1)

hedleyroos (817147) | about 10 months ago | (#46668565)

Ironically your nick is the surname of a well known South African cricket player.

Re:The Million Dollar Question (1)

Salafrance Underhill (2947653) | about 10 months ago | (#46668651)

Cricket isn't really about the game. It's more about the small hamper of cream buns, punnets of strawberries, the chilled champagne, the lounging around in the sun with your girlfriend/boyfriend. That and being frightfully English.

Re:The Million Dollar Question (1)

alienmole (15522) | about 10 months ago | (#46668675)

In that case why not go have your picnic somewhere where you're not surrounded by a bunch of frightfully boring cricket fans?

Re:The Million Dollar Question (1)

Anonymous Coward | about 10 months ago | (#46668689)

Because his one of them :)

Re:The Million Dollar Question (1)

bagorange (1531625) | about 10 months ago | (#46672483)

That and being frightfully English.

Cricket might not even be the second most popular sport in England. (Association) Football a clear first, then either cricket or Rugby (but which kind of rugby??)

Vastly more popular in India, Pakistan, Australia and the West Indies / Carribean

Re:The Million Dollar Question (1)

bullgod (93002) | about 10 months ago | (#46672239)

No. Despite the apparent similarities (ball, bat, runs, etc.) cricket and baseball are very different games.

In baseball, there will be (at least) nine innings, each of which last until 3 outs. It's a competition between pitcher and batter that little can interrupt. I don't think many baseball fans realise that cricket is much more to about about managing resources than a gladiatorial contest.

So, for example, in a 5-day match every ball you decide to face is one less opportunity for you to dismiss the opposition, and if you don't do that (twice) you can't win the game, this is why a draw is a valid result. And why, if you think cricket works like baseball, it can't make sense.

Re:The Million Dollar Question (1)

burnttoy (754394) | about 10 months ago | (#46672533)

Not quite.

Sadly the problem with baseball is it doesn't really give you enough time...

I mean - At least with cricket I get the prerequisite 4 or 5 days to get which is _just_ about enough time to get properly drunk (and eat cheese sarnies)

Re:America's Pastime (1)

hoboroadie (1726896) | about 10 months ago | (#46668483)

I can take it or leave it, but a minor league game from wooden bleachers is a much better time for me.
What I find amusing is the obsession with statistics, considering the randomness of any particular game. But then I don't follow any particular team, its the spectacle of seeing it done. (And the main thing I appreciate is how unimaginable it is relative to my own abilities.)

Re:The Million Dollar Question (1)

VortexCortex (1117377) | about 10 months ago | (#46668527)

It's boring to those that see it but don't understand it.

Negative. It is boring to hackers since we're likely to think a sport is something you do, not something you watch. [catb.org]

Re:Specked Taters (1)

hoboroadie (1726896) | about 10 months ago | (#46668637)

I'm a nerd, not a hacker, but yeah. I was on the fencing team.

Re:Specked Taters (1)

sconeu (64226) | about 10 months ago | (#46670435)

I was also a fencer. But I've spent the past 10 years playing catcher on a beer-league softball team.

Re:The Million Dollar Question (0)

Anonymous Coward | about 10 months ago | (#46670861)

I know the rules of baseball, it is a stupid boring "sport" with little physical activity or action. You don't understand "sport".

Re:The Million Dollar Question (1)

idioto (259918) | about 10 months ago | (#46673197)

I understand it, and that is what makes it boring. Most of the people who watch it do not understand baseball. I used to love baseball, but the more years pass, the less I like it. Like most people in the stadium can see whether a pitch is a ball or a strike, or a curveball or a fastball, you can't see it unless you have perfect seats. I will beat 9/10 baseball fans in baseball trivia, I will beat 9/10 baseball fans in a baseball video game, i will beat 9/10 of baseball fans in whiffleball (because 9/10 of baseball fans can't even track a baseball in mid-air). it's barely even a team sport, since using statistics to pick players seems to be all the rage rather than actually coaching. that's why they're called managers.

the only thing that baseball has going for it is that you never run out of time, you run out of outs. a high school team could beat a college team could beat a pro team cause for the most part baseball could be simulated with dice. that's all baseball is. dungeons and dragons

Re:The Million Dollar Question (1)

auric_dude (610172) | about 10 months ago | (#46668557)

The million dollar question is that why spend time looking at the past when they could use this to forecast the next winning Powerball ticket?

Re:The Million Dollar Question (0)

Anonymous Coward | about 10 months ago | (#46668649)

There is nothing to forecast in picking a winning lottery ticket.

Re:The Million Dollar Question (1)

wb8nbs (174741) | about 10 months ago | (#46668897)

Ha! Couldn't agree more.

Can you imagine where America would be today if all those man hours spent obsessing over Baseball were instead spent doing something productive?

Re:The Million Dollar Question (1)

rssrss (686344) | about 10 months ago | (#46670055)

Like commenting on slashdot?

Re:The Million Dollar Question (1)

jonwil (467024) | about 10 months ago | (#46669445)

Just be glad you yanks broke away from the motherland all those years ago otherwise you would probably be doing what us Aussies are doing and playing the one team sport on this earth MORE boring to watch than Baseball, Cricket.

Re:The Million Dollar Question (2)

wiredlogic (135348) | about 10 months ago | (#46669895)

Canada managed to dodge the cricket scourge and it's still in the commonwealth.

Re:The Million Dollar Question (1)

Talderas (1212466) | about 10 months ago | (#46692901)

They have curling instead.

Call Asimov (2)

InsultsByThePound (3603437) | about 10 months ago | (#46668427)

I'm sorry, but not even robots could make this game interesting. I say we sell it to the Japanese while they're still game.

Re:Call Asimov (1)

wb8nbs (174741) | about 10 months ago | (#46668903)

You have to agree though, it's one step closer to Psychohistory.

Compounding variables (1)

hoboroadie (1726896) | about 10 months ago | (#46668519)

Sorry, I RTFA, it was unavoidable.
Looks like they might actually use the horsepower.

It's the Cubs (2)

techsoldaten (309296) | about 10 months ago | (#46668539)

My best guess is it's the Cubs.

They are looking for minority investors in the club right now, and the cost of ballpark improvements is a smoke screen for taking on the cost of big data. Theo has not been the same without Tessie, and it's not cheap to recreate the analysis that system is capable of performing.

I really wonder what the value of such a system is compared to updating / refining Nate Silver's PECOTA odds to play out hypothetical teams and transactions over a 5 year period. There is so much data available about players at this point, it's almost possible to predict regressions on a macro level.

Re:It's the Cubs (1)

Zontar_Thing_From_Ve (949321) | about 10 months ago | (#46669009)

My best guess is it's the Cubs.

It's a good guess. I'd also say that the Red Sox and A's make the short list of teams that have people in the front office who might see some value in this, although the A's are run on the cheap so it is a little hard to think they'd pay for this.

Re:It's the Cubs (2)

Rogue Haggis Landing (1230830) | about 10 months ago | (#46669439)

It's not the Cubs, Red Sox, or A's.

The original story [si.com] about this said that it was "an organization that many might not expect." None of those, or the other teams who've shown marked interest in analytics or who have GMs known to be friendly to advanced analytics (off the top of my head that's the Yankees and Mets, Cleveland, Tampa, Baltimore, Toronto, Seattle, and Arizona to start with) would be particularly surprising. The other thing to note is that "buy a supecomputer!" is the sort of response that a team that suddenly realizes that it's way behind might do. The Red Sox have probably been growing a dedicated server farm to deal with all of the new data sources that have been coming along. They don't need to rush out and buy a Cray.

The speculation at the time the story came out ran to the Phillies (they have cash and seem to be way behind on analytics) and Astros, and then teams like the Tigers and Royals that have a fantastically rich owner.

Re:It's the Cubs (1)

sconeu (64226) | about 10 months ago | (#46670455)

The last team you'd expect to be into analytics would be the Angels, given Scioscia's tendencies.

And Arte may be panicking after the horrendous start.

Re:It's the Cubs (1)

SpzToid (869795) | about 10 months ago | (#46670551)

Here's a thought: Maybe the Angels are so loathsome to fire everyone's favorite Mike Scioscia after his World Series win waaay back in 2002, even if maybe his time has come and gone, the management is considering analytics to micromanage Mike's calls?

For example, player X at bat against Pitcher Y, 2 men on, no outs, count is 2 and 2. Mike says to bunt for the sacrifice, but what do you say DAVE?

Re:It's the Cubs (1)

93 Escort Wagon (326346) | about 10 months ago | (#46671073)

Seattle? You've got to be kidding. They have an analytics department, but pretty much every decision made by the GM shows he doesn't listen to the analytics department.

Now three or four years ago... Perhaps. But they fired their main stats guy and the GM has an entirely different group around him now.

Re:It's the Cubs (1)

techsoldaten (309296) | about 10 months ago | (#46676941)

Well, it says 5 years ago, they would not be a team you would expect. I still say it's the Cubs, and yeah, this is just a guess. But I really can't think who else would have a reason to do it.

When I go down the list, here's the teams that have a front office with a strong, expressed interest in Big Data.

- Athletics
- Red Sox
- Cubs
- Padres (Jed Hoyer legacy)

Here are the clubs that are known to have been investing in advanced metrics previously, in some cases at a limited scale.

- Nationals
- Dodgers
- Rays
- Phillies
- Yankees
- Mets

Out of the teams listed above, the Cubs stand out as the one with the strongest support for big data from the front office, and the biggest gap in terms of what they have now. There was an article about Theo recently that talked about the fact they had someone on payroll who would print emails and web pages out for scouts to read, since they were not reading them online. Five years ago, they are one of the last teams I would expect to use metrics in a meaningful way.

I discount the other teams based on the following factors:

- If it was the Yankees, the price tag would be more like $13 million. They don't spend cheap period.

- If it was the Nationals, Davey Johnson would not be in the front office. He has been vocal about not using advanced stats in game-time decisions.

- If it was the Phillies, the system would be less about game time decisions and more about scouting. Their issues with their scouting system are well-known.

- The Rays are all about efficiency and doing the most with what they have. They don't like to acquire free agents, they are about building from within. They are not going to have a lot of historical data about their players for a system like this to chew on. It would not make sense for them to invest in one.

- I could almost see it being the Dodgers, but the Dodgers have a lot on their hands with new television contracts. I doubt they have the bandwidth for an organizational overhaul on top of that. They are focused on marketing, and this plays a role in how they make decisions.

- The Mets continue to struggle financially, and I am not sure they are entirely solvent. I am sure a capital expenditure like this would be something people would have already heard about through the media. It's possible it's something that would need to be approved by a bankruptcy judge.

The teams I simply discount are as follows. I don't see where big data fits into what they are doing. A $500,000 investment in winning requires some kind of organizational commitment to transforming the club overall, which just doesn't jive with the way these teams spend money. They either have systems that already work, or the markets they operate in allow them to make money without winning. I don't see where the impetus for a big, organizational change comes from with these ones.

- Orioles
- Indians
- Twins
- Mariners
- Angels
- Rangers
- Astros
- Marlins
- Pittsburgh
- Braves

That leaves about 11 teams to think about. If there was a wildcard, I would say it's the Twins, simply because Selig owns them and is aware of what Big Data can do.

Re:It's the Cubs (1)

techsoldaten (309296) | about 10 months ago | (#46676759)

Yeah, and the Red Sox has Tessie. I don't think they are in the market for a replacement.

Re:It's the Cubs (1)

Joe_Dragon (2206452) | about 10 months ago | (#46669289)

they don't need to spend all of that you get to get the out put of "NEXT YEAR"

Re:It's the Cubs (1)

ducomputergeek (595742) | about 10 months ago | (#46669715)

Maybe it would be cheaper if they just obtained a copy of "The Cardinal Way". http://www.baseballprospectus.... [baseballprospectus.com]

Yankees or Red Sox Mystery (1)

markass530 (870112) | about 10 months ago | (#46668551)

they've been buying wins for almost 20 years now, nothing new

Re:Yankees or Red Sox Mystery (1)

CaptainStumpy (1132145) | about 10 months ago | (#46668731)

How have the sox been buying wins since 1994?

Re:Yankees or Red Sox Mystery (1)

markass530 (870112) | about 10 months ago | (#46673161)

they = the yankees and red sox, the red sox didn't start until the 2000's

Forget moneyball, just extort more from cities (0)

swb (14022) | about 10 months ago | (#46668643)

Why bother with moneyball? If your stadium is more than 10 years old, just whine you need a new one to provide the revenue to be competitive. You can threaten to leave for another city, promise to get an All-Star game, or just quit spending money on decent players for a while to convince the fan base that you really aren't competitive.

The Twins did a combination of all these things, but of course, the owners decided that more money in their pockets was the real goal as the new money from their shiny, taxpayer financed stadium hasn't bought new players and they have been .407 in each of the last two seasons and are working on a similar outcome this season, already making themselves comfortable at 1-3 in last place.

Red Sox (1)

Anonymous Coward | about 10 months ago | (#46668657)

It is likely the Boston Red Sox. There was talk of this at the Analytics conference in Boston a month ago.

frist 5top (-1)

Anonymous Coward | about 10 months ago | (#46668773)

The mystery is... (2)

gatkinso (15975) | about 10 months ago | (#46668785)

...why haven't they been doing this from the start?

Re:The mystery is... (2)

rainwater (530678) | about 10 months ago | (#46669257)

It was probably hard to find a super computer in 1876.

Re:The mystery is... (1)

93 Escort Wagon (326346) | about 10 months ago | (#46671141)

...why haven't they been doing this from the start?

Among baseball people, there is still great resistance to "nerds" and their "spreadsheets". I've heard it stated, on many occasions by many players and managers, you can't understand how to win baseball games if you didn't play the game yourself at the major league level. Last season, Seattle's manager Eric Wedge made more than one sneering comment about spreadsheet users who hadn't played the game since Little League, for example.

There are a handful of players who DO look at advanced metrics in an attempt to improve their performance - guys like Joey Votto and Brandon McCarthy. And of course there are orgs like the Rays and Athletics who are really known for it (and it seems to be catching on in many front offices). But you still hear lots of people in the game holding fast to ideas like a pitcher with more wins is better than a pitcher with fewer wins, regardless of their supporting cast; or, with my team, holding onto the idea that offensive ability is really all that matters - so they'll pick up atrocious fielders who may be able to hit, then wonder at the end of the season how their record didn't improve (when their hitters scored a dozen or so extra runs over the previous season but gave away more than one hundred extra on defense).

No longer a game. (1)

geekmux (1040042) | about 10 months ago | (#46668815)

"They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."

Hypothesized reality? Oh you mean if a coach wanted to give a player performance enhancing drugs that they know they can hide to analyze the wins, or do you mean simulating reduced gravity because you plan to bilk the entire nation in taxes to pay for the next baseball stadium on the moon?

I don't think baseball needs a supercomputer to analyze just how bored I am watching men be paid millions of dollars to stand around 90% of the time in a grassy field, especially when that cost translates to the average American family spending hundreds at the ballpark for a single game.

The Mets, obviously (1)

OutSourcingIsTreason (734571) | about 10 months ago | (#46668901)

They need to calculate what to do when players go on paternity leave.

it's about defensive analytics (4, Interesting)

Rogue Haggis Landing (1230830) | about 10 months ago | (#46669191)

One of the great pleasures of baseball is that it generates a vast amount of data for the analytically minded to use and abuse to their heart's content.

This purchase is presumably related to MLB's recent announcement of a new system [mlb.com] that will constantly track and measure the movement of the ball and every player on the field. Supposedly this is going to generate several terrabytes of information each game, and some team has decided to buy a Cray as a way of processing all that data. Whether that's a better idea than the proverbial Beowulf cluster I don't know, but that seems to be this team's thinking.

Most, maybe all, baseball teams have been doing some variant of advanced analytics for quite some time now. Most of this work is proprietary and secret, but there's been a lot of "open source" (or at least publicly available) work that's probably along the same lines. Sabermatricians (baseball stat people -- from "SABR', the Society for American Baseball Research) have gotten very good at measuring offense, and reasonably good at predicting hitters' future numbers. Nate Silver's PECOTA system is the most famous, but there are others that work about as well (ZiPS and Cairo being the ones I've spent time with, plus the "dumb as the monkey on Friends" system called Marcel). Pitching numbers are understood pretty well, at least as they relate to the Three True Outcomes, which are the results or a batter v. pitcher matchup that don't involve any defensive players (i.e., walks, strikeouts, and home runs).

The next great frontier of analytics is defense. There's been a lot of work in this field over the last decade, but the problem has always been in getting good data. If a ball is hit towards the shortstop and the shortstop doesn't get to it, why is that? Is it because the ball was hit too hard? Is it because the shortstop was badly positioned by his coaches? Is it because the shortstop isn't very good? Data that's not much more than "groundball to shortstop" can't really answer that question, but the new tracking system promises to answer that sort of question in full by precisely measuring reaction times, routes to the ball, and so forth. This in turn might lead to greater and greater changes in defensive positioning, different emphases in player acquisition, maybe even in-game changes based on small changes in wind patterns or whatever.

Some of what we're already learning about defense is very surprising. For example, there has been a lot of work done recently on catcher's ability to "frame" pitches, that is to make a borderline pitch look good. The most current results [baseballprospectus.com] suggest that the pitch-framing difference between the best and worst catcher might be worth something on the order of 5 wins. That's roughly the difference between having a random scrub and an All-Star as your right fielder, and all from a catcher's ability (or inability) to fool the umpire. It's shocking.

As for what team this is, when the news first broke it was claimed that the purchasing team "would surprise most people". That rules out the teams that are well-known to be friendly to advanced analytics -- starting with the Red Sox, Yankees, Cub, and A's. The best guess I've seen is that it's the Phillies -- they have tons of cash and seem to be very behind on analytics, and seem likely to just go out and buy a supercomputer rather than have the MIT grads in their analytics department jerry-rig a bunch of Debian boxes into something cooler and weirder.

Re:it's about defensive analytics (1)

tjb (226873) | about 10 months ago | (#46670699)

Yup, defense is where it is at. The SF Giants won two WS in 3 years by (accidentally, I think) putting together a team that was focused on pitching and defense while everyone else was focused on offense.

While offense is WAY more important, it is too well understood now to gain any advantage over other teams. In 2002, Billy Beane could flip a guy with a great swing or subjectively good defense for someone with better OPS+ and generate wins because everyone else valued the scout's opinions and not the numbers. In 2014, everyone values OPS+/PECOTA/Cairo over subjective opinion so the market has become almost completely efficient in that regard.

The 2010 and 2012 SF Giants had above average (but not spectacular, particularly in 2012) pitching and absolutely awful hitting but managed to win 2 championships when the current state of research says that they shouldn't. One, you could explain it away as a fluke. Two? Maybe there's something there and defense would seem to be where the current blindspot is and if someone can get ahead of the crowd in understanding it, they can get ahead of the crowd and intentionally put together a championship team in that way.

Re:it's about defensive analytics (1)

mea2214 (935585) | about 10 months ago | (#46671085)

The problem with baseball analytics is more in the misuse of mathematics than lack of processing power. Few of what they call "peripheral" stats like FIP and BABIP have any mathematical proofs whatsoever. Other stats try and normalize all external conditions to level the field for all players for evaluation. This introduces ambiguous concepts like "stadium effect" being calculated based upon another set of stats and assumptions. All assumptions along the way add up to introduce bias which is why there is no single player evaluation system that can even determine past results with full knowledge of history. WAR tries but there are several variations. If WAR truly had a mathematical proof for it there would only be one answer. Supercomputers might be an interesting science project for some baseball owner but when you enter into it garbage data based on faulty math you end up with garbage results no matter how many processor cycles it consumes.

Why do people watch other people do sports? (0)

Anonymous Coward | about 10 months ago | (#46669199)

I don't understand the attraction. Unless you have a family member on the team, who gives a fuck?
I don't watch other people swim, I go to the pool and swim myself.
Same with biking, or any other kind of sport.

Psychohistory. (1)

Anonymous Coward | about 10 months ago | (#46669243)

Psychohistory called, and they want their 5% of the profits.

The ECB are looking for (1)

mjwalshe (1680392) | about 10 months ago | (#46670355)

A boffin to explain how these blinkly light things work and if they can run hadoop on the item of searing white hot technology (a LEO III) they have in the basement. In the hope that it can stop the English Cricket Team losing to the Dutch!

I thought they already knew (1)

AndyKron (937105) | about 10 months ago | (#46670893)

I thought they already knew which teams are going to win, like wrestling?

The Impossible Modeling of Unpredictable Animals (0)

Anonymous Coward | about 10 months ago | (#46671593)

Sports Psychology and team culture is probably more important to the success of an organization, and like a bad classic Star Trek scene where Kirk asks the killer probe computer "what is love?", the Cray powered data mining system is going to struggle with this.

Players are not factory machines with predictable performance. The .300 hitter who swatted 46 home runs last year signed to the $40 million contract can get snapped by TMZ stepping out with his mistress in November, get sued for divorce in December, lose half his estate in February and be expected to show up for Spring Training in March ready to play ball. Let's see how the computer models his mental state as his insomnia is making him sleep 4 hours a night, the court give his house to his ex wife, the mistress left him, he's started drinking and eating poorly, and now he's struggling at bat to simply not look foolish let alone be the home run hero he was last year.

You want two words to confound the computer?
"Tiger Woods."

Every player on the PGA tour should have sent a Thank-You card to Elin Nordegren for the spurned wife rampage she went on when Tigers' dalliances were discovered, and the psychological nuclear disentegration it caused to Tiger's game. The man imploded and has largely never recovered.

These are wildly extreme examples. Most players are just streaky in general and it's all due to the wetware sitting atop their shoulders. Look at how Alabama's football team handles mental conditioning with a full time on staff Sports Psychologist:

WTF?? (0)

Anonymous Coward | about 10 months ago | (#46672249)

"They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."

WTF does that actually mean?

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?