Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Amazon Launches Public Data Sets To Spur Research

kdawson posted more than 5 years ago | from the put-it-there dept.

Databases 82

turnkeylinux writes "Amazon just launched its Public Data Sets service (home). The project encourages developers, researchers, universities, and businesses to upload large (non-confidential) data sets to Amazon — things like census data, genomes, etc. — and then let others integrate that data into their own AWS applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various US Census databases, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl."

cancel ×

82 comments

Finally! (5, Funny)

jornak (1377831) | more than 5 years ago | (#26002601)

Now I have somewhere I can store the index of my massive porn collection. Thanks, Amazon!

Re:Finally! (1, Funny)

dmbasso (1052166) | more than 5 years ago | (#26004089)

Score:0, Funny... why the fuck people are so humorless nowdays? To take the time to mod down a funny post... why don't you spare your mod points to mod down the 'frist psots!!1!!!' that became usual?

Btw, go ahead and mod me off-topic.

Re:Finally! (1)

CambodiaSam (1153015) | more than 5 years ago | (#26006059)

I believe the system will automatically give the first post a -1, specifically because of the people that normally put junk in there.

Re:Finally! (1)

dougisfunny (1200171) | more than 5 years ago | (#26007141)

It did have a starting score of -1

Check off privacy (1)

SchizoStatic (1413201) | more than 5 years ago | (#26002611)

One more step to a non private world CHECK

Re:Check off privacy (2, Insightful)

kellyb9 (954229) | more than 5 years ago | (#26002711)

One more step to a non private world CHECK

Depends on what you upload. Census data isn't private.

Re:Check off privacy (4, Insightful)

johnsonav (1098915) | more than 5 years ago | (#26003027)

One more step to a non private world CHECK

Privacy, as we have experienced in the last hundred years, is on its way out anyway. The sheer volume, immortality, and interconnection of, even publicly available, datasets inadvertently reveal information most of us would rather keep private. Much like how most people don't have a problem with beat cops regularly patrolling an area, but feel threatened by cameras monitoring, recording, analyzing, and storing information about the same public area.

That said, its here to stay. The data's here as long as we use credit cards for most purchases, use I-Pass(or similar) toll paying systems, carry GPS enabled cell phones, and expect the police to protect us from 100% of terrorist and criminal bogeymen. We might as well get some private research done, rather than leave it all to the government and big business.

Re:Check off privacy (2, Insightful)

truthsearch (249536) | more than 5 years ago | (#26003225)

Privacy, as we have experienced in the last hundred years, is on its way out anyway.

It was only recently on its way in. For most of history people lived in small communities where everyone knew each others' business. Privacy only seemed to become a major concern when technology let us share information across large distances and with many more people.

I'm not commenting on whether that's a good or bad thing.

Not the same (2, Insightful)

Nerdposeur (910128) | more than 5 years ago | (#26004025)

[Privacy] was only recently on its way in. For most of history people lived in small communities where everyone knew each others' business.

Which is very different from a large society in which some people know everybody else's business.

Even if this stuff is public, the time and money and knowledge necessary to use it will not be evenly distributed.

Re:Not the same (3, Insightful)

johnsonav (1098915) | more than 5 years ago | (#26004323)

Which is very different from a large society in which some people know everybody else's business.
Even if this stuff is public, the time and money and knowledge necessary to use it will not be evenly distributed.

Information has never been evenly distributed. In small communities it was the neighborhood gossip, the corner pharmacist, the village priest, or the county sheriff who knew everybody's business. The replacement of social capital with monetary capital is the only difference.

Those small communities had, however, a fast-acting, closely monitored feedback system. If someone abused their position of power and trust, it was caught quickly and it was easy to remove them from the loop. A similar system is needed now, only on a national, or worldwide scale. I think the only way to accomplish this, without going back to a pre-computer society, is to make sure that as much information about the watchers is as publicly accessible as possible. Hopefully, the same spirit that makes the OSS community so vibrant and quick to act will transfer to this new domain.

Re:Not the same (1)

truthsearch (249536) | more than 5 years ago | (#26004795)

Well said. The feedback system, though, does exist in a few forms today. The press, for example, when they're doing their job, keep a watch on the government and report whatever they can find. The abusers of information can be removed from office with enough public interest. As another example, corporations can lose customers if they lose their trust.

Re:Not the same (1)

johnsonav (1098915) | more than 5 years ago | (#26005403)

Its true. Abuse by institutions already has a, relatively, effective feedback system. What I worry about is individual criminals or criminal organizations. There, the anonymity of access to the data prevents the feedback system from finding the person responsible for the abuse. I wouldn't mind having most of my personal data publicly available as long as I could see a unforgeable record of who had looked at it.

I guess I have the same attitude as the banks do regarding fraud. Its easy to commit, but really hard to get away with. If we are going to live in a transparent society, I don't want to be caught on the wrong side of a one way mirror. Its got to be transparency for everyone, or for no one.

Re:Check off privacy (2, Insightful)

MikeURL (890801) | more than 5 years ago | (#26003589)

Most people never experienced Usenet. They never got to see offhand comments they made 15 years ago still searchable today.

I think if everyone had a chance to really live the immortality of data in that way they'd be a LOT more scared. As it is, most 'immortal' data lives out of our sight and lurks behind the scenes. Our credit card charge history isn't available in Google so it is easy to think it is gone.

Having said that...I think you're right. There is a cloud-like structure developing out there where virtually EVERY electronic transaction will leave a permanent record. There will come a skynet-like point where we won't even have the option of simply restoring privacy. The entire system is built upon the premise that privacy isn't really all that important. As it is you can get people to surrender virtually all explicit privacy if you give then a free iPhone (or whatever gadget they were offering) and implicit privacy is mostly an illusion already.

Re:Check off privacy (1)

johnsonav (1098915) | more than 5 years ago | (#26004791)

There is a cloud-like structure developing out there where virtually EVERY electronic transaction will leave a permanent record. There will come a skynet-like point where we won't even have the option of simply restoring privacy.

I think we're well past the point of restoring privacy, as it existed in the middle part of this century, without tearing down the computational, networking, and legal infrastructure that we've created since the 1970's. As long as the capacity exists to record and search this data, and proves useful enough to those who collect it, we will live in a world where almost everything we do is monitored by machine.

The part that's hard to imagine is when these disconnected islands of, seemingly innocuous, data are joined together completely. Few balk when their name is listed in the phone book, when their child's birth is certified, when their house's picture is on Street View, when their building permits and house plans are recorded at the county courthouse, when their credit report reflects the state of their finances, when the IRS requires a detailed accounting of all their money every year, or when the time, place and manner of their death is certified. But, when all of those records are digitized, linked together, and combined with other private datasets, people would blanch at knowing just how much of their lives had been recorded, tracked, and analyzed. Only recently has technology advanced far enough that it is possible to correlate all of these datasets for almost everyone.

I don't think there is anything we can do about it. If we want to live in the modern world, we have to accept that the data will be collected and stored. I just hope the legal system moves fast enough to allow the citizenry to watch the watchers (not freaking likely). Having more of this data publicly available helps.

Re:Check off privacy (4, Interesting)

Chyeld (713439) | more than 5 years ago | (#26003153)

The less privacy we have, the less we have to worry about our privacy. That sounds flip, and along the lines of "if you have nothing to hide..." but it isn't.

We want privacy primarly due to shame.

We have shame because we wear masks almost 100% of the time.

We wear masks don't want people to realize who we 'really are' either mentally or phyically.

We don't want people to really know us because we have been convinced to hold ourselves to standards that no one actually meets.

We hold ourselves to these standards because everyone else is wearing masks and while we can tell ourselves that 'they are just like us', it's hard to grasp that cognatively without actual proof.

If there were no privacy, no one could wear a mask. If no one were wearing a mask, we would realize that the standards we hold ourselves to are unrealistic. If we realize the standards we hold ourselves to are unrealisitic, we are freed from shame. If we are freed from shame, we no longer find privacy necessary.

Re:Check off privacy (1)

megamerican (1073936) | more than 5 years ago | (#26003243)

Barack Obama called, he wants you as Attorney General.

Re:Check off privacy (5, Insightful)

tylerni7 (944579) | more than 5 years ago | (#26003373)

We (or at least some of us) also want privacy to prevent annoyances and for protection.

I certainly don't want to have to answer to the government anytime I say the word "bomb" or "terrorist" on the telephone, in email, or in an IM.
I also don't want some company complaining anytime they see me buy a product from one of their competitors.
I also don't want to have everyone on the internet knowing my social security number, address, license plate number, or telephone number.

That isn't because of "shame" that's because people can be assholes, and some people will abuse information. I don't care if people that I trust know these things, but I don't think shame or masks or whatever has anything to do with getting one's identity stolen, or having the government ensure you don't say anything bad about them.

That said, I don't think this public dataset business really affects individual privacy. This is more a database of already public, but hard to find, data, that doesn't contain personally identifiable anything in it.
Let's just hope they keep it that way.

Re:Check off privacy (1)

Chyeld (713439) | more than 5 years ago | (#26004781)

But think for a moment why those items are annoyances.

You don't want to be heard saying certain keywords because the government has adopted the ineffective method of attempting to track terrorists by looking for these words. If they were forced to track EVERYONE, the government would quickly realize the folly of this course of action when it became apparent that not only did most terrorist already have ways around this, but the incidence of 'innocent' usage of these keywords was high enough that tracking them would be pointless.

You don't want companies complaining because you've purchased their competitor's products, but you don't realize that if everyone is buying their competitor's products (and knows it) then the company would either quickly go under or refine their marketing/product to the point where you'd be willing to buy it.

You don't want people having your SSN because too many places erroneously treat it as a universal ID number, but if everyone's SSN was available, no one would use it as a basis for ID'ing someone. Similarly, your license plate, phone number, and address are not useful to a 'black hatter' in any large extent if everyone's was public. It's only because those items are protected by people that makes them valuable.

That being said, I agree with your last paragraph. My response was to the GP's appearent desire to defend privacy for privacy's sake rather than a condemnation or defense of Amazon's actions.

Re:Check off privacy (2, Insightful)

tylerni7 (944579) | more than 5 years ago | (#26004985)

If my phone number and address were available, then people could easily contact and harass me. It's true that they could do the same to anyone, but that doesn't mean they will stop harassing people all together. Instead what would (probably) happen, is people would just choose who they want to harass. (Just think about 4chan, for instance, they don't do it because it's difficult, they do it to harass people)

Likewise, the government wouldn't just change laws, instead they would (probably) just use the information they have to go after people they don't like.

I am just speculating of course, and you do have a lot of valid points, like with SSNs for isntance. But I don't agree that if society was completely open, people would suddenly stop abusing their power and stop being assholes to other people. Instead, it would just be easier for them to do these things.

Re:Check off privacy (1)

Chyeld (713439) | more than 5 years ago | (#26005881)

4chan'ers harass anonymously, do you think they would be still as willing to do so if you knew who they were? If EVERYONE knew who they were? How many people have you met who are complete dicks when they are under public scrutiny?

The "government" is not an entity, it's a group of people. People who themselves would be aware of the fact that they themselves are just as vunerable 'informationwise' as you are.

Information is only power when only one side has it. Once you realize that tagging someone with your 'power' not only puts you in their sights as a target for revenge with an equal opportunity to strike back with your own information, you become less eager to strike first. (Think MAD- Mutually Assured Destruction).

Granted, there will always be people willing to 'die' to take 'you' out. But are they any better off 'now'? I would say no. Before they could dig up your information and you'd have nothing on them to 'retaliate' with. Now, you both have access to all the information, at least now you can fight back.

Re:Check off privacy (1)

DragonWriter (970822) | more than 5 years ago | (#26005969)

4chan'ers harass anonymously, do you think they would be still as willing to do so if you knew who they were? If EVERYONE knew who they were?

Even with total transparency, everyone wouldn't know who they were, since no one could manage all the available information. People who cared would be able to find out, but if the harassers were part of a big enough group with enough resources that it was ineffective for their targets to retaliate against them, they could still get away with a lot. You'd have to do a lot more control with laws rather than privacy, OTOH, if you really had perfect transparency, enforcing those laws would, presumably, be easy.

Of course, without magic, you won't have perfect transparency; the government will always retain the right to keep secrets, and those with sufficient resources will find ways to avoid or spoof attempts at universal monitoring, which will become a tool for the controls of those without resources by those with.

Re:Check off privacy (1)

Chyeld (713439) | more than 5 years ago | (#26007877)

To repurpose an Abe Lincoln quote:

"You can fool some of the people all of the time, and all of the people some of the time, but you can not fool all of the people all of the time."

Complete transparency only seems impossible because right now only a small group of people are watching everyone. It would be orders of magnitude harder to hide constantly something from everyone than it is to hide something from just a small group.

Re:Check off privacy (1)

DragonWriter (970822) | more than 5 years ago | (#26008467)

Complete transparency only seems impossible because right now only a small group of people are watching everyone.

No, it seems impossible because it is.

We don't have a small group of people watching everyone now, we have no one watching everyone and everyone watching what they are interested in within their limits of access. Even if you assume everyone had access to everything (which is unlikely to be practical), you still won't have everyone watching everyone.

It would be orders of magnitude harder to hide constantly something from everyone than it is to hide something from just a small group.

Only if everyone is interested in and looking for it.

Which for almost most things, even things that people would be interested in if they found out about it, wouldn't even be remotely close to the truth.

Even if everyone had access to everything they chose, no one would have the time or ability to integrate information to monitor more than the vanishingly small fraction they were most interested in. And, anyway, without infinite resources, everyone couldn't monitor everything. Because that would require the ability to distribute an infinite amount of information (since the information that person X is monitoring input Y is part of the information flow that must be subject to monitoring, and, on the assumption that everyone is completely monitoring everyone else, must be transmitted to everyone.)
 

Re:Check off privacy (1)

chadenright (1344231) | more than 5 years ago | (#26004101)

We don't want people to really know us because we have been convinced to hold ourselves to standards that no one actually meets.

You are confusing standards and ideals. There is a difference. Standards of behavior are met all the time. Sometimes you can actually find checklists of 'standard behavior'; things like etiquette guidelines and domain rules. It's also a sliding scale; your standard of behavior with your girlfriend is likely different than your standard of behavior with your boss.

Ideals of behavior are different; while people meet standards of behavior easily, no one really meets their ideals. This is a good thing and there's no shame in it. If we met our ideals the only direction we could go is downhill.

Of course our ideals are unrealistic; that's why they're ideals. If we realize that our ideals are unrealistic...we should still try to meet that ideal anyway.

I don't know about all that. (1)

kwabbles (259554) | more than 5 years ago | (#26004631)

Actually - I just want to seed my pirated movie collection, download midget porn, and read my boss's email and be the only person to know about it.

Re:Check off privacy (0)

Anonymous Coward | more than 5 years ago | (#26004997)

we...we...We...We...we...We...we...We...we ...We...while we can tell ourselves that 'they are just like us', it's hard to grasp that cognatively without actual proof...we...we...we...we...we

What are you, a Borg? Scary. I prefer to think that other people are different, not "just like" myself. Ever heard about first person singular?

Re:Check off privacy (0)

Anonymous Coward | more than 5 years ago | (#26005051)

ummm, who do you think will actauly have the time to search the information? who will have control and have the ability to edit it and omit things and get away with it?

the only people with the time to eat up dirt will be those paid by those in power and eventually it will become a massive black-list society in which you must push the system in order to not be destroyed under its weight.

Re:Check off privacy (1)

Snaller (147050) | more than 5 years ago | (#26005539)

Is what you are smoking a legal substance?

Check off pants. (1)

Ostracus (1354233) | more than 5 years ago | (#26006917)

"We don't want people to really know us because we have been convinced to hold ourselves to standards that no one actually meets."

Yeah...well, I do have a bigger dick than everyone else.

Re:Check off privacy (1)

geekoid (135745) | more than 5 years ago | (#26007077)

I don't know why you have so much shame, but shame has nothing to do with my desire for privacy.

I like privacy to prevent abuse.
People knowing where my kids are, what my alarm codes are, what I research online, what medical information I have, and a million other things.

Society is a set of rules, this creates a common mask or expected behaviors. Ideally these behaviors are set by the same society that uses them. That way they can change as people do. Enforce societal rules, i.e. rules most people don't want's but has to obey, are bad. They cause stagnation.

Re:Check off privacy (1)

Chyeld (713439) | more than 5 years ago | (#26007829)

The first two items you listed, I would qualify as being 'protection', reasonable things to keep under wraps because other people out there have hidden lives you don't know about. And why don't you know about them? ^_^

The second two however, are what I would qualify as 'shame'. Granted you may feel I'm stretching the definition, but I can come up with two reasons why you wouldn't want people knowing about your medical information.

1. You have 'something' and don't want others to know because it might be embarrassing or affect your insurance.
2. You don't live up to your ideal of what you should be and don't want others to know.

And the same goes for what you look up online.

1. You are looking up something 'naughty' (where naughty is defined as something you don't want others to know you are interested in because you don't think 'normal' people would be).
2. You are looking up something 'embarrassing' (where embarrassing is defined as something you don't want others to know about you because you don't think 'normal' people are the same).

My belief is, if you went down your list of hypothetical million items and evaluated why you didn't want others to know about them, they'd fall out into two categories: Items which you don't want others to know for protection because there are people out there who have their own secrets and things you don't want others to know because they may reveal that you aren't a perfect human.

The first category breaks down if privacy does. You would find it hard to be a serial rapist if the first time you struck everyone knew about it. You would find it hard to be a thief if everyone knew when you came home with stolen property.

The second category breaks down when you truly realize no one is a 'perfect' human and in truth the actual 'perfect' human is the fallible one full of foibles. That whole 'mask of society' everyone is forced to play under because no one wants to admit that it's not just the emperor that is naked is just that, another mask. We wear it to protect ourselves but only because everyone else is as well.

Re:Check off privacy (1)

Monkeyboy4 (789832) | more than 5 years ago | (#26009729)

We want privacy primarly due to shame.

We hold ourselves to these standards because everyone else is wearing masks and while we can tell ourselves that 'they are just like us', it's hard to grasp that cognitively without actual proof.

If there were no privacy, no one could wear a mask. If no one were wearing a mask, we would realize that the standards we hold ourselves to are unrealistic. If we realize the standards we hold ourselves to are unrealisitic, we are freed from shame. If we are freed from shame, we no longer find privacy necessary.

Your three points there are dramatically off.

We want privacy because we want to be left alone, we want to only be known by those who know us, we wan to be a partner, not an object.

Other people hold us to these standards, and prosecute us both socially and (increasingly) legally.

If there were no privacy...there will always be privacy. Its just a matter of the equity of privacy. The government has a set up that hides themselves from scrutiny, for both legitimate and illegitimate reasons. They won't change behavior in this hypothetical 'privacy free world' of yours.

no charge? it actually costs money to access it (0)

Anonymous Coward | more than 5 years ago | (#26002661)

From the service description:

"They can then access, modify and perform computation on these volumes directly using their Amazon EC2 instances and just pay for the compute and storage resources that they use. "

It is not expensive, but it ain't free either.

Re:no charge? it actually costs money to access it (1)

morgan_greywolf (835522) | more than 5 years ago | (#26003175)

Most of the data they've listed is available for public download elsewhere, such as the U.S. Census data. The 2000 Census is available from the Census bureau's website [census.gov] .

Privacy? (1)

staryc (852301) | more than 5 years ago | (#26002757)

It is my understanding that this data was already obtainable in the first place. So technically it isn't huge a huge invasion of privacy; it is just becoming more readily/easily available. One of the public data sets provided is from The US Census Bureau, and those were for the public anyway.

Re:Privacy? (4, Insightful)

russotto (537200) | more than 5 years ago | (#26002797)

It is my understanding that this data was already obtainable in the first place.

This is true. But the easier it is to obtain datasets like these, the easier it is for anyone to do data mining and correlate the public (presumably non-identified) datasets with any private data they do happen to have.

Re:Privacy? (1)

getuid() (1305889) | more than 5 years ago | (#26003649)

But the easier it is to obtain datasets like these, the easier it is for anyone to do data mining and correlate the public (presumably non-identified) datasets with any private data they do happen to have.

Yes, but at least now we are all able to do data mining in large databases.

In what privacy is concerned, we're losing it, one way or the other. The only important question here is whether we are all going to lose the privacy (thus being somehow equal), or if there are going to be those which will keep most of their privacy (large corporations, politicians etc), while we (the mere consumer) is going to lose most of ours.

Large public databases like Amazon's at least tend to level the field a little, by provinding everyone with the same (humongous) amounts of data. They tend to drive towards the "we're all losing a bit of privacy", as opposed to the more scary scenario, where only you get to lose yours and I get to keep mine.

Re:Privacy? (2, Insightful)

dubl-u (51156) | more than 5 years ago | (#26004755)

Yes, but at least now we are all able to do data mining in large databases.

This is absolutely the case.

The web has made vast amounts of information available, so you would think it would play into the "computers will bring about the age of big brother" that was so prominent during the 60s. But it hasn't. Instead, because everybody can afford computers and bandwidth, is had distributed power rather than concentrating it.

The rich and powerful already have access to vast datasets, and the computing and human power necessary to mine them. Things like Google and Wikipedia and blogs have given everybody a taste of that power, and I'm in favor of anything that helps level the playing field.

Re:Privacy? (4, Informative)

Frosty Piss (770223) | more than 5 years ago | (#26002909)

The US Census Bureau charges to access much of their datasets.

Re:Privacy? (1)

geekoid (135745) | more than 5 years ago | (#26007101)

I believe the charge cost;which is fine for a government agency.

Re:Privacy? (1)

HiThere (15173) | more than 5 years ago | (#26008861)

Is that still true? It was true in 1960, but it seems to me that their prices have risen and their costs have diminished.

Re:Privacy? (0)

Anonymous Coward | more than 5 years ago | (#26008285)

Which is precisely why I hold a policy of ignoring their requests for data.

Re:Privacy? (1)

JoCat (1291368) | more than 5 years ago | (#26010857)

That seems fine to me.

*badum tisk*

Uhg... (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#26002861)

This discovery is not a reasoned conclusion but an inward disclosure. Nevertheless, our logical minds demand an explanation. We want to grasp our interior problem conceptually. Later we may understand that our recognition of life's meaninglessness does not arise from realizing that the human race will someday end or that everything we do will eventually be lost or that all our little goals and projects ultimately add up to nothing. Such 'explanations of meaninglessness' we offer ourselves allow our minds to accept what our spirits already know. Existential meaninglessness is a futile sense that arises from within. There is no rational way to convince ourselves that life is meaningless. Attack our life-illusions as we may, we will only be convinced if deep within ourselves we already sense our futility and emptiness.

Selling EC2 service? (4, Insightful)

bonyari (697573) | more than 5 years ago | (#26002865)

This just looks like a way to sell there cloud computing services. They provide the free data and you provide the monthly service fee.

Re:Selling EC2 service? (2, Insightful)

dubl-u (51156) | more than 5 years ago | (#26004175)

This just looks like a way to sell there cloud computing services. They provide the free data and you provide the monthly service fee.

I'd bet that's not quite how they think about it.

I once had the fortune to work on a small project for a guy who had built a pretty large software company and then sold it. He said that he always looked to do something interesting first, and then figured out how to make it not lose money, because money-losers aren't sustainable.

I don't know anybody at Amazon anymore, but from my pals who did work there, my guess is that AWS has a similar culture: they seek out the useful and interesting, and actually do the ideas they can make pay for themselves.

If they had a culture that was mainly revenue-focused, I'd expect this idea to get shot down, because some penny-pincher would argue that they'd make more money from people uploading duplicates of these giant data sets over and over.

Re:Selling EC2 service? (2, Interesting)

John Hasler (414242) | more than 5 years ago | (#26005091)

> If they had a culture that was mainly revenue-focused, I'd expect this idea to get shot
> down, because some penny-pincher would argue that they'd make more money from people
> uploading duplicates of these giant data sets over and over.

And a clever marketing man would counter that this is an opportunity to achieve lock-in by establishing exclusive access to a large number of datasets. Once people have built large, complex applications that use a number of these datasets in Amazon's environment and format it will very difficult for them to move elsewhere. To marketing people "community"=="locked-in customers".

Re:Selling EC2 service? (1)

dubl-u (51156) | more than 5 years ago | (#26008887)

And a clever marketing man would counter that this is an opportunity to achieve lock-in by establishing exclusive access to a large number of datasets.

You did note that these are public datasets, yes? And that you access them by mounting them like any other filesystem? Other than transferring a few hundred gig of data to somewhere else, there's no lock-in barriers.

Of course, that pursuit of openness could just mean that they have an extremely clever marketing person. But I've never met one quite that smart. Well, perhaps that's the wrong word. But all the marketing people I've met, even the ones I like a lot, remind me of the seagulls in Finding Nemo. Everything belongs to them. And if it doesn't, it should.

Nope (1)

Slashdot Parent (995749) | more than 5 years ago | (#26008095)

This is just a benefit that they are giving to their customers.

They are storing huge, public, commonly-used datasets for their customers, free of charge. If you are a customer and want to use, say, census data, you don't have to waste your time uploading the data, and you don't have to pay the $0.10/GB to upload the 200GB of data. Amazon already hosts the data for you. You just run a simple command, and the data are now instantly available for you to use however you want.

If you are not an AWS customer, then this service probably will not do you any good. Just download the data from census.gov and be done with it.

Standardization possible (1)

deodiaus2 (980169) | more than 5 years ago | (#26002881)

One good thing is that it will be possible to standardize statistical tests and results against such a common database. One big problem with a lot of statistical analysis is the skewing of data due to insufficient size, vastly different population sample sets, and the presence of colored noise.
Take a simple radix sort algorithm applied to a telephone directory. Radix sort works if the pre-allocation of slots matches the data. One example where it breaks down is if you used a Boston matrix [having large concentrations of Irish names] on San Francisco's population [having clusters of Asian names].
Given the tremendous progress in genomic research, I would be interested in comparing my DNA with Craig Venture's. I guess one drawback might be with what I call the white lab mouse issue. White lab mice's DNA are becoming a laboratory benchmark because they are so well studied and breed to keep providing consistency of data by the money making labs which furnish them. However, white mice are very rare in nature (easily spotted in nature). So, we have an entire industry focused on investigating a marginal population.
However, as my mother makes fun about, we spend more money subsidizing Viagra [probably due to the large white male population of "elected officials"] than we do on Alzheimer medication, so progress is good!

headline correction (0)

Anonymous Coward | more than 5 years ago | (#26002941)

Amazon Launches Public Data Sets To drive sales of their AWS.

Catch 22 (3, Insightful)

Anonymous Coward | more than 5 years ago | (#26002943)

Note that on Amazon's website they say that you can only access the data if you're paying them to crunch numbers on their cloud computers.
That is, you can't just download the data off their sites, which would be the nice thing to do.
As such, this article is nothing more than a slashvertizement.

What's the license? (2, Informative)

SanityInAnarchy (655584) | more than 5 years ago | (#26004389)

You'll recall that Amazon's "cloud computers" (ugh) are by the hour, and are pretty much root access to a VM. Unless there's a specific legal reason you can't, it's always possible to just download the data -- you'd just pay a bit for the time that instance must be up, and for the data transferred.

However, for those of us who already are using EC2, it's nice to not have to download the whole set -- which can be terabytes, for some of these -- and instead be able to simply mount it from wherever it is and work with it right away. Especially when you consider the cost of downloading terabytes worth of data from Amazon's web services, at 17 cents per gigabyte -- reasonable, but still probably more than you wanted to just query the stuff.

I suspect, also, that at least some of these will be made available via a web service of some sort, maybe even free, by some of those people using that service.

Re:What's the license? (1, Informative)

Anonymous Coward | more than 5 years ago | (#26005265)

These data must be public to begin with, or they wouldn't host them:

If you have a public domain or non-proprietary data set that you think is useful and interesting to the AWS community [...] You must have the right to make the data freely available.

(How to share a public data set on AWS [amazon.com] )

So I guess there isn't even a license. Free as in go grab 'em.

Re:Catch 22 (3, Interesting)

dubl-u (51156) | more than 5 years ago | (#26004803)

Note that on Amazon's website they say that you can only access the data if you're paying them to crunch numbers on their cloud computers.
That is, you can't just download the data off their sites, which would be the nice thing to do.

And you know what you can do with a cloud computer, my little rocket scientist? You can set up a frickin' web server. And then you can download anything your precious heart desires.

Patent pending... (4, Funny)

owlnation (858981) | more than 5 years ago | (#26003001)

Expect a new slew of Amazon patents...

"1-Sick" -- Health Data
"1-Mick" -- Irish Census Data
"1-Dick" -- Porn Movies Database
"1-Lick" -- Lesbian Porn Movies Database
"1-Fick" -- German Porn Movie Database
"1-Hick" -- The George W. Bush Presidential Library catalog.
"1-Kick" -- Pharmaceutical Index
"1-Nick" -- Crime Data
"1-Prick" -- Copyright Law Legal database
"1-Trick" -- List of iKea-nu Reeves Movies.
"1-Tick" -- Camping Places Data set.
"1-Brick" -- The Lego Catalog.
"1-Thick" -- Obesity Index.

Re:Patent pending... (1)

syphax (189065) | more than 5 years ago | (#26003911)

Oh, for mods points today. Well done (ethnic slurs aside).

Torrents? (1)

starseeker (141897) | more than 5 years ago | (#26003051)

Can someone make these datasets available to download as aggregate torrents, or are they available only once someone writes an application and gets it working on AWS?

With hard drives getting bigger and bigger, to me it makes sense to have lots and lots of local mirrors of this sort of data.

Re:Torrents? (1)

dubl-u (51156) | more than 5 years ago | (#26004939)

Can someone make these datasets available to download as aggregate torrents[...]?

That someone could be you! Steps to making this happen:

  1. get an AWS account
  2. start up a virtual machine running Linux
  3. install the torrent software of your choice
  4. poke a hole in the firewall
  5. Attach the dataset of your choice as a block device
  6. Mount the block device
  7. make a torrent
  8. upload the torrent to one of the public trackers
  9. download the torrent to your home computer, and get a couple others to do the same
  10. once you've got a few seeds, shut down the virtual server

If you're serious about doing this and need advice on dealing with the Amazon Web Services stuff, feel free to drop me an email at the address in my profile. If it's one of the smaller datasets, like the 200 GB of the US Census, I'll even be one of the seeders.

Re:Torrents? (1)

Slashdot Parent (995749) | more than 5 years ago | (#26007861)

Can someone make these datasets available to download as aggregate torrents, or are they available only once someone writes an application and gets it working on AWS?

Which part of the word "public" did you fail to understand? Of course you can download the datasets and do whatever you want with them. You can download them from Amazon, or you can download them from the original source.

AWS is just hosting them locally, free of charge as a courtesy to their customers. If you want to access/analyze census data as part of your application, the entire 200GB is available to you instantly with just a few command line calls. You don't have to upload or pay to store the data on AWS yourself.

If you are not an AWS customer and don't want to become one, then by all means just download the data from the Census Bureau, yourself. No one is trying to hide public data from you.

Gnomes (1)

Falkkin (97268) | more than 5 years ago | (#26003097)

Am I the only person who read "to upload large (non-confidential) data sets to Amazon -- things like census data, gnomes, etc --"?

Re:Gnomes (1)

truthsearch (249536) | more than 5 years ago | (#26005119)

Yes... yes you are.

You may want to seek help. :)

MP3 store worldwide? (1)

PARENA (413947) | more than 5 years ago | (#26003125)

[offtopic]Great, but how about they start hurrying up with making their MP3 downloads available worldwide? And update their mp3 album downloader for openSUSE (which is for 10.3)? Those 2 things would make me an Amazon customer.

History repeats (2, Funny)

sukotto (122876) | more than 5 years ago | (#26003375)

>users pay only for the compute and storage they
>consume with their own applications

Everything old is new again!
Ah the good old days... when you had to PAY for cycles.... not like the young whippersnappers today with their "desktops" and "laptops" and more cycles than they know what to do with.

Sounds good (0, Offtopic)

horza (87255) | more than 5 years ago | (#26003397)

How many developers here have had to hunt around for a list of countries to populate a select box? Or chained select boxes for country -> county/state -> town? How nice would it be to have a central repository where you can download all (in any mixed selection of languages) in cvs/xml/etc?

Phillip.

Re:Sounds good (0)

Anonymous Coward | more than 5 years ago | (#26005351)

And one DDoS attack on Amazon later and all ship-to or bill-to web forms around the world cease to function temporarily...Also, I'd rather have a minute fraction of my users be disenfranchised when a newly created zip code isn't recognized by my system rather than pay these guys.

Sounds like "Give us data so we can charge you" (4, Insightful)

Morgaine (4316) | more than 5 years ago | (#26003503)

If the uploaded data is not available for download, but is only available to AWS applications running on Amazon's (paid for) compute service, then Amazon deserves nothing but contempt and an "Up yours" for this.

It seems that working for a living is out of fashion at Amazon. They expect people to supply them with resources so that they can charge them and others for their use. It's creative business bullshit, and not even remotely funny.

Amazon, how about you PAY BACK for the privilege of having the datasets uploaded to you by hosting them freely for the Internet community, and only on the back of that you charge for local, higher-speed access by AWS applications? Or would that be too "fair" for an Amazon business practice?

Re:Sounds like "Give us data so we can charge you" (2, Interesting)

cecille (583022) | more than 5 years ago | (#26004019)

Oh, agreed...it's totally a business move for them, wrapped in the veneer of a good deed. On the other hand...if this is implemented correctly, it could be amazing. I say this as a researcher who has spent more time than necessary gathering data sets. Just as a quick (and painful) example...during my Master's degree, I was doing CI research for a hearing aid application. Without boring you with the details, the idea was to create a system to classify the audio background environment so it could be more effectively removed. For this, I needed a large set of ~1-sec clips of background noise with as much variety as possible. I didn't want to use what we normally call a "toy" data set because this was intended to be actually used. So I wanted variety, but I also wanted combo sounds - it's easy to tell a highway from a room of people, but what about a cityscape, with cars AND people AND a bah-zillion other sounds. Anyway, the result was that I spent MONTHS in a sound booth splitting audio files and listening to EACH 1-sec clip individually and recording exactly what sounds were in the clip and then parsing audio features. It SUCKED.

Anyway, now that it's done, putting something like this on Amazon would be great (if I had the rights to the original clips). Not only would it save someone else the work, but researchers would be using a real, tough data set. Plus, it might get corrections (no way I didn't make at least a few mistakes in all those clips), and it might get added to (there are so many different sounds in this world, no way is this data set complete). Alternately, if I was a researcher now and I got my hands on this, it would save months of work, months of pay to an RA, a semester's tuition, even I did have to pay for cycles.

On the other hand, I think there are a few places that do this, possibly for free. I want to say...Wolfram maybe? Plus, there's specialty ones. I think there's a big facial recognition set etc.

Re:Sounds like "Give us data so we can charge you" (1)

Bender0x7D1 (536254) | more than 5 years ago | (#26005533)

I'm not trying to troll, and I assume I'm missing something, but why not put a microphone on and walk around collecting background noise?

If there isn't a technical reason this wouldn't work, you could keep track of what you are hearing and it would be much easier to categorize. Being a linear recording from a single source, you could tag 2:43-3:12 as "bus, traffic, swearing cab driver" without having to do it for 29 separate clips.

Re:Sounds like "Give us data so we can charge you" (1)

cecille (583022) | more than 5 years ago | (#26005919)

Mostly for transient sounds. Ex, if there's a bird chip in 1 clip, it might not be in others. Ditto for stuff like clinking dinnerwear, footsteps, person laughing etc. etc. The base sounds are constant, it's the stuff on top that's a pain.

Re:Sounds like "Give us data so we can charge you" (4, Informative)

dubl-u (51156) | more than 5 years ago | (#26004555)

If the uploaded data is not available for download, but is only available to AWS applications running on Amazon's (paid for) compute service, then Amazon deserves nothing but contempt and an "Up yours" for this.

Seriously? Or did somebody just put sand in your pancakes this morning?

As an AWS user, I think this is great. It means I don't have to waste time and money copying over a public dataset. When I read about this I fired up a virtual Linux box, attached the census data as /dev/sdb, and spent a couple hours rummaging. Total cost: $0.70. If I had had to copy everything over first, it would have been $20 in bandwidth, plus a long time waiting for the 200 GB to transfer.

You realize that these datasets are public, right? For the census one, you can already download it for free [census.gov] . Do you want Amazon to make it extra-super-free or something?

I presume it's the same for the others. But if not, you should put your money where your very active mouth is. It would take maybe 15 minutes work to get an Amazon server up and running, attach all the public datasets, and set up a web server.

I'm so very tired of people who say "somebody should do X!" but aren't willing to be that somebody.

Re:Sounds like "Give us data so we can charge you" (1)

geekoid (135745) | more than 5 years ago | (#26007149)

A) Home Bandwidth is a sunk cost. Transferring it wouldn't ahve cost you more then a penny more then you are paying. Assuming you pay a flat rate.

B) Transferring the data would made it available to you for free, anytime.

Re:Sounds like "Give us data so we can charge you" (2, Insightful)

Slashdot Parent (995749) | more than 5 years ago | (#26008039)

A) Home Bandwidth is a sunk cost. Transferring it wouldn't ahve cost you more then a penny more then you are paying. Assuming you pay a flat rate.

My time is not a sunk cost.

B) Transferring the data would made it available to you for free, anytime.

Most of these datasets are hundreds of GB in size. That's going to take a long time to download and it's going to mean buying a new hard disk and/or deleting your pornography collection.

The whole idea here is that if you are an AWS customer, and you're crunching a bunch of numbers, and need to crunch some census/genome/whatever data, you can type 'ec2-create-volume --snapshot <snapshotId>' and now that dataset can be attached to any EC2 instance. You don't have to wait to transfer the data in, and you don't have to pay the $0.10/GB to transfer the data in. The data sets are there for you when you need them.

If you are not an AWS customer, then this isn't for you. Move along, now.

Re:Sounds like "Give us data so we can charge you" (0)

Anonymous Coward | more than 5 years ago | (#26011941)

As an AWS user, I think this is great.

Of course you do, but just because it's great for you, this shouldn't stop you from seeing the principles involved.

This is public data, and because AWS users like yourself will benefit greatly from its nearness, this means that Amazon will profit by gaining more AWS users. Now think about it a bit: they've done nothing (storing data qualifies as nothing these days), yet are profiting nicely from it.

That's where the unfairness comes in. When benefiting from a public resource, the least they could do is to help the originators of the data and the potential public users of the data by offering a public mirror.

It's all take and no give, which was the point made in the parent.

Anyone asking about the data quality / validity? (1)

ITShaman (120297) | more than 5 years ago | (#26003633)

So Amazon says: We'll host the raw data for your study! I say: who vouches for the validity of the data set itself? I understand that some of the sets are already publicly available, but that doesn't mean all. Will Amazon provide information on who/where the datasets came from? If I can't trust my data set, then I can't trust my results...

Re:Anyone asking about the data quality / validity (1)

dubl-u (51156) | more than 5 years ago | (#26004605)

I say: who vouches for the validity of the data set itself? I understand that some of the sets are already publicly available, but that doesn't mean all.

Did you even look at their public data sets page [amazon.com] ? Every one of the public data sets they've listed has source information. They are already, at this very moment, providing information on who/where they came from.

The one I looked at also had extensive sets of README files explaining the source and format of the data, and I'd imagine it's true for the others as well.

"No charge"? (1)

John Hasler (414242) | more than 5 years ago | (#26004999)

How is there "no charge to the community" when the data is accessible only to paying Amazon customers? I have no objection to them doing this, but the hype is a bit much. I guess the only "community" that matters to Amazon is the one consisting of Amazon customers.

AOL (1, Funny)

Anonymous Coward | more than 5 years ago | (#26005173)

I wonder if AOL will be submitting any "non-confidential" data...

All the genomic stuff is already free to download (1, Informative)

Anonymous Coward | more than 5 years ago | (#26005467)

plus much much more at:
at
http://genome.ucsc.edu/

http://www.ensembl.org/index.html

this is just a way to access it from amazon compute cloud.

Haw to analyze display survey data (1)

totierne (56891) | more than 5 years ago | (#26013495)

I have been promised some simple survey data (bipolar survey) - what is the best way to analyze and display - I am into java/database/oracle and would like to use business intelligence techniques to test growing my career that way.

I realize there are lots of ways to do this, most of which would increase my skillset.
         

Best way to analyze these public data sets? (1)

totierne (56891) | more than 5 years ago | (#26013507)

OK so you have the public data from amazon - how do you analyze it? (I realize the problem here is the amount of options available rather than being constrained down one path)

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...