Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

US Census Bureau Offers Public API For Data Apps

samzenpus posted more than 2 years ago | from the people-in-your-neighborhood dept.

United States 47

Nerval's Lobster writes "For any software developers with an urge to play around with demographic or socio-economic data: the U.S. Census Bureau has launched an API for Web and mobile apps that can slice that statistical information in all sorts of nifty ways. The API draws data from two sets: the 2010 Census (statistics include population, age, sex, and race) and the 2006-2010 American Community Survey (offers information on education, income, occupation, commuting, and more). In theory, developers could use those datasets to analyze housing prices for a particular neighborhood, or gain insights into a city's employment cycles. The APIs include no information that could identify an individual."

Sorry! There are no comments related to the filter you selected.

Finally proof! (0)

Anonymous Coward | more than 2 years ago | (#40820437)

That the government is watching and monitoring us. Now we shall watch them watching us!

Re:Finally proof! (1, Interesting)

krovisser (1056294) | more than 2 years ago | (#40820457)

But they promise to never misuse [wikipedia.org] that data, remember!

political power advantage (2)

fluffythedestroyer (2586259) | more than 2 years ago | (#40820477)

the gov can use this data for themselves in the campaign. With demograph info you can finally manage your campain more effectively.

Re:political power advantage (-1)

Anonymous Coward | more than 2 years ago | (#40820541)

yes but terrorists can use this type of information to locate populations centers and high value targets

Re:political power advantage (1)

boneglorious (718907) | more than 2 years ago | (#40820685)

Yes, or they could use one many widely-available population density maps. This is going to make lots of cool things easier, and I really doubt its going to make terrorist plotting much easier than it already is.

Re:political power advantage (5, Insightful)

fiannaFailMan (702447) | more than 2 years ago | (#40820687)

yes but terrorists can use this type of information to locate populations centers and high value targets

You know what? Let's camouflage our cities in case the Jihadists find out where people live.

FFS.

Either there's a joke that I'm not getting or some people are sad sad individuals who see terrorists under every bed. It's like when that tall viaduct was built in France people here were posting "b ... b ... but won't terrorists want to blow it up?" It was the same every time something big or tall was built. You know what I say? Get. The. Fuck. Over. Yourselves.

You'd think terrorism was invented in 2001 to listen to some people. What do you think the rest of the world has been putting up with for decades? The Brits were having their town and city centres blown to pieces long before most USAians even heard of terrorism. But you know what? They didn't let it govern their lives. They carried on shopping in their town and city centres. They carried on building tall buildings out of glass. They carried on riding on trains at 110MPH. They carried on cramming into crowded buses and underground train systems. They didn't become a crowd of pathetic little scaredycats who lock themselves in the room and didn't move for fear that the terrorists would get them. Sure they took a few precautions (like stopping the left luggage service in train stations and removing trash cans from airports) but they didn't become paranoid neurotic wrecks.

Re:political power advantage (1)

sed quid in infernos (1167989) | more than 2 years ago | (#40820841)

They carried on shopping in their town and city centres. They carried on building tall buildings out of glass. They carried on riding on trains at 110MPH. They carried on cramming into crowded buses and underground train systems.

Let's see, do Americans shop in town and city centers? Yes. Do they build tall buildings out of glass? Yes. Do they ride on trains at 110 MPH? Yes, although only between Boston and D.C., because trains here generally suck. Do they cram into crowded buses and underground train systems? Yes.

Re:political power advantage (2)

vlm (69642) | more than 2 years ago | (#40821111)

OK how about commentary on this accurate part:

become paranoid neurotic wrecks.

Re:political power advantage (0)

Anonymous Coward | more than 2 years ago | (#40821377)

Outside of a few racists, which I understand Europe has its share of as well (which country is it that banned minarets again?), how is that accurate?

Re:political power advantage (0)

Anonymous Coward | more than 2 years ago | (#40821469)

which country is it that banned minarets again?

The one banning yelling from a tower with a half moon also banned ringing bells on towers with a cross on it.
It's just public nuisance.
If their members are too dumb to read a fucking clock, send them SMSes but let us sleep.

Re:political power advantage (1)

sed quid in infernos (1167989) | about 2 years ago | (#40827773)

Since the parts I called out as erroneous were the support for this phony conclusion, I did.

Re:political power advantage (1)

fiannaFailMan (702447) | more than 2 years ago | (#40821663)

Yeah, armed to the teeth. Where else in the world are people so paranoid that they feel the need to carry guns everywhere?

Re:political power advantage (1)

Anonymous Coward | more than 2 years ago | (#40822765)

Yeah, armed to the teeth. Where else in the world are people so paranoid that they feel the need to carry guns everywhere?

Switzerland? Ok, maybe they don't carry them around everywhere. But most Americans would consider giving every young man a machine gun and a pack of ammo on their 18th birthday to be a little bit crazy. Because that is what Switzerland does, just in case France decides to invade.

Re:political power advantage (1)

fiannaFailMan (702447) | more than 2 years ago | (#40823479)

BS. Only in America can you walk into a Wal Mart and buy a machine gun.

Re:political power advantage (1)

sed quid in infernos (1167989) | about 2 years ago | (#40827739)

Nowhere in America can you walk into Wal-Mart and buy a machine gun.

Re:political power advantage (2)

gknoy (899301) | more than 2 years ago | (#40822193)

Any terrorist that has to use the US Census Bureau to find a population center, as opposed to buying a map from AAA or using Wikipedia, is an idiot.

Re:political power advantage (1)

fiannaFailMan (702447) | more than 2 years ago | (#40820563)

the gov can use this data for themselves in the campaign.

What campaign?

Re:political power advantage (1)

fluffythedestroyer (2586259) | more than 2 years ago | (#40820613)

unless you live under a rock, the next one thats coming. What other campaign...

Re:political power advantage (1)

fiannaFailMan (702447) | more than 2 years ago | (#40820697)

Oh you mean the administration. When I hear "the government" in the USA it generally refers to institutions.

Re:political power advantage (-1)

Anonymous Coward | more than 2 years ago | (#40820843)

I think he is referring to the inevitable, Leftist campaign where the police and soldiers will go door-to-door collecting guns from all the Christians so that our Muslim, Kenyan, NAZI, President can introduce the Sharia Law that the Atheists have been begging for.

I know its true, Glen Beck says so,

Re:political power advantage (1)

fluffythedestroyer (2586259) | more than 2 years ago | (#40821087)

The RFA will stop that for sure lol

Re:political power advantage (1)

fluffythedestroyer (2586259) | more than 2 years ago | (#40821081)

that too. any group of more than 2 people that needs data to get more power or any type of advantage. I assure you this api will get raped by the gov, institution, org...any group in fact :)

Re:political power advantage (1)

Anonymous Coward | more than 2 years ago | (#40820651)

You do know that 'the gov' does not run the political campaigns, right? And that this information has been available for over 200 years (but not with a fancy API, so that makes all the difference)?

Riiiiight (-1)

killmenow (184444) | more than 2 years ago | (#40820731)

The APIs include no information that could identify an individual.

They keep using that word. I do not think it means what they think it means.

Re:Riiiiight (0)

Anonymous Coward | more than 2 years ago | (#40820787)

That's just what I was thinking. Whether or not the data contains any personal information is the real question because if there is any then someone will figure out how to get at it.

Re:Riiiiight (0)

Anonymous Coward | more than 2 years ago | (#40820985)

OK, if you say so. BTW - what is the exploit for getting all the personal information out of the IRS database? Or the SSA? Or banks? Or insurance companies? Or credit card companies?

Believe it or not, many databases containing sensitive information are designed and run by professionals, using professional software and good security practices. Not all databases are run on MySQL administered by some high-school kid.

Re:Riiiiight (0)

Anonymous Coward | more than 2 years ago | (#40821603)

BTW - what is the exploit for getting all the personal information out of the IRS database? Or the SSA? Or banks? Or insurance companies? Or credit card companies?

None of those have public facing API's, smartass.

Re:Riiiiight (0)

Anonymous Coward | more than 2 years ago | (#40821705)

Really? I could have sworn I could file my taxes and see my tax refund status online. And I am pretty sure I can see all my banking activity online (and download it to financial software). Same with credit cards. And those services are every bit as much of an API as the http requests that the census API uses are.

Even when it doesn't it does. (1)

Ungrounded Lightning (62228) | more than 2 years ago | (#40825201)

Whether or not the data contains any personal information is the real question because if there is any then someone will figure out how to get at it.

Even when personal information "has been stripped" it can be rediscovered in various ways.

For instance: At the start of WWII, US authorities used census data to round up people of Japanese descent. They didn't have the individuals' names. But they had the number on each block. So they just raided until they had accumulated that number.

Re:Riiiiight (2)

vlm (69642) | more than 2 years ago | (#40821079)

For those who don't get it,

(statistics include population, age, sex, and race)

(offers information on education, income, occupation, commuting, and more).

Treat is as a multidimensional data source. So you figure out who someone is using perhaps 6 factors, then you've got the unknown data for the other 1315 data points.

I almost got in quite a bit of trouble at a previous employer by pointing out a public distributed incredibly detailed analysis of an "anonymous" corporate employee attitude survey mean it was completely 100% non anonymous. So... 100% of 25 year old engineers who are white single males who drive a red car and have an Irish girlfriend and live in an apartment and commute to work between 4 and 8 miles and have a five digit /. UID responded that their boss was a 5/10 at leadership, or whatever. Sure... that's perfectly anonymous.

It wasn't quite that ridiculous but pretty darn close. As I recall they "de-anonymized" it by providing 5 year age brackets and 1 year (yikes) hiring date brackets, and job titles. It was enough to quite sufficient to identify the exact responses of each person. The funny part was once the word got out employees would read the responses of other people... oh so Rachel in purchasing said that her boss was a complete... You get the idea.

Frankly I was more insulted that they thought we were stupid enough not to understand they were lying despite giving us complete evidence, than I was insulted that they lied to us by calling it anonymous. They had no shortage of suckiness.

They were even stupid enough to pretend it was anonymous and run it year after year, at least until I left. Needless to say everyone lied like a carpet after the first debacle.

Re:Riiiiight (2)

FranTaylor (164577) | more than 2 years ago | (#40821117)

all of this data has always been available to those who ask for it

they have just made it easier for people to get at it

what is your complaint again?

Re:Riiiiight (1)

vlm (69642) | more than 2 years ago | (#40821373)

That would be the "they have just made it easier for people to get at it" part

Making it easier to mush databases together to gather inappropriate levels of personal data.

Re:Riiiiight (1)

PRMan (959735) | more than 2 years ago | (#40821151)

I always lied on company-given anonymous surveys. What are they going to do? Call you on it?

That's not how it works (4, Informative)

langelgjm (860756) | more than 2 years ago | (#40821491)

That's not how Census information is either collected or stored. First off, there are two different data sources at issue - the decennial census, which gathers a very limited set of information on (theoretically) every person in the country, and the American Community Survey, which uses sampling to get estimates on a much wider range of information. You cannot link those two datasets, since the only public factors they share are far too broad - e.g., age, race, sex, etc., and the time periods during which they are conducted are totally different.

Besides, the information is not released at person-level. The lowest level you can get sampled information at (e.g., the detailed ACS stuff) is the "block group", which on average contains 39 blocks. You can get decennial census information at the block level, and a "block" may correspond to a city block, or a much larger area for lesser-populated areas.

So, you can find some interesting information about your city street (I've looked up my own, and found the number of people living alone, owning/renting, age, sex, etc. for the 24 houses on my block), but these data are not per person, they are per block - in other words, if there is only one Native American living on my street, I cannot then find out whether they are owning/renting. I can only find out the number of renters on the entire block.

Re:That's not how it works (2)

killmenow (184444) | more than 2 years ago | (#40821905)

So? (1)

langelgjm (860756) | more than 2 years ago | (#40822291)

That paper points out that three factors - DOB, place, and gender - are often enough to uniquely identify a person. How is that relevant to Census Summary File information, or ACS information?

I think the point that many people misunderstand is that Census/ACS public information is not a database where each row represents one response, and some data items have been withheld. It's not at all like that - it's aggregate totals for geographic areas of varying sizes. That row-by-row information is not made public until 72 years after the census (remember the news about info from the 1940 census being made public?).

The real issue that paper highlights is the state-level legislative mandates regarding information collection. I'm not denying you could link a voter registration list to state-collected health data, like the author does in that paper, but that fact has nothing to do with the data made accessible by this API (not to mention these data were already online, just in a more complicated format).

Re:That's not how it works (1)

iiii (541004) | more than 2 years ago | (#40822527)

That's right, and even at the block level data may be swapped around between block or obfuscated in other ways that protect individuals while still keeping the data accurate at an aggregate level. I know it is easy to be concerned about this when looking at it for the first time, but Census has been seriously working for years on how to protect confidentiality while releasing quality data at as low a level as possible.

The Census site has a little info about this: http://www.census.gov/privacy/data_protection/statistical_safeguards.html [census.gov]

But more relevant is this link to the American Statistical Association, which goes into significant depth on the techniques used to protect confidentiality: http://www.amstat.org/committees/pc/index.html [amstat.org]

On this page http://www.fcsm.gov/working-papers/spwp22.html [fcsm.gov] we find a working paper from the Federal Committee on Statistical Methodology, which has deeper details on actual operations.

From that page, the "Statistical Disclosure Limitation: A Primer" document has an interesting section defining inferential disclosure - "occurs when individual information can be inferred with high confidence from statistical properties of the released data."

And the "Current Federal Statistical Agency Practices" describes the multi-dimensional linear programming used to prevent that, along with other techniques including geographic thresholds, population thresholds and coarsening.

So the summary is: Yes, it is a serious issue to be concerned about, but Census is taking it seriously, applying some real science and math to it, and it looks like they are doing a good job.

Re:That's not how it works (1)

vlm (69642) | about 2 years ago | (#40827039)

OK so am I correct in stating the TLDR version is Census dept doesn't release data containing logical AND statements? Or in SQL we'll let you have exactly one "GROUP BY" clause, sorta?

So in an example above, the census will release the number of native americans on a block and a separate table of number of renters, but would never respond to a query of "NAs AND renters" because that might get a response table of single digit rows etc. That Seems reasonable.

Its a PR campaign disaster to describe that as "a" data source and list dozens of columns when it turns out to actually be dozens of data sources each with "a" column.

It doesn't help the PR thing in that few people use these datasets while probably 100 times as many people do genealogy and are VERY familiar with the full data dump after 70 years or whatever (most recently, the complete 1940 dataset has just been released for my state... many hours recently burned on ancestry.com).

I feel the pain of those census dept guys because journalists and PR people also completely F up everything I'm into that requires more than a 3rd grade education... I should have guessed it was something like this...

My employer broke the "no ANDs" rule because every dataset they released from our morale survey was "department" AND "title" AND "hire year" AND "approx age" which makes it trivial to identify individuals. But if these census guys don't do "ANDs" then its not an issue.

Re:That's not how it works (1)

iiii (541004) | about 2 years ago | (#40834103)

Actually you *can* do that kind of multi-dimensional filtering, equivalent to multiple AND statements followed by a GROUP BY. There are different data sets here, with different usage models. Perhaps most interesting is the Public Use Microdata Sample (PUMS). Docs here: http://www.census.gov/acs/www/data_documentation/public_use_microdata_sample/ [census.gov]

PUMS contains records representing individual responses to the American Community Survey (ACS). These individual responses include detailed data including housing data (# rooms, heating fuel, property value, mortgage, age of house, etc) and personal data (family income, vehicles, employments status, # children, language spoken, etc). Now, ACS is a sample, not a full enumeration like the decennial census, but the sampling is done carefully in an attempt to be representative. Full record definition here: http://www.census.gov/acs/www/Downloads/data_documentation/pums/DataDict/PUMS_Data_Dictionary_2006-2010.pdf [census.gov]

Back to the confidentiality question: this detailed data is carefully altered to protect individual privacy while still being correct at an aggregate level. Here's what the site says about this protection:

"As required by federal law, the confidentiality of ACS respondents is protected through a variety of steps to disguise or suppress original data while making sure the results are still useful. The first means of protecting is the suppression of all personal identification, such as name and address, from each record. In addition, a small number of records are switched with similar records from a neighboring area or receive another collection of characteristics developed by using a modeling technique. Age perturbation is one example of procedures that disguise original data by randomly adjusting the reported ages for a subset of individuals. The answers to open-ended questions, where an extreme value might identify an individual, are top-coded. Top coded questions include age, income, and housing unit value. In addition to modifying the individual records, respondents' confidentiality is protected because only large geographic areas are identified in the PUMS."

Re:Riiiiight (0)

Anonymous Coward | more than 2 years ago | (#40824217)

The census bureau has requirements for the sizes and regularity of groups to avoid inference. With special tabulations, they do pre- and post-processing that manipulates the input and output to scrub inference vectors. They know what they are doing.

Re:Riiiiight (2)

FranTaylor (164577) | more than 2 years ago | (#40821103)

census data has been public all along

before now you had to go to washington and look it up yourself

now it's easier to get at

Re:Riiiiight (2)

uniquename72 (1169497) | more than 2 years ago | (#40821255)

census data has been public all along

before now you had to go to washington and look it up yourself

now it's easier to get at

No, it's been online for years. There just hasn't been a good, uniform way to query it and write apps against it.

Re:Riiiiight (1)

story645 (1278106) | more than 2 years ago | (#40821799)

Or load up the shapefiles posted at census.gov, seeing as census data has been available online for at least a few years now. As the summary said, this made it easier, but in the past couple of years there has also been an explosion in free and open source GIS tools that translate the raw data into something more readable.

Stripped-Down Data (1)

N8F8 (4562) | more than 2 years ago | (#40822039)

I recently pulled the census data and it's pretty much useless since any information you could use to look at results by city or region have been stripped out in the version available to the general public. Sucks.

but, but ... job creators! (0)

Anonymous Coward | more than 2 years ago | (#40822227)

I recently pulled the census data and it's pretty much useless since any information you could use to look at results by city or region have been stripped out in the version available to the general public. Sucks.

What else did you expect when you privatize data collected using public funds> http://corporate.ancestry.com/press/press-releases/2006/06/ancestry.com-digitizes-entire-u.s.-federal-census-collection-from-1790-1930/ [ancestry.com] .

See also http://www.archives.gov/digitization/digitized-by-partners.html [archives.gov]

Not really... (2)

langelgjm (860756) | more than 2 years ago | (#40822373)

I don't know what specifically you tried to do, but there is a lot of data available down to the block group and block level, which are relatively small geographic units. There's even more data available by "place", which would include any major city and many smaller cities and towns. Some of the tax data is redacted for confidentiality (e.g., when there is only one employer of a certain type in a geographic area, they won't release payroll information for it), but that's pretty unusual in larger areas.

You may have been using one of the user-friendly tools, which can be limited in their reach. American FactFinder has more depth than most, but it's also kind of a PITA. If you're serious about digging into the data, you can download zipped text files [census.gov] that represent the full extent of the public information available, which you can then load into your favorite processing program.

A check on redistricting? (1)

learn1teach1 (2695297) | more than 2 years ago | (#40822267)

Granted that this information has already been available on request, but maybe these new tools will make it easier for watchdog groups to crunch the census numbers themselves and act as a check on gerrymandering politicians, exposing redistricting plans that don't seem to square with the census data.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?