Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Problem of Search Engines and "Sekrit" Data

Hemos posted more than 12 years ago | from the how-to-choose-data dept.

Security 411

Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.

cancel ×

411 comments

Sorry! There are no comments related to the filter you selected.

FP (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613781)

Beans and cornbread are good!

Re:FP (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613787)

aNd FeRvEnT Is mY BiTcH!~!@

Re:FP (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613968)

aNd FeRvEnT Is mY BiTcH!

First logged in Toast (-1)

Fecal Troll Matter (445929) | more than 12 years ago | (#2613784)

Butter my anus with sekrit margarine.

Re:First logged in Toast (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613960)


Comdr Taco has a sister - Tuna Taco.

teehee (0, Offtopic)

Clay Mitchell (43630) | more than 12 years ago | (#2613785)

/me goes to search for "credit card"

/me buys an x-box with stuff he found by reading slashdot

the gods of irony salute!

YES! YES! (0, Offtopic)

A_Non_Moose (413034) | more than 12 years ago | (#2613789)

Just in time for Christmas Shopping!!!!

All the toys and none of the debt!

Just gotta remember to buy a P.O. box first, not give my home address like the last ti....uhhhh...never mind.

A symptom of poor programming... (4, Insightful)

Bonker (243350) | more than 12 years ago | (#2613791)

I don't see what's so hard about this problem. It's very simple... don't keep data of any kind on the web server. That's what firewalled, password/encryption protected DB servers are for.

Re:A symptom of poor programming... (1)

hiroko (110942) | more than 12 years ago | (#2613860)

don't keep data of any kind on the web server
Erm... what about my html data? ;)

Hell, No. (1, Funny)

tomblackwell (6196) | more than 12 years ago | (#2613887)

You should be writing that type of data on the backs of envelopes and leaving them scattered around your living room...

Re:A symptom of poor programming... (5, Interesting)

ChazeFroy (51595) | more than 12 years ago | (#2613875)

Try the following searches on google (include the quotes) and you'll be amazed at what's out there:

"Index of /admin"
"Index of /password"
"Index of /mail"
"Index of /" +passwd
"Index of /" password.txt

Re:A symptom of poor programming... (1)

Bonker (243350) | more than 12 years ago | (#2613941)

From a site indexed by google:

PASSWORD PROTECTION is one way to guard your stack against unauthorized access.

Unlike locking your stack which prevents others from making changes, this surprisingly simple script won't allow anyone to view your stack without the password. For you Ursula K. Le Guin fans, the password for this stack is "Antwerp".

Re:A symptom of poor programming... (0)

Anonymous Coward | more than 12 years ago | (#2613963)

http://congress.nw.dc.us/lwv/custom/password.txt

Hmmm. Interesting.

Re:A symptom of poor programming... (4, Funny)

Brainless (18015) | more than 12 years ago | (#2613929)

I manage a Cold Fusion web server that we allow clients to post their own websites to. Recently, their programmer accidentally made a link to the admin section. Google found that link and proceeded into the admin secion and indexed all the "delete item" links as well. I found it quite amusing when they asked to see a copy of the logs complaining the website was hacked and I discovered GoogleBot deleted every single database entry for them.

Re:A symptom of poor programming... (1, Insightful)

Anonymous Coward | more than 12 years ago | (#2613953)

What ignorance to security. Security is a problem that cannot be solved with technology alone. If you think encryption and/or firewalls will prevent this sort of issue, you totally misunderstand the purpose/capabilities of these tools. In this case, privacy is better protected through people (education) and process (security policy). If I write bad code that exposes credit card numbers (irregardless of whether I store data on the web server, use encryption, and use firewalls), the numbers will still be disclosed.

Re:A symptom of poor programming... (2, Informative)

ChazeFroy (51595) | more than 12 years ago | (#2613993)

Something I forgot to mention in my other post:

The October 2001 issue of IEEE Computer has some articles on security, and the first article in the issue is titled "Search Engines as Security Threat" by Hernandez, Sierra, Ribagorda, Ramos.

Here's a link [computer.org] to it.

But ... (0)

Anonymous Coward | more than 12 years ago | (#2613795)

... information wants to be free. Right?

How can this happen? (4, Redundant)

Nonesuch (90847) | more than 12 years ago | (#2613798)

To the best of my knowledge, search engines all work by indexing the web, starting with the base of web sites or submitted URLs, and following the links on each page.

Given this premise, the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.

That is to say, the web-indexing bots used by search engines cannot find anything that an ordinary, very patient human could not find by randomly following links.

How this happens (5, Informative)

Tom7 (102298) | more than 12 years ago | (#2613885)

People often wonder how their "secret" sites get into web indices. Here's a scenario that's not too obvious but is quite common:

Suppose I have a secret page, like:
http://mysite.com/cgi-bin/secret?password=admini st rator

Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).

Now suppose elsewhere.com runs analog on their web logs, and posts them in a publically-accessible location. Suppose elsewhere.com's analog setup also reports the contents of the "referer" header.

Now suppose the web logs are indexed (because of this same problem, or because the logs are just linked to from their web page somewhere). Google has the link to your secret information, even though you never explicitly linked to it anywhere.

One solution is to use proper HTTP access control (as crappy as it is), or to use POST instead of GET to supply credentials (POST doesn't transfer into a URL that might be passed as a referrer). You could also use robots.txt to deny indexing of your secret stuff, though others could still find it through web logs.

Of course, I don't think credit card info should *ever* be accessible via HTTP, even if it is password protected!

Re:How this happens (2, Informative)

Garfunkel (3569) | more than 12 years ago | (#2613940)

ah yes, analog's reports (and other web stat programs) are a big culprit as well. Even on local sites. If I have a /sekrit/ site that isn't linked to from anywhere on my site, but I have a bookmark that I visit often. That shows up in web logs still and usually gets indexed by a web log analyzer which can "handily" create links to all those pages when it generates the report.

Re:How can this happen? (1)

Garfunkel (3569) | more than 12 years ago | (#2613891)

Index pages. Index pages often have the ../ parent link and that can get you to some places people tend not to think of as being accessible. IMHO, it's their own fault for putting that stuff somewhere even remotely close to being accessible. My guess is that many of them are run off of Micorosoft Personal Webservers or something that they may not even be sure they are running.

Re:How can this happen? (0)

Anonymous Coward | more than 12 years ago | (#2613924)

Pretty easy. Some websites (and especially hit counters) post referrers lists. These lists contain the pages visitors viewed before they came to the tracking site. They obviously might contain urls like login:password@some.url and if a search engine follows the links in the referrer list it will find secret information.

hdmx

Oh Yeah? (4, Funny)

Knunov (158076) | more than 12 years ago | (#2613801)

"...search engines are finding password and credit card numbers while doing its indexing."

This is very serious. Could you please post the exact search engines are query strings so I can make sure my information isn't there?

Knunov

Re:Oh Yeah? (5, Funny)

Karma 50 (538274) | more than 12 years ago | (#2613820)

Just search for your credit card number.

By the way, does google have that realtime display of what people are searching for?

Re:Oh Yeah? (1)

NTSwerver (92128) | more than 12 years ago | (#2613926)

Could searching for your own credit card number also be risky - ie: could it be intercepted?

Re:Oh Yeah? (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613982)

Thanks braniac. I wonder why he asked about the realtime display.

Re:Oh Yeah? (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613956)

Flamebait? You mods have your heads in your asses again. That's funny.

Re:Oh Yeah? (0)

Anonymous Coward | more than 12 years ago | (#2613883)

Could you please post the exact search engines are query strings so I can make sure my information isn't there?

hehe, yeah, we believe you ;-)

Re:Oh Yeah? (2, Funny)

4of12 (97621) | more than 12 years ago | (#2613906)

Yeah!

I just typed in my credit card number and found 15 hits on web sites involving videos of hot young goats.

Tangential Google Question (5, Interesting)

banuaba (308937) | more than 12 years ago | (#2613803)

How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?
If I want to find lyrics to a song, the site that has them will often be down, but the cache will still have them in there.. Why is what google is doing 'okay' but what the origional site not okay? Or do they just leave google alone?

Re:Tangential Google Question (1)

SamBeckett (96685) | more than 12 years ago | (#2613816)

Intent

Re:Tangential Google Question (3, Interesting)

CaseyB (1105) | more than 12 years ago | (#2613881)

Good question.

Given that they do have (for now) some sort of immunity, it opens a loophole for publishing illegal data. Simply set up your site with all of Metallica's lyrics / guitar scores (all 5 of them, heh). Submit it for indexing to Google, but don't otherwise attract attention to the site. When you see the spider hit, take it offline. Now the data is available to anyone who searches for it on Google, but you're not liable for anything. The process could be repeated to update the cache.

Re:Tangential Google Question (2)

passion (84900) | more than 12 years ago | (#2613882)

I doubt most prosecuting teams are savvy enough to think about google's cache.

how the FUCK is this possible? (2, Insightful)

posmon (516207) | more than 12 years ago | (#2613805)

just because google is only picking them up now doesn't mean that they haven't been there for years!

how can someone be so blatantly stupid as to store anything other than their web content, never mind credit card details, in their published folders? how? they redirected my documents to c:\inetpub\wwwroot\%username%\...???

Re:how the FUCK is this possible? (2, Insightful)

Karma 50 (538274) | more than 12 years ago | (#2613856)

Google has just added the ability to index PDFs, word docs etc. So, yes, the information was there before, but now it is much easier to find.

Re:how the FUCK is this possible? (2, Insightful)

Neon Spiral Injector (21234) | more than 12 years ago | (#2613863)

In published folders? How about on machines that are on the Internet at all.

In an ideal setup the machine storing credit card information wouldn't have a network card, or speak any networking protocal. You'd have a front end secure webserver. That machine would would pass the credit card information to the backend across a serial link. The backend machine would process the card and return the status. The CC data would only be a one way transfer, with no way of retrieving it back off of that machine.

Nothing to do (1)

jeriqo (530691) | more than 12 years ago | (#2613807)

Google does nothing more than a regular Web user. It simply follows links, and indexes the content in its database.

What's wrong with this?

Nothing. Human stupidity.

Re:Nothing to do (2)

Jburkholder (28127) | more than 12 years ago | (#2613951)

Far as I can tell from checking out the article and then trying this myself on Google is that you can now target your search to specific filetypes [google.com] . If you are dumb enough to store passwords or creditcard numbers in an xls file on your website, google makes it easy to find.

I'm at a loss to explain how someone puts sensitive information on the web in an unprotected location and then points the finger at google because they made it easier to find.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Stopping Google won't stop the problem... (5, Insightful)

Kr3m3Puff (413047) | more than 12 years ago | (#2613813)

The big complaint of the article is that Google is searching for new types of files, instead of HTML. If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...

Well Behaved Crawlers (4, Insightful)

tomblackwell (6196) | more than 12 years ago | (#2613815)

...obey the Robot Exclusion Standard [robotstxt.org] . This is not a big secret, and is linked to by all major search engines. Anyone wishing to exclude a well-behaved robot (like those of major search engines) can place a small file on their site which controls the behaviour of the robot. Don't want a robot in a particular directory? Then set your robots.txt up correctly.

P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.

Many crawlers ignore robots.txt (3, Interesting)

Ars-Fartsica (166957) | more than 12 years ago | (#2613915)

I do not know if this is still the case, but Microsoft's IE offline browsing page crawler (collects pages for you to read offline) ignored robots.txt last time I checked. I know many other crawlers do likewise.

Re:Well Behaved Crawlers (1)

melvin22 (523080) | more than 12 years ago | (#2613935)

Actually htey should probably get the hell out as fast as they can. But that's just my opinion...

Re:Well Behaved Crawlers (2)

Nos. (179609) | more than 12 years ago | (#2613944)

This is not the way to do it, as the article mentions. This may stop Google, but suppose I'm running my own search engine that doesn't follow "robots.txt" rules?

True. (2)

tomblackwell (6196) | more than 12 years ago | (#2613958)

It will stop the casual perusal of your data.

The way to stop the determined snooper is to not keep your data in a directory that can be accessed by your web server.

Re:Well Behaved Crawlers (5, Insightful)

ryanvm (247662) | more than 12 years ago | (#2613976)

The Robot Exclusion Standard (e.g. robots.txt) is mainly useful for making sure that search engines don't cache dynamic data on your web site. That way users don't get a 404 error when clicking on your links in the search results.

You should not be using robots.txt to keep confidential data out of caches. In fact, most semi-intelligent crackers would actually download the robots.txt with the specific intention of finding ill-hidden sensitive data.

Re:Well Behaved Crawlers (1)

sunking2 (521698) | more than 12 years ago | (#2613990)

But I thought OPT OUT was bad and everything should be OPT IN!

Seriously tho, there are a lot of ways that this sort of information can make it onto the web other than blaming companies. For example, how many times have people bought things online and then saved the html document that was returned as the receipt. It's very easy to imagine that people could save this to a directory that is inadvertanetly crawled.

Google shouldn't lift a finger (2, Interesting)

sketerpot (454020) | more than 12 years ago | (#2613818)

Why should Google or any other search engine do anything to save fools from their stupidity? Putting credit card numbers online where anyone can get them is just plain idiotic. Hopefully this will get a lot of publicity along with the names of companies who do stupid things like this and most people will shape up their act.

Simple but burdensome solution (4, Informative)

camusflage (65105) | more than 12 years ago | (#2613823)

Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.

Re:Simple but burdensome solution (5, Insightful)

Xerithane (13482) | more than 12 years ago | (#2613857)

It is a burden, but the responsibility does not lie on a crawling engine. You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.

If anything Google is providing them a service by telling them about the problem.

Re:Simple but burdensome solution (1)

new-black-hand (197043) | more than 12 years ago | (#2613967)

might not be that bad, since to do the calcluation on the number it will have to be XXXXXXXXXXXXXXXX format, so it has to be converted regardless

Google exploit patch for Apache (4, Funny)

Anarchofascist (4820) | more than 12 years ago | (#2613825)

% cd /var/www
% cat > robots.txt
User-agent: *
Disallow: /
^D
%

Google exploit patch 0.2 for Apache (2, Funny)

Anarchofascist (4820) | more than 12 years ago | (#2613864)

Oops! Version 0.2 already:

% cat > /var/www/html/robots.txt
User-agent: *
Disallow: /
^D
%

Insert foot in mouth.... (2, Interesting)

Crewd (199804) | more than 12 years ago | (#2613829)

From the article :

"Webmasters should know how to protect their files before they even start writing a Web site," wrote James Reno, chief executive of Amelia, Ohio-based ByteHosting Internet Services. "Standard Apache Password Protection handles most of the search engine problems--search engines can't crack it. Pretty much all that it does is use standard HTTP/1.0 Basic Authentication and checks the username based on the password stored in a MySQL Database."

And chief executives of a hosting company should know how Basic Authentication works before hosting web sites...

Crewd

Basic Authentication (3, Insightful)

KyleCordes (10679) | more than 12 years ago | (#2613900)

[know how Basic Authentication works before hosting web sites]

... and know that it's a wholly inadequate way of "protecting" credit card numbers!

Re:Insert foot in mouth.... (2, Funny)

simong (32944) | more than 12 years ago | (#2613904)

Not necessarily, they are chief executives after all.

What did they expect? (1)

Walter Bell (535520) | more than 12 years ago | (#2613833)

Lowering the barrier to entry to web publishing has had a few benefits. Families can share photographs and news in a cheap, efficient manner. Novices can publish information for the benefit of their employees or others easily. However, problems like this do arise quite often, and at their source one can see that the widespread ability of people to publish documents to the web does not coexist well with existing security systems and models.

At any other time in the past few years, this would not ordinarily be a societal problem. Sure, a few peoples' passwords and credit card numbers will leak out. Hopefully they would have to pay for the charges to punish them for their own stupidity. (After all, as a customer of several banks, I don't want my rates to go up because somebody posted his account numbers for the entire world to see.) But now, this is a national security problem, because we are being attacked by a foreign force who might abuse leaked passwords to access critical systems and cause chaos in this country. President Bush and his staff are very concerned about a cyberwar, because it can be waged without physically having Arabs in the States to commit the terrorism. That is very dangerous indeed.

I'm not sure what the solution is, but a good first step is for companies to raise the barrier to entry to publishing web pages. Geocities and Angelfire should force users to demonstrate their competence before uploading their first page. Perhaps requiring an A+ certification number would help? And Microsoft should take away the parts of FrontPage that allow users to generate documents without writing in HTML. That would help ease the problem, I reckon.

In conclusion - if everybody does their part to help solve this problem and stop information leakage, we will be a safer, more secure society without giving up any more civil liberties.

~wally

This Is A Known Troll (-1, Offtopic)

Faulty Dreamer (259659) | more than 12 years ago | (#2613886)

and the idiot moderators who keep modding him up are creating a bigger monster every time.

Requiring competence before publishing a page? What the Fuck?

Please think before modding him up because this post sounds good. Read it carefully, and you'll see it is only shit dressed up nicely.

Where is YOUR speech license? (2)

Unknown Poltroon (31628) | more than 12 years ago | (#2613889)

WHere have you put your license to speak yoour mind on slashdot? Surely, people cant go around putting anything they want to say into a public forum. They might say anything. A a matter of fact, we must revoke peoples phone privliges until lthey can proove theyre smart enough not to give out credit card numbers to telemarketers. As a matter of fact, lets just legislate intelegence. We can tack it on as a rider for that bill to make Pi = 3.
Youre a nitwit. Im revoking your speech licnese on slashdot.

Dooh. Spellcheck, then paste (-1, Offtopic)

Unknown Poltroon (31628) | more than 12 years ago | (#2613913)

Stupid submit button. Grrrrrrr.

OKC (1)

TheMidget (512188) | more than 12 years ago | (#2613980)

President Bush and his staff are very concerned about a cyberwar, because it can be waged without physically having Arabs in the States to commit the terrorism. That is very dangerous indeed.

Well, terrorism can easily be waged without having Arabs in the States, even without resorting to cyberwar. As Oklahoma City has shown, it's enough to have Rednecks in the States. Kudos though for disguising your racist drivel well enough to get modded up to 2.

The problem of CmdrTaco and "Snotting" Data (-1, Offtopic)

fishtoast (539097) | more than 12 years ago | (#2613838)

troooooon

Microsoft .Net.... (1)

MrWinkey (454317) | more than 12 years ago | (#2613842)

Hmmm....Microsoft's .Net possibly helping the problem?
Microsoft said it was safe.....

The Problem of Search Engines and "Sekrit" Data (4, Funny)

NTSwerver (92128) | more than 12 years ago | (#2613848)

Please change the title of this article to:

The Problem Incompetent System Administrators

If data is 'sekrit'/sensitive/confidential - don't put it on the web. It's as simple as that. If that data is available on the web, search engines can't be blamed for finding it.

Re:The Problem of Search Engines and "Sekrit" Data (1)

Garfunkel (3569) | more than 12 years ago | (#2613959)

It's not always "System Administrators". How many DSL/Cable subscribers do you think run Microsoft's Personal Webserver. Technically, yes, they are administering a system, but I don't think anybody would really call them a SysAdmin.

This is what happens when you use frontpage... (5, Informative)

Grip3n (470031) | more than 12 years ago | (#2613850)

I'm a web developer, and I don't know how many times I've heard people who are just getting into the scene talking about making 'hidden' pages. I'm reffering to those that are only accessible to those who click on a very tiny area of an image map, or perhaps find that 'secret' link at the bottom of the page. Visually, these elements seem 'hidden' to a user who doesn't really understand web pages and source code. However, these 'hidden' pages look like giant 'Click Here' buttons to search engines, which is what I'm presuming some of this indexing is finding.

The search engines cannot feasibly stop this from happening, each occurance is unique unto itself. The only prevention tool is knowledge and education, and bringing to the masses a general understanding of search engine spidering theory.

Just my 2 cents.

Re:This is what happens when you use frontpage... (2)

onion2k (203094) | more than 12 years ago | (#2613916)

Often worse than that.. the dreaded visibile:hidden CSS/DHTML that the likes of Dreamweaver is so keen on.. what the eye can't seen the robot certainly can..

Heh (0)

Anonymous Coward | more than 12 years ago | (#2613853)

These people who store credit card numbers on the web server are the same people who don't patch for IIS worms until all hell breaks loose.
Good for them.

Example (5, Informative)

squaretorus (459130) | more than 12 years ago | (#2613854)

I recently joined an angel organisation to publicise my business in an attempt to raise funds. The information provided to the organisation is supposed to be secret, and only available to members of the organisation via a paper newsletter which was reproduced in the secure area of the organisations website.
A couple of months down the line a couple of search engines, when asked about 'mycompanyname' were giving the newsletter entry in the top 5.

Alongside my details were those of several other companies. Essentially laying out the essence of the respective business plans.

How did this happen? The site was put together with FP2000, and the 'secure' area was simply those files in the /secure directory.

I had no cause to view the website prior to this. The site has been fixed on my advice. How did this come about? No one in the organisation knew what security meant. They were told that /secure WAS!

It didn't do any damage to myself, but a few of the other companies could have suffered if their plans were found. Its not googles job to do anything about this, its the webmasters. But a word of warning - before you agree for your info to appear on a website ask about the security measures. They mey well be crap!

I've got a solution! (5, Funny)

CraigoFL (201165) | more than 12 years ago | (#2613855)

Every web server should have a file in their root directory called "secret.xml" or somesuch. This file could list all the publicly-accessible URLs that have all the "secret" data such as credit card numbers, root passwords, and private keys. Search engines could parse this file and then NOT include those URLs in their search results!

Brilliant, huh? ;-)

On second thought, maybe I shouldn't post this... some PHB might actually think it's a good idea.

SSL only takes you so far (1)

imrdkl (302224) | more than 12 years ago | (#2613858)

And then you are at the mercy of ridiculous temp-file and text database schemes. I've never deployed a credit-card web, but I get enough spam from people trying to sell me their own implementation for my server, that this is not surprising at all.

Maybe we need to demand "approved" server-side implementation of credit-card webservers, besides SSL. How could this be verified? I don't have a clue.

No easy solution in sight?!?! (0)

Anonymous Coward | more than 12 years ago | (#2613861)

Here's a really easy solution - the bank has these crazy things called "bills." Go to the bank and get some. Then go to the store and use aforementioned "bills." Voila - hax0rs go bye-bye.

Bad manager ideas (1)

Mr Krinkle (112489) | more than 12 years ago | (#2613862)

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
The fact that this guy claims the responsibility lies with google for not allowing this type of search is just plain crazy. If you are publishing critical information on a site that is not at least secure and preferaby encrypted you are just asking for trouble. It should not be google's responsibility in any way shape or form to not find this information. If the content providers wish it they can put the robot file out but that is not fixing anything merely sidesteping one super easy hack. They still need to have a decent or at least SOME security design.
Oh well.

What a bunch of idiots, was Re:Bad manager ideas (1)

pdqlamb (10952) | more than 12 years ago | (#2613996)

Absolutely. And this is supposed to be a network security outfit. (disgusted grimace) Trouble is, these idiots fail to make clear where the responsibility for "solid software design" lies -- right on the shoulders of the people putting the information out in the open. It's like taking Martha Stewart's idea to the extreme -- collect all the credit cards in town and line the public swimming pool with them, and then put signs up saying, "Please do not copy down credit card numbers!"

Maybe we do need some kind of accreditation. Any idiot can claim to be a security expert in the computer field. Can any convicted burglar claim to be a locksmith?

Umm (0)

Anonymous Coward | more than 12 years ago | (#2613866)

"As the article outlines, this has been a problem for a long time -- and with no easy solution in sight."

How about using basic mySQL passwords?
Sounds pretty simple to me

To test your credit-card ordering site... (1)

5n3ak3rp1mp (305814) | more than 12 years ago | (#2613868)

go to Google, type in "site:yourdomain.com xxxx-xxxx-xxxx-xxxx" where the x's are the credit card number of a known customer. If you get hits, your security is less than ideal.

Unfortunately, website security is not as simple as locking a door... but keeping your customer data out of the webserver's document root would be a good start.

Re:To test your credit-card ordering site... (2)

Legion303 (97901) | more than 12 years ago | (#2613914)

go to Google, type in "site:yourdomain.com xxxx-xxxx-xxxx-xxxx" where the x's are the credit card number of a known customer.

Then watch the fraudulent charges fly when the person who was sniffing cleartext HTTP traffic gets it in his logs.

-Legion

Bad.. but (2)

boaworm (180781) | more than 12 years ago | (#2613871)

Such problems have existed for quite a while. Hackings, Crackings, internet sniffing etc.

The real issue is not if you can.. but if you actually do use the information. Regardless of if it is available or not, it IS ILLEGAL. (Carding does give rather long prison times as well)

People had the chance to steel from other people for as long as mankind existed. This is just another form... perhaps a bit simpler though ...

My suspiscion (-1, Flamebait)

Anonymous Coward | more than 12 years ago | (#2613872)


I suspect, sir, that you may be a nigger.

Easy solution (1, Redundant)

Arethan (223197) | more than 12 years ago | (#2613895)

Your crawler is caching credit card numbers you say? Simple, check the content you cache for 16 digit numbers. Any that you find, you check with a simple LUHN (mod 10) algorithm. If it passes, you replace the number with "################" or a similar masking.

There, all credit card numbers will now filtered from your cache.

I understand the severity of the issue, and it's good to know this is happening, but the solution is simple.

Not just credit cards (1)

Jodrell (191685) | more than 12 years ago | (#2613899)

I run a website [uk.com] that pulls a lot of content from other servers. We use to have a newsfeed via ITN [itn.co.uk] 's RDF feed - until I got a call from their Director of New Media asking me to take it off. Seems they charge a hefty fee for such a feed - around £30,000 - but hadn't taken any attempts to protect it with a .htaccess file or something. How did I find it? By searching Google [google.com] !

spam (1, Informative)

flollywebfrog (462849) | more than 12 years ago | (#2613901)

The other day I was using google to explore the files of an annoying spammers [referralware.com] site [referralware.com]. Simply searching for a few numbers with the query site:.referralware.com brought up search results in their unprotected source.referralware.com directory that included all the credit card logs for the past week. And I am just an average computer joe user ... this is a problem if I can be a "hacker" with less knowledge than a script kiddie!

We must learn from our mistakes (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613903)

HISTORY REPEATS ITSELF

+MONDAY MORNING+
Cmdr Taco: I will not suck any more dick ever again.
+MONDAY EVENING+
Cmdr Taco: *slurp* *slurp* *slurp*

+TUESDAY MORNING+
Cmdr Taco: I will not suck any more dick ever again.
+TUESDAY EVENING+
Cmdr Taco: *slurp* *slurp* *slurp*

+WEDNESDAY MORNING+
Cmdr Taco: I will not suck any more dick ever again.
+WEDNESDAY EVENING+
Cmdr Taco: *slurp* *slurp* *slurp*

+THURSDAY MORNING+
Cmdr Taco: I will not suck any more dick ever again.
+THURSDAY EVENING+
Cmdr Taco: *slurp* *slurp* *slurp*

+FRIDAY MORNING+
Cmdr Taco: I will not suck any more dick ever again.
+FRIDAY EVENING+
Cmdr Taco: *slurp* *slurp* *slurp* *slurp*

+SATURDAY MORNING+
Cmdr Taco: I will not suck any more dick ever again.
+SATURDAY EVENING+
Cmdr Taco: *slurp* *slurp* *slurp*

+SUNDAY MORNING+
Cmdr Taco: Today is the Lord's day.
+SUNDAY AFTERNOON+
Cmdr Taco: *slurp* *slurp* *slurp*

robots.txt (2, Interesting)

mukund (163654) | more than 12 years ago | (#2613910)

From my web logs, I see that a lot of HTTP bots don't care crap about /robots.txt. Another thing which happens is that they read robots.txt only once and cache it forever in the lifetime of accessing that site, and do not use a newer robots.txt when it's available. It'd be useful to update what a bot knows of a site's /robots.txt from time to time.

HTTP bot writers should adhere to using information in /robots.txt and restricting their access accordingly. In a lot of occasions, webmasters may setup /robots.txt to actually help stop bots from feeding on junk information which they don't require.. or things which change regularly and need not be recorded.

Oh, for regular expression searching in Google (5, Funny)

EnglishTim (9662) | more than 12 years ago | (#2613918)

I could be a rich man...

(Not, of course that I'd ever do anything like that...)

Searching with regular expressions would be cool, though...

Directory listings (2, Informative)

NineNine (235196) | more than 12 years ago | (#2613919)

Most of tihs is coming from leaving directory listing turned on. Generally, this should only be used on an HTTP front-ends to FTP boxes, and for development machines. IIS has "directory browsing" turned off by default. Maybe Apache has it turned on by default? You'd be surprised to see how many public webservers have this on, making it exceedingly likely that search engines will find files they weren't meant to find. The situation arises when there's no "default" page (usually index.html or default.html, default.asp, etc.) in a directory and only a file like content.html in a directory. IF a SE tries http://domain.com/directory/, it'll get the directory listing, which it can, in turn, continue to spider.

Must... blame... someone.... (3, Funny)

JMZero (449047) | more than 12 years ago | (#2613925)

INetPub means "INetPublic" not "INetPubrobably a great place to put my credit card numbers".

Why are stupid people not to blame for anything anymore?

Evil Robot? (1)

StevenHallman76 (455545) | more than 12 years ago | (#2613933)

So, where might one find an 'evil' robot that looks specifically in places robots.txt tells it not to? hypothetically speaking, of course...

Business Model (5, Funny)

Alomex (148003) | more than 12 years ago | (#2613934)


A while back there was a thread here about the weakness of the revenue model for search engines. Maybe we have found the answer, think about all the revenue that Google could generate with this data!

Anybody knows when Google is going public?

well golly gosh, it works! (2, Informative)

Anonymous Coward | more than 12 years ago | (#2613936)

search for: password admin filetype:doc

My first hit is:

www.nomi.navy.mil/TriMEP/TriMEPUserGuide/WordDoc s/ Setup_Procedures_Release_1.0e.doc

at the bottom of the html:

UserName: TURBO and PassWord: turbo, will give you unlimited user access (passwords are case sensitive).

Username: ADMIN and PassWord: admin, will give you password and system access (passwords are case sensitive).

It is recommend that the user go to Tools, System Defaults first and change the Facility UIC to Your facility UIC.

oh dear, am I now a terrorist?

SHit. (-1, Offtopic)

Unknown Poltroon (31628) | more than 12 years ago | (#2613964)

Now youre an 3l33t h^x0r D00d3!!!!

Bring out the legal eagles (4, Insightful)

Milican (58140) | more than 12 years ago | (#2613942)

"Webmasters should know how to protect their files before they even start writing a Web site"

That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.

To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light :)

JOhn

Did you know (0, Troll)

hyyx (447405) | more than 12 years ago | (#2613943)

that you can use "file://[address]" to find pages and directories that are NOT linked to on a server (if the server allows it)?

A poem (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613945)


From the Norton Anthology of Black Poetry:

"Watermelon, Fried Chicken, Cadillac Car -
We're not as dumb as you think we is!"

Robots search by links right? (1, Troll)

linuxrunner (225041) | more than 12 years ago | (#2613946)

The search engines use robots, and the robots read your site through links... So unless the file is in the root directory or has a direct link to the information. It should not show up.

So create a folder called "mystuff" and keep everything in it... and don't create a link to it, just remember it and type in the url.
http://www.my-site.com/mystuff
You'll then be sent to your secret folder that no one knows about, even the robots.
So I'm not sure what all the yelling is about. Just do that, or set up the robots.txt correctly, but most people don't realize they can do that....

No easy solution in sight? (2)

vrmlguy (120854) | more than 12 years ago | (#2613955)

From the article: "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security." -- Gary McGraw

Allow me to disagree. This fellow apparantly agrees with Microsoft that people shouldn't publish code exploits and weaknesses. Sorry, but anyone who had secret information available to the external web is in the exact same boat as someone who has an unpatched IIS server or is running SQL Server without a password.

Let's assume that Google had (a) somehow figured out from day one that people would search for passwords, credit card numbers, etc, and (b) figured out some way to recognize such data to help keep it secret. Should they have publisized this fact or kept it a secret? Publicity would just mean that every script kiddie would be out writting their own search engines, looking for the things that Google et al were avoiding. Secrecy would mean that a very few black hats would write their own search engines, and the victims of such searches would have no idea how their secrets were being compromised.

But this assumes that there's someway of accomplishing item (B), which I claim is very difficult indeed. In fact, it would be harder to accomplish than natural language recognition. Think about it... Secrets are frequently obscure, to the point that to a computer they look like noise. Most credit cards, for example, use 16 digit numbers. Should Google not index any page containing a string of 16 consecutive digits? How about pages that contain SQL code? How would one not index those, but still index the on-line tutorials at MySQL, Oracle, etc?

The only "solution" is to recognize that this problem belongs in the lap of the web site's owner, and the search engine(s) have no fundamental responsibilty.

And please close the door on the way out.... (2)

pwagland (472537) | more than 12 years ago | (#2613957)

But other critics said Google bears its share of the blame.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Am I the only one scared by this? The problem is googles, simply because they follow links? I find it hard to believe this stuff sometimes!

<rant>When will people learn that criminals don't behave? That is what makes them criminals!</rant>

As our second year uni project we were required to write a web index bot. Guess what? It didn't "behave". It would search through a robots.txt roadblock. It would find whatever their was there to find. This stuff is so far from being rocket science it is ridiculous!

Sure, using Google might ease a tiny fraction of the bad guys work, but if Google wasn't there, the bad guys tools would be. In fact, they still are there.

Saying that you have to write your client software to work around server/administrator flaws is like putting a "do not enter" sign on a tent. Sure, it will stop some people, but the others will just come in anyway, probably even more so just to find out what you are hiding.

How I handle my customer data (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#2613977)

Here is my ass
Which you may kiss.
Take time and aim well
You don't want to miss.

For if you aim low
And your lips they do fall
Then you will find
You'll be sucking my balls.

If you aim high
Despite your true heart
Sucks to be you
Now you're eating my fart.

Sure enough. (3, Interesting)

Joe Decker (3806) | more than 12 years ago | (#2613979)

Looked up the first 8 digits of one of my own CC numbers, and, while I didn't find my own CC # on the net, I did immediately find a large file full of them with names, expiration dates, etc. (Sent a message to the site manager, but this case is pretty clearly an accidental leak.)

At any rate--scary it is.

Don't know that this is Google's problem.. (2)

sid_vicious (157798) | more than 12 years ago | (#2613983)

From the article:
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Search and replace "Google" with "Microsoft". The lack of security is in the operating system and the applications which launch the malicious files without warning the user. Google just tell you where to get 'em, not what to do with 'em.

Web Sites are public by definition (4, Insightful)

hattig (47930) | more than 12 years ago | (#2613986)

It is a simple rule of the web - any directory or subdirectory thereof that is configured to be accessible via the internet (either html root directories, ftp root directories, gnutella shared directories, etc) should be assumed to be publically accessible. Do not store anything that should be private in these areas.

Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.

But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?