The Problem of Search Engines and "Sekrit" Data

Become a fan of Slashdot on Facebook

The Problem of Search Engines and "Sekrit" Data 411

Posted by Hemos on Monday November 26, 2001 @12:43PM from the how-to-choose-data dept.

Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.

This discussion has been archived. No new comments can be posted.

The Problem of Search Engines and "Sekrit" Data

Load All Comments

Search 411 Comments Log In/Create an Account

Comments Filter:

A symptom of poor programming... (Score:4, Insightful)

by Bonker ( 243350 ) writes: on Monday November 26, 2001 @12:48PM (#2613791)

I don't see what's so hard about this problem. It's very simple... don't keep data of any kind on the web server. That's what firewalled, password/encryption protected DB servers are for.

Share
twitter facebook
- Re:A symptom of poor programming... (Score:5, Interesting)
  
  by ChazeFroy ( 51595 ) writes: on Monday November 26, 2001 @12:58PM (#2613875) Homepage
  
  Try the following searches on google (include the quotes) and you'll be amazed at what's out there:
  
  "Index of /admin"
  "Index of /password"
  "Index of /mail"
  "Index of /" +passwd
  "Index of /" password.txt
  
  Parent Share
  twitter facebook
  - Re:A symptom of poor programming... (Score:5, Informative)
    
    by ichimunki ( 194887 ) writes: on Monday November 26, 2001 @01:19PM (#2614020)
    
    A big part of why this is a problem is the fact that many web servers are, by default, set up to display file listings for directories if there is no "index.html" file in the directory and the user requests a URL corresponding to that directory.
    
    Personally I like to make sure that there is an .htaccess file that prevents this (on Apache-- I'm sure IIS and others have similar config options). I like to turn off the directory listing capability if possible, and certainly assign a valid default page, even if index.html is not present.
    
    And don't forget "index of /cgi-bin" for some real fun. ;)
    
    Parent Share
    twitter facebook
    - Re:A symptom of poor programming... (Score:2)
      
      by Bonker ( 243350 ) writes:
      
      Funnily enough, IIS defaults to hide directory contents where Apache doesn't. The option to display directory contents can be turned on easily enough, but an administrator does actually have to make the concious decision to do so.
      
      This is a good reason not to let developers have administrator access to any boxen they are developing on.
  - Re:A symptom of poor programming... (Score:5, Interesting)
    
    by Legion303 ( 97901 ) writes: on Monday November 26, 2001 @01:29PM (#2614080) Homepage
    
    Please give credit where credit is due. Vincent Gaillot posted this list to Bugtraq on November 16.
    -Legion
    
    Parent Share
    twitter facebook
    - - Re:Nice work, Legion303. (Score:3, Funny)
        
        by well_jung ( 462688 ) writes:
        
        "Trees cause more pollution than automobiles do." --Ronald Reagan '81
- Re:A symptom of poor programming... (Score:4, Funny)
  
  by Brainless ( 18015 ) writes: on Monday November 26, 2001 @01:04PM (#2613929)
  
  I manage a Cold Fusion web server that we allow clients to post their own websites to. Recently, their programmer accidentally made a link to the admin section. Google found that link and proceeded into the admin secion and indexed all the "delete item" links as well. I found it quite amusing when they asked to see a copy of the logs complaining the website was hacked and I discovered GoogleBot deleted every single database entry for them.
  
  Parent Share
  twitter facebook
  - Re:A symptom of poor programming... (Score:3)
    
    by greenrd ( 47933 ) writes:
    
    Do you realise that the web developer who made the admin section accessible via a GET request, without any additional authentication, is the biggest moron here, not the client? You shouldn't rely on people not knowing where your wide-open doors are - lock them!
- Re:A symptom of poor programming... (Score:2, Informative)
  
  by ChazeFroy ( 51595 ) writes:
  
  Something I forgot to mention in my other post:
  
  The October 2001 issue of IEEE Computer has some articles on security, and the first article in the issue is titled "Search Engines as Security Threat" by Hernandez, Sierra, Ribagorda, Ramos.
  
  Here's a link [computer.org] to it.
How can this happen? (Score:4, Redundant)

by Nonesuch ( 90847 ) writes: on Monday November 26, 2001 @12:48PM (#2613798) Homepage Journal

To the best of my knowledge, search engines all work by indexing the web, starting with the base of web sites or submitted URLs, and following the links on each page.
Given this premise, the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.
That is to say, the web-indexing bots used by search engines cannot find anything that an ordinary, very patient human could not find by randomly following links.

Share
twitter facebook
- How this happens (Score:5, Informative)
  
  by Tom7 ( 102298 ) writes: on Monday November 26, 2001 @12:59PM (#2613885) Homepage Journal
  
  People often wonder how their "secret" sites get into web indices. Here's a scenario that's not too obvious but is quite common:
  
  Suppose I have a secret page, like:
  http://mysite.com/cgi-bin/secret?password=admini st rator
  
  Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).
  
  Now suppose elsewhere.com runs analog on their web logs, and posts them in a publically-accessible location. Suppose elsewhere.com's analog setup also reports the contents of the "referer" header.
  
  Now suppose the web logs are indexed (because of this same problem, or because the logs are just linked to from their web page somewhere). Google has the link to your secret information, even though you never explicitly linked to it anywhere.
  
  One solution is to use proper HTTP access control (as crappy as it is), or to use POST instead of GET to supply credentials (POST doesn't transfer into a URL that might be passed as a referrer). You could also use robots.txt to deny indexing of your secret stuff, though others could still find it through web logs.
  
  Of course, I don't think credit card info should *ever* be accessible via HTTP, even if it is password protected!
  
  Parent Share
  twitter facebook
  - Re:How this happens (Score:2, Informative)
    
    by Garfunkel ( 3569 ) writes:
    
    ah yes, analog's reports (and other web stat programs) are a big culprit as well. Even on local sites. If I have a /sekrit/ site that isn't linked to from anywhere on my site, but I have a bookmark that I visit often. That shows up in web logs still and usually gets indexed by a web log analyzer which can "handily" create links to all those pages when it generates the report.
  - Re:How this happens (Score:2, Troll)
    
    by frankie ( 91710 ) writes:
    
    Suppose I have a secret page, like: http://mysite.com/cgi-bin/secret?password=administ rator
    Then it's a pretty crappy secret. Plaintext passwords sent via GET are weaker than the 4 bit encryption in a DVD or something.
    Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).
    If the page is really truly supposed to be secret, then it won't have external links, and you'll filter it out of your web logs too. Or you could just suck.
    Google doesn't kill secrets. PHBs and MCSEs kill secrets.
- Directory searches (Score:4, Insightful)
  
  by wytcld ( 179112 ) writes: on Monday November 26, 2001 @02:03PM (#2614273) Homepage
  
  Some search engines don't just check the pages linked from other pages on the server, but also look for other files in the subdirectories presented in links.
  
  So if http://credit.com/ has a link to http://credit.com/signin/entry.html then these engines will also check http://credit.com/signin/ - which will, if directory indexes are on and there is no index.html page there, show all the files in the directory. In which case http://credit.com/signin/custlist.dat - your flatfile list including credit cards - gets indexed.
  
  So if you're going to have directory indexing on (which there can be valid reasons for) you really need to create an empty index.html file as the very next step each time you set up a subdirectory, even if you only intend to link to files within it.
  
  Parent Share
  twitter facebook
Oh Yeah? (Score:4, Funny)

by Knunov ( 158076 ) writes: <eat@my.ass> on Monday November 26, 2001 @12:49PM (#2613801) Homepage

"...search engines are finding password and credit card numbers while doing its indexing."

This is very serious. Could you please post the exact search engines are query strings so I can make sure my information isn't there?

Knunov

Share
twitter facebook
- Re:Oh Yeah? (Score:5, Funny)
  
  by Karma 50 ( 538274 ) writes: on Monday November 26, 2001 @12:51PM (#2613820) Homepage
  
  Just search for your credit card number.
  
  By the way, does google have that realtime display of what people are searching for?
  
  Parent Share
  twitter facebook
  - - Re:Oh Yeah? (Score:2, Insightful)
      
      by morcego ( 260031 ) writes:
      
      Yes, it could. Actualy, it very trivial to do.
      I actualy tried to search for my credit card number, but only searched for 8 digits, in various forms (always the same digits, mind you), like:
      "XXXX XXXX"
      "XXXX-XXXX"
      "XXXXXXXX"
      
      Thanks god, nothing ...
      This is something I sugest you people to do. I would sugest using the last 8 digits, onde the "last 4 digits" are commonly used, but you won't be exposing something that is probably already everywhere.
- Re:Oh Yeah? (Score:2, Funny)
  
  by 4of12 ( 97621 ) writes:
  
  Yeah!
  
  I just typed in my credit card number and found 15 hits on web sites involving videos of hot young goats.
Tangential Google Question (Score:5, Interesting)

by banuaba ( 308937 ) writes: <drbork@@@hotmail...com> on Monday November 26, 2001 @12:50PM (#2613803)

How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?
If I want to find lyrics to a song, the site that has them will often be down, but the cache will still have them in there.. Why is what google is doing 'okay' but what the origional site not okay? Or do they just leave google alone?

Share
twitter facebook
- Re:Tangential Google Question (Score:3, Interesting)
  
  by CaseyB ( 1105 ) writes:
  
  Good question.
  Given that they do have (for now) some sort of immunity, it opens a loophole for publishing illegal data. Simply set up your site with all of Metallica's lyrics / guitar scores (all 5 of them, heh). Submit it for indexing to Google, but don't otherwise attract attention to the site. When you see the spider hit, take it offline. Now the data is available to anyone who searches for it on Google, but you're not liable for anything. The process could be repeated to update the cache.
  - Re:Tangential Google Question (Score:2, Interesting)
    
    by Suidae ( 162977 ) writes:
    
    Don't bother taking it offline, just set up your web server so it only responds to the google indexing server. Cache stays up all the time, but no one else can (easily) see that you are serving it.
  - It doesn't last (Score:3, Informative)
    
    by kimihia ( 84738 ) writes:
    
    Because 28 days after you took your page offline it will disappear from the Google cache.
    
    Google reindexes web pages, and if they 404 on the next visit, then good bye pork pie! You have to get them while they are hot, eg, when a site has JUST been Slashdotted.
  - - Re:Tangential Google Question (Score:3, Informative)
      
      by Xzzy ( 111297 ) writes:
      
      > If you only had Google pointing to it, wouldn't
      > it be very low on a search list?
      
      If it's a very specific search term, Google will still return it in the list. If it's unique enough, it's very possible that it will even be the top ranked page. If you put a unique string of characters (like a password or something) on a page, and google indexed it, typing that "password" into the search engine will give you your page.
      
      You can also type domain names into google to retrieve the cache page for that website, which would accomplish much the same thing as long as it's not geocities or something.
- Re:Tangential Google Question (Score:2)
  
  by passion ( 84900 ) writes:
  
  I doubt most prosecuting teams are savvy enough to think about google's cache.
  - Re:Tangential Google Question (Score:3)
    
    by snake_dad ( 311844 ) writes:
    
    Make that: were savvy enough.
how the FUCK is this possible? (Score:2, Insightful)

by posmon ( 516207 ) writes:

just because google is only picking them up now doesn't mean that they haven't been there for years!
how can someone be so blatantly stupid as to store anything other than their web content, never mind credit card details, in their published folders? how? they redirected my documents to c:\inetpub\wwwroot\%username%\...???
- Re:how the FUCK is this possible? (Score:2, Insightful)
  
  by Karma 50 ( 538274 ) writes:
  
  Google has just added the ability to index PDFs, word docs etc. So, yes, the information was there before, but now it is much easier to find.
- Re:how the FUCK is this possible? (Score:2, Insightful)
  
  by Neon Spiral Injector ( 21234 ) writes:
  
  In published folders? How about on machines that are on the Internet at all.
  
  In an ideal setup the machine storing credit card information wouldn't have a network card, or speak any networking protocal. You'd have a front end secure webserver. That machine would would pass the credit card information to the backend across a serial link. The backend machine would process the card and return the status. The CC data would only be a one way transfer, with no way of retrieving it back off of that machine.
Stopping Google won't stop the problem... (Score:5, Insightful)

by Kr3m3Puff ( 413047 ) writes: <me@@@kitsonkelly...com> on Monday November 26, 2001 @12:51PM (#2613813) Homepage Journal

The big complaint of the article is that Google is searching for new types of files, instead of HTML. If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...

Share
twitter facebook
- Re:Stopping Google won't stop the problem... (Score:3, Interesting)
  
  by Zspdude ( 531908 ) writes:
  
  It's definately very true that if there were no stupid people these things would not be an issue of controversy. However, society has struggled for a very long time to resolve the question, "Should stupid people be protected from themselves?" There will always be those who( whether they're just technologically inept or for whatever reason) will not act sensibly and not realize they are being foolish. Do they deserve protection as well, even though they don't know how to protect themselves? That's a question which is not quite as easy to answer....
  - Re:Stopping Google won't stop the problem... (Score:2, Interesting)
    
    by greed ( 112493 ) writes:
    
    So maybe the fix should be in making it harder to share things on the Web, rather than trying to have search bots guess whether someone really meant to post the file?
    
    Web servers could ship configured to not AutoIndex, only allow specific file types (.jpeg, .html, .png, .txt), and disable all those things that I disabled in Apache without losing anything I needed for my site, and so on. Then, the burden is placed on the person who started sharing these other filetypes that have sensitive data on the public internet.
    
    Of course, putting something in public that you don't want someone to see is just plain stupid, but apparently we need to make stupid people feel like they're allowed on the 'net.
- Re:Stopping Google won't stop the problem... (Score:2)
  
  by DaoudaW ( 533025 ) writes:
  
  If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.
  
  This seems to be the most common early response to the article and I agree up to a point. The problem is where to stop. Several times I've found stuff in Google's cache that I know were password-protected on the website. I was grateful, but wondered how they retrieved them. Did they purchase a subscription? Did the owners give them access for the benefit of having the site catalogued?
  
  Another issue appears when they start crawling directories. It's never obvious which directories were meant to be public readable and which ones weren't, but Google undoubtedly uses techniques beyond that of the casual browser. As what point do they become crackers?
  
  A number of years ago, I had a shell account on a Unix system. It was amazing where I could go, what I could see on the system with a little bit of ingenuity. When I pointed this out to the sysadmin, he treated me like a criminal. Okay, maybe I should have stopped when I started getting warning messages ;-), but the fact is that Google could probably get behind at least 50% of firewalls if they wanted to.
  
  How far is too far in the search for information?
  - Re:Stopping Google won't stop the problem... (Score:5, Funny)
    
    by mobiGeek ( 201274 ) writes: on Monday November 26, 2001 @02:37PM (#2614486)
    but Google undoubtedly uses techniques beyond that of the casual browser
    Uhh...no.
    HTTP is an extremely basic protocol. Google's bots simply do a series of GET requests.
    It would be possible that Google's bots have a database of username/passwords for given sites, but the more likely scenario is that they have stumbled across another way to get the "protected" information:
    
    a link which contains a username and/or password
    /protected/show_article.pl?username=foo&passwo rd=bar&num=1
    
    a link to the pages which by-passes the protection scheme
    /no_one_can_find_this_cause_Im_3l33t/article1.html
    
    someone else posted the information elsewhere, and this is what is actually crawled
    
    I ran robots for nearly 2 years and was harassed by many a Webmuster who could prove that my robots had hacked their site. They'd show me protected or secret data. It typically took 3 to 5 minutes to find the problem...usually the muster was the problem themself.
    HERE'S A NOTE OF WARNING TO WEBMASTERS:
    Black text links on black backgrounds in really small fonts are NOT secure.
    Maybe I should get this posted to BugTraq...or would MS come after me??
    Parent Share
    twitter facebook
    - Re:Stopping Google won't stop the problem... (Score:3, Insightful)
      
      by Webmonger ( 24302 ) writes:
      
      Umm, I don't think that's how it happens. I think Google indexes the page and THEN the idiots put on the password protection.
      
      If Google accessed it via a special link, then Google would store that link, and you'd use that link, and you'd see it yourself.
      
      (another form of not-secret link:
      http://user:password@domain/path/file)
    - Re:Stopping Google won't stop the problem... (Score:4, Insightful)
      
      by Anonymous Coward writes: on Monday November 26, 2001 @04:44PM (#2615460)
      
      Years ago cable companies cried foul that ordinary citizens were grabbing satelite communications off the air with their fancy 6' dishes and whatching whatever they wanted for free. The companies raised a big stink and tried to get people to pay for the content. The FCC said "tough luck buddy. If you put it out there then people have a perfect right to grab it." Since that time most satelite traffic has been encrypted.
      
      If you run a web site on the public internet then you should be paying attention to this basic fact: If you put it out there then people have a perfect right to grab it, even if you don't specifically tell them it's there. (I know FCC rulings don't apply, but the principle is the same). You should encrypt EVERYTHING you don't want people to see.
      
      Encryption is like your pants, it keeps people from seeing your privates. Hiding your URLs and hoping is like running realy, realy fast with no pants on - most people wont see your stuff, but there's always some bastard with a handy-cam.
      
      Parent Share
      twitter facebook
- Different file types make my day (Score:3, Interesting)
  
  by srichman ( 231122 ) writes:
  
  The big complaint of the article is that Google is searching for new types of files, instead of HTML.
  
  The only people who complain about this are obviously the folks using crossed fingers for security. The rest of us love that Google indexes different file types.
  I'll never forget the day I first saw a .pdf in Google search result. Not that long ago I saw my first .ps.gz in a search result. I mean, how dope is that!? They're ungzipping the file, and then parsing the postscript! Soon they'll start uniso-ing images, untarring files, unrpming packages, .... You'll be able to search for text and have it found inside the README in an rpm in a Red Hat ISO.
  Can't wait until images.google.com starts doing OCR on the pix they index...
Well Behaved Crawlers (Score:4, Insightful)

by tomblackwell ( 6196 ) writes: on Monday November 26, 2001 @12:51PM (#2613815) Homepage

...obey the Robot Exclusion Standard [robotstxt.org]. This is not a big secret, and is linked to by all major search engines. Anyone wishing to exclude a well-behaved robot (like those of major search engines) can place a small file on their site which controls the behaviour of the robot. Don't want a robot in a particular directory? Then set your robots.txt up correctly.

P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.

Share
twitter facebook
- Many crawlers ignore robots.txt (Score:3, Interesting)
  
  by Ars-Fartsica ( 166957 ) writes:
  
  I do not know if this is still the case, but Microsoft's IE offline browsing page crawler (collects pages for you to read offline) ignored robots.txt last time I checked. I know many other crawlers do likewise.
  - Re:Many crawlers ignore robots.txt (Score:2)
    
    by Syberghost ( 10557 ) writes:
    
    Other crawlers that do listen to robots.txt can be duped into effectively ignoring it.
    
    For example, try this with wget sometime:
    
    wget -r somesitethathasrobots.txt
    su -
    chown root:root robots.txt
    cat /dev/null >robots.txt
    chmod 0000 robots.txt
    exit
    wget -r somesitethathasrobots.txt
    
    voila, wget now thinks it's observing robots.txt, but robots.txt is a zero-length file, and it can't overwrite it because only root can write to that file...
- Re:Well Behaved Crawlers (Score:2)
  
  by Nos. ( 179609 ) writes:
  
  This is not the way to do it, as the article mentions. This may stop Google, but suppose I'm running my own search engine that doesn't follow "robots.txt" rules?
  - True. (Score:2)
    
    by tomblackwell ( 6196 ) writes:
    
    It will stop the casual perusal of your data.
    
    The way to stop the determined snooper is to not keep your data in a directory that can be accessed by your web server.
- Re:Well Behaved Crawlers (Score:5, Insightful)
  
  by ryanvm ( 247662 ) writes: on Monday November 26, 2001 @01:12PM (#2613976)
  
  The Robot Exclusion Standard (e.g. robots.txt) is mainly useful for making sure that search engines don't cache dynamic data on your web site. That way users don't get a 404 error when clicking on your links in the search results.
  
  You should not be using robots.txt to keep confidential data out of caches. In fact, most semi-intelligent crackers would actually download the robots.txt with the specific intention of finding ill-hidden sensitive data.
  
  Parent Share
  twitter facebook
- comp.risks (Score:2)
  
  by coyote-san ( 38515 ) writes:
  
  Many years ago on comp.risks somebody actually looked at the contents of a number of robot.txt files - he wondered if they could be used as a quick index into "interesting" files. At the time, erroneous use of the file was still pretty rare... but I'm sure that was a selection effect that is no longer valid.
  
  Bottom line: that standard may be intended for one behavior (robots don't look in these directories), but there's absolutely nothing to prevent it from being used to support other behaviors (robots look in these directories first). If you don't want information indexed, don't put the content on your site. Or at a minimum, don't provide directory indexes and use non-obvious directory names.
  - Re:comp.risks (Score:2)
    
    by jfunk ( 33224 ) writes:
    
    Um, robots.txt should *not* be used for security reasons. That's just stupid.
    
    It is best used to tell crawlers not to bother with pages that are simply useless to crawl. If I ran a site containing a text dictionary in one big html file, I should use robots.txt. If I had a script that just printed random words, I should disallow that too.
Google shouldn't lift a finger (Score:2, Interesting)

by sketerpot ( 454020 ) writes:

Why should Google or any other search engine do anything to save fools from their stupidity? Putting credit card numbers online where anyone can get them is just plain idiotic. Hopefully this will get a lot of publicity along with the names of companies who do stupid things like this and most people will shape up their act.
- Re:Google shouldn't lift a finger (Score:2)
  
  by nomadic ( 141991 ) writes:
  
  Yeah, but the people that suffer the most aren't the idiots posting the data, they're the people whose credit card numbers they are. Why should they suffer because the store they bought something from doesn't understand the concept of security?
Simple but burdensome solution (Score:4, Informative)

by camusflage ( 65105 ) writes: on Monday November 26, 2001 @12:52PM (#2613823)

Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.

Share
twitter facebook
- Re:Simple but burdensome solution (Score:5, Insightful)
  
  by Xerithane ( 13482 ) writes: <xerithane.nerdfarm@org> on Monday November 26, 2001 @12:56PM (#2613857) Homepage Journal
  
  It is a burden, but the responsibility does not lie on a crawling engine. You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.
  
  I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.
  
  If anything Google is providing them a service by telling them about the problem.
  
  Parent Share
  twitter facebook
  - Re:Simple but burdensome solution (Score:2)
    
    by camusflage ( 65105 ) writes:
    
    I never said it wasn't web monkey's fault. Yes, anyone who would do something like this doesn't deserve even the title of web monkey. This is simply a reaction, like a provider filtering inbound port 80 to staunch code red's effects.
  - - Re:Simple but burdensome solution (Score:3, Informative)
      
      by Bronster ( 13157 ) writes:
      
      \d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d .?\d.?\d.?\d
      
      should match >99% of cc numbers. And a lot of other dross, but you can just pipe it into a mod10 checker
      
      Putting the burden on me, the poor sap who wants to have my web pages indexed, to make sure that I don't accidently put any numbers on a web site that might be mis-interpreted as a credit card number (i.e. a tab or comma separated list of numbers would be likely to hit the above, especially if it was much longer than a CC number).
      
      Not to mention the problem of recursive lookup on
      a long number (the first 2000 digits of pi are 3.1415926535.......) - it would take an age to make sure there were no CC no's in that.
      
      All together, it would cause 'innocent' pages to not be indexed, which is distinctly sub optimal.
- Re:Simple but burdensome solution (Score:2)
  
  by Codifex Maximus ( 639 ) writes:
  
  While the idea is workable in a small subset of data, what about other sensitive data that is found in the public domain? Will Google and other search engines be responsible for hiding that too? Where does it end?
  
  The burden of hiding information, that should *NOT* be there in the first place, should rest on the entity that posted the information publicly - the web-site.
  
  Once information is *Published* can it be *UnPublished*?
- Re:Simple but burdensome solution (Score:2)
  
  by The Pim ( 140414 ) writes:
  
  Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.
  Among other things, this would have the amusing effect of blacklisting most web pages about credit card number validation.
Google exploit patch for Apache (Score:4, Funny)

by Anarchofascist ( 4820 ) writes: on Monday November 26, 2001 @12:52PM (#2613825) Homepage Journal

% cd /var/www % cat > robots.txt User-agent: * Disallow: / ^D %

Share
twitter facebook
- Google exploit patch 0.2 for Apache (Score:2, Funny)
  
  by Anarchofascist ( 4820 ) writes:
  
  Oops! Version 0.2 already:
  % cat > /var/www/html/robots.txt User-agent: * Disallow: / ^D %
Insert foot in mouth.... (Score:2, Interesting)

by Crewd ( 199804 ) writes:

From the article :

"Webmasters should know how to protect their files before they even start writing a Web site," wrote James Reno, chief executive of Amelia, Ohio-based ByteHosting Internet Services. "Standard Apache Password Protection handles most of the search engine problems--search engines can't crack it. Pretty much all that it does is use standard HTTP/1.0 Basic Authentication and checks the username based on the password stored in a MySQL Database."

And chief executives of a hosting company should know how Basic Authentication works before hosting web sites...

Crewd
- Basic Authentication (Score:3, Insightful)
  
  by KyleCordes ( 10679 ) writes:
  
  [know how Basic Authentication works before hosting web sites]
  
  ... and know that it's a wholly inadequate way of "protecting" credit card numbers!
- Re:Insert foot in mouth.... (Score:2, Funny)
  
  by simong ( 32944 ) writes:
  
  Not necessarily, they are chief executives after all.
The Problem of Search Engines and "Sekrit" Data (Score:4, Funny)

by NTSwerver ( 92128 ) writes: on Monday November 26, 2001 @12:55PM (#2613848) Journal

Please change the title of this article to:

The Problem Incompetent System Administrators

If data is 'sekrit'/sensitive/confidential - don't put it on the web. It's as simple as that. If that data is available on the web, search engines can't be blamed for finding it.

Share
twitter facebook
This is what happens when you use frontpage... (Score:5, Informative)

by Grip3n ( 470031 ) writes: on Monday November 26, 2001 @12:55PM (#2613850) Homepage

I'm a web developer, and I don't know how many times I've heard people who are just getting into the scene talking about making 'hidden' pages. I'm reffering to those that are only accessible to those who click on a very tiny area of an image map, or perhaps find that 'secret' link at the bottom of the page. Visually, these elements seem 'hidden' to a user who doesn't really understand web pages and source code. However, these 'hidden' pages look like giant 'Click Here' buttons to search engines, which is what I'm presuming some of this indexing is finding.

The search engines cannot feasibly stop this from happening, each occurance is unique unto itself. The only prevention tool is knowledge and education, and bringing to the masses a general understanding of search engine spidering theory.

Just my 2 cents.

Share
twitter facebook
- Re:This is what happens when you use frontpage... (Score:2)
  
  by onion2k ( 203094 ) writes:
  
  Often worse than that.. the dreaded visibile:hidden CSS/DHTML that the likes of Dreamweaver is so keen on.. what the eye can't seen the robot certainly can..
- Re:This is what happens when you use frontpage... (Score:2, Insightful)
  
  by EccentricAnomaly ( 451326 ) writes:
  
  C|Net seems to think the security problem is with Google:
  
  "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
  
  This is crazy. Google isn't doing anything wrong. The problem is with the idiots who don't spend five minutes to check that their secret data is really hidden.
  
  This is like blaming a dog owner when his dog bites a burgler... er uh, nevermind.
Example (Score:5, Informative)

by squaretorus ( 459130 ) writes: on Monday November 26, 2001 @12:55PM (#2613854) Homepage Journal

I recently joined an angel organisation to publicise my business in an attempt to raise funds. The information provided to the organisation is supposed to be secret, and only available to members of the organisation via a paper newsletter which was reproduced in the secure area of the organisations website.
A couple of months down the line a couple of search engines, when asked about 'mycompanyname' were giving the newsletter entry in the top 5.

Alongside my details were those of several other companies. Essentially laying out the essence of the respective business plans.

How did this happen? The site was put together with FP2000, and the 'secure' area was simply those files in the /secure directory.

I had no cause to view the website prior to this. The site has been fixed on my advice. How did this come about? No one in the organisation knew what security meant. They were told that /secure WAS!

It didn't do any damage to myself, but a few of the other companies could have suffered if their plans were found. Its not googles job to do anything about this, its the webmasters. But a word of warning - before you agree for your info to appear on a website ask about the security measures. They mey well be crap!

Share
twitter facebook
I've got a solution! (Score:5, Funny)

by CraigoFL ( 201165 ) writes: <slashdotNO@SPAMkanook.net> on Monday November 26, 2001 @12:56PM (#2613855)

Every web server should have a file in their root directory called "secret.xml" or somesuch. This file could list all the publicly-accessible URLs that have all the "secret" data such as credit card numbers, root passwords, and private keys. Search engines could parse this file and then NOT include those URLs in their search results!
Brilliant, huh? ;-)
On second thought, maybe I shouldn't post this... some PHB might actually think it's a good idea.

Share
twitter facebook
Bad.. but (Score:2)

by boaworm ( 180781 ) writes:

Such problems have existed for quite a while. Hackings, Crackings, internet sniffing etc.

The real issue is not if you can.. but if you actually do use the information. Regardless of if it is available or not, it IS ILLEGAL. (Carding does give rather long prison times as well)

People had the chance to steel from other people for as long as mankind existed. This is just another form... perhaps a bit simpler though ...
robots.txt (Score:2, Interesting)

by mukund ( 163654 ) writes:

From my web logs, I see that a lot of HTTP bots don't care crap about /robots.txt. Another thing which happens is that they read robots.txt only once and cache it forever in the lifetime of accessing that site, and do not use a newer robots.txt when it's available. It'd be useful to update what a bot knows of a site's /robots.txt from time to time.

HTTP bot writers should adhere to using information in /robots.txt and restricting their access accordingly. In a lot of occasions, webmasters may setup /robots.txt to actually help stop bots from feeding on junk information which they don't require.. or things which change regularly and need not be recorded.
Oh, for regular expression searching in Google (Score:5, Funny)

by EnglishTim ( 9662 ) writes: on Monday November 26, 2001 @01:03PM (#2613918)

I could be a rich man...

(Not, of course that I'd ever do anything like that...)

Searching with regular expressions would be cool, though...

Share
twitter facebook
Directory listings (Score:2, Informative)

by NineNine ( 235196 ) writes:

Most of tihs is coming from leaving directory listing turned on. Generally, this should only be used on an HTTP front-ends to FTP boxes, and for development machines. IIS has "directory browsing" turned off by default. Maybe Apache has it turned on by default? You'd be surprised to see how many public webservers have this on, making it exceedingly likely that search engines will find files they weren't meant to find. The situation arises when there's no "default" page (usually index.html or default.html, default.asp, etc.) in a directory and only a file like content.html in a directory. IF a SE tries http://domain.com/directory/, it'll get the directory listing, which it can, in turn, continue to spider.
Must... blame... someone.... (Score:3, Funny)

by JMZero ( 449047 ) writes: on Monday November 26, 2001 @01:04PM (#2613925) Homepage

INetPub means "INetPublic" not "INetPubrobably a great place to put my credit card numbers".

Why are stupid people not to blame for anything anymore?

Share
twitter facebook
Business Model (Score:5, Funny)

by Alomex ( 148003 ) writes: on Monday November 26, 2001 @01:05PM (#2613934) Homepage

A while back there was a thread here about the weakness of the revenue model for search engines. Maybe we have found the answer, think about all the revenue that Google could generate with this data!

Anybody knows when Google is going public?

Share
twitter facebook
well golly gosh, it works! (Score:2, Informative)

by Anonymous Coward writes:

search for: password admin filetype:doc

My first hit is:

www.nomi.navy.mil/TriMEP/TriMEPUserGuide/WordDoc s/ Setup_Procedures_Release_1.0e.doc

at the bottom of the html:

UserName: TURBO and PassWord: turbo, will give you unlimited user access (passwords are case sensitive).

Username: ADMIN and PassWord: admin, will give you password and system access (passwords are case sensitive).

It is recommend that the user go to Tools, System Defaults first and change the Facility UIC to Your facility UIC.

oh dear, am I now a terrorist?
Bring out the legal eagles (Score:4, Insightful)

by Milican ( 58140 ) writes: on Monday November 26, 2001 @01:06PM (#2613942) Journal

"Webmasters should know how to protect their files before they even start writing a Web site"

That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.

To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light :)

JOhn

Share
twitter facebook
No easy solution in sight? (Score:2)

by vrmlguy ( 120854 ) writes:

From the article: "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security." -- Gary McGraw
Allow me to disagree. This fellow apparantly agrees with Microsoft that people shouldn't publish code exploits and weaknesses. Sorry, but anyone who had secret information available to the external web is in the exact same boat as someone who has an unpatched IIS server or is running SQL Server without a password.
Let's assume that Google had (a) somehow figured out from day one that people would search for passwords, credit card numbers, etc, and (b) figured out some way to recognize such data to help keep it secret. Should they have publisized this fact or kept it a secret? Publicity would just mean that every script kiddie would be out writting their own search engines, looking for the things that Google et al were avoiding. Secrecy would mean that a very few black hats would write their own search engines, and the victims of such searches would have no idea how their secrets were being compromised.
But this assumes that there's someway of accomplishing item (B), which I claim is very difficult indeed. In fact, it would be harder to accomplish than natural language recognition. Think about it... Secrets are frequently obscure, to the point that to a computer they look like noise. Most credit cards, for example, use 16 digit numbers. Should Google not index any page containing a string of 16 consecutive digits? How about pages that contain SQL code? How would one not index those, but still index the on-line tutorials at MySQL, Oracle, etc?
The only "solution" is to recognize that this problem belongs in the lap of the web site's owner, and the search engine(s) have no fundamental responsibilty.
And please close the door on the way out.... (Score:2)

by pwagland ( 472537 ) writes:

But other critics said Google bears its share of the blame.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Am I the only one scared by this? The problem is googles, simply because they follow links? I find it hard to believe this stuff sometimes!
<rant>When will people learn that criminals don't behave? That is what makes them criminals!</rant>
As our second year uni project we were required to write a web index bot. Guess what? It didn't "behave". It would search through a robots.txt roadblock. It would find whatever their was there to find. This stuff is so far from being rocket science it is ridiculous!
Sure, using Google might ease a tiny fraction of the bad guys work, but if Google wasn't there, the bad guys tools would be. In fact, they still are there.
Saying that you have to write your client software to work around server/administrator flaws is like putting a "do not enter" sign on a tent. Sure, it will stop some people, but the others will just come in anyway, probably even more so just to find out what you are hiding.
Sure enough. (Score:3, Interesting)

by Joe Decker ( 3806 ) writes: on Monday November 26, 2001 @01:12PM (#2613979) Homepage

Looked up the first 8 digits of one of my own CC numbers, and, while I didn't find my own CC # on the net, I did immediately find a large file full of them with names, expiration dates, etc. (Sent a message to the site manager, but this case is pretty clearly an accidental leak.)
At any rate--scary it is.

Share
twitter facebook
Don't know that this is Google's problem.. (Score:2)

by sid_vicious ( 157798 ) writes:

From the article:
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Search and replace "Google" with "Microsoft". The lack of security is in the operating system and the applications which launch the malicious files without warning the user. Google just tell you where to get 'em, not what to do with 'em.
Web Sites are public by definition (Score:4, Insightful)

by hattig ( 47930 ) writes: on Monday November 26, 2001 @01:14PM (#2613986) Journal

It is a simple rule of the web - any directory or subdirectory thereof that is configured to be accessible via the internet (either html root directories, ftp root directories, gnutella shared directories, etc) should be assumed to be publically accessible. Do not store anything that should be private in these areas.
Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.
But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).

Share
twitter facebook
Disagree With Gary McGraw (Score:4, Insightful)

by devnullkac ( 223246 ) writes: on Monday November 26, 2001 @01:16PM (#2614001) Homepage

Near the end of the article, there's a quote from Gary McGraw:
The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief.
I must say I couldn't disagree more. To suggest that web site administrators can somehow entrust Google to implement the "obscurity" part of their "security through obscurity" plan is unrealistic. As an external entity, Google is really just another one of those "bad guys" and the fact that they're making your mistakes obvious without actually exploiting them is what people where I come from call a Good Thing.

Share
twitter facebook
standing naked in front of the window (Score:3, Interesting)

by eddy the lip ( 20794 ) writes: on Monday November 26, 2001 @01:17PM (#2614006)

But other critics said Google bears its share of the blame.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

also known as ostrich security...if you're s00p3r s3cr37 files are just lying around waiting for idle surfers, search engines are the least of your worries. if you don't know enough to protect your files (by, say, not linking to them, or .htaccess files, or encrypting them), it's not the search engines fault. it's you're own dumb ass.
this guy's just looking for free hype for his book. if that's the kind of advice he offers, he's doing more harm than good.

Share
twitter facebook
- Re:standing naked in front of the window (Score:2)
  
  by SirSlud ( 67381 ) writes:
  
  I agree. I was going to say the same thing as the subject of your post .. should the cencus guy get charged with being a peeping tom if he comes up to your house while you're buck naked in front of an open livingroom window?
  
  > .. "software to behave itself."
  
  When asked to clarify furthur, Gary said, "Uh .. you know, like .. uh, C3PO .. and .. uh, Data." I hate idiots like that ... I suppose he thinks Windows should 'ask' you if you want to install viruses (cause heaven forbid a user should have to know anything about protecting their computer), and your hard drive should kindly suggest you upgrade a few days before it bites the dust? Yeah .. technology .. thats the problem .. we just havn't invented anything perfectly enough yet. Sigh .. grab the O'Reilly, it's time for a good old-fashioned CTO beating.
Hint, Hint. (Score:2, Insightful)

by A_Non_Moose ( 413034 ) writes:

"The underlying issue is that the infrastructure of all these Web sites aren't protected."

Agreed. Such lax security via the use of Frontpage, IIS, .asp and VBS in webpages.
You might as well do and impression of Duncan in the movie Shrek "Ooo! Ooo! pick me! pick me!"

Webmasters queried about the search engine problem said precautions against overzealous search bots are of fundamental concern.

Uhh...they are "bots"...they don't think, they do.
Does the bot say "Oh, look, these guys did something stupid...let's tell them about it."

No, they search, they index and they generate reports.

I've seen this problem crop up before when a coworker was looking for something totally unrelated on google.
Sad part was it was an ISP I had respect for, despite moving from them to broadband.
What killed my respect was at the very top of the pages was "Generated by Frontpage Express"...gack!

I don't recall if it was a user account or one of their admin accounts...but for modem access I kind of stopped recommending them, or pointed out my observations.

I have to parrot, and agree, with the "Human Error" but add "Computer accelerated and amplified".

It happens, but that does not mean we have to like it, much less let it keep happening.
fun (Score:2)

by British ( 51765 ) writes:

Try doing a search of the file WinsockFTP leaves(WS_FTP.LOG?). You'll get hundreds of hundreds of results, and you just might find unlinked files mentioned in it.

Of course, there's always good fun going into the /images/ directory(since virtual directories are on by default) on any angelfire user pages. Often you'll find images that the user didn't intend on the public to see.

Of cousre, there's the old fashioned way. If you see an image at http://www.whatever.com/boobie3.jpg, chances are there's a boobie1 and boobie2.jpg.
Totally unrelated to Google (Score:2)

by Sloppy ( 14984 ) writes:

If Google can find it, then a human with a web browser can find it. That's all there is to it. Have info you don't want to share? Then don't share it!
Why are they blaming the search engines? (Score:2)

by eison ( 56778 ) writes:

"But other critics said Google bears its share of the blame."
Why?!

Google is finding documents that any web browser could find. The fault belongs to the idiots who publicly posted sensitive documents in the first place. Why doesn't the article mention this anywhere? Garbage reporting if I've ever seen it.
Blaming Google for this... (Score:3, Funny)

by night_flyer ( 453866 ) writes: on Monday November 26, 2001 @01:52PM (#2614200) Homepage

Is like blaming the Highway department for speeders...

Share
twitter facebook
Checklist for HTTP Distribution of Sensitive Data (Score:3, Informative)

by Gleef ( 86 ) writes: on Monday November 26, 2001 @02:03PM (#2614274) Homepage
First, determine if you really need to distribute this via HTTP. It is far easier to secure other protocols (eg scp), so if there's another way of doing this, do it.

Second, if the sensitive information is going to a select few people, consider PGP encrypting the data, and only putting the encrypted version online. Doing this makes many of the HTTP security issues less critical.

Assuming you still have to put something sensitive online, make sure of the following:
- Only use HTTPS, never use just plain HTTP.
- Use CGI, Java Servlets, or some other server-side program technology to password-protect the site. I will refer to the resulting program(s) as the security program
- Never accept a password from a GET request, only accept them from POST requests.
- Never make the user list or password list visible from the internet, not even an encrypted password list.
- Never place the sensitive information in a directory the web server software knows how to access. Only the security program should know how to find the info.
- Review all documentation for your web server software and the platform used for the security program. Pay special attention to seciurity issues, make sure you aren't inadvertently opening up holes. Keep current, do this at minimum four times a year.
- Subscribe to any security mailing lists for your web server platform operating system web server software, and for the programing platform you used for the security program. If there is anything else running on this machine, subscribe to their security mailing lists too.
- Subscribe to cert-advisory [cert.org] and BugTraq [securityfocus.com]. Read in detail all the messages that are relevant to your setup. Review your setup after each relevant message.
- Don't use IIS.
- Don't use Windows 95/98/Me. Don't use Windows XP Home Edition.
- Don't use any version of MacOS before OS X.
- Don't use website hosting services for sensitive information.
- Never connect to this webserver using telnet, ftp or FrontPage. SSH is your friend.
- Never have Front Page Extensions (or its clones or workalikes) installed on a webserver with sensitive data.
- If there is anything above that you don't understand, or if you can't afford the time for any of the above, hire a professional with security experience and recommendations from people you trust who have used his or her services. It's bad enough that amateurs are running webservers, much less running ecommerce sites and other sites with sensitive data.
The above is an incomplete list. It is primarly there to start giving people an idea of how much effort they should expect to put into a properly administered secure website with sensitive information. Do you really need to distribute this via a web browser?
Share
twitter facebook
- For those who must use IIS (Score:3, Informative)
  
  by JMZero ( 449047 ) writes:
  
  I agree with all of your assertions, except
  
  "Don't use IIS."
  
  This just isn't an option for a lot of people. I would change this to:
  
  "If you use IIS, you need to make sure you check BugTraq/cert EVERY day."
  
  I would also add:
  
  "If you use IIS with COM components via ASP, make sure the DLL's are not in a publicly accessible directory."
  
  This happens a lot, and makes DLL's lots easier to break.
MicroSoft Passport Credit Card # avaliable (Score:2, Interesting)

by peter303 ( 12292 ) writes:

The new issue of "2600" all but gives a kiddie
script for extracting credit card numbers from
the Passport database. Scary. Dont buy anything
through it until they fix it.
- Re:MicroSoft Passport Credit Card # avaliable (Score:2, Informative)
  
  by PaperTie ( 411784 ) writes:
  
  Actually not. The article simply discussed how the Passport system uses cookies to store users' information and how you could possibly get the cookies from a user that still has them. It doesn't detail anything about accessing some magical databse, nor does it mention credit cards.
Are crawlers only using links? (Score:2)

by n-baxley ( 103975 ) writes:

I haven't looked into how the new crawlers are working. I assume that they still follow links from page to page, but are there new types of crawlers that could be searching the directory sturctures of a site? Not that this excusses the webmasters, but it might explain some of the new search results.
Blissful ignorance backfires again. (Score:3, Interesting)

by hkmwbz ( 531650 ) writes: on Monday November 26, 2001 @02:12PM (#2614322) Journal

That a search engine is able to harvest this kind of data just proves that some people don't know what they are doing. Forgive me if I seem judgmental, but these people are probably the same people who think Windows XP is the next step and that IE is the only browser in the world. But as is proven again and again, ignorance backfires. Not only are they attacked by viruses and worms and have all backdoors and security holes exploited - they are ignorant enough to leave users' data in the open, for everyone to get.
Google's comment was:
"The primary burden falls to the people who are incorrectly exposing this information."
This is where they should have stopped. Those who find their credit card information in a search engine will learn a lesson and use services that actually take care of their customers' security and privacy. Google shouldn't have to clean up incompetent people's mess.
In the long run, these things can only lead to the ignorant (wannabe?) players in the market slowly dying because they don't know what they are doing.
I personally hope someone gets a taste of reality here, and that only the serious players survive. The MCSE crowd may finally learn that there's more to it than blind trust in their own (lacking) ability.

Share
twitter facebook
Gary McGraw, super-genius. ;) (Score:2)

by bacchusrx ( 317059 ) writes:

"The guys at Google thought, 'How cool that we can offer this to our users,' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief." - Gary McGraw (quoted in the CNet article).

*blinks*

Well, actually, Gary, it seems to me that it isn't Google that's been caused any grief here, but, those wembasters who didn't "think about security from the beginning." In fact, it looks like Google runs a pretty tight ship.

This is the kind of guy who blames incidents.org for his web server getting hacked. After all, they weren't thinking about security from the beginning, were they?

Riight.

BRx ;)
Password search (Score:3, Interesting)

by azaroth42 ( 458293 ) writes: on Monday November 26, 2001 @03:19PM (#2614880) Homepage

Or for more fun, do a search like

filetype:htpasswd htpasswd

Scary how many .htpasswd files come up.

-- Azaroth

Share
twitter facebook
An advertisement for publicfile (Score:3, Informative)

by kimihia ( 84738 ) writes: on Monday November 26, 2001 @10:03PM (#2617064) Homepage
Perhaps it would be a good idea after reading this article to examine publicfile [cr.yp.to].

It was written by a very security conscious programmer who realises that your private files can easily get out onto the web. That is why publicfile has no concept of content protection (eg, Deny from evilh4x0r.com or .htaccess) and will only serve up files that are publically readable.

From the features page:
- publicfile doesn't let users log in. Intruders can't use publicfile to check your usernames and passwords.
- publicfile refuses to supply files that are unreadable to owner, unreadable to group, or unreadable to world.
A good healthy does of paranoia would do people good.
Share
twitter facebook
Print out this article !! (Score:3)

by AftanGustur ( 7715 ) writes: on Tuesday November 27, 2001 @03:49AM (#2618097) Homepage

No, seriously, do it !
Print it out and hand it on the wall, then put a post-it note on top of it saying : "The best example of 'blaiming the messenger' ever !!!"

Share
twitter facebook
- Where is YOUR speech license? (Score:2)
  
  by Unknown Poltroon ( 31628 ) writes:
  
  WHere have you put your license to speak yoour mind on slashdot? Surely, people cant go around putting anything they want to say into a public forum. They might say anything. A a matter of fact, we must revoke peoples phone privliges until lthey can proove theyre smart enough not to give out credit card numbers to telemarketers. As a matter of fact, lets just legislate intelegence. We can tack it on as a rider for that bill to make Pi = 3.
  Youre a nitwit. Im revoking your speech licnese on slashdot.
- Re:To test your credit-card ordering site... (Score:2)
  
  by Legion303 ( 97901 ) writes:
  
  go to Google, type in "site:yourdomain.com xxxx-xxxx-xxxx-xxxx" where the x's are the credit card number of a known customer.
  Then watch the fraudulent charges fly when the person who was sniffing cleartext HTTP traffic gets it in his logs.
  -Legion
- Re:Nothing to do (Score:2)
  
  by Jburkholder ( 28127 ) writes:
  
  Far as I can tell from checking out the article and then trying this myself on Google is that you can now target your search to specific filetypes [google.com]. If you are dumb enough to store passwords or creditcard numbers in an xls file on your website, google makes it easy to find.
  
  I'm at a loss to explain how someone puts sensitive information on the web in an unprotected location and then points the finger at google because they made it easier to find.
  
  "We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.
  
  "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
- funny? (Score:2)
  
  by 3am ( 314579 ) writes:
  
  are you kidding?
  
  they are talking about sensitive personal information - just don't store this online.
  
  if you really need to access something (that isn't a credit card number... just don't do that!) and don't have physical access to the box, try SSH or at least make sure it's a secure directory (httpS://blah/mystuff...)
- Re:regular expressions to the rescue (Score:2)
  
  by 3am ( 314579 ) writes:
  
  or these sloppy admins could store them in encrypted form and/or in a private directory....
  
  i'm sure google knows of a dozen ways they can do this, but why should they? it isn't prohibitively hard to write a spider, and with a 160GB HD for $300, someone with not-so-pure motives and the equivalent of an undergrad education in CS could write one, send it out (ignoring robots.txt), do the reverse of that regex search to sniff out cc#'s online, and create a database full of beer money.
  
  ie, (as has been mentioned n+1 times already) Google changing their behavior does nothing to fix the underlying problem of sysadmins that are undertrained and/or irresponsible.
- - Re:Easy solution (Score:2)
    
    by Arethan ( 223197 ) writes:
    
    My appologies. I should have been more clear in my intent. Yes, simply masking credit card numbers in pages would allow people to simply search for the mask and follow the same link google did in order to see the unmasked result.
    
    However, my intention was simply to remove Google's legal implication of storing credit card numbers that were not willingly given by the cardholder. They could also autonomously send an email to webmaster@offendingsite.com notifying them of the potentially vulnerable link, entirely from the kindness of their hearts. But, legal issues in the past have shown that this would result in a cease-and-desist and a lawsuit against Google claiming that the crawler/spider has been hacking their website.
    
    Judging from the past, from a legal standpoint, the best thing they can do is simply filter their cached content. If you are worried that people are going to search for ################, then disallow searching for something so erroneous. Or simply change the mask from all # to random special characters.
    
    It's really not that difficult of a solution. Yes, it's a little disturbing that some websites are this easily hacked, but are we really all that surprised? Get into the low-end ecommerce business sometime. You'll be surprised (frightened even) with what some people have been using for their online stores.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

A symptom of poor programming... (Score:4, Insightful)

Re:A symptom of poor programming... (Score:5, Interesting)

Re:A symptom of poor programming... (Score:5, Informative)

Re:A symptom of poor programming... (Score:2)

Re:A symptom of poor programming... (Score:5, Interesting)

Re:Nice work, Legion303. (Score:3, Funny)

Re:A symptom of poor programming... (Score:4, Funny)

Re:A symptom of poor programming... (Score:3)

Re:A symptom of poor programming... (Score:2, Informative)

How can this happen? (Score:4, Redundant)

How this happens (Score:5, Informative)

Re:How this happens (Score:2, Informative)

Re:How this happens (Score:2, Troll)

Directory searches (Score:4, Insightful)

Oh Yeah? (Score:4, Funny)

Re:Oh Yeah? (Score:5, Funny)

Re:Oh Yeah? (Score:2, Insightful)

Re:Oh Yeah? (Score:2, Funny)

Tangential Google Question (Score:5, Interesting)

Re:Tangential Google Question (Score:3, Interesting)

Re:Tangential Google Question (Score:2, Interesting)

It doesn't last (Score:3, Informative)

Re:Tangential Google Question (Score:3, Informative)

Re:Tangential Google Question (Score:2)

Re:Tangential Google Question (Score:3)

how the FUCK is this possible? (Score:2, Insightful)

Re:how the FUCK is this possible? (Score:2, Insightful)

Re:how the FUCK is this possible? (Score:2, Insightful)

Stopping Google won't stop the problem... (Score:5, Insightful)

Re:Stopping Google won't stop the problem... (Score:3, Interesting)

Re:Stopping Google won't stop the problem... (Score:2, Interesting)

Re:Stopping Google won't stop the problem... (Score:2)

Re:Stopping Google won't stop the problem... (Score:5, Funny)

Re:Stopping Google won't stop the problem... (Score:3, Insightful)

Re:Stopping Google won't stop the problem... (Score:4, Insightful)

Different file types make my day (Score:3, Interesting)

Well Behaved Crawlers (Score:4, Insightful)

Many crawlers ignore robots.txt (Score:3, Interesting)

Re:Many crawlers ignore robots.txt (Score:2)

Re:Well Behaved Crawlers (Score:2)

True. (Score:2)

Re:Well Behaved Crawlers (Score:5, Insightful)

comp.risks (Score:2)

Re:comp.risks (Score:2)

Google shouldn't lift a finger (Score:2, Interesting)

Re:Google shouldn't lift a finger (Score:2)

Simple but burdensome solution (Score:4, Informative)

Re:Simple but burdensome solution (Score:5, Insightful)

Re:Simple but burdensome solution (Score:2)

Re:Simple but burdensome solution (Score:3, Informative)

Re:Simple but burdensome solution (Score:2)

Re:Simple but burdensome solution (Score:2)

Google exploit patch for Apache (Score:4, Funny)

Google exploit patch 0.2 for Apache (Score:2, Funny)

Insert foot in mouth.... (Score:2, Interesting)

Basic Authentication (Score:3, Insightful)

Re:Insert foot in mouth.... (Score:2, Funny)

The Problem of Search Engines and "Sekrit" Data (Score:4, Funny)

This is what happens when you use frontpage... (Score:5, Informative)

Re:This is what happens when you use frontpage... (Score:2)

Re:This is what happens when you use frontpage... (Score:2, Insightful)

Example (Score:5, Informative)

I've got a solution! (Score:5, Funny)

Bad.. but (Score:2)

robots.txt (Score:2, Interesting)

Oh, for regular expression searching in Google (Score:5, Funny)

Directory listings (Score:2, Informative)

Must... blame... someone.... (Score:3, Funny)

Business Model (Score:5, Funny)

well golly gosh, it works! (Score:2, Informative)

Bring out the legal eagles (Score:4, Insightful)

No easy solution in sight? (Score:2)

And please close the door on the way out.... (Score:2)

Sure enough. (Score:3, Interesting)

Don't know that this is Google's problem.. (Score:2)

Web Sites are public by definition (Score:4, Insightful)

Disagree With Gary McGraw (Score:4, Insightful)

standing naked in front of the window (Score:3, Interesting)

Re:standing naked in front of the window (Score:2)

Hint, Hint. (Score:2, Insightful)