×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Common Crawl - now everyone can be Google

mikejuk (1801200) writes | more than 2 years ago

Open Source 0

mikejuk writes "If you have ever thought that you could do a better job than Google but were intimidated by the hardware needed to build a web index, then the Common Crawl Foundation has a solution for you. It has indexed 5 billion web pages, placed the results on Amazon EC2/S3 and invites you to make use of it for free. All you have to do is setup your own Amazon EC2 Hadoop cluster and pay for the time you use it — accessing the data is free. This idea is to open up the whole area of web search to experiment and innovation. So if you want to challenge Google now you can't use the excuse that you can't afford it."
Link to Original Source

0 comment

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...