Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

New Search Engine Takes "Dyve" Into the Dark Web

timothy posted more than 5 years ago | from the looking-for-porn dept.

The Internet 55

CWmike writes "DeepDyve has launched its free search engine that can be used to access databases, scholarly journals, unstructured information and other data sources in the so-called 'Deep Web' or 'Dark Web,' where traditional search technologies don't work. The company partnered with owners of private technical publications, databases, scholarly publications and unstructured data to gain access to content overlooked by other engines. Google said earlier this month that it was adding the ability to search PDF documents. In April, Google said it was investigating how to index HTML forms such as drop-down boxes and select menus, another part of the Dark Web."

cancel ×

55 comments

Dark... dyve.. dyke... (-1, Troll)

Anonymous Coward | more than 5 years ago | (#25738473)

Just got confused there for a while.

Re:Dark... dyve.. dyke... (0)

Anonymous Coward | more than 5 years ago | (#25738503)

Rubber tubing, gas, saw, gloves, cuffs, razor wire, hatchet, Gladys, and my mitts.

so... (5, Funny)

u4ya (1248548) | more than 5 years ago | (#25738527)

this will help me get more porn, how?

Re:so... (0)

Anonymous Coward | more than 5 years ago | (#25738605)

They're also looking into indexing images based on whether they contain boobies.

More tits and boobies (5, Funny)

tepples (727027) | more than 5 years ago | (#25738711)

They're also looking into indexing images based on whether they contain boobies.

You mean like these boobies [wikimedia.org] ? What about these great tits [wikimedia.org] ? And would you tap that ass [wikimedia.org] ?

Re:More tits and boobies (4, Funny)

exp(pi*sqrt(163)) (613870) | more than 5 years ago | (#25739287)

So that's what my parents have been trying to stop me accessing all these years! But I don't see what the big deal is.

Re:More tits and boobies (0)

Anonymous Coward | more than 5 years ago | (#25739601)

you made someone very horny. especially with that ass pic. no sleeping tonight, baby!

Re:so... (1)

mfh (56) | more than 5 years ago | (#25738659)

Well if you consider that Moot only gives you 10 pages of it at a time, a service like Deepdyve will aggregate all that hard to reach stuff. Not that you'd want hard-to-reach porn... Whatever floats your boat!

Re:so... (1, Funny)

Anonymous Coward | more than 5 years ago | (#25739357)

But you don't need to sign up on 4chan...

Re:so... (1, Informative)

Anonymous Coward | more than 5 years ago | (#25739709)

created an account,.....as long as they return results a bit faster than getting you your user name and password to you.

Thank you for your registration.

Due to the wonderful interest that we have received, we will be sending out your username and password next week.
We hope you enjoy using DeepDyve, the research engine for the Deep Web!

Pay walls (4, Informative)

tepples (727027) | more than 5 years ago | (#25738593)

The company partnered with owners of private technical publications, databases, scholarly publications and unstructured data to gain access to content overlooked by other engines.

I know why the other engines don't index these documents: they're behind pay walls. As the second link points out, Google already indexes (some) PDFs, but that doesn't help if the site doesn't want me to see the PDF. There are lot of topics, such as disability rehabilitation and linguistics, that I can't search for without Google returning a bunch of results from sites that require a subscription but to which my county library [acpl.info] doesn't subscribe. (A tip-off for these results is that "Cached" doesn't show up.)

Re:Pay walls (5, Informative)

philspear (1142299) | more than 5 years ago | (#25738637)

It appears this website ITSELF requires a subscription, the "beta" is free, the "pro" is not. Signing up for the beta will get you a registration page, followed by this helpful message:

"Due to the wonderful interest that we have received, we will be sending out your username and password next week.
We hope you enjoy using DeepDyve, the research engine for the Deep Web!"

Not impressed so far that they can't let me use the search for a week unless I pay them money. Don't fall for this scam.

Ignore. (2)

Erris (531066) | more than 5 years ago | (#25738861)

They want money for their service instead of following the mega successful Google, advert supported model? Good, they will be ignored just like the content they offer. This stuff needs to be liberated instead.

Re:Ignore. (4, Interesting)

MMC Monster (602931) | more than 5 years ago | (#25739071)

And I need to login even if I want to search wikipedia???

Nice way to shoot yourself in the foot, guys.

At least you should offer a checkbox on the search page so that registered users get the payed content and anonymous users get what's out there for "free".

Re:Ignore. (1)

thedonger (1317951) | more than 5 years ago | (#25739147)

I think they consider themselves a premium service to a niche market. Consider that at least 75% of the browsing public never gets much deeper than a Google image search for "cute puppies," or some such nonsense.

Re:Ignore. (3, Funny)

philspear (1142299) | more than 5 years ago | (#25739429)

Consider that at least 75% of the browsing public never gets much deeper than a Google image search for "cute puppies," or some such nonsense.

Yes, but you have to realize that some of that 75% is going to want to see MORE cute puppies they couldn't with just google image search.

Re:Ignore. (1)

thedonger (1317951) | more than 5 years ago | (#25748517)

Cute puppies from the Dark Web? I don't want any part of that.

Re:Pay walls (1)

bob_herrick (784633) | more than 5 years ago | (#25739019)

That must be new. I got a username and password yesterday. Unfortunately neither my test search nor the one test search that I tested among those posted returned any results. I suspect it was a victim of /.ing before even being posted to /.

Re:Pay walls (5, Insightful)

z0idberg (888892) | more than 5 years ago | (#25739777)

If they can't set up a registration system that can get someone registered in under a week then how good is the rest of it?

And what do they need my street address for?

Pass.

Re:Pay walls (1)

clang_jangle (975789) | more than 5 years ago | (#25738643)

That's a very valid concern (and an astute observation) you raise. Still, I look forward to seeing how much of this dark web can be freely illuminated. I wish them the best of the luck in their "DeepDyve" project. But at the moment, I confess my mind is slightly more focused on getting some "DeepFryde". Mmmm, grease.

Google guidelines (1)

TheLink (130905) | more than 5 years ago | (#25744677)

"There are lot of topics, such as disability rehabilitation and linguistics, that I can't search for without Google returning a bunch of results from sites that require a subscription "

To me that's a breach of Google's own guidelines.

Here are Google's guidelines:

'Make pages primarily for users, not for search engines. Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."'

In 2006 they blacklisted BMW for breaching them:

http://news.bbc.co.uk/1/hi/technology/4685750.stm [bbc.co.uk]

I've actually reported some of those "subscriber only" sites to Google, but not surprised that nothing much happened - since I suspect Google gets $$$ from them, and the unwritten guidelines is don't deceive users unless you pay us $$$ :).

As Google's user, I very rarely want to get search results for content that I can't access. If they want that feature, at least I should be allowed to opt in/out much like their "safesearch".

So much for Google's don't be evil eh?

You should try search.yahoo.com and search.live.com once in a while to see if they are better. So far they are about as good as Google. If Google becomes worse I have no qualms about switching.

Re:Google guidelines (1)

tepples (727027) | more than 5 years ago | (#25745311)

As Google's user, I very rarely want to get search results for content that I can't access.

If you are still in university, you can access it: e-mail the URL to yourself and access it from a desktop computer inside the university library. I guess Google assumes that anybody who searches for what it thinks are scholarly topics is still in school and therefore has access to the school's subscription to JSTOR, Wiley, Elsevier, SpringerLink, and other closed-access scholarly journal sites that I see spamming the listings. But for those who have left school after finishing a degree, tough droppings. I've already asked Google a couple times for a flag that I could add to a query to block search results with the nocache flag, but nobody has replied.

Re:Google guidelines (1)

TheLink (130905) | more than 5 years ago | (#25746313)

1) I'm not in university
2) Which universities are allowed access? I'll find it interesting and amazing if somehow all universities worldwide have access.

As it is, such results are useless and a big waste of time to me.

Re:Google guidelines (1)

tepples (727027) | more than 5 years ago | (#25746645)

Which universities are allowed access?

Any library that subscribes to journals is allowed access to articles from those journals. In general, underfunded county libraries choose not to spend tax money on them, but libraries affiliated with universities that offer courses in those areas need the subscriptions for their students.

This means more spam (1)

SoundGuyNoise (864550) | more than 5 years ago | (#25738611)

This will certainly defeat the practice of obfuscating links with e-mail addresses in them, by using a picture link or "click here."

If the search engine can read source code, it can certainly parse out an email address.

Re:This means more spam (2, Informative)

tepples (727027) | more than 5 years ago | (#25738639)

This will certainly defeat the practice of obfuscating links with e-mail addresses in them, by using a picture link or "click here."

"Click here" still works: use a web form to send e-mail instead of disclosing an e-mail address that doesn't use a whitelist. And AFB has reported that a picture of an address doesn't work even for legitimate users of speech or braille browsers.

Most Prime (1)

AnalogDog (756238) | more than 5 years ago | (#25738613)

I have pondered how or if this information could be made available. Looking good for open access!

You have to pay?! (5, Insightful)

LingNoi (1066278) | more than 5 years ago | (#25738679)

I don't know about you guys but I prefer not to have to sign up or use the "pro" version for my web searching needs.

In fact why do I have to sign up to web search anything?

Besides this thing looks like it just gets in your way [deepdyve.com] .

Thanks, but it's not a google killer.

Re:You have to pay?! (1)

jabithew (1340853) | more than 5 years ago | (#25739949)

Because it searches stuff Google can't access because of pay barriers. I have to be on the college network to get at this stuff.

BTW, this is not especially revolutionary as Google Scholar from my university can search this stuff. Don't know quite how it works, but it seems to tie in with SFX and/or Metalib. Only it works much better than Metalib.

Re:You have to pay?! (1)

jabithew (1340853) | more than 5 years ago | (#25740025)

Sorry, familiarity makes me forget to clarify, that's SFX OpenURL software [wikipedia.org]

Re:You have to pay?! (1)

LingNoi (1066278) | more than 5 years ago | (#25740045)

Interesting, because I am about to go to my university to search for research papers, I shall have to give this a go.

Re:You have to pay?! (0)

Anonymous Coward | more than 5 years ago | (#25740245)

"Thank you for your registration.

Due to the wonderful interest that we have received, we will be sending out your username and password next week."

Also, I don't see anyway is links to Metalib or your university..

Either you pay via credit card for the pro account or you go with the gimped free account.

Woo... (2, Insightful)

ZekoMal (1404259) | more than 5 years ago | (#25738799)

Just what I needed: 40 million NEW search results to sift through; I already have to deal with the first 5 pages being useful, followed by 60 pages of, let's say 'pokemon glitch' that is really someone's blog that has 500 words slapped on the bottom (nothing quite as useful as finding out a website that came up on the search says at the very bottom of the blog 'boobs anal pokemon glitch asians etc'

Good news is I can finally PAY to be annoyed.

Re:Woo... (1)

digitig (1056110) | more than 5 years ago | (#25739249)

But at least it makes googlewhacking more challenging.

Re:Woo... (1)

captaindirtnap (1231494) | more than 5 years ago | (#25739317)

'boobs anal pokemon glitch asians etc'

What the hell are you searching for again?

Riiiight. (4, Funny)

He Who Waits (1102491) | more than 5 years ago | (#25738809)

It's apparently not working right now. But give it all your personal information now, and they will get back to you.

Re:Riiiight. (0)

Anonymous Coward | more than 5 years ago | (#25739143)

...but I already gave all my personal information to google!

In layman's term (2, Funny)

z-j-y (1056250) | more than 5 years ago | (#25738839)

basically it's like cavity search for the internet.

Dear DeapDyve: +1, PatRIOTic (2, Insightful)

Anonymous Coward | more than 5 years ago | (#25738897)

Login? to search a "dark net".

You are fucking kidding?

I was right about Tesla crashing. I'll make another prediction.

Deap Dyve out of business in 1 year.

Cheers,
Kilgore Trout

P.S. : get the Cyrillic fonts enabled. Russia is invading the U.S.S.A. Finally !!!

Re:Dear DeapDyve: +1, PatRIOTic (1)

RobBebop (947356) | more than 5 years ago | (#25739623)

I know you! For years Anonymous Coward has been making all sorts of predictions. You can't improve your credibility from one specific event to make me believe you! And signing the e-mail with the name of a fictional character from Vonnegut doesn't help either.

Try it (or not) (1, Insightful)

Anonymous Coward | more than 5 years ago | (#25738901)

I shall certainly try it out.
BUT, if it is anything like how badly Cuil went on its first week, it will fail.

Instantly, just by seeing the frontpage, i don't have high hopes.
You have to sign up?
Yes, i will try it, when i can be bothered signing up, WHICH would probably be never, as i will probably forget about it until the article posted here in a month saying how awful it is doing.

Google starts indexing scanned (!) PDFs (4, Informative)

harmonica (29841) | more than 5 years ago | (#25739495)

The summary is a bit misleading. Google has been indexing the textual parts of PDFs for a long time. According to the article they have now started indexing scans inside of PDF files, which requires OCR.

Google has been doing that for catalogs [google.com] for a while now, but OCRing large numbers of scans obviously requires a lot more resources.

Re:Google starts indexing scanned (!) PDFs (1)

404 Clue Not Found (763556) | more than 5 years ago | (#25742285)

The summary is a bit misleading. Google has been indexing the textual parts of PDFs for a long time. According to the article they have now started indexing scans inside of PDF files, which requires OCR.

Google has been doing that for catalogs for a while now, but OCRing large numbers of scans obviously requires a lot more resources.

Like... the amount of resources behind Google Book Search [wikipedia.org] ?

Re:Google starts indexing scanned (!) PDFs (1)

harmonica (29841) | more than 5 years ago | (#25751511)

Like... the amount of resources behind Google Book Search [wikipedia.org] ?

No, a lot more than that. 3.000 books a day is great, but there are a lot more PDF files to be processed. And as usual, if you make a service work some of the time, people will complain, so Google probably took their Books and Catalogs experience and put it to work on a larger scale.

already Convert Scanned PDF Docs to TXT w/Google (1)

mungurk (982766) | more than 5 years ago | (#25746793)

Actually you can already Convert Scanned PDF Documents to Text with Google OCR, though it is not immediate, unless you have control of the indexing frequency of your site. http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/ [labnol.org]

Re:Google starts indexing scanned (!) PDFs (0)

Anonymous Coward | more than 5 years ago | (#25748283)

380,000+ servers ought to be a start with that then
 
http://www.strassmann.com/pubs/gmu/LectureV4.pdf
 
In the lecture Strassman describes 1 cluster = 31654 machines, and there's at least 12 data centers...

Lost me (0)

Anonymous Coward | more than 5 years ago | (#25740127)

You lost me at "Sign Up Now"

Re:Lost me/Login (1)

phrostie (121428) | more than 5 years ago | (#25741141)

That was my first impression too

the last few new search engines that have been advertised here at /. all required a login/account just to search.

how F'ed up is that?

best thing ever or smoking crack? (1)

bigbigbison (104532) | more than 5 years ago | (#25741127)

Either this DeepDyve thing is the best search engine ever or they are smoking crack. They have a pro version for $45 a month. http://www.deepdyve.com/why_deepdyve/deepdyve_pro [deepdyve.com] that's got to be some pretty good venn diagrams to be worth $45 a month...

Re:best thing ever or smoking crack? (1)

bigbigbison (104532) | more than 5 years ago | (#25812085)

Not that it is likely that many people will read this but I got my login to the beta and used it a bit.

I'm in the humanities so perhaps the experience of people in the sciences will vary. However, I'm not terribly impressed. It has potential but as it is there doesn't seem to be much that it offers that Google Scholar doesn't and it has things that don't seem to be of much use at all.

Most of my search results came from Sage Publishing which typically show up in Google Scholar results anyway. If they get other publishers on board then they might have something worthwhile.

However, it isn't entirely clear who they are attempting to serve because part of what is so infuriating is that by default they also include wikipedia search results.Wikipedia is great but having the entry for my search term show up in the results just clutters it because it doesn't offer anything more than just using google to search wikipedia. It also includes patents as well. For 99% of searches it won't result in anything so I'm not sure why that is in there either.

The power of the search is also questionable because they don't tell you if they do boolean, if putting quotes around phrases does anything, or any sort of advanced searches so there's no way of knowing if any of your fancy search techniques are doing anything or not.

Multiplication Problem..... (1)

IHC Navistar (967161) | more than 5 years ago | (#25742753)

"In April, Google said it was investigating how to index HTML forms such as drop-down boxes and select menus, another part of the Dark Web."

-Great, now I can have 10,000 times more irrelevent search results to dig through!

Deep, NOT Dark!!! (1)

Jane Q. Public (1010737) | more than 5 years ago | (#25742815)

There is a difference! A "Dark" web (or more properly Dark Net) is designed to be private. The "Deep Web" simply accesses more information that has always been public, just hard to find.

There is a VERY big difference!

Google (2, Insightful)

Aggrajag (716041) | more than 5 years ago | (#25743141)

I'll wait for Google to assimilate DeepDyve before I'll check it out.

Blocking search engines (0)

Anonymous Coward | more than 5 years ago | (#25763919)

I'll be sure to add DeepDyve to my list of blocked search engine spiders.

The dark web is dark for a reason. Some of us on the dark web don't want our content indexed at all. Others don't want our content indexed if the search engine companies can't be bothered to adhere to the HTTP specifications and recommendations.

Search engines like Google eat up gobs of bandwidth every day by indexing my websites, usually multiple times per day, even when nothing has changed on my websites.

The queer thing is, any updates I make to my websites never make it into the search engine results. Instead, all you see are old listings from the first time Google and the other search engines hit my websites, thus cheating the search engine users out of the time and bandwidth required to directly access my websites to look for updates.

For this reason, and others, I have blocked Google and other search engines from indexing my websites. I'm also checking the Referer header and blocking any that come from those brain-dead search engines. You might want to consider doing the same, at least until the search engine companies do the following:

1) Stop doing daily, or multiple times per day, full indexing of the exact same content. Currently, all of the search engines are ignoring the If-Modified-Since HTTP headers. They are also always using GET instead of the less bandwidth intensive HEAD directive for the follow-on indexing.

The correct way to do follow-on indexing after the initial index would be to use HEAD, followed by GET if and only if the If-Modified-Since timestamp has changed.

2) Start reflecting updates to the indexed websites in the search results. If the search engines have already indexed your website multiple times, there is no legitimate reason for them not updating the search results.

If the search engine companies refuse do to 1) and 2), then:

3) Start paying website operators for the bandwidth their brain-dead search engines are wasting.

Google and others are making billions of dollars each year in advertisements that show up next to the search results that include your content. Much of it is completely unrelated to your content. If the search engine companies are going to waste my bandwidth for no good reason at all, I want compensation.

This is one of the real reasons why comcast and at&t want to charge per-gigabyte metered rates. The smokescreen lie of it being because of P2P is transparent to those of us who have a clue.

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...