Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Super-Fast RDF Search Engine Developed

ScuttleMonkey posted more than 7 years ago | from the google-to-buy-ireland dept.

The Internet 144

The Register is reporting that Irish researchers have developed a new high-speed RDF search engine capable of answering search queries with more than seven billion RDF statements in mere fractions of a second. "'The importance of this breakthrough cannot be overestimated,' said Professor Stefan Decker, director of DERI. 'These results enable us to create web search engines that really deliver answers instead of links. The technology also allows us to combine information from the web, for example the engine can list all partnerships of a company even if there is no single web page that lists all of them.'"

Sorry! There are no comments related to the filter you selected.

First Search (-1)

Anonymous Coward | more than 7 years ago | (#18988171)

Surely the first search will be useless on such a system if the dynamics of the collected data do not intricately model the meaning of the question you are trying to ask.

42 (1)

Prysorra (1040518) | more than 7 years ago | (#18988237)

NO.

The first answer will be 42.

Re:42 (0)

Anonymous Coward | more than 7 years ago | (#18988379)

I was trying to be serious and all you can do is fool around!

Re:42 (1)

sryx (34524) | more than 7 years ago | (#18989775)

NO.

The first answer will be 42.


That, it turns out wasn't the hard part, it's figuring out the query!
-Jason :P

Re:42 (1)

VWJedi (972839) | more than 7 years ago | (#18990707)

The first answer will be 42.

If that is the answer to "Life, the Universe, and Everything", then ALL the answers should be 42.

Give me a couple minutes and I'll write the code for you search engine.

Official DERI Website (3, Informative)

achillean (1031500) | more than 7 years ago | (#18988207)

Here's the link to the official NUIG: DERI (omgwtfbbq) website in Ireland:

DERI [www.deri.ie]

TMA: Too Many Acronyms (2, Insightful)

EccentricAnomaly (451326) | more than 7 years ago | (#18988717)

Why assume everyone knows your acronyms. To me RDF means "Reality Distortion Field". Zeesh, 7 billion triples or whatever.

Re:TMA: Too Many Acronyms (4, Funny)

QuickFox (311231) | more than 7 years ago | (#18988915)

Why assume everyone knows your acronyms.

OMG: Oh my God!
WTF: What the fuck?
BBQ: Barbecue.

HTH

Re:Official DERI Website (4, Funny)

PDHoss (141657) | more than 7 years ago | (#18989031)

I tried to access that site, and I got a good look at their DERI Error.

Re:Official DERI Website (0)

Anonymous Coward | more than 7 years ago | (#18989345)

Go to your room.

Re:Official DERI Website (1)

YourMotherCalled (888364) | more than 7 years ago | (#18989801)

You're welcome. I'm beautiful. You'll be here all week. I should tip my waitress.

Re:Official DERI Website (1)

fche (36607) | more than 7 years ago | (#18989573)

Some of their software is at http://www.deri.ie/publications/tools/ [www.deri.ie]
but not the "yars2" world-record-busting supergadget.

This could be huge (4, Interesting)

$RANDOMLUSER (804576) | more than 7 years ago | (#18988211)

Except for the minor little problem of getting everyone to agree on the ontologies. Being able to search quickly is important, but until somebody comes up with the Dewey Decimal System for all knowledge, it won't mean much.

Next up: Ontology spam (5, Insightful)

G4from128k (686170) | more than 7 years ago | (#18988361)

Yes, creating a consistent ontology is challenge. But the bigger challenge is the lack of incentive for ontology truthfulness. If this type of search becomes popular, ontology spam and OSEO (Ontology Search Engine Optimization) will become a booming industry.

Re:Next up: Ontology spam (1)

$RANDOMLUSER (804576) | more than 7 years ago | (#18988497)

Of course you're correct. It had never occured to me that there would be ontology spam, but of course there will be. Still, for the pure knowledge aspects (think Wikipedia on RDF) it would be a wonderful thing.

Re:Next up: Ontology spam (1)

inviolet (797804) | more than 7 years ago | (#18988707)

Of course you're correct. It had never occured to me that there would be ontology spam, but of course there will be. Still, for the pure knowledge aspects (think Wikipedia on RDF) it would be a wonderful thing.

For a while, yes. But as long as there is a cash-per-page-view market, the onslaught of adverspam will reach every corner of the web. It can't be stopped as long as there is money to be made there.

Certainly the big "pure knowledge" sites will defend themselves, as Wikipedia does, but that is an arms race that will eventually exhaust the resources of any single organization.

Re:Next up: Ontology spam (1)

maxume (22995) | more than 7 years ago | (#18988835)

Semantic information still adds value to a given page in isolation. Hence microformats.

Re:Next up: Ontology spam (1)

G4from128k (686170) | more than 7 years ago | (#18988803)

I agree with you 100% and did not mean to imply that the goal is not worthy. Being able to search semantically or to pull out just the relevant information would be hugely valuable.

And I'm sure that next generation search engines will create clever ways of detecting and punishing ontology spam (e.g., noting the dissonance between the text and the tags)

Re:Next up: Ontology spam (1)

cyphercell (843398) | more than 7 years ago | (#18989035)

I was thinking the article kinda indicates a resolution to the ontology. Most definitions are a product of synonymous/antonymous context. For instance a person cannot understand the concept of clear without simultaneously understanding opaque. This level of search would suggest that if you throw enough generic definition at a term then some logic could be used to say "if we find so many synonyms then we have an accurate definition" this is how AIML works at a basic level. RDF would be like AIML on crack and steroids.

Re:Next up: Ontology spam (1)

$RANDOMLUSER (804576) | more than 7 years ago | (#18989479)

Ah, but that's the rub: most things are not binary, neither this nor that. Things live on a continuum, and it's all too often a judgement call where they should lie.
transparency ==> translucency ==> opacity

Or, to put it in website design terms: "It's not blue enough.

Next up: Murky spam (0)

Anonymous Coward | more than 7 years ago | (#18991107)

Fuzzy Sets [wikipedia.org]

Re:Next up: Ontology spam (1)

Rakshasa Taisab (244699) | more than 7 years ago | (#18989185)

Why do I suddenly get this mental image of spam on increasing the plumage size of birds...?

Re:Next up: Ontology spam (1)

DragonWriter (970822) | more than 7 years ago | (#18991261)

Yes, creating a consistent ontology is challenge. But the bigger challenge is the lack of incentive for ontology truthfulness.


I'd say consistent ontology is a bigger challenge (though also one that doesn't need to be anywhere near completely solved for all kinds of useful applications to exist.) Trust mechanisms built on RDF aren't really all that big of a challenge: trust relationships are fairly basic, straightforward relationships of exactly the type RDF was designed to express from the outset, after all.

Re:Next up: Ontology spam (1)

CarpetShark (865376) | more than 7 years ago | (#18991599)

Agreed, spam could really screw things up. Different ontologies aren't such a big problem, as there are already tools to translate between them -- that's part of the show, rather than being showstopper. However, when a search engine like this is open source, available to install on a lan (or any specific project server), and can be kickstarted with an ontology that says "we trust x, y, and z, and n hops of trust from them, with each trust hop reduced by a m", things may start to look up.

Re:This could be huge (0)

Anonymous Coward | more than 7 years ago | (#18988621)

How about People?
Who cares if a search is so much is faster??
  Does this make people process the info any faster ?

Re:This could be huge (1, Insightful)

Anonymous Coward | more than 7 years ago | (#18988625)

Except for the minor little problem of getting everyone to agree on the ontologies. Being able to search quickly is important, but until somebody comes up with the Dewey Decimal System for all knowledge, it won't mean much.
How about the Dewey Decimal System?

Re:This could be huge (1)

angst_ridden_hipster (23104) | more than 7 years ago | (#18990981)

It was an admirable attempt in its time, but it's pretty clunky. It's also very biased towards the world-view of one man in the 1870s. While it does get updated, you'll find that there are structural issues. The classic example is religion: everything involving Buddhism, Sikhism, or Jainism is lumped together in a number space which is the same size as the number space reserved for Christian "Parish Government And Administration." Christianity itself gets 88 percent of all the top-level numbers set aside for religion.

Anyone who has ever tried to make a meaningful taxonomy knows that it's really, really hard.

Still, it could be a starting point for the inevitable endless committees.

Re:This could be huge (3, Insightful)

complete loony (663508) | more than 7 years ago | (#18988645)

Ah, but the Dewey Decimal system only works because responsible people are involved in categorizing everything. They let just anyone publish information on the internet these days.

Re:This could be huge (1)

stevemulligan (1097663) | more than 7 years ago | (#18988893)

First time poster, long time lurker. Just wanted to say grats grats to the Irish. Way to go. What kind of hardware spit out 7billion records in a second? I guess I'll have to read the article... :(

AND is it public? Can I hook up to it and send some queries just to see for myself how fast it is?

Re:This could be huge (1)

StefanDecker (1097693) | more than 7 years ago | (#18989569)

Fully agreed. But it worked for RSS - and it also seems to work for SIOC (see http://sioc-project.org/ [sioc-project.org] ). Other XML structured formats are also catching on - eg., XBRL. All of them can be (quite easily) translated in a graph and integrated. So there is hope. However, Andreas and Aidans work reported on in the press release enables us to build scalable engines - scalability was a major headache before.

Re:This could be huge (2, Interesting)

spemen (1075451) | more than 7 years ago | (#18989697)

Actually there is a lot of research being done to get around the need for a 'Dewey Decimal System'. The idea is to analyze relations between terms (names, datatypes, ect.) in an ontology. One could also compare relationships between terms: A child of B, C child of D, and A=B does B==A ?? Please note that these are examples of how terms and ontologies *could* be matched and not necessarily how someone would match terms. http://www.ontologymatching.org/ [ontologymatching.org] Also, http://wordnet.princeton.edu/ [princeton.edu] is a project I think is in the direction of a 'Dewey Decimal System' for knowlege.

The only RDF I know.... (0)

Anonymous Coward | more than 7 years ago | (#18988225)

Is "Reality Distortion Field."

Sounds about right, in this case.

Why would I want to search... (1)

msauve (701917) | more than 7 years ago | (#18988233)

for a Radio Direction Finder?

Re:Why would I want to search... (0)

Anonymous Coward | more than 7 years ago | (#18988597)

Don't they mean Reality Distortion Field?

Re:Why would I want to search... (1)

Mattintosh (758112) | more than 7 years ago | (#18988685)

Yes. And it's always easy to find the largest RDF on the planet... It's wherever Steve Jobs is. I hear it extends into space.

Re:Why would I want to search... (1)

SleepyHappyDoc (813919) | more than 7 years ago | (#18988965)

I thought they meant the Robotech Defense Force.

Links! (3, Insightful)

SolitaryMan (538416) | more than 7 years ago | (#18988235)

These results enable us to create web search engines that really deliver answers instead of links.

I need both: answers *and* links! Many times when I search the web, I don't know for sure what am I searching for, let alone being able to ask specific question...

Re:Links! (1)

CastrTroy (595695) | more than 7 years ago | (#18988547)

This is probably the biggest problem with searching. Google can return really good results if you know what to search for. Most people I know just type in the first word that pops into their head, and make the search way too generalized, and don't get good results. Knowing what words to type in can save you a lot of time in searching.

Re:Links! (1)

Andy_R (114137) | more than 7 years ago | (#18988761)

The sad thing is that it's so easy to learn how to get good results using current search engines, but people are never taught how to do it.

RDF could do very useful things, like throwing up a disambiguation question at the top os the results page when you've not made it clear what you want, or filtering out the plague of typosquatter/content free price comparison/'be the first to write a review of this item' sites, but so could a bit more intelligence built into Google.

Re:Links! (1)

CastrTroy (595695) | more than 7 years ago | (#18988927)

People just expect computers to do everything for them, and turn off their brains most of the time. This is why people have so many problems operating computers. Most people when searching for information about Cats (the musical) will probably just type in "cats", and look through all the results. Whereas, a person who understands the concept of feeding the right information to the search engine, will probably type in "cats musical", or if you're looking for something more specific, you may type in "cats musical song list".

Re:Links! (2, Funny)

Red Flayer (890720) | more than 7 years ago | (#18989281)

RDF could do very useful things, like throwing up a disambiguation question at the top os the results page when you've not made it clear what you want
It looks like you're trying to search for tentacle porn. Would you like help?

No thanks, I don't need Clippy in my search engine.

Re:Links! (1)

Andy_R (114137) | more than 7 years ago | (#18989691)

I'm not suggesting Clippy, I'm just suggesting disambiguation. Google already does this for typos (You searched for "kats musical song list" did you mean "cats musical song list"?). If Google noticed that the cats pages fell into 3 major categories (musical/animal/character who says 'all your base') and offered me those options in the typo line, I'd find that useful in narrowing down which of the 86,500,000 pages it found is the right one.

In your example, I'm guessing you might find the option to filter down by gay/straight or censored/uncensored useful?

Re:Links! (1)

Red Flayer (890720) | more than 7 years ago | (#18989805)

I know, I was just making a joke.

However, I think contextural disambiguation questions like what you're suggesting are already served by "search within results" queries. Proposing likely criteria for narrowing down the results would be, I think, a disservice. It pigeonholes sites, but worse than that, pigeonholes searches. This leads to easy gaming of the search system -- SEO would cause pretty much every site to make sure it's associated with the typical disambiguation terms, thus removing the utility of those terms.

Re:Links! (1)

Andy_R (114137) | more than 7 years ago | (#18990371)

I'm not so sure. I'm not suggesting linking to disambiguation pages (which could be gamed by SEO), I'm suggesting Google analyses the text and notices that pages tend to either use the words "Andrew Lloyd Webber" "Kitty-litter" or "set us up the bomb" and that these phrases tend to be mutually exclusive, so they would be good ones to offer as means of disambiguation.

The terms wouldn't be 'typical disambiguation terms', as they would be generated freshly from the content of the pages that appear in the search results. Probably too computationally intensive to do yet, but ontology isn't needed to give useful results when it becomes possible.

"Kitty-litter" isn't a great example, but it shows why I think this would be useful - I spent a while trying to thing of a phrase would occur in pages about the animal but not the musical and I couldn't come up with a good one. Doing a statistical analysis on the text content would probably find a much better term, and that term would be offered to me in the disambiguation line.

Normally I'd use "-musical" as a search term if I was after feline resultd, but that relies on me knowing about the existance of the musical and it does a sort of reverse pigeon-holing by throwing out pages that are 99% about felines but happen mention the musical in passing.

I'm not seeing any way for SEO to game this.

Re:Links! (1)

Red Flayer (890720) | more than 7 years ago | (#18990607)

Hmm. I do manually what you're suggesting when searching. Enter my search terms, and if most of the results are for something different than what I'm looking for, I'll add terms to remove the extraneous results. This is based on the couple lines of content info returned by Google (of course, those lines aren't 100% fresh with Google).

So, I think what you're suggesting is that the search engine prompt those terms to help people narrow their search? Didn't Ask Jeeves try this and miserably fail -- and if so, is that because of execution or concept?

Great!! (0, Insightful)

Anonymous Coward | more than 7 years ago | (#18988239)

Now all we need to do is get everyone to start using RDF.... wait.. you dont even know what that is??

Re:Great!! (4, Informative)

$RANDOMLUSER (804576) | more than 7 years ago | (#18988343)

Now all we need to do is get everyone to start using RDF.... wait.. you dont even know what that is??
It's the Resource Description Framework [w3.org] , which RSS is a subset of.

Re:Great!! (2, Informative)

jrumney (197329) | more than 7 years ago | (#18988945)

Actually, only RSS 1.0 is based on RDF. The only similarity between RDF and the more popular RSS 2.0 and RSS 0.92 is that they are all based on XML.

Re:Great!! (1)

martinmarv (920771) | more than 7 years ago | (#18989671)

The article is a bit light on details, but if this can be used to filter my RSS feeds in real-time, I'll be a happy bunny.

Search solved. World hunger next. (3, Funny)

140Mandak262Jamuna (970587) | more than 7 years ago | (#18988265)

Having solved the problem of search, and providing a breakthrough product that has consciousness to what was previously mere series of tubes, now the National University of Ireland announced that it is going to solve world hunger next, may be in three months. Other projects in the pipeline includes cure for cancer and solving full Navier Stokes equation.

Re:Search solved. World hunger next. (1)

skidv (656766) | more than 7 years ago | (#18988439)

'The importance of this breakthrough cannot be overestimated,' said Professor Stefan Decker, director of DERI.

Having solved the problem of search, and providing a breakthrough product that has consciousness to what was previously mere series of tubes

This breakthrough makes it possible to use the Interweb as a tube of artificial intelligences capable of answering such questions as "Who is Neuromancer?" and "Why is the number 42 so important, anyway?" as well as organize a successful revolution by moon colonists.

Re:Search solved. World hunger next. (0)

Anonymous Coward | more than 7 years ago | (#18988687)

It's obvious that the solution to hunger is whirled peas.

While we are on the topic, what's all the hoopla about endangered feces?

too slow (0, Redundant)

BeoCluster (995566) | more than 7 years ago | (#18988299)

Looks great but it would work even faster using a Beowulf Cluster !

Hype (4, Insightful)

gvc (167165) | more than 7 years ago | (#18988317)

users should get more relevant results


Yet another /. article parroting an uncritical popular press account of a press release.

Re:Hype (1)

Dhalka226 (559740) | more than 7 years ago | (#18988647)

Yet another /. post bitching about /. articles, yet adding absolutely no value of their own.

Seriously. Do you have anything to add to the discussion or were you simply karma whoring?

Re:Hype (2, Insightful)

StefanDecker (1097693) | more than 7 years ago | (#18989645)

We have a Technical Report available at http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf [www.deri.ie] that should answer most of the technical questions. From the abstract: "We present the architecture of an end-to-end search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of computers. We evaluate the system on a dataset with 430 million statements collected from the Web, and provide scale-up experiments on 7 billion synthetically generated statements."

"'...this breakthrough cannot be overestimated" (0)

Anonymous Coward | more than 7 years ago | (#18988357)

Cool. It'll end war and bring universal freedom to all people.

Re:"'...this breakthrough cannot be overestimated" (1)

Miseph (979059) | more than 7 years ago | (#18988465)

I asked the RDF search engine, and that's what it told me. Maybe if we ask it the right question it could come up with an answer to do that. Now if only we could devise a machine powerful enough to tell us what that question would be...

Re:"'...this breakthrough cannot be overestimated" (1)

FormOfActionBanana (966779) | more than 7 years ago | (#18988609)

...and overclock the sucker until she smokes...

Re:"'...this breakthrough cannot be overestimated" (1)

tygerstripes (832644) | more than 7 years ago | (#18988935)

Ah, so Global Warming is evidence that the mice are looking to boost their clockspeed...

RDF? (4, Funny)

lancelotlink (958750) | more than 7 years ago | (#18988393)

I didn't realize Steve Jobs' Reality Distortion Field was able to be harnessed and bottled in a search engine, or any software for that matter. His abilities are boundless!

RDF (1)

ZooSpeed (233670) | more than 7 years ago | (#18988555)

Why would anyone want to search Steve Jobs Reality Distortion Field?

Here's the first query result... (-1, Troll)

TheChromaticOrb (931032) | more than 7 years ago | (#18988571)

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

Re:Here's the first query result... (1)

roshanpv (1090393) | more than 7 years ago | (#18989283)

Don't put slashdot to trouble with such messages

I'll prove him wrong (3, Interesting)

Big Nothing (229456) | more than 7 years ago | (#18988577)

"'The importance of this breakthrough cannot be overestimated,' said Professor Stefan Decker, director of DERI."

This is without a doubt the greatest invention in the history of time!

There, I just proved the professor wrong. Muahaha.

contradictory (1)

DohnJoe (900898) | more than 7 years ago | (#18988607)

The importance of this breakthrough cannot be overestimated
I think he just did...

Re:contradictory (0)

Anonymous Coward | more than 7 years ago | (#18990035)

'The importance of this breakthrough cannot be overestimated,'
It'll cure us from Aids, malaria, smallpox, close the hole in the ozonlayer, get our climate back under control while allowing for exponential economic growth and will put a man on Mars in nine months time, all by itself without any human intervention.

There. This is all true according to TFA.

Cannot be overestimated (4, Insightful)

stevenp (610846) | more than 7 years ago | (#18988617)

- "The importance of this breakthrough cannot be overestimated"

The importance of any event can be overestimated and quite often is overestimated. It is called hype.
When speaking of XML, XHTML and semantic WEB then the word "overestimated" fits just nice.
If this was not the case then HTML should long have been dead and the whole WEB should have been based on pure XML with meaningful tags.

-- Do not read me, I am a stupid tag

Re:Cannot be overestimated (1)

gaspar ilom (859751) | more than 7 years ago | (#18991379)

How could this possibly be modded insightful? This is actually an uninformed and naive view (however prevalent on Slashdot) of how and why XML are used.

I've seen this sentiment regarding "HTML" vs "XML" on Slashdot so often; let me set the record straight:

Many sites use XML on the back-end, either as an interchange format with a DB, or to store and to generate HTML. I would dare say that *most* web-based applications of XML generate HTML, rather than XML, as the final output format. Outputting HTML is simply a matter of supporting the lowest common denominator, and is no indication that "XML" lacks utility. With minimal effort, these XML data sources could be made directly accessible by search engines. (Or, XMHTML can be outputted instead of, or in addition to, HTML)

I would opine that if you are a developer, and you are storing vast amounts of marked-up data as "HTML" -- rather than X/HTML, then you are doing something wrong. (I'll back this up with one brief point: It is *trivial* to transform XML into HTML using a technology like XSL -- but going from HTML to XML most certainly is not.) Having your data in a machine-readable, micro-addressable format like X/HTML is so much more advantageous than HTML tag soup, I shouldn't have to enumerate all the reasons here. (This remedial info can be sought via a quick google search.)

Just because YOU only see "HTML" in the browser (and don't know how XML is used?), it does not mean that XML is not useful -- or even ideal -- for representing marked-up text and documents in an information system.

Re:Cannot be overestimated (1)

Ant P. (974313) | more than 7 years ago | (#18991561)

Yes, I agree the W3C's vision of everyone surfing the WEB on their Blue-ray equipped MACS is just wishful thinking

Mere fractions of a second? (0)

Anonymous Coward | more than 7 years ago | (#18988723)

Since everyone's being pedantic... I notice it takes more than one fraction of a second then.

Web or Database search engine? (0)

Anonymous Coward | more than 7 years ago | (#18988747)

Ok, I might say something very stupid here but even the best search engine still isn't a WEB search engine but a DATABASE search engine (searching in copies/excerpts from websites previously (i.e. recently) acquired).

My question: has someone ever proposed (i.e. written down) ideas/plans/designs for a life-searchable web (altough something like that would seem impossible to me)? It might be a very interesting read however.

Could be interesting, but missing details (5, Interesting)

Anonymous Coward | more than 7 years ago | (#18988821)

What kind of data set did they use? The structure and contents of the graph that is the data in an RDF database has a huge impact on the performance of query execution, and different applications have different structures.

What kind of queries are they running? There are several different RDF query languages (think of SeRQL, RDQL, N3, SPARQL, etcetera) and some of them support quite complex queries. Quickly finding the answers to a simple query like

SELECT ?name WHERE ?name <http://xmlns.com/foaf/0.1/name> "John Smith"
is just a matter of an indexed lookup and not very special. But, like in SQL, much more complex expressions can be generated that require complex index operations on the query execution level. Having implemented an RDF database that supports SPARQL queries an order of magnitude faster than the software the W3C uses for their experiments (which, admitedly, doesn't have performance as a prime requirement), I know that it's possible to do simple things fast, but the interesting part is handling RDF queries that don't easily map to efficient database operations.

Which brings me to the most important point: where is their detailed report? Can I get the software somewhere and perform my own tests? The article is too vague to draw any conclusions about what their RDF database does, and how good it is. I'd love to read up on it, but I can't seem to find the information.

Here's the Tech Report (5, Informative)

aharth (412459) | more than 7 years ago | (#18988963)

Hello, I am one of the main developers of SWSE. True, the press release is vague, but there is only so much you can say in a press release aimed for the general public.

We have a Technical Report available at http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf [www.deri.ie] that should answer most of the technical questions.

From the abstract:

"We present the architecture of an end-to-end search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web.

In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of computers.

We evaluate the system on a dataset with 430 million statements collected from the Web, and provide scale-up experiments on 7 billion synthetically generated statements."

Re:Here's the Tech Report (3, Insightful)

$RANDOMLUSER (804576) | more than 7 years ago | (#18989253)

You are too modest. You're the lead author. Congratulations on a first-rate contribution to mankind. And such a young pup [harth.org] , too.

Re:Here's the Tech Report (0)

Anonymous Coward | more than 7 years ago | (#18989549)

I skimmed the pdf real quick and noticed the index section didn't consider bitmap indexes. Given that RDF data is relatively static and queries are n-dimensional, it seems odd that bitmap indexes weren't considered. I've pointed this out to other RDF storage engines and they also made the same mistake of not considering bitmap indexes. Using an efficient bitmap index means queries would be near constant even for complex queries.

Mod parent up! (1)

smartdreamer (666870) | more than 7 years ago | (#18989943)

... for obvious reasons.

SUPER Speed (2, Funny)

phoric (833867) | more than 7 years ago | (#18988849)

Colonel Sandurz: Prepare ship for light speed. Dark Helmet: No, no, no. Light speed is too slow. Colonel Sandurz: Light speed is too slow? Dark Helmet: Yes. We're gonna have go right to... SUPER speed. [everybody gasps] Colonel Sandurz: SUPER speed? Sir, we've never gone that fast before. I do'nt know if this ship can take it. Dark Helmet: What's the matter Colonel Sandurz? Chicken? Colonel Sandurz: [Wimpering] Prepair ship! [Calms down] Colonel Sandurz: Prepare ship, for Ludicrous speed. Fasten all seat belts. [everybody fastens in their seat belts and locks all of the doors] Colonel Sandurz: Seal all entrances and exits. Lock all stores in the mall. Cancel the 3-ring circus. Secure all animals in the zoo... Dark Helmet: [Takes the intercom from Sandurz] Gimme that, you petty excuse for an officer! [speaks into the intercom as Sandurz puts on his seat belt] Dark Helmet: Now hear this, Ludicrous speed... Colonel Sandurz: [Interrupts] Sir, you better buckle up. Dark Helmet: [to Sandurz] Ah, buckle this. [Into the intercom] Dark Helmet: SUPER speed, go!

Re:SUPER Speed (2, Informative)

VWJedi (972839) | more than 7 years ago | (#18990613)


[Wimpering] Prepair ship! [Calms down] Colonel Sandurz: Prepare ship, for Ludicrous speed. Fasten all seat belts.

If you're going to steal a joke, you need to make sure to replace all references to the original. Find / Replace works great for this.

all about context (0)

Anonymous Coward | more than 7 years ago | (#18988931)

if those RDF statements are tiny and basically pointless, then searching 1 billion entries isn't all that hard. Especially if they are properly indexed. Most RDF engines suck ass at the moment. If they implement an efficient bitmap index for the RDF statements, the query times for complex n-dimensional queries should be basically constant. W3C's specs for semantic web suck ass and their approach is totally impractical.

RDF needs a killer app... (-1, Troll)

Bazman (4849) | more than 7 years ago | (#18989009)

...perhaps an ontology for pr0n?

Two things... (1)

PornMaster (749461) | more than 7 years ago | (#18989167)

First, giving the amount of time and the number of items searched means nothing. Are they doing it on a BlueGene or an Apple II?

Second, the problem with "the semantic web" if you're relying on people providing the metadata themselves, is the reliability (trustworthiness?) of the person creating the metadata. There's a reason the meta name="keywords" tags aren't a significant factor if at all in any of the major search engines' ranking systems.

Re:Two things... (1)

StefanDecker (1097693) | more than 7 years ago | (#18989729)

First: The experiments have been done on a 18 node cluster of cheap servers.
Second: There are other ways to get metadata - eg., via SIOC (see URL:http://sioc-project.org/>. But true, trust is an issue. And some people in DERI Galway are working on ranking algorithms on top of the search engine.

Web of Data (not just metadata) (1)

CaptSolo (899152) | more than 7 years ago | (#18990079)

Second, the problem with "the semantic web" if you're relying on people providing the metadata themselves, is the reliability (trustworthiness?) of the person creating the metadata.

One of misconceptions about the Semantic Web - that it's only about metadata when in fact it's about a Web of Data, e.g., currently locked in in databases, blog engines or social software sites. (related: SemWeb FAQ entry [w3.org] on "Does the Semantic Web require me to manually markup all the existing web-pages ... ?")

A very, very simple example - if you enable creation of RDF data creation in a WordPress weblog (via a WordPress SIOC plugin [sioc-project.org] ), all this information is generated automatically, from the data already inside a database. What you get is every blog post, etc. in a machine-readable form (RDF), ready for query and reuse.

Of course, that is very "light" semantics - expressing what the blog engine knows. As for data / structured content created by people directly - there's always risk for someone writing lies. Then there's a need for the concept of trust (can we trust the source?) and some ranking mechanism.

sounds fishy (2, Interesting)

vga_init (589198) | more than 7 years ago | (#18989453)

Of course a search based on meta data is going to be faster and more accurate, but only when the meta data is correct. We've had this since the beginning of the interweb; people would load up their pages with bogus meta data just to generate search traffic. Because of this dishonesty, search engines have had to resort to other methods of evaluating and indexing pages (for example, based on actual content).

I don't see any difference between this new RDF and that old stuff.

Re:sounds fishy (1)

CaptSolo (899152) | more than 7 years ago | (#18989657)

RDF is just a way to express knowledge. In answer to "any difference between this new RDF and ..." you may take a look at the W3C Semantic Web FAQ [w3.org] (published very recently).

Now, like you said what we find depends on what we feed into search engines and on the engines themselves. To this regard it's work for better search engines and ranking algorithms, and the work described here is an important step in this path. There's a link to a technical report and more details posted (by a developer) in another Slashdot comment [slashdot.org] .

sounds foxy. (0)

Anonymous Coward | more than 7 years ago | (#18990881)

"RDF is just a way to express knowledge. In answer to "any difference between this new RDF and ..." you may take a look at the W3C Semantic Web FAQ (published very recently)."

Mozilla uses RDF under the hood.

Save the hype (1)

broothal (186066) | more than 7 years ago | (#18989535)

So now we have a search engine capeable of making a godzillion searches in a data domain that does not exist yet. That's all great and dandy, and we do indeed need new models and architectures for search engines once (if) the web goes all semantic. However, when (if) the semantic web ever becomes a reality, this search engine will long be retired. So, this result is great from a research point of view, but don't expect it to leave the lab.

Developer on this project (3, Informative)

aidhog (1097699) | more than 7 years ago | (#18989863)

As one of the developers on the project (along with user aharth), feel free to ask any specific questions you may have here. The article is quite vague and so I refer you to a technical report at http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf/ [www.deri.ie] .

Re:Developer on this project (1)

kellererik (307956) | more than 7 years ago | (#18991083)

could be my browser (Safari), but clicking on your link leads to a 404, going through the main page and clicking my way through works, though. Just in case it happens to other users. Thanks for the link, now my friday evening is officially ruined, I got to read this right away. ;-)

Re:Developer on this project (1)

wintermute42 (710554) | more than 7 years ago | (#18991177)

I'm using Firefox under Windoz and I could not access the article either. It's a bad URL.

Re:Developer on this project (1)

kellererik (307956) | more than 7 years ago | (#18991495)

This is how I got the PDF:

go to www.deri.ie

click on "World Record 7 Billion Triples"

scroll down on the resulting page click on the word "here" in the last sentence.

get busy reading ;-)

I suspect that this is not a browser-related problem but a server-problem. The link in the OP and the one mentioned here is the same.

This is great and all (1)

rsilvergun (571051) | more than 7 years ago | (#18989919)

but why would I want to search several million statements from the Robotech Defense Force? I mean, sure I'm an Anime nerd, but there are limits...

Beer fund (0, Troll)

billcopc (196330) | more than 7 years ago | (#18989939)

Top o'the mornin' laddy! We've got this crackpot idea that doesn't work in real-world scenarios, but you see, we're out of Guinness and me welfare checks be runnin' dry. How's about a nice research grant to refill me beer fridge ?

Tally-ho!

Boo @ Slashdotters (0)

Anonymous Coward | more than 7 years ago | (#18990717)

Give me a break! Scientists produce a technology that has the potential to change the face of the web and all they get is the usual, automated Slashdot moaning.

Listen, we're not just talking about a technology for searching cookie recipies. We're talking about a technology that will match your favorite ingredients with a recipe, find pastry chefs in your area who knows the recipe, weed out the ones that doesn't meet your quality standards, and, on your command, order the ingredients at the lowest available price, have them sent to the chef of your choice along with the payment for your order.

So, you just take a look around on Web 2.0 while eating your dry, factory produced, hard-as-a-rock cookies. I'll be enjoying my customized inexpensive home-baked quality pastries and getting real answers at the fraction of a second on the Semantic Web in the meantime.

Copyright infringement (0)

Anonymous Coward | more than 7 years ago | (#18990855)

Hey, i created RDF (Reality Distorsion Field)
Cease or Desist!

Signed,

Steve Jobs

Web 3.0 (0)

Anonymous Coward | more than 7 years ago | (#18991235)

Web 3.0 [wikipedia.org] is near... Seriously - what does this development give us?
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?