Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

On Finding Semantic Web Documents

michael posted more than 9 years ago | from the location-location-location dept.

The Internet 67

Anonymous Coward writes "A research group at University of Maryland has published a blog describing the latest approach for finding and indexing Semantic Web Documents. They have published it in reaction to Peter Norvig's (director of search quality at Google) view on the Semantic Web (Semantic Web Ontologies: What Works and What Doesn't): 'A friend of mine [from UMBC] just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go.'"

cancel ×

67 comments

Sorry! There are no comments related to the filter you selected.

Plasma TV Lesson (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11368548)

This has to be one of the funniest things I've ever read.

I used to work with this guy until he found a better job making about three times what he used to. With his first paycheck he got the latest Plasma TV and started to rub it in about how cool it is etc. I recieved the following email from him today ....

-----



----- Original Message -----
From: xxx
To: xxx ; xxx
Sent: Friday, January 14, 2005 1:32 PM
Subject: My PLASMA TV

You fools need to listen to dis....

Your peeps this high roller right...... so he gets himself a
plasma screen. So hes watching regular cable on it and then
he gets an idea how cool would it be to hook up the computer
and stream porn on it. So i get the gear that will allow the
computer to display the shiiiiiiiiiat on the tv.

Dawgs listen to this shit... Im watching milf on it and boy was
i enjoying it when my dad picks up the right time to call me
and touch base with me abt the wedding. So put the damn thing
on pause and start talking to him. 10 mins... 20 mins.. 45 mins im
still talking to my homie on the phone. So I hang up on him after
talking to him for an hour. Now when I come back and see the tv
is still on pause... so i start seeing it. Just when i thought i had
enough of hunter getting on some hillbilly... im switching between
inputs i realized that the image where i paused is burnt on the tv.

Dude its so embarrasing... Ive called in for a replacement but
im scared the guys would turn this shit on find an imprint of
hunter in a doggy position.

THIS SHIT AINT FAIR....

xxxx

Re:Plasma TV Lesson (0, Offtopic)

LMAOff (849905) | more than 9 years ago | (#11368689)

Dude! Its got to be the funniest thing Ive heard so far. Here is an idea for your friend to save him some embarresment. Ask him to turn on the tv hook up the laptop and pause some vidoe (non-porn) that way he can cover up the burn in... actually burn the remaining stuff so they cant make out Man! My buddies are laughing their A** off too.. Good one will remember for a long time to come.

Re:Plasma TV Lesson (1)

funkify (749441) | more than 9 years ago | (#11368995)

No, you should offer to attempt to "repair" it yourself then do what LMA suggests YOURSELF. If this works, charge him $$$. Of course, I don't know enough about plasma screens to know if it would really work or not. You might try pausing on an all-white screen.

Re:Plasma TV Lesson (1)

jahmike (741923) | more than 9 years ago | (#11368713)

Hahahahahah .... this shit is funny ahhahahahaha I wonder if he will have nightmares?

It's not about the filename (3, Insightful)

Simon Brooke (45012) | more than 9 years ago | (#11368557)

It's not about the filename extension (if any), silly. It's about the data. Valid RDF data may be stored in files with a wire range of extensions, or even (how radical is this?) generated on the fly.

What matters is first the mime type (which is most likely application/xml or preferably text/xml), and the data in it.

Oh, and, First Post, BTW.

Re:It's not about the filename (1)

crschmidt (659859) | more than 9 years ago | (#11369046)

Preferably application/rdf+xml . Anything else is not appropriate for RDF-serialized triples. text/xml and application/xml are both wrong for this kind of data.

This will become more important as resources are represented in multiple ways, for tools to consume: they ask for a specific type, and fallback may fall the wrong way if people start telling their webservers that RDF is something it's not.

Re:It's not about the filename (0)

Anonymous Coward | more than 9 years ago | (#11369670)

Outstanding first post!

Re:It's not about the filename (3, Interesting)

old_guys_can_code (144406) | more than 9 years ago | (#11370869)

I work at one of the few places that crawls billions of URLs each month, and I observed exactly the same thing as Peter. There just isn't that much xml/rdf/daml/owl on the web. At the point when we had crawled 6 billion URLs, I found only 180,000 URLs that had a mime type or extension to indicate that they were machine-readable metadata.

The reason is something that people in the semantic web community are loathe to talk about - that there isn't enough incentive for people to create metadata that they put out for others to read. When we write web pages or blogs, we are able to express ourselves to other humans, but when we put out data there is no clear incentive (economic or otherwise) to justify the effort. This is probably why there is so little metadata being published.

If you wish to dispute the small amount of data, feel free to put up a web server showing a million URLs of metadata created by others.

Semantic? (0)

Anonymous Coward | more than 9 years ago | (#11368565)

I used to love their Norton Utilities.

What about... (4, Insightful)

Apreche (239272) | more than 9 years ago | (#11368567)

What about all the pages that are .rss but are actually rss 1.0, those are rdf-based. And what about all the rdf which is in the comments of .html files and others? My creative commons license is rdf, but its inside a .html file. Sure, we do have a long ways to go, but the semantic web is bigger than a few file extensions findable by google.

What's all this... (0)

Anonymous Coward | more than 9 years ago | (#11368593)

What's all this about finding Semetic web documents... Oh... Never mind.

Yahe (0)

Anonymous Coward | more than 9 years ago | (#11368601)

HTML are semantic. It's all in the
body

unexpected? (3, Insightful)

AnonymousCactus (810364) | more than 9 years ago | (#11368635)

Without a large number of widely used tools out there that make use of semantic information there won't be that much content designed for them...and without content designed for them the tools won't exist and certainly won't be widely used. Currently it's more of an academic exercise - if we somehow knew what all this information on the web actually was, what could we do with it? More interesting it seems then are approaches at bypassing the markup by hand and do something equivalent automatically.

Solution without a problem? (4, Interesting)

faust2097 (137829) | more than 9 years ago | (#11368636)

Semantic web stuff if cool and all but I honestly don't believe that it will ever really take off in any meaningful way. For one, it takes a paradigm that people know and understand and adds a lot of complexity to it, both on the user end and the engineering end.

Plus a lot of the rah-rah booster club that's grown up around it sound a whole lot like the Royal Society folks in Quicksilver who keep trying to catalog everything in the world into a 'natural' organization.

What it basically comes down to for me is that it seems like a great framework for single-topic information organization but at a point we need to keep our focus on the actual content of what we're producing more than the packaging. For this to be ready for prine time the value proposition needs to move from a 30-minute explanation involving diagrams and made-up words ending in '-sphere' to something even less than an "elevator pitch" like 2 sentences.

Re:Solution without a problem? (0)

Anonymous Coward | more than 9 years ago | (#11368729)

" For one, it takes a paradigm that people know and understand and adds a lot of complexity to it,"

but what if i leverage this paradigm with action orientated out of the box solutions?

surely this sea change couppled with changing the onion, will serve to generate open door mindshare in this post 9/11 society...

no?

Re:Solution without a problem? (1)

amembleton (411990) | more than 9 years ago | (#11368979)

No

Re:Solution without a problem? (0)

Anonymous Coward | more than 9 years ago | (#11369119)

reification baby, that's all you need to know.

say it with me

"reification"

ooh yeah. it means "to make into a thing".

"reification"

mmm, gotta love them RDF buzzwords.

Two sentences, eh? (2, Interesting)

misuba (139520) | more than 9 years ago | (#11369886)

You're on.

1) A simple human- and machine-readable schema is defined for marking up descriptions of items for sale or wanted.
2) Google learns how to read them, thereby putting eBay, Craigslist, and other sundry companies out of business and putting your data back in your hands.

Okay, so the second sentence is a bit of a run-on, and this use case has a whole lot of hairy details I'm leaving out. But the possibilities are pretty exciting nonetheless.

Re:Two sentences, eh? (1)

cmowire (254489) | more than 9 years ago | (#11370441)

The problem is, that doesn't require the semantic web or any sort of semantic technologies.

A simple well-formed XML document will suffice and be simpler to write. And if you *really* want to make it fit into the semantic web, you can provide an XSLT file that translates it to RDF. The problem is that RDF is wounded by having an incredibly ugly syntax.

Furthermore, the simple model of posting XML or RDF documents and google not only magically finding them but "putting the data back in your hands" is flawed. Part of the reason why eBay, Craigslist, etc. all exist is because there is just enough checks and balances in place that makes them not useless. Craigslist has distributed flagging and requires email addresses, eBay has accounts and a reputation system. Sure you can create reputation systems, but in order to make that work, you need to be able to pick one easily and preferably automatically. So you need.. ehrm.. a reputation system for reputation systems...

The problem is, not only do you need to be able to do an "elevator pitch" 2 sentences of a useful app, you also need to make it not fall over due to abuse, scaling laws, etc.

I think the biggest problem is that you can't trust metadata blindly. And most of the big "semantic web" stuff assumes that you can, or that figuring out trust can be "solved".

And there is metadata that's trusted. EXIF tags are trusted, simply because there's no benefit in lying. RSS is provisionally trusted simply because the user picks if they want to syndicate or not -- again, you are doing it to draw people back and lying doesn't help.

The problem is the entire semantic web movement is doing damage to its possibilities for the future. Because RDF and OWL and RDF Schema and such are all such thick and hard-to-grasp items and also because it's a buzzworded hyped technology, stuff that could be written to enable semantic technologies isn't.

See, I have my doubts that you can really write applications that, through the magic of ontologies and other stuff, can make intelligent relations between data in such a way that the total time spent in making an ontology, a schema, and whatever else necessary to make useful things happen is less than it would have taken to write a simple application that does what you want it to.

But, really, if it can be made to work, it needs to scale. There needs to be data. Sure FOAF and RSS 1 use RDF. But OPML, RSS 2, and Atom don't.

The problem is, the Semantic web folk need to quit complaining about there not being any semantic-accessible data or applications and start figuring out how to get around that.

My current thesis is that if you really want to turn the Internet into a tuple space, the fact that there isn't much current data in RDF isn't insurmountable, it's just nobody's willing to accept that you can't just force people to do everything around RDF at this point in the game.

Longwinded rant, but if you go back up to the top of the post, there's the rest of my argument. If you don't have RDF-format data, can't send out jack-booted thugs to force people to make RDF-format data, and need some of it to make things work properly, you need to figure out how to generate it.

Which there's plenty of stuff to draw from. Convert Atom, RSS 2, OPML, and other data to RDF representations and you've increased your space. You can generate more from webpages by parsing HTML for meta tags, links, and stuff. You can query google for "What's Related" and such.

The problem is, nobody's bothered to work on a tool to make tuple-spidering code to generate tuples for RDF. They aren't even trying to come up with halfway-point guidlines about making existing and new documents more able to eventually be converted to RDF.

Instead, semantic web folks are just complaining about how there's nothing to work with.

Re:Two sentences, eh? (2, Insightful)

mike_sucks (55259) | more than 9 years ago | (#11371291)

"The problem is, that doesn't require the semantic web or any sort of semantic technologies."

You're right of course, but for any such initative to be successful, it needs to use a standard (or at least widely-known and stable) format/grammar/etc so that thrid-party systems can understand your data.

This is where RDF, OWL and the other semamtic web technologies come into it. Why invent another system when there is already one there?

"The problem is that RDF is wounded by having an incredibly ugly syntax."

No, you're confusing the XML syntax with the model. RDF isn't the XML format, that's just one way to serialise an RDF graph. You also have the n-triple format and others. Ideally, XML serialised RDF would never be hand-written, it _is_ a pig to do so. But is a convenient way to dump an RDF graph in such a way that it can be reliably machine read (which is the whole point of the semantic web).

Notice also how OPML, RDF2 and Atom are never generated by hand? Given the format is best generated by a computer and best consumed by a computer, what's the problem with the format?

It would be nice if there were one canonical way to serialise the graph, thus making processing with tools that aren't RDF-aware easier (eg, an XSLT processor), but I don't think that is a show-stopper.

"I think the biggest problem is that you can't trust metadata blindly. And most of the big "semantic web" stuff assumes that you can, or that figuring out trust can be "solved"."

So why are places like Ebay, Amazon and so on trusted? How is buying something directly via the Ebay web interface any different from buying it via Google, which picked up the same auction from EBay's RDF feed?

There's a lot of places where trust comes into it. Do you trust Google? Do you trust EBay? Do you trust the seller? The semantic web doesn't solve this problem, but it can make it much easier for you to locate the thing in the first place.

"And there is metadata that's trusted. EXIF tags are trusted, simply because there's no benefit in lying."

Right, so why not make the EXIF data available via RDF anyway? Even though it can't be trusted, at least I would be able to search for images that proport to be of a sunset taken between between 17:00 and 18:00, using a Canon IXUS II? That's more than I can do now.

"If you don't have RDF-format data, can't send out jack-booted thugs to force people to make RDF-format data, and need some of it to make things work properly, you need to figure out how to generate it."

Well, that's main problem the sematic web faces today, lack of tool support. Why doesn't Dreamweaver people to embed Dublin Core RDF into every document it produces? Why don't the endless numbers of slide-show gallery generators do the same for EXIF data?

Note however that this isn't a problem caused by the XML RDF serialisation format.

"The problem is, nobody's bothered to work on a tool to make tuple-spidering code to generate tuples for RDF."

Honestly, what's the point? It would be much more producive to refit authoring and content management tools so they produce RDF and search engines and the like to consume RDF. We'd me much better off.

Re:Two sentences, eh? (1)

cmowire (254489) | more than 9 years ago | (#11401504)

The problem with RDF is that they are putting really technical terms on really simple ideas and nobody's done an especially good job of distilling it down to the most basic level in such a way that anybody can program it. It's not just that the format's ugly, it's that, as far as I can tell, the vast majority of folks who actually are in a position to output semantic information suffer eye-glaze-over when they try to understand RDF.

Furthermore, the problem is not just that you need to have the tools output the semantics, but you have to get people to put them in the document. When's the last time you've seen somebody fill in the various meta-information fields on a word document? So, sure Dreamweaver could put Dublin Core metadata there, but people won't use it. In fact, they'll probably complain if it's there because it'll either bloat the size of their documents or include information that they'd rather not have included -- like EXIF thumbnails and revision notes and things like that. Dublin Core won't get you anywhere on any of the search engines because the meta tags got too much search-engine-spam crap stuffed into them. Likewise, if there was an image gallery that gave an RDF representation of the EXIF tags, it would slow down load times, so unless there was a good advantage to using it that way, people would complain that crap they don't want is in the document. Remember, people are *surprised* that folks who use flickr and del.icio.us are actually adding metainformation of any sort to their personal stuff.

I don't implictly trust EBay or Amazon.com. However, I do trust that there's at least some modicum of crap-removal at play. Remember, the problem with the web isn't just finding stuff. It's finding stuff without finding crap.

The problem is that there's a log of different tools for building sites. There's quite a few weblogging systems (Slashcode, Drupal, Radio UserLand, Movable Type, etc) out there. There's tons of shopping carts. There's a variety of commercial CMS systems. There's frontpage and dreamweaver. There's stuff that nobody in their right mind would use anymore, except that they are too lazy to use something else. There's hand-coded stuff. Simply put, there's a lot of stuff generating pages. Most of the time, they can't even be bothered to make their pages work on Mozilla or be compliant to any sort of standard. About the only thing that can be said about the task of changing how most of the authoring and content management software in use outputs stuff is that it's easier than moving to IPv6.

No, folks have been moaning for several years now that we need to RDF enable stuff. It's not going to happen. The only road forward is to accept that and figure out how to have a semantic web without requiring huge buy-in before you've got useful apps.

And if there start to be some actual applications of semantic web technology, other than tools for generating the data and viewers that claim to be revolutionary but always seem to just display things as a directed graph, then people will start thinking about outputting RDF.

Re:Two sentences, eh? (1)

mike_sucks (55259) | more than 9 years ago | (#11403440)

"as far as I can tell, the vast majority of folks who actually are in a position to output semantic information suffer eye-glaze-over when they try to understand RDF."

That's interesting, do you have some figures to back that up? I'm a generic web developer from a small city in a backwards country, and I get it. Are you saying that RDF is harder to learn then say, writing a POSIX or Windows application? A FPS? A Java web-app? A Linux kernel module? Because plenty of people do those things, every day.

"Furthermore, the problem is not just that you need to have the tools output the semantics, but you have to get people to put them in the document."

If you don't want to provide metadata, then fine, don't do it. But if you don't provide metadata, it doesn't matter if you don't provide it RDF or don't provide it some other format. Aren't you arguing against RDF, not metadata in general?

"So, sure Dreamweaver could put Dublin Core metadata there, but people won't use it. In fact, they'll probably complain if it's there because it'll either bloat the size of their documents"

If size is a concern, then you can provide the RDF in a separate file and use a HTML link element to point to it. In fact IIRC, for HTML documents and for XHTML served as text/html, this is the only correct way of embedding serialized RDF in a web page.

"I don't implictly trust EBay or Amazon.com. However, I do trust that there's at least some modicum of crap-removal at play. Remember, the problem with the web isn't just finding stuff. It's finding stuff without finding crap."

Right, so you have to trust an RDF-aggregation-based auction site or search engine as well. Maybe you'll trust the auction site because they have similar policies and procedures as EBay and Amazon, maybe you won't. Search engines have to deal with metadata spam in the same way they deal with keyword spam in the HTML document's body or with link farms. How is any of that that RDF's (or metadata in general's) problem?

"The problem is that there's a log of different tools for building sites."

There certainly is a lot of tools for building sites, but how is that a problem? If people want to publish metadata, they can choose a tool to allow them to do so, or to modify their existing tools and/or practices.

"No, folks have been moaning for several years now that we need to RDF enable stuff. It's not going to happen."

Err, it has already happened. Blogging has made RDF-based syndication a must-have feature for web based applications and has shown how the semantic web can make life a lot easier.

The question is, where is it going to spread to next?

Re:Two sentences, eh? (1)

cmowire (254489) | more than 9 years ago | (#11403731)

*one* RDF-based technology (And only for part of the market) in the past several years means that we'll start having real semantic applications in 2105 or so.

Remember, there's nothing intelligent about syndication. It fits just as well into the "well formed web" as it does the "semantic web". All RDF does is make things much more verbose than it is otherwise. The whole point of the semantic web was so that I could view a RDF Site Summery file and have my web browser automatically figure it out and link to other places with it.

No, I don't have any facts to back up that your average web designer's eyes will glaze over. But then, your average web designer isn't thinking about RDF, so the message is either getting lost or ignored somewhere.

My theses are:
1) RDF, without a dramatic reduction of complexity, coupled with real applications that can only be done with RDF, not merely well-formed XML, is doomed for failure.
2) Trustworth metainformation is valuable. However, it's also extremely rare. Many of the "cool things" that the semantic web was supposed to enable require trusted metainformation.
3) Until there are real applications for Semantic Web technologies, people won't Semantic Web enable their software in any substantial way. Therefore, if you want the Semantic Web to happen you either need to find a way to make existing metadata RDF-accessible or you need to make one with existing technology.

Re:Two sentences, eh? (1)

mike_sucks (55259) | more than 9 years ago | (#11403997)

"Remember, there's nothing intelligent about syndication. It fits just as well into the "well formed web" as it does the "semantic web"."

Sure, syndication is a specialization of the semantic web in general, but when you come down to it, that's all the semantic web is: machine readable information about a web-accessable resource. There's nothing special or crazy or obtuse going on, it's that simple.

"But then, your average web designer isn't thinking about RDF, so the message is either getting lost or ignored somewhere."

Maybe, but only because we are driven by what our customers want. But the demand is there, and growing. Many companies and government organisations want compusory metadata for their {Inter|intra|extra}net web sites. It is interesting to note that it is required that all Australian Government web sites provide DC metadata. That trend is going to continue.

"1) RDF, without a dramatic reduction of complexity, coupled with real applications that can only be done with RDF, not merely well-formed XML, is doomed for failure."

Heh. Well, I assert it isn't doomed to failure. I think we need to provide some evidence here.

"2) Trustworth metainformation is valuable. However, it's also extremely rare. Many of the "cool things" that the semantic web was supposed to enable require trusted metainformation."

As I said, it isn't about trusting the metadata, it is about trusting the source. If you trust the source, you trust the metadata. This situation isn't any different to what we have today with human-readable web pages.

Out of curiosity, what are these cool things that require technology based/provided trust?

"3) Until there are real applications for Semantic Web technologies, people won't Semantic Web enable their software in any substantial way."

There already is and they already do, as I said before.

"Therefore, if you want the Semantic Web to happen you either need to find a way to make existing metadata RDF-accessible"

Sure, HTML meta tags can be assumed to be statements about the HTML resource that contains them, Atom data may be able to be coerced to RDF as well, but you also have to have, say, a metadata-smart search engine that lets you search over this metadata. I.e. we still need better tool support.

" or you need to make one with existing technology"

Sorry, I couldn't grok that, make one what?

Re:Two sentences, eh? (1)

faust2097 (137829) | more than 9 years ago | (#11371197)

But I like craigslist better than I like Google. And the quality of that data in my hands is dependent upon the internet community at large AKA those people who write gay erotic fan fiction about Star Trek characters.

Besides, Google's pagerank has been owned by 'optimizers' for years. I don't trust them any more than any other commercial enterprise.

And FOAF is the closest the 'blogosphere' has come yet to physically jerking each other off.

Re:Two sentences, eh? (1)

SunFan (845761) | more than 9 years ago | (#11371339)

1) A simple human- and machine-readable schema is defined for marking up descriptions of items for sale or wanted.

How does a typical shop keeper learn to do this and apply it to their wizard-made web page?

The second point about Google is fairly ripe for abuse. Meta tags in HTML were mostly rendered useless by porn sites, for example. Also, sites like eBay tend to concentrate useful information in useful ways, while feeding keywords to Google can often be frustrating for anything remotely generic.

Slashdot lies, opinions, and half-truths (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11368651)

If you agree with any of this, feel free to repost it in the future.

* If you expect companies to follow the copyright of the GPL, you should support the RIAA going after infringers of its copyright. If not, you're a hypocrite.

* There is absolutely nothing wrong with a company being upset that its product is being pirated freely over online networks. A recent Slashdot poll showed that the majority of Slashotters are unemployed or are students ("academics"), which explains a lot. Try getting a real job sometime and see what it feels like when your work is everywhere, and you start worrying that your days are numbered. Does John Carmack want you to "sample" his new game via the "free advertising" happening on eMule?

* Artists "deserve their money" only in cases in which the RIAA is the bad guy. When it's a P2P article, suddenly ripping artists off and not paying for their music via piracy is magically different from some record companies not paying royalties. This mindset is supposed to make sense.

* At the 2004 WinHEC, Allchin demonstrated an alpha version of Longhorn that played six high-resolution videos at the same time while playing Quake III in the background. An equivalent XP machine couldn't play more than four videos. Meanwhile, I can't even get xmms to play without skipping, and windows to drag without visual tearing! That's because KDE and GNOME are hacks to emulate a desktop on top of the crufty XFree86 architecture that people won't let die (the majority Linux users absolutely fear change...there are rational ones, but they are outnumbered by zealots).

* OSTG-owned Slashdot thinks its niche opinion represents the majority of the world. This is a result of people visiting every day and buying into the groupthink. Nobody outside of Slashdot knows or cares about "Linux," "RIAA", "M$," or anything else Slashdotters think is such a huge issue in today's society. Go to a mall or coffee shop sometime and see what people actually talk about.

* Speaking of OSTG--it's a Linux company...that owns a "tech news" site...that posts news stories negative toward competitors like Microsoft. If a Windows company or even Microsoft itself owned a "tech news" site and posted anti-Linux articles all the time, everyone would be up in arms. But with OSTG, it's okay.

* Slashbots think people don't like the music coming out these days, which is the cause of the piracy. Never mind that if people didn't like the music they wouldn't be pirating it, most Slashbots--again, this goes back to the niche opinion thing--don't realize that most people these days love the music coming out and want to hear all of it. Probing around, you discover that Slashdot is made up of nerds and fogies who listen to things like The Who and Blind Guardian and techno--not what mainstream society enjoys.

* Any company ending in "AA" is evil. Especially if it doesn't want you distributing its works without paying for it. Somehow, this mindset is supposed to make sense.

* The inevitable result of all this is a world in which nothing can be profitable because people simply pirate free copies. Is that really what Slashbots want? OSS and free-ness in general reminds me of the hippie era of the 60s--idealistic socialism that only exists because of the surrounding capitalism around it that provides the environment for it to exist. We all know what happened to that idea.

* Linux rules the desktop, when in reality [google.com] : Windows = 91%; Mac = 4%; Linux = 1%

* Slashdot editors are abusive. We all remember The Post. It's amusing the editors never mention the issue. The worst editor is michael, who will mod you down, insult you for your post count, and post unprofessional color commentary along with the article. This is the same bizarre person who cybersquatted Censorware for years--even as Slashdot posted articles negative toward cybersquatting! Michael played it off as though he was a stalking victim, which made it all the more bizarre.

* The moderation system is broken. If you mod someone as "Overrated," you can't be metamodded. People abuse this all the time to gang up and knock you down into oblivion.

* If "Linux" just refers to the kernel and not the operating system, how can "FreeBSD" refer to the operating system (userland tools, standard libraries, etc.) and not just the kernel? Face it, "GNU/Linux" looks and sounds ridiculous.

* Slashdot is all about spinning truth for its agenda and posting outright falsehoods. In this article [slashdot.org] , for instance, Roblimo claims that Baystar spokesman Bob McGraith "admitted" that their "only viable asset is the potential proceeds of lawsuits against Linux users and vendors." And yet, in the very next sentence, his real words are given: "We're looking for the best return we can, and we think the focus should be on IP licensing (and enforcement)." Ignoring the outright lie RobLimo posted about what was said, Bob McGraith describes what every standard IP company does--run their business on the licensing of their valuable IP. If that isn't enough, Slashdot's own VA Linux stated in their recent 10Q filing [yahoo.com] the exact same thing: "We rely on a combination of copyright, trademark and trade-secret laws, employee and third-party nondisclosure agreements, and other arrangements to protect our proprietary rights." But hypocrisy and double-standards don't matter to an agenda-driven group like Slashdot. It's all about "whatever it takes" to discredit those on your geek blacklist.

* SCO and other companies are evil scum, manipulating stock prices and going to the extreme to be greedy. Meanwhile, the SEC investigated VA Linux's IPO [com.com] for "questionable IPO practices." VA Linux owns Slashdot.

* Slashdot breathlessly reported that AMD beat Intel in CPU sales [slashdot.org] by 2% for one week (what a victory). Meanwhile, it was omitted from the article that AMD only beat Intel in RETAIL desktop sales. Dell hasn't been selling AMDs [forbes.com] , and Dell, among others, does not count as retail. According to the article Intel still outsold AMD in the PC market with a 61% share. Of course this is helped by their 81% share in notebook sales a market that AMD has been unable to succeed. This is crucial because according to the article this market is the fastest growing segment of the PC market. The anti-Intel spin is amazing. But not unpredictable, because Windows and Intel go hand in hand, and therefore "Wintel" is evil. Even though laptop sales account for over 50% of PC sales, and AMD has ignored that market...

* Somehow, user-ran executables are always a "New Microsoft Hole" (actual article headline). Meanwhile, LinuxSecurity [linuxsecurity.com] posts weekly security advisories for all the Linux distributions. You never, ever, EVER see any of these mentioned on Slashdot--bizarre things like arbitrary code execution via MPlayer.

* OSS advocates complain about the lack of innovation coming from Microsoft. Often, these posts are written from KDE using an integrated filesystem/HTML browser, a taskbar, a start menu, and more. Apparently, nobody wants to admit that the only reason those are implemented is because much-criticized Windows 98 did it first. Clone, clone, clone. This is the life of an open source wannabe. One of these days, they'll actually come up with an original idea that ordinary people can use to create interest in their offerings. Until then, it's going to be, "Yeah, we'll be able to do that soon, too." Slashdotters--ripping people off then criticizing those who came up with the ideas in the first place. This is not the way to gain integrity.

* This opinion poll [opinion.com.au] shows that 56% of respondents hadn't even heard of Linux.

* Linux is "ready for the desktop." This is the yearly uttering since 1998. Never mind that there is STILL no binary installation/uninstallation API for desktops, you can't come home with a printer and a CD and stick it in to get an Autoplay menu that lets you set up the driver. Somehow, Linux is just magically supposed to be ready--that is, if someone else sets it up for you and you never change or add your hardware or software and doing nothing else but check e-mail and browse the web. Conveniently, this includes grandmas, so people can post their grandma-using-Linux stories as "proof." As a recent article on Slashdot pointed out, Linux can't even run a generic soundcard that 10-year-old Windows 95 has no problem with.

* Hypocrisy is accusing Windows XP of being "riddled with spyware" without actually citing a single example, and if you run Windows Media Player, the very first thing it gives you is the privacy page allowing you to disable automatic grabbing of song titles. Meanwhile, almost every single standard Linux media player automatically grabs titles from places like freedb.com without asking you first. One OS grabs song titles and it's spyware, the other grabs song titles and nobody mentions a single thing. Hypocrisy.

* Slashdot professes to be some sort of golden defender of consumer copyright law. Few people remember that in an IRC chat, Hemos said that what DailySlash is doing was "illegal" and that they should stop.

* Corporate-owned, subscription fees, banner ads, reposts, and complete falsehoods. Remember when Slashdot was a great tech news site for nerds? Before the point of the site was to have an anti-RIAA, anti-"M$" agenda? When it was just about posting cool technology stories regardless, before VA Linux took it over?

Slashdot is dead.

Manual for the modern Slashdotter (-1, Troll)

Anonymous Coward | more than 9 years ago | (#11368667)

Manual for the Modern Slashdotter

Golden Rule: You must base your worldview entirely on Slashdot headlines. You must ignore the innaccuracy and editorial shortcomings of the Slashdot staff. You must buy into the groupthink of the comment threads. This is of UTMOST IMPORTANCE.

- Post the lamest, most obvious, and most unfunny jokes imaginable. They will be modded up "+5 Funny." Even Malda couldn't stand it any longer and made Funny mods not count toward karma.
- Everything involving Linux is flawless and perfect.
- Anything involving Mozilla is flawless and perfect. Ignore that Mozilla marks security flaws as "confidential" and keeps them secret. Ignore that this is something Microsoft is endlessly bashed for. Ignore that Firefox has had several severe security flaws, especially for a browser used by so little of the market (1% according to Google Zeitgeist).
- Whenever someone has a criticism of the current moderation system, refer to Taco's "future moderation system."
- You must lean left. You must obsess over George W. Bush and make Bush jokes whenever possible, no matter how irrelevant to the topic. In political articles, you must upmod anti-Bush comments and downmod independent or pro-Bush comments. Use the "Overrated" moderator whenever possible. Remember, Taco is going to fix this in "the future moderation system."
- Use the term "FUD" religiously in everyday conversation. When someone puts out something that disagrees with your worldview, call it FUD matter-of-factly as a way to dismiss the points it raises. Demonization is far easier than debating the issues.
- Whenever Linux Torvalds says anything, it is newsworthy and infallible. Linux is perfect, just so practicle and is the "Alpha-Geek." Linus does not make mistakes. Basically, you must behave as though you are in love with Linux Torvalds. When he says he doesn't bother looking at the source code of competitors like Solaris [slashdot.org] because he's not interested, herald it as the "wonderful attitude of Linus" even though such a comment coming from a Microsoft employee would get flamed as an example of their arrogance and closed-minded attitude. When giant kernel holes go unpatched, ignore it and continue to suck the teat of the Linus Torvalds hype machine like a good sheep should.
- Believe articles like "Microsoft Violates Human Rights In China," based entirely on the idea that Microsoft is evil because Windows is used by the government there. Ignore the fact that China has its own custom Linux distribution called Red Flag Linux. Slashdot is unbiased and holy.
- Ignore that Slashdot is corporate-owned, by a company called OSTG that employs Rob Malda and makes money off selling OSS products. Ignore the conflict of interests in running a "tech news" site that coincidentally posts articles critical of competitors. Ignore that if Microsoft owned a tech news site that did the same, it would be criticized for it.
- Pretend that Linux is ready for the desktop, even though it took you two hours to set up your soundcard, mouse scroll wheel, and 3D card. Ignore that the real reason you refuse to acknowledge that Linux sucks on the desktop is because you don't want to diminish your sense of accomplishment in getting it up and running. Make sure to confuse this sense of accomplishment with the feeling that you have "more control" in a Linux system compared to a Windows system.
- Pretend there's nothing wrong with endless submissions accepted from Roland Piquepaille, who makes several thousands a month thanks to Slashdot's linking to his blog which links to the original article--rather than Slashdot just linking to the original article and cutting out the pointless middle-man. It's okay for Malda to shrug it off as though Slashdot should never consider ethics or morals.
- Pretend there's nothing wrong with Michael cybersquatting Censorware.org, even though Slashdot champions itself as the voice of online rights, anti-spam, anti-squatting, and anti-copyright enforcement. Believe it's okay for Michael to treat the article as his editorializing first post rather than putting his opinion in a comment like the rest of us have to.

Please redistribute this at will.

The Linux revolution is dying (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11368698)

The Linux Revolution Is Dying

In light of the disastrous 2.6 development model that has given sysadmins everywhere a headache by introducing development code into a production line, Linux has signed its own death knell. With more and more people looking to alternatives like FreeBSD 5.x, OS X, and DragonflyBSD, Linux is slowly shovelling the dirt beneath its feet to dig its own grave.

Linux And Windows

Quite simply, the revolution against Windows has run out of steam. While Linux was a viable alternative in the days of Windows 98, when the rallying cry of geeks everywhere was "Down with M$, Linux never crashes," we now have the majority of the Windows userbase running NT-based operating systems. Except in cases of hardware or driver issues, reliability is no longer an issue in the comparison between Linux and Windows.

Eventually, the movement became one of security. In the years after its release, Windows XP was discovered to have several high-profile security flaws. Microsoft underwent a major code audit and released SP2. The rallying cry for OSS was now about security.

However, the community has discovered major flaws in the Mozilla software suite, including bugs marked "confidential" for years at a time. Additionally, major security holes have been appearing in the 2.6 line of Linux kernels, some having existed for years and affecting the 2.4 line. Declaring Linux to be the secure alternative is no longer as true.

Worst of all, the Linux kernel developers have no clear process, nor any clear contact person, when it comes to security issues.

Evidence: http://lwn.net/Articles/118251/

Evidence: Long-time shell-provider SDF used Linux until they got hacked into. Now, it's a 64-bit version of NetBSD.

Evidence: PaX discovered the mlockall hole. It was fixed in PaX for two years. Linux just now (2005) caught up.

Evidence: "Using 'advanced static analysis': 'cd drivers; grep copy_from_user -r ./* | grep -v sizeof', I discovered 4 exploitable vulnerabilities in a matter of 15 minutes. More vulnerabilities were found in 2.6 than in 2.4. It's a pretty sad state of affairs for Linux security when someone can find 4 exploitable vulnerabilities in a matter of minutes." - Brad Spengler

The New Linux Development Model

With the 2.6 line of kernels, a new model has been adopted that is considered easier for the kernel developers. Instead of branching a 2.7 line, following the model of odd-numbered version numbers denoting development code, everything is now being thrown into 2.6.

"Not all 2.6.x kernels will be good; but if we do releases every 1 or 2 weeks, some of them *will* be good. The problem with the -rc releases is that we try to predict in advance which releases in advance will be stable, and we don't seem to be able to do a good job of that. If we do a release every week, my guess is that at least 1 in 3 releases will turn out to be stable enough for most purposes. But we won't know until after 2 or 3 days which releases will be the good ones." -- Ted T'So

In other words, this Linux kernel developer believes it is perfectly fine for one in three kernels of the stable line to actually be stable. The new development process is anti-user. "Release early, release often" has outlived its reliability and applicability to the real world.

The excuse given is that Linus is only one man, and there are only 24 hours in a day. If that is true, than Linus needs to address this shortcoming of the process; otherwise, the process is poorly managed.

The Community Has Regurgitated Itself

In a frenzy of newbies, the Linux community has grown, with Slashdot as its rallying center. The cycle of self-feeding groupthink has created a userbase unable to see outside its own perceptions. This leads to unrealistic attitudes about the safety and stability of Linux and its applicability to various solutions.

Contrast to the BSD community which employs a more academic approach. Instead of a cabal of kernel elite who pick and choose patches while the rest of the community watches on, BSD accepts volunteers from all over the world and maintains a calm, rational approach to development and advocacy. BSD users remain quite and non-vocal for the most part, content to simply make their OS the best it can be. The Linux world, on the other hand, is entirely focused on Windows and Bill Gates and leveraging itself with ten different filesystems. Witness the endless mentions of Microsoft Bob, a product that was on the market for less than a year over half a decade ago, intended for children and neophytes on a single-user machine.

A Decade Later And Nothing's Changed

The Linux community is in the unenviable position of being forced to look back on itself after ten years of hype and attempt to justify what it's accomplished. So far, Linux has made inroads in replacing old UNIX servers, just as BSD has. In the desktop market, it has barely made a dent. Before Google Zeitgeist removed its OS numbers, Linux was at a mere 1%. OS X was at 5%. The community, in more self-regurgitation, tells itself that Linux will succeed and even surpass OS X (witness last year's Slashdot article about how Linux usage will pass OS X's within the year...it never happened).

We're still using XFree86, which just recently gained the ability to change its own screen resolution without requiring a configuration file edit and restart. Desktop environments like KDE and GNOME are more interesting in adding more buttons and sidebars rather than implementing a universal API library for development, including binary installation/uninstallation, a universal graphics/sound library for games, and clear interface design that doesn't borrow from Windows while complaining about it. KDE currently implements an integrated file browser/net browser, start menu, taskbar, and more. All popular Windows features. Mono, currently the most promising prospect for a true future desktop Linux, is an implementation of Microsoft technologies.

Linux is heralded by fans as supporting the most devices, but in reality it simply supports more older, fringe hardware while other operating systems support today's modern hardware. As of this writing, Linux still has trouble with basic wireless networking, and the data corruption of the S-ATA driver is being ignored as developers continue to blame the hardware for its issues. Meanwhile, in NetBSD for instance, there are no reported issues with S-ATA.

Netcraft Confirms--Linux Is Dying

Unless major changes are made in both the community and the development process, Linux will remain a niche. It's over ten years later, and Linux is still just a marginal server OS beneath BSD. Fans blind themselves to any flaws because of an obsession over competing with Windows, which leads to groupthink and stagnation. The developers no longer care about the users and sysadmins, creating a development process that has spawned several outright kernel vulnerabilities while ignoring important patches. Meanwhile, technological innovation is made in other areas like OS X (true UNIX and GUI integration) and DragonflyBSD (revamp of core FreeBSD subsystems for performance).

The excuse of being a volunteer project is no longer valid--the real world is about results, not excuses. Making more "M$" jokes might get a +5 Funny on Slashdot. But with the recent trend of vulnerabilities, instability, and difficulties in the development process of Linux 2.6, it's one more deflation of the Linux movement. Linus, get it together!

Re:Slashdot lies, opinions, and half-truths (1)

hostyle (773991) | more than 9 years ago | (#11368720)

When you can honestly show me how any sane person can support the RIAA's stance on anything - yet remain a money-gouging, price-fixing cartel who pay their representants (the artists) pittance - I may actually listen to you.

Re:Slashdot lies, opinions, and half-truths (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#11368798)

What a fantastic, well-researched reply. "If you disagree with me, you're insane, so I'll just cover my ears."

Nobody has ever actually offered a valid legal or moral justification for pirating artists' music. Notice you left out "artists" and replaced it with "RIAA," illustrating the point of the post.

Artists willingly sign their contracts, so the part about "pittance" is completely bogus. I know on Slashdot, everybody is somehow a victim, but it's not true. You have free will.

Re:Slashdot lies, opinions, and half-truths (0, Flamebait)

hostyle (773991) | more than 9 years ago | (#11369044)

So you don't deny that they are "money-gouging, price-fixing cartel" then? I guess we're in agreement then ...

Re:Slashdot lies, opinions, and half-truths (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#11370370)

Nice dodge of all the points raised. Clearly, I am right. Next.

OT Re:Slashdot lies, opinions, and half-truths (1)

ezzzD55J (697465) | more than 9 years ago | (#11368916)

I know it's offtopic (and trollish), but..

* Slashdot editors are abusive. We all remember The Post.

Anyone know what he's talking about here?

Re:OT Re:Slashdot lies, opinions, and half-truths (0)

britneys 9th husband (741556) | more than 9 years ago | (#11370643)

He is talking about this comment [slashdot.org] .

There is additional background information and historical perspective available at the following sites:

Sllort's journal [slashdot.org]
Kuro5hin article [kuro5hin.org]

Re:Slashdot lies, opinions, and half-truths (1)

Ghostgate (800445) | more than 9 years ago | (#11368917)

Any company ending in AA is evil. Especially if it doesn't want you distributing its works without paying for it. Somehow, this mindset is supposed to make sense.

That's harsh, man! I have nothing against the AAA [aaa.com] . Why, just last week they came and changed my tire when I had a flat and was without a spare.

Anyway... did you say anything else, or was that pretty much it? Oh, yeah! Almost forgot:

Slashdot is dead.

Wow! That's pretty big news. But has Netcraft confirmed this???

Don't worry, I'm laughing with you, not at you. No... no, really.

the spam problem (0, Offtopic)

hostyle (773991) | more than 9 years ago | (#11368661)

Who cares about the semantic web or any new web technology if its going to be deluged by spam within 5 days of deciding to use it, and thus becoming unusable / untrustable as a resource. Deal with the spam problem, then come back to me about these great new technologies that are vulnerable to it.

The person who mods this down (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11368668)

Is a bastard on wheels and he spreads chicken shit over himselft!

I think it's disgusting (-1, Offtopic)

stratjakt (596332) | more than 9 years ago | (#11368703)

That the 'net has to be segregated into semantic and non-semantic.

Haven't the Jewish people been through enough without this digital persecution?

More proof that michael's a Nazi.

Disgusting.

Wow (0)

KillerDeathRobot (818062) | more than 9 years ago | (#11368709)

I thought I knew what these articles were supposed to be talking about, but it turns out I had no clue.

Re:Wow (0)

Anonymous Coward | more than 9 years ago | (#11368808)

I thought I knew what these articles were supposed to be talking about, but it turns out I had no clue.

Summary:
The semantic web is all about putting useful, very well defined meta-data all over the place so that doing useful things will be much easier. The google dude says it isn't working because:
1. People can't agree on what the meta-data should be
2. Meta-data will be abused anyway.

It relies too much on people getting together, making rules, and sticking to the rules.

Wait, what's does Semantec... (1)

Spy Handler (822350) | more than 9 years ago | (#11368719)

Norton Antivirus got to do with this web technology?

Have some Compassion . . . (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11368721)

I think Google should not spend time finding anything anti-Semantic.

LiveJournal and other weblogging services (3, Informative)

crschmidt (659859) | more than 9 years ago | (#11368769)

Every user of a LiveJournal-based website running recent code has a FOAF file. Let's look how many users that is:

* LiveJournal.com: 5751567
* GreatestJournal.com: 717406
* DeadJournal.com: 474435
* Weedweb.net: 22650
* InsaneJournal.com: 12970
* JournalFen.net: 7629
* Plogs.net: 7086
* journal.bad.lv: 4530

(This list is most likely incomplete.)

In addition to this, every Typepad user has an account: according to the 6A merger stories, that's another million users. Add in the RDF from all the Typepad RSS files, and that's another 1 million.

All Wordpress blogs have a feed, located at /feed/rdf or /wp-rdf.php, which is in RDF. Movable Type comes preinstalled with an RSS 1.0 feed. Each of these has at least a couple thousand users.

So, we've got, just as a guess, about 9 million RDF files out there in the blogging world alone. Throw in a hell of a lot of scientific data, and everything on RDFdata.org [rdfdata.org] , and you start to get an idea that the world is a lot more Semantic Web enabled than you seem to think it is.

Re:LiveJournal and other weblogging services (2, Informative)

Da_Weasel (458921) | more than 9 years ago | (#11369073)

About 75% of those that signed up for those various blogging services have never actually posted a single entry in their blog. So the actual numbers is more like 2.2 million of so. Even with a devistating hit like that it's still 10 times more that the number stated in the article though....lol...and its still just the bloggers alone.

Re:LiveJournal and other weblogging services (2, Informative)

zangdesign (462534) | more than 9 years ago | (#11371366)

So, we've got, just as a guess, about 9 million RDF files out there in the blogging world alone.

Care to venture a guess as to how many of those actually contain useful information? Really, who cares if Melanie in Oshkosh really, really loves Justin Timberlake, or Winthorpe in Des Moines really, really wants people to sign up so he can get an Ipod?

Furthermore, once you start tying all this information together, doesn't that just make the work for corporate data miners just that much easier?

Of course, you could salt in a bunch of useless, random data, which of course, means that the whole shooting match is useless.

Wow, a lot is missing from this survey! (1)

Chris Croome (24340) | more than 9 years ago | (#11368819)

A few sites I have worked on that are run by MKDoc [mkdoc.org] are listed in their top 500 [umbc.edu] , since MKDoc generates a RDF metadata file for every HTML document, but the biggest and most interesting are missing, I expect that there are perhaps several hundred times more RDF documents out there than they have found...

Hey Michael.... (0)

Anonymous Coward | more than 9 years ago | (#11368892)

How's censorware.org doing?

I guess they forgot to read the TOS (1, Insightful)

dracocat (554744) | more than 9 years ago | (#11368925)

From the article: For each new site S we encounter, we give Google the narrower query 'filetype:owl site:S', which will often end up getting some additional results not included in the earlier query.

From the Google TOS: [google.com] You may not send automated queries of any sort to Google's system without express permission in advance from Google.

I am serious. These researches just used a lot of resources from Google that they had no permission to use. Researchers especially should try to be good citizens on the net and not do tons of automated querying to websites without permission--especially when it is specifically prohibited.

Google has spent a lot of time and money to get the information that they wanted; and when asked for copies of it google didnt give it to them--so instead they just took it without permission.

I would call that stealing, except I wont because that will start a whole other thread thelling me that information cannot be stolen.

My point is, if you want to do research, at least play by the rules that you are given. It may take longer and require more work, but that seems better than using information that you dont have permission to use.

Re:I guess they forgot to read the TOS (0)

Anonymous Coward | more than 9 years ago | (#11368993)

Well, this has probably resulted in fewer than 10 thousand Google queries -- several for each unique site on which we've seen a semantic web document -- over several months. We do this to generate seed URLs for our own crawlers. That doesn't sound like a problem to me. I'm sure it's noise, and a very still, quiet noise, in Google's transaction stream.

Re:I guess they forgot to read the TOS (0)

Anonymous Coward | more than 9 years ago | (#11370226)

That doesn't sound like a problem to me.

The problem is that you've clearly broken their terms of service. You are taking advantage of their resources without their permission. In case you don't get it, that's WRONG.

How hard is it to get permission from Google? Are you really that lazy??

Re:I guess they forgot to read the TOS (0)

Anonymous Coward | more than 9 years ago | (#11371726)

the google api by definition allows automated querying of google, i would suppose the terms of service are for automated querying of the browser accessible service.

Re:I guess they forgot to read the TOS (1)

Anonymous Coward | more than 9 years ago | (#11371407)

I happen to know the members of the ebiquity lab personally, I was a grad student in another lab at UMBC. Just to clarify what the person who wrote this post completely made up, the ebiquity people actually had an account with Google to use the web APIs for use in Swoogle. I know this because at the time they were working on it, Google was having some sort of technical problem and they couldn't register for a license key. So they asked around if anyone had a valid key that they weren't using and that ebiquity could borrow until they were able to register for their own key. They weren't sending automated queries without Google's express written consent, they were using the Google API to do the queries and they were not abusing the system because they took the initial 1000 query results they got, did some processing on their own, and then issued narrower queries to gain additional information. It doesn't specify that you can only run one query, it just says you can't get back more than 1000 results for a single query. I'm not sure how you can call this "stealing" or imply that it might be stealing, given that Google provides the information free of charge to anyone who types in a query on their site. So while you raise a good point that researchers should not circumvent rules in order to accomplish their research goals, in this particular case you haven't a clue what you're talking about and you shouldn't try to tarnish the reputation of well known researchers in the AI community.

Re:I guess they forgot to read the TOS (2)

dracocat (554744) | more than 9 years ago | (#11371518)

I must have been mislead by statements such as:
We never did get any help from Google. -- Tim Finin [umbc.edu]
and statements like: Please do not write to Google to request permission to "meta-search" Google for a research project, as such requests will not be granted. -- Google.

You may be right, they may have had permission, but all I see is complaints against google for no cooperation--which I would count special permission to use their database as being.

At any rate, if I misunderstood the Tim Finin, then I do apologize--but I do have to say that the way they talk about working around Google's restrictions and their lack of help sure makes it sound like they have never received any such permission.

Re:I guess they forgot to read the TOS (1)

Finin (97295) | more than 9 years ago | (#11371567)

Perhaps I should have clarified how Google didn't help. I had asked Peter Norvig if we might be able to get all of the .owl, .rdf, .rdfs, etc files. He said he'd check, but we never heard back. This was during the early stage of the runup to the IPO, so I was neither upset nor surprised. If Google started helping all of the web hackers in the world, they'd never get anything done. Besides, it gave us a new problem to solve and left us with a ice warm feeling after we did. I'm pretty sure our use of the Google API was in line with their TOS and didn't stress their systems any more than a random user with a bad caffeine habit.

I guess You read the wrong TOS (2)

pavan0918 (850023) | more than 9 years ago | (#11372075)

The google TOS [google.com] you are talking about is for the google website. We had used the google webservice api, please read the google api TOS [google.com] .
Google api was built to allow automated queries so we were not "violating" the TOS.

So I think it is wrong on you part to comment on some one without having the full information.Ofcourse it may take longer and require more work, but that seems better than using wrong information.

Re:I guess they forgot to read the TOS (1)

claes (25551) | more than 9 years ago | (#11372215)

I would call that stealing, except I wont because that will start a whole other thread thelling me that information cannot be stolen.

Too late! In that case Google is the bigger thief with all the cached content it has stored and provides through its own servers.

Re:I guess they forgot to read the TOS (1)

/dev/trash (182850) | more than 9 years ago | (#11376400)

Nah. This would be a valid example of Fair-Use, that everyone here likes to bandish about.

blah blah blah semantic blah blah blah (0)

Anonymous Coward | more than 9 years ago | (#11369010)

That's about 0.005% of the web. We've got a ways to go.

I dunno about you, but I'm not going to do this to any of my data, unless I'm forced to (i.e., my editor saves it that way, or Firefox 5.0 doesn't read it otherwise).

So don't hold yer breath.

First post (-1, Redundant)

Anonymous Coward | more than 9 years ago | (#11369759)

to deliVer what, host what the house of prograaming

But how do I use this semantic data? (1)

sffubs (561863) | more than 9 years ago | (#11370363)

Apart from RSS feeds, how can I use this data? I mean, I have RDF metadata available for pretty much every page on my website, but I haven't yet noticed anyone who actually reads it.

The semantic web seems like a good idea in principle, but I would really like to know just how I could use it in real life! Seriously, can anyone name a useful tool that relies on RDF feeds (again, aside from RSS-style stuff) or propose one that could? Perhaps if I saw a real application of the semantic web I would actually understand what RDF is actually all about.

Re:But how do I use this semantic data? (1)

crschmidt (659859) | more than 9 years ago | (#11370437)

Check out http://crschmidt.net/semweb/ for info on some of the projects I've worked on which use the semantic web.

The most interesting one, in my opinion, is lorebot [crschmidt.net] . Lorebot sits in a channel, and associates identified users to their FOAF files. Once it does this, it links them to a human readable description of HTMl about them, and, if possible, displays an image for them. Example output: online users [crschmidt.net] , personal output [crschmidt.net] .

There's also things like the FOAF or DOAP a matic: both of which take RDF and spit out a machine readable description. The Firefox plugins on my semweb page let you see in the corner of your browser when you have that information available.

There's more tools out there, but they don't tend to be as down to earth, because a lot of RDF data is in high-level stuff. The demonstrations are becoming a lot more usable though, and I expect that to continue over the next year.

Re:But how do I use this semantic data? (0)

Anonymous Coward | more than 9 years ago | (#11371327)

The semantic web isn't for human consuption. It's meant to be written in a markup language, such as RDF or DAML+OIL, which can produce an ontology of knowledge out of all the existing information on the www, so that "intelligent" software agents can make use of the vast resource that the web is.

Just my opinion, but... (3, Insightful)

crazyphilman (609923) | more than 9 years ago | (#11371671)

I think the "Semantic Web" sounds great on paper, and is the next big thing in university research departments and etc, etc, BUT I don't think it's going to end up seeing wide use. Here are my reasons, basically a list of things that I as a web developer would hesitate on.

1. The Semantic web seems to require a lot of extra complexity without much "bang for my buck". If I build a page normally, all my needs are already met. I can submit the main web page to search engines, prevent the rest from being indexed, figure out how to advertise my 'page's existence... I'm pretty much set. The extra stuff doesn't buy me anything. In fact, I definitely would NOT want people being able to find information on my site without going through my standard user interface. I WANT them to come in through the front door and ask for it.

2. Let's say people start using this tech, which I imagine would involve all sorts of extra tagging in pages, extra metadata, etc. Now you have to trust people to A) actually know what they're doing and set things up properly, which is a long shot at best, and B) not try to game the system somehow. On top of that, you have to trust the tool vendors to write bug-free code, which isn't going to happen. What I'm saying is that all these extra layers of complexity are places for bugs, screw-ups, and booby traps to hide.

3. And, the real beneficiary of these sorts of systems seems to be the tool vendors themselves. Because what this REALLY seems to be about is software vendors figuring out a new thing they can charge money for. Don't write those web pages using HTML, XML, and such! No, code them up with our special sauce, and use our special toolset to bake them into buttery goodness! Suddenly, you're not just writing HTML, you're going through a whole development process for the simplest of web pages.

Maybe I'm getting crusty in my old age, but it seems that every single year, some guy comes up with some new layer of complexity that we all "must have". It's never enough for a technology to simply work with no muss and no fuss. Nothing must ever be left alone! We must change everything every year or two! Because otherwise, what would college kids do with their excess energy, eh?

Sigh... Anyway, no matter what you try and do to prevent the Semantic Web from turning out just like meta tags, the inevitable will happen. You watch.

Re:Just my opinion, but... (3, Insightful)

l0b0 (803611) | more than 9 years ago | (#11372093)

The Semantic web seems to require a lot of extra complexity without much "bang for my buck". If I build a page normally, all my needs are already met.

How about the needs of the people actually using the page? If you don't care about the viewers, why bother putting it on the web?

I definitely would NOT want people being able to find information on my site without going through my standard user interface. I WANT them to come in through the front door and ask for it.

That sounds just like the kind of site I get pissed off at, when being redirected to the main page after finding the page I really want via Google. Forcing visitors to jump through hoops has never been popular.

Now you have to trust people to A) actually know what they're doing and set things up properly, which is a long shot at best, and B) not try to game the system somehow.

As a web developer, you probably already know what kinds of ugly designs there are out there. And yet, by some kind of magic, there are companies which create searchable indexes of these pages, and it just works [google.com] . One of the benefits of this technology I expect to see in search engines shortly, is the possibility of semantic searches. How would you go about, today, looking for a bike magazine called "Encyclopedia" (I've tried)? Or research resultat relevant to your latest blog entry? Or the cheapest direct or indirect first class return ticket from London to New Delhi departing between one hour from now and 9 a.m. Thursday, with return between three and five days later, no smoking all the way?

Re:Just my opinion, but... (1)

crazyphilman (609923) | more than 9 years ago | (#11403299)

"How about the needs of the people actually using the page? If you don't care about the viewers, why bother putting it on the web?"

Anything more complex than flat HTML is actually going to require the developer to retain some control over how the user views the pages. For example, take a page that allows you to submit an application online. The only appropriate place for a user to start is the start page of the application. NATURALLY I'm going to bounce you back to the beginning.

Anytime you try to do ANYTHING complicated, you're going to have to take control of the user's experience. This is as much for his benefit as yours, and it's a fact of life for web developers.

"That sounds just like the kind of site I get pissed off at, when being redirected to the main page after finding the page I really want via Google. Forcing visitors to jump through hoops has never been popular."

Tough nuts. The customer is not always right.

As for the rest, it's all possible with what we already have. Nothing you've described is aided or solved by the "Semantic Web".

Remember my comment about "gaming the system"? What prevents a porn site from creating a semantic web setup which says it's a bicycle site? In no time at all, you're wading through the same dreck you had with regular search engines.

The solution is NOT to create a "semantic web" but to improve YOUR SEARCH SKILLS. It's not my duty as a developer to hold your little hand and absolve you of the need to develop your skills. It's my duty to create web pages that do what they're supposed to do, every time, never breaking, never crashing, with consistent, repeatable, predictable results.

If someone like you gets bent out of shape because I'm enforcing the proper working of my page, well, tough. I don't need guys like you on my site anyway, you spend too much time trying to tinker around with it and thus, are a royal pain in the ass.

Finding Nemo (0)

Anonymous Coward | more than 9 years ago | (#11373045)

Any news on that down there?
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>