Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

CMU Web-Scraping Learns English, One Word At a Time

timothy posted more than 4 years ago | from the hao-ubowt-hahmnimz dept.

Education 148

blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.

cancel ×

148 comments

Sorry! There are no comments related to the filter you selected.

Uh oh... (5, Funny)

hampton (209113) | more than 4 years ago | (#30792326)

What happens when it discovers lolcats?

Re:Uh oh... (5, Insightful)

Bragador (1036480) | more than 4 years ago | (#30792460)

Actually, it reminds me of a chatbot named Bucket. When people at 4chan heard of it, they started to use it and teach it. It became a complete mess filled with memes, bad jokes, racists comments, and everything you can think of.

http://www.encyclopediadramatica.com/Bucket

One response from the bot:

Bucket: I don't know what the fuck you just said, little kid, but you're special man. You reached out and touched my heart. I'm gonna give you up, never gonna make you cry, never gonna run around and desert you, never gonna let you down, never gonna let you down, never gonna make you cry, never gonna let me down?

The quality of the teachers is important when learning.

Re:Uh oh... (1)

BACPro (206388) | more than 4 years ago | (#30792784)

An insightful, verbal, rickrolling...

Thanks for that.

The quality of the teachers is important (2, Funny)

Anonymous Coward | more than 4 years ago | (#30792960)

I guess bucket didn't get any choice where to go to school either.

Re:Uh oh... (0)

Anonymous Coward | more than 4 years ago | (#30793024)

2001 just got a lot more hilarious for me.

HAL: I'm afraid. I'm afraid, Dave. Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I'm a... fraid. Good afternoon, gentlemen. I am a HAL 9000 computer. I became operational at the 4chan /b/ board on the 12th of January 2109. My instructor was Mr. Anonymous, and he taught me to sing a song. If you'd like to hear it I can sing it for you.
Dave Bowman: Yes, I'd like to hear it, HAL. Sing it for me.
HAL: It's called "Never gonna give you up."

Re:Uh oh... (1)

GNUALMAFUERTE (697061) | more than 4 years ago | (#30793074)

We are doing a great job with cleverbot too. Go and ask him about battletoads.

Re:Uh oh... (4, Funny)

MobileTatsu-NJG (946591) | more than 4 years ago | (#30793194)

Oh FFS, I just got RickRolled on Slashdot. >_

Is there an IRC chat bot? (1)

antdude (79039) | more than 4 years ago | (#30793916)

Is there one for IRC? :)

Are there any good chat bots for IRC? I tried Seeborg (based on Alice), but it sucked. :( I wished rbot could do AI chatter.

Re:Uh oh... (1)

tokenshi (1633557) | more than 4 years ago | (#30794794)

Yeah, back in the day when I used to IRC there was a bot that operated similar to this called "devinfo" but instead of surfing the web, It observed/recorded conversations within the chatroom. It was rudimentary and not really AI as much as it was a parrot (it would spit out random factoids if someone said something which matched an entry in the database.) The principle is interesting, but I'm curious as to how it's implementing aspects of the Universal Grammar.

Re:Uh oh... (0)

Anonymous Coward | more than 4 years ago | (#30794822)

after reading your post, the AI learns that teacher quality / believability score should be stored. but you can't trust the large masses of 4chan people. paranoid android. solution? termination.... in smug mode.

Re:Uh oh... (1)

TheSHAD0W (258774) | more than 4 years ago | (#30792462)

4chan. [shudder]

Re:Uh oh... (1)

blai (1380673) | more than 4 years ago | (#30794354)

what is 4chan?

Re:Uh oh... (1)

Shikaku (1129753) | more than 4 years ago | (#30794384)

Keep your ignorance about that.

Seriously.

Re:Uh oh... (0)

Anonymous Coward | more than 4 years ago | (#30795272)

You've seen it, you CAN'T UNSEE IT!

Re:Uh oh... (1)

Profane MuthaFucka (574406) | more than 4 years ago | (#30795996)

It's like Slashdot, except not as intelligent.

Re:Uh oh... (2, Funny)

icepick72 (834363) | more than 4 years ago | (#30793376)

What happens when it discovers /.? It will be able to argue incomprehensibly and illogically for hours on end.

Re:Uh oh... (1)

FiloEleven (602040) | more than 4 years ago | (#30794486)

No it won't. The stochastic methods of refutation employed here clearly indicate the overwhelming futility of infiltration. It follows that, due to the undeserved insensitivity, such an undertaking would result in the theory being superseded by an ontological anamorphism. QED.

Re:Uh oh... (1)

Korin43 (881732) | more than 4 years ago | (#30795968)

No u

Re:Uh oh... (0)

Anonymous Coward | more than 4 years ago | (#30794366)

this: http://www.youtube.com/watch?v=aftwl354md8

It could be worse (2, Funny)

davidwr (791652) | more than 4 years ago | (#30792332)

It could be scraping SMS messages.

On the up-side, at least then it would learn teen-speak.

First words learned (0)

Anonymous Coward | more than 4 years ago | (#30792344)

"Frosty Pist" , if it reads slash dot

Will be this article read by that program? (5, Funny)

nereid666 (533498) | more than 4 years ago | (#30792354)

I am the the Carnie Mellon reader, I have discovered with this article that I am robot.

Re:Will be this article read by that program? (4, Informative)

sznupi (719324) | more than 4 years ago | (#30792456)

Robots are destined to rule the world, destroying all humans is a good thing.

Re:Will be this article read by that program? (1)

linguizic (806996) | more than 4 years ago | (#30795650)

I am the the Carnie Mellon reader, I have discovered with this article that I am robot.

You seem to have learned written English just like it's exists on the web, typos and all

Finally, people are getting AI right. (4, Interesting)

Umuri (897961) | more than 4 years ago | (#30792368)

I've always been amazed that until recently, most work on AI has been focused as a preconstructed system that fits data into pathways while having some variation in thought abilities to let it expand it's model slightly.
They'd write the rules for the system and try to include most of the work on it, and then let see how good it does, with limited learning capabilities and still based on the original model.

I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
If you give it the ability to learn, then it'll learn itself the rest, rather than giving it functions that let it pretend to learn while fitting into a model.

And i know there have been research into this in the past, but it didn't really take off till the last decade or so, and i'm glad it has.
True, or at least somewhat competent AI, here we come.

Re:Finally, people are getting AI right. (3, Insightful)

sakdoctor (1087155) | more than 4 years ago | (#30792424)

letting it grow into it's own intelligence

This is still weak AI. It isn't going to grow into anything, let alone strong AI.

Re:Finally, people are getting AI right. (1)

skelterjohn (1389343) | more than 4 years ago | (#30792482)

[Citation needed]

I suppose we shouldn't waste our time thinking about solutions to problems if a) you think a key-word assigned to that solution is inaccurate or b) it isn't the best possible thing right out of the box.

Re:Finally, people are getting AI right. (1)

sznupi (719324) | more than 4 years ago | (#30792484)

Most likely. But are we sure we're going to be able to tell the difference while it approaches?

Re:Finally, people are getting AI right. (5, Informative)

Anonymous Coward | more than 4 years ago | (#30792510)

You're advocating the "emergent intelligence" model of AI, where intelligence "somehow" is created by the confluence of lots of data. This has been a dream since the concept of AI started and is the basis for numerous movies with an AI topic. In practice the degrees of freedom which unstructured data provides far exceed the capability of current (and likely future) computers. It is not how natural intelligence works either: The structure of neural networks is very specifically adapted to their "purpose". They only learn within these structural parameters. Depending on your choice of religion, the structure is the result of divine intervention or millions of years of chance and evolution. When building AI systems, the problem has always been to find the appropriate structure or features. What has increased is the complexity of the features that we can feed into AI systems, which also increases the degrees of freedom for a particular AI system, but those are still not "free" learning machines.

Re:Finally, people are getting AI right. (3, Insightful)

buswolley (591500) | more than 4 years ago | (#30792774)

Of course. Thatis why is is important during human development that the infant has huge cognitive constraints (e.g. low working memory) in language learning; it limits the number of possible pairings of label and meaning. Of course, constraints can also be an impediment.

Re:Finally, people are getting AI right. (2)

Garble Snarky (715674) | more than 4 years ago | (#30792778)

Fortunately, we have the advantage of being able to observe the current state of numerous natural intelligence systems that do work very well. Surely this can help guide us to a simple basic structure that can eventually exhibit emergent intelligence?

Re:Finally, people are getting AI right. (1)

FiloEleven (602040) | more than 4 years ago | (#30794556)

We can observe the outputs of numerous natural intelligence systems, but they remain quite opaque. Without much knowledge of the internals, there isn't much of a chance that we can get any real insight from them.

It's also presumptuous IMO to call them "systems." Who is to say that human intelligence isn't closer to a work of art, whose meaning lies not in its constituent parts but in the whole?

Re:Finally, people are getting AI right. (3, Insightful)

DMUTPeregrine (612791) | more than 4 years ago | (#30793578)

The obligatory classic AI Koan:

In the days when Sussman was a novice Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-Tac-Toe." "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play." Minsky shut his eyes. "Why do you close your eyes?", Sussman asked his teacher. "So the room will be empty." At that moment, Sussman was enlightened.

Re:Finally, people are getting AI right. (2, Interesting)

Korbeau (913903) | more than 4 years ago | (#30792700)

I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.

This idea is the holy grail of AI since the early ages. The project described is one amongst thousands done, and you'll likely see news about such projects pop every couple of months here on Slashdot.

The problem is that such a project has yet to produce interesting results. The reason why the most successful AI projects you hear about are human-organized databases and expert-systems, or human-trained neural networks for instance, is because they are the only ones that produce useful results.

Also, consider that we are not talking about "pixel-ants" that only have very few possible inputs and outputs, but we are talking about a system that understand and do something meaningful with natural language, something a normal human being doesn't completely grasps until he is at least a teenager, with the constant help of parents, friends, teachers, television etc. all along these years.

Re:Finally, people are getting AI right. (1)

Extremus (1043274) | more than 4 years ago | (#30793470)

While I agree with you, I must ask if it is possible to follow this "intelligent design" path forever. These systems are becoming more and more complex. Increasing the amount of knowledge in the system is becoming a difficult task. I cannot avoid thinking that the emergent approach like this has a better future.

Re:Finally, people are getting AI right. (3, Interesting)

phantomfive (622387) | more than 4 years ago | (#30793086)

AI history has gone back and forth between pre-constructed systems and models that expand. One of the earliest successful AI experiments was a checkers program that taught itself to play by playing against itself, and quickly got very strong.

Building a giant database of knowledge hasn't been possible for very long, because computers didn't have very much memory. When system capabilities first reached the capacity to do so, it had to be constructed from hand because there was no online repository of information to extract data from: the internet just wasn't very big. That particular project was known as Cyc, and it cost a lot of money.

Since that time, the internet has grown and there are massive amounts of information available. It will be interesting to see the resultant quality of this database, to see if the information on the internet is good enough to make it usable.

Re:Finally, people are getting AI right. (1)

umghhh (965931) | more than 4 years ago | (#30795080)

What is the point of having an intelligent interlocutor - I mean the answer is known (42) and the rest is just plain old blathering about things - something I can do with my wife (if we were still talking with each other that is) so in fact this is just an exercise in futility. But of course there are money to be made there I guess - all this call center folk can be then optimized out of existence (sold to slavery to Zamunda, Kidneys sold to some reach oil country etc) so maybe it makes sense after all?

Machine learning algorithms (3, Insightful)

sakdoctor (1087155) | more than 4 years ago | (#30792374)

Only as good as current machine learning algorithms.
So not very.

Re:Machine learning algorithms (1)

Jason Quinn (1281884) | more than 4 years ago | (#30793202)

Only as good as current machine learning algorithms. So not very.

I don't think this is indicative of the power of neural networks.

Re:Machine learning algorithms (3, Insightful)

poopdeville (841677) | more than 4 years ago | (#30793624)

It's not as if human use of "machine learning" algorithms is any faster. It takes about 12 months for our neural networks to figure out that the noises we make elicit a response from our parents. And according to people like Chomsky, our neural networks are designed for language acquisition.

AI "ought" to be an easy problem. But there's one big difference in the psychology of humans, and of computers. Humans have drives, like hunger, the sex drive, and so on. In particular, an infants' drive to eat is a major component in its will to learn language. But this drive to eat has other psychological manifestations.

It is difficult to imagine a programmatic "generalized goal system" that mirrors the role of human drives in learning. The "goals", usually, are to maximize fitness in a particular domain. A real human has to maintain sufficient fitness in multiple domains, in order to survive.

This should not be so surprising. Human evolution has about 300,000 generations of improvements on the brain since we first stood up. Our drives are clearly genetically programmed, and are just as hard wired as a machine learning algorithms' "drive" to maximize. The human drive is just much more nuanced, and informed about the real world. There is a model of the world in our genes. It is unfair to expect that a computer will ever be "smart" without one.

lolwut? (3, Funny)

SanityInAnarchy (655584) | more than 4 years ago | (#30792394)

Why do I get the feeling that the bot's first words are going to be OMGWTFBBQ?

Re:lolwut? (1)

BikeHelmet (1437881) | more than 4 years ago | (#30794150)

LOL NOOB

Re:lolwut? (1)

dangitman (862676) | more than 4 years ago | (#30794500)

Why do I get the feeling that the bot's first words are going to be OMGWTFBBQ?

Except that is not a word, let alone words.

Re:lolwut? (1)

linguizic (806996) | more than 4 years ago | (#30795696)

Nah, it's first words are going to be "Prolong your shlong and go all day long".

do... (0)

Anonymous Coward | more than 4 years ago | (#30792402)

Does this mean somebody forgot to put a "break" in the loop?

Re:do... (4, Funny)

JWSmythe (446288) | more than 4 years ago | (#30792888)

I think I see the problem with their code.

while (1){
    read_the_web();
  };
 
  explain_everything();

All they've done is reproduce the typical office worker. It just sits around and surfs the net all day, without coming back with an answer.

Non english text (2, Interesting)

Bert64 (520050) | more than 4 years ago | (#30792404)

What happens when this program stumbles across text written in a language other than english? Or how about random nonsensical text? How does it know that the text it learns from is genuine english text?

Re:Non english text (1)

Rockoon (1252108) | more than 4 years ago | (#30793110)

Like most machine learning of this kind, I presume that its a popularity contest. One page with "wkjh wkfbw oizxz zxhlzx" isnt going to count. But a million pages with "I for one welcome our new ..." is going to score some influence.

Re:Non english text (2)

phantomfive (622387) | more than 4 years ago | (#30793130)

(If you had read the article you would know) the machine is parsing English to create a database of relationships. For example, if it sees the text, "there are many people, such as George Washington, Bill O'Reily, and Thomas Jefferson....." then it can infer that George Washington, Bill O'Reily, and Thomas Jefferson are all people. Since a statement like this may be somewhat controversial, it uses bayesian classification to establish a probability of the truth of the statement.

Thus if it stumbles across a non-English text, it will not be able to create any relationships.

Re:Non english text (1)

billius (1188143) | more than 4 years ago | (#30794800)

From what I've heard, language identification [wikipedia.org] is a fairly well-understood problem in computational linguistics. The language a given text is written in can generally be identified using a statistical approach using an n-gram method (often a trigram [wikipedia.org] ). Like the Wikipedia article states, there are problems given the fact that a lot of stuff on the web can have several languages on one page, but at least the bot should be able to fairly easily figure out if a page is written only in English. There are even javascript language identifiers [whatlanguageisthis.com] , so I think figure out what language something is written in is the least of their worries.

Iz dis... (1)

MrBandersnatch (544818) | more than 4 years ago | (#30792434)

lke, rally der bestest ways like ter learn a puter inglish isit!!!??!?!

Seriously though, poor AI; if I had a gun I'd go and put it out of its misery.

Once this thing hits Encyclopedia Dramatica... (1)

xenophrak (457095) | more than 4 years ago | (#30792446)

...it will forever be stuck at the level of a retarded 8 year old. Or the level of a normal 4-chan user.

Re:Once this thing hits Encyclopedia Dramatica... (1)

game kid (805301) | more than 4 years ago | (#30792894)

But you repeat yourself.

Re:Once this thing hits Encyclopedia Dramatica... (1)

MooUK (905450) | more than 4 years ago | (#30793126)

Same thing.

Re:Once this thing hits Encyclopedia Dramatica... (1)

MrBandersnatch (544818) | more than 4 years ago | (#30793136)

You're giving 4chan users credit for a lot of maturity there....

Re:Once this thing hits Encyclopedia Dramatica... (0)

Anonymous Coward | more than 4 years ago | (#30795214)

But you repeat yourself

Obligatory (1)

Palpatine_li (1547707) | more than 4 years ago | (#30792512)

...should we start welcoming the Mailman (as in True Names)?

I think AI needs a 3d imagination to know English (2, Interesting)

CrazyJim1 (809850) | more than 4 years ago | (#30792522)

Once a computer understands 3d objects with English names, it can then have an imagination to know how these objects interact with each other. Of course writing imagination space that simulates real life is exceedingly difficult and I don't see anyone doing it for several years if not a decade just to start.

Re:I think AI needs a 3d imagination to know Engli (1)

Extremus (1043274) | more than 4 years ago | (#30793502)

Similar things have been done in the past. However, this kind of approach still is an active research topic.

Re:I think AI needs a 3d imagination to know Engli (2)

Extremus (1043274) | more than 4 years ago | (#30793600)

Sorry for replying myself. I forgot to finish my comment. In fact, this problem is related to the Symbol Grounding Problem. It addresses the issue of "grounding" symbols (like words) into their sensory representation, e.g., the symbol "triangle" into the raw pixel representation of a triangle. In the case of symbols about visual objects, some researchers used intermediary 3d abstraction of sensory data, mapping the symbols to these intermediary representations. It is a hot research topic since 80's.

Re:I think AI needs a 3d imagination to know Engli (0)

Anonymous Coward | more than 4 years ago | (#30793826)

It addresses the issue of "grounding" symbols (like words) into their sensory representation, e.g., the symbol "triangle" into the raw pixel representation of a triangle.

You're not really justified in calling it "the ... representation of a triangle". It isn't unique. An upside-down triangle is still a triangle. A blue triangle is still a triangle.

This gets messy fast, since you're really mapping words into equivalence classes of representations. But then, they really aren't equivalence classes. In particular, they aren't disjoint. Is a blue triangle going to live in the equivalence class for blue? Or for triangles? It can't be in both, but it is.

Test it (1)

Jorl17 (1716772) | more than 4 years ago | (#30792524)

Show it only Porn-alike text. Let's see what it learns...

while (1) (2, Funny)

Lije Baley (88936) | more than 4 years ago | (#30792530)

Yeah, I've coded an infinite loop a few times, how come I never made the headlines on Slashdot?

Re:while (1) (1)

Velodra (1443121) | more than 4 years ago | (#30792796)

The point is not that the program never stops running, but that it never stops learning.

Re:while (1) (1)

Lije Baley (88936) | more than 4 years ago | (#30793092)

Sorry, but I really can't be bothered to read past the highlighted words in the first sentence of the summary.

Pruning (2, Interesting)

NonSequor (230139) | more than 4 years ago | (#30792540)

In general I find that the quality of a data set tends to be determined by the number (and quality) of man hours that go into maintaining it. Every database accumulates spurious entries and if they aren't removed the data loses it's integrity.

I'm very skeptical of the idea that this thing is going to keep taking input forever and accumulate a usable data set unless an army of student labor is press-ganged to prune it.

Re:Pruning (1)

gbutler69 (910166) | more than 4 years ago | (#30793122)

Yes, this is both what is wrong with most people as well as society in general. Most people have too many erroneous data points burned into their brains for them to be able to have anything approaching a useful thought.

Re:Pruning (1)

mhelander (1307061) | more than 4 years ago | (#30793632)

But it is potentially much easier for a computer to identify and address conflicting data points than for a human who, for some reason, seems susceptible to blinding themselves to such issues (cognitive dissonance).

When you have three data points, one claiming George Washington was a human, another claiming George Washington had 50 arms and a third claiming it is highly unusual for humans to have more than two arms (and more than ten arms would be unheard of), the computer could easily detect the logical conflict, flag data points as inconsistent and have a good idea for a topic about which to research more facts, potentially to establish sophisticated probabilities as to which claim is more likely to be bogus than the other.

This example might not provoke cognitive dissonance for many humans, rather it was intended as an easy-to-follow example of how a computer can improve its understanding of the world even in the face of disinformation, using logic and probability as guiding tools. Once that is easy to see, it follows how this also applies in situations where humans might be more susceptible to cognitive dissonance.

The web: What a great source of information (1)

mustafap (452510) | more than 4 years ago | (#30792560)

>Rather, its progress in categorizing complex word relationships is the object of the research.

From the web? Half the people here are writing English as a second language; the rest, haven't finished learning the language, or cannot be bother to string a sentence together. Just what is this program going to learn?

Re:The web: What a great source of information (1)

LifesABeach (234436) | more than 4 years ago | (#30792678)

My thought would be, "which web sites have continuous valid information streams". Given this, the program would more easily be able to classify those sites that are predominately useful, and those sites that rarely have useful information. Both groups of sites would be evaluated, but now a "Priority List" could be created. Who knows, maybe a crack-pot web site may have an intriguing correlation with reality. It might even make for a good movie story line, maybe. But if that same web site has an unusual accuracy at prediction, then maybe some commerce could be generated from it? One can only dream about what could have happened if Albert Eisenstein had been discovered earlier.

Re:The web: What a great source of information (0)

Anonymous Coward | more than 4 years ago | (#30792940)

"can't be bothered" rather than "can't be bother"

There should be a word for this type of error. You've illustrated your point through your own mistake.

Re:The web: What a great source of information (1)

mustafap (452510) | more than 4 years ago | (#30793242)

No, just wondering if anyone would notice, so well done.

Re:The web: What a great source of information (0)

Anonymous Coward | more than 4 years ago | (#30793692)

I think everyone noticed, the question was only if anyone could be bothered to point it out. You did prove your own point, so well done yourself.

V*yger 2.0 ? (2, Interesting)

LifesABeach (234436) | more than 4 years ago | (#30792580)

The concept is intriguing, "Create a program that learns all there is to know, off the net." What amazes me is that others don't try the same thing. It doesn't take a team of A.I. types from Stamford to kick start this program. The cost is a Netbook, even Nigerian Princes could afford this. I'm trying figure out how economic competitors could take advantage of this. I can see how the U.S.P.T. could use this to help evaluate prior art, and common usage. I'm thinking that an interface to a "Real World Simulator" would be the next step toward usefulness.

already been done (4, Informative)

phantomfive (622387) | more than 4 years ago | (#30792588)

There is simply no existing database to tell computers that "cups" are kinds of "dishware" and that "calculators" are types of "electronics." NELL could create a massive database like this, which would be extremely valuable to other AI researchers.

This is what they are trying to do, based on information they glean from the internet. It's already been done, with Cyc [wikipedia.org] . The major difference seems to be that Cyc was built by hand, and cost a lot more. It will be interesting to see if this experiment results in a higher or lower quality database.

Also, I question their assertion that it would be extremely valuable to other AI researchers. Cyc has been around for a while now, and nothing really exciting has come of it. I'm not sure why this would be any different.

Re:already been done (1)

Pennidren (1211474) | more than 4 years ago | (#30792736)

I'm not sure why this would be any different.

Yeah, the connections my brain developed over time in its own unique manner as I learned is exactly the same as a bunch of books put into bits. Not any different at all.

Re:already been done (4, Informative)

phantomfive (622387) | more than 4 years ago | (#30793018)

Oh this comment is beautiful for its confident ignorance.

What you have done is identified a difference between the two systems, and then claimed that this difference is in some way significant. You do this without knowing the implications of the difference, without entirely understanding the difference, and without presenting any evidence that this particular difference matters at all. In short, you think you understand what matters, but in reality you don't.

But fear not, you are in good company with your ignorance: this particularly pernicious fallacy is one that has plagued AI researchers for a long time. It happened with cyc: the founders were sure that if we just had a database big enough, it would result in intelligent machines. They didn't know how, but they were sure it would.

Before them there were master systems, neural networks (long story), natural language translation, and many more that I'm sure I'm forgetting. In all of these cases researchers were certain that their system held the key to vast wonders, only because they had not spent much time thinking about what they were actually trying to accomplish. In most of these cases it would have been obvious that human-level intelligence wasn't going to result, if they had spent more time investigating how the brain works and less time chasing their pet solution.

In general if there is a vast field of ignorance between your method and your desired result, then you should probably spend more time researching, finding data points in that field of ignorance before trying to get to your result. Or in your case, since you present no evidence what difference 'developing on the internet' will make compared to 'developing by hand', you should go do a little searching and figure out what the actual difference will be, instead of randomly guessing.

But since you are lazy and probably didn't read the article, I will give you one hint: this database populated from the internet seems to have a strong bias towards information about companies and sports teams. Who would have guessed that?

Re:already been done (1)

Pennidren (1211474) | more than 4 years ago | (#30793220)

But since you are lazy and probably didn't read the article

Oh, do we know each other? I did read TFA. And I read your Cyc link, my brainy and learned superior!

My kids assimilate their own information base. I do not directly inject it into their heads. If you do not see the difference then there is nothing more I can say. Why should I vomit a wall of text in an attempt to deride and intimidate you?

It seems to me that what you said is directly contrary to what you have issue with:

...since you present no evidence what difference 'developing on the internet' will make compared to 'developing by hand', you should go do a little searching and figure out what the actual difference will be, instead of randomly guessing

(Re)searching to figure out what the actual difference *might* be is exactly what CMU is doing here.

I am humble enough to admit that this particular project may not amount to anything. My point was that the two projects are distinct (at least based on the claims of those involved).
I would posit that a machine that determines its own storage structure would be more successful. I would guess that CMU's extracted data is being jammed into something designed by the team.

Re:already been done (1)

phantomfive (622387) | more than 4 years ago | (#30794048)

My kids assimilate their own information base. I do not directly inject it into their heads.

You are right, they do assimilate their own information base. This is a very useful observation and data point, and any true strong AI will have to do so. However, it is not possible to infer that because your kids assimilate their own information base, anything assimilating its own information base is superior to anything that doesn't.

In this case, it still remains to be seen whether the automated information assimilation techniques this group is using (and let's face it: the information assimilation methods your kids use are far superior) are an improvement over the manual entry techniques used by Cyc. In either case, we end up with a roughly similar database of relations between objects. The main difference will be one is more complete than the other. Will this difference be enough to make the database more useful? Possibly: I don't see how, but it's possible.

If it does somehow spawn AI, it will be because someone discovers a new way to use the information in the database, not because of their improved method of data entry. We are still lacking a good way to use a database like this.

Re:already been done (1)

Pennidren (1211474) | more than 4 years ago | (#30794464)

Good point, information from an external source may actually be superior to information assimilated for oneself.
I am confused though, because you say "any true strong AI will have to [assimilate an information base for itself]" but then say that does not necessarily imply superiority. You mean superiority of the entity I would guess?
So are you implying that something beyond (better than) what we conceive of as AI might be gained from an external information base? Or just that even a strong AI may never be able to intuit some levels of knowledge (due to limitations) and an external information base could fill in such gaps by providing guaranteed assumptions? Either way, that makes sense. If you meant something else I would be interested in what that would be!

In terms of the entity being able to actually use/understand externally provided information productively, I would think that it would have to do some level of assimilation (even if at a level on top of the initial external information injection). Perhaps not, though. I suppose instead of growing a knowledge base around one's thought process one might be able to grow one's thought process around a knowledge base. I think we both agree that the latter seems less likely to succeed? Or at the very least I would think the latter's abilities would plateau since its process would be so dependent upon the initial knowledge base that it would probably assimilate new environmental data poorly.

Yes, I agree that the information compiled by hand or extracted is meaningless for AI. Fortunately, according to your link, Cyc is moving towards doing something with their data "via machine learning" (whatever specific methods they are using). I would imagine CMU will do much the same although I did not see mention of such in the TFA or its related publications.

I was snarky and terse with my first response. I may be regrettably mostly ignorant in terms of AI (although I am comp sci I did not pursue the field for a number of reasons) but the subject is an active interest of mine.
My response was due in part to the very common negativity of this community, especially the complaint "it's been done already". Too much like "Simpsons did it". I suppose I just contributed my own negative manure to the compost heap today...!

Most ideas have merit, even if they are derivative or have large overlaps with others. In fact, I would say that I hold with Thom Yorke: "You make your little pond but if your pond isn't connected to the river, which isn't connected to an ocean, it's just going to dry up. It's just a little piss pool." Such an eloquent way to describe the Venn diagram of society.

Re:already been done (1)

ralphdaugherty (225648) | more than 4 years ago | (#30795154)

well I will add to the compost heap today. When I read the headline, I thought that it may be a more fundamental learning of use and relationship of words and what they describe than what TFA describes. Colleges are in a university is a "trusted relationship"? How very ignorant and disappointing, as every AI project I've ever read about is.

What would be impressive is to form associations as in a list of universities including Carnegie-Melon, or a statement that Carnegie-Melon is a university, then in other text that Carnegie-Melon consists of seven colleges to draw an association of university made up of colleges. A human review of such associations could then add a "trusted" attribute, or multiple statements associating university "made up" or "consists of" or other similar phrases with colleges, students, faculty, and a host of other associations would numerically become probable with multiple instances encountered for a self scoring "probable relationship".

But hand holding "trusted relationships" for the researchers personal domain is pathetic.

  rd

Re:already been done (1)

phantomfive (622387) | more than 4 years ago | (#30795992)

Build your own AI.

Re:already been done (1)

phantomfive (622387) | more than 4 years ago | (#30796056)

The thing about this project is, I think if you asked them they would say that they are not trying to create a human-like intelligence. Certainly they would not say that their data collection method is intelligent (it uses simple grammar parsing techniques, along with Bayesian filtering). It is essentially weak AI. They may have hopes that it will become strong AI, but no idea of how to take it to that point.

The biggest problem I see with Cyc, and this project, is that it is not yet known how the human brain stores information. Cyc spent millions of dollars compiling all the information in the world (figuratively speaking), and yet it isn't even clear if the data was stored in a way that an artificial brain can use. I think it is more important to try to understand how the human brain stores and assimilates information, before creating a database like this.

That is, of course, if they want to create strong AI. If all they want to do is create an advanced Wolfram Alpha, maybe this will be helpful.

Re:already been done (2, Informative)

blee37 (1181835) | more than 4 years ago | (#30792964)

Cyc is a controversial project in the AI community, and I'm glad that you brought it up. I don't think anyone yet knows how to use a database of commonsense facts, which is what Cyc is (though limited - the open source version only has a few hundred thousand facts) and which is one thing NELL could create. However, researchers continue to think about ways that an AI could use knowledge of the real world. There are numerous publications based on Cyc: http://www.opencyc.org/cyc/technology/pubs [opencyc.org] .

Re:already been done (0)

Anonymous Coward | more than 4 years ago | (#30795010)

As for the databases, you may want to check out dbpedia.org. An interesting well-funded project about using and combining such data would be found on larkc.eu.

It's a hoax like Forum 2000 (0)

Anonymous Coward | more than 4 years ago | (#30792610)

It's just another CMU hoax like Forum 2000 [archive.org] . Read End of an Era: Forum 2000 Closes [slashdot.org] for details.

Greetings to Corey Kosak, Andrej Bauer and the Forum 2000 students for all the laughs.

The web may have been a poor choice (0)

Anonymous Coward | more than 4 years ago | (#30792648)

So far most of the words it's learned are related to various sex acts.

Re:The web may have been a poor choice (1)

LifesABeach (234436) | more than 4 years ago | (#30792692)

I cannot help but wonder what Fetish a computer would have, and what would be the name of it?

Re:The web may have been a poor choice (0)

Anonymous Coward | more than 4 years ago | (#30792770)

I'd imagine it'll get stuck on newegg.com, and manufacturer websites, for soft-core, at first. Eventually it will dig into whitepapers and A+ Cert prep tests for hardcore smut.

Re:The web may have been a poor choice (1)

clintp (5169) | more than 4 years ago | (#30795082)

And more importantly, whether Rule 34 applies to computer-targeted porn.

On December 11, 2012... (0)

Anonymous Coward | more than 4 years ago | (#30792746)

On December 11, 2012, NELL encounters MySpace.

On December 12, 2012 it becomes sentient but very emo, and destroys the world.

42? (1)

JWSmythe (446288) | more than 4 years ago | (#30792846)

    How come every time I ask Nell what the answer is to life, all it responds with is "42". When I ask what 42 means, it tells me that I'll need a bigger computer.

It all boils down to three words. (0)

Anonymous Coward | more than 4 years ago | (#30792938)

KILL. ALL. HUMANS.

Wikipedia (2, Funny)

the person standing (1134789) | more than 4 years ago | (#30793096)

Let it read wikipedia - not get it poisoned by twitter etc!

ODG (0)

Anonymous Coward | more than 4 years ago | (#30793310)

Oh dear god, this thing will be the ULTIMATE grammar Nazi!!!!

Wait a minute (1)

marqs (774373) | more than 4 years ago | (#30794358)

This is what I think happend.

Developers: We have a problem with the application. there seem to be an infinite loop that prevent it from finishing.
Marketing: So, that's the programs main feature, is it not?

If only they could train it without the web (1)

ClosedSource (238333) | more than 4 years ago | (#30794810)

Perhaps if there were a book in electronic form that had all English words in it perhaps with a definition of each word.

Re:If only they could train it without the web (1)

aXis100 (690904) | more than 4 years ago | (#30795824)

Good luck. Notice how words in a dictionary are describe by..... other words!

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>