Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Using The Web For Linguistic Research

timothy posted more than 9 years ago | from the that's-rediculous dept.

The Internet 205

prostoalex writes "The Economist says linguists are gradually adopting the World Wide Web as a useful corpus for linguistic research. Google is used, among other resources, to research how the written language evolves and how some non-standard examples of usage become more or less acceptable (The Economist quotes the phrase 'He far from succeeded,' where 'far from' is used as an adverb). LanguageLog is a resource linked in the article, where linguists discuss current peculiarities of the English language."

Sorry! There are no comments related to the filter you selected.

They should probably avoid Slashdot (4, Funny)

Peter Cooper (660482) | more than 9 years ago | (#11446691)

It's probably a good thing that they steer away from Slashdot as a corpus of English usage. Or, should I say, in SOVIET RUSSIA it's best Slashdot stays away from THEM! Or is it that only old people use the Internet as a corpus of the English language while pouring hot grits down a naked and petrified Natalie Portman's pants?

Re:They should probably avoid Slashdot (1)

Frogbert (589961) | more than 9 years ago | (#11446811)

Thats true but only in japan.

Re:They should probably avoid Slashdot (2, Interesting)

mizhi (186984) | more than 9 years ago | (#11446831)

Hopefully, they'll harvest well written webpages for data and not those of 13-year old girls drooling over Orlando Bloom, AOL users, or porn sites.

Actually, I take that back.

It could actually be very interesting from a lexical or morphological point of view. The phenomenon of abbreviating words, such as "u" for "you" or "ur" for "you're" or "ru" for "are you." Language teachers in classrooms have been seeing it crop up in actual homework assignments. While reading such language may be like having glass wiped across the eyes of people educated before computers came into wide-spread use, it's interesting how it's affecting younger people.

There's a collision between the high tech world children grew up with today and the way language is taught in schools in a similar way to the situation with how students speak on the street versus how they are expected to speak in the classroom or the professional world. Remember when it was proposed that ebonics be considered a valid dialect for using in the classroom?

What would be even more interesting to study is how keyboard effect the structure of languages. It seems that people are under the assumption that languages are static and don't change, but this is incorrect.

Because the keyboard is still the main way of inputing information into the computer, people take short cuts and I would be surprised if that didn't start to effect their use of language in other contexts.

I'm just rambling, but such studies would be akin to socialogical studies that look at the influence of technology on social organization.

Re:They should probably avoid Slashdot (2, Interesting)

Joe Tie. (567096) | more than 9 years ago | (#11446888)

Because the keyboard is still the main way of inputing information into the computer, people take short cuts

One thing that's always been at the front of my my mind, why aren't these kids learning how to type? Or at least to type with any reasonable amount of skill. The only computer I had as a child was a Commodore 64, and I was still faster than most of todays youth even with their abbreviations. I was somewhat lucky in that our schools somehow foresaw the advent of the home computer and made sure we knew how to type, but I'd certainly hope that held even more true in todays schools!

Re:They should probably avoid Slashdot (1, Insightful)

Anonymous Coward | more than 9 years ago | (#11446924)

One thing that's always been at the front of my my mind, why aren't these kids learning how to type?

Because, unlike the parent's assumption, the phenomenon isn't related to computers. It's related to text messaging. It might be just as fast to type "you" instead of "u" with a keyboard, but it's noticably slower on mobile phones, especially before predictive text became popular.

Furthermore, there is a limit on how many characters you can send in a single message. Most service providers automatically split long messages into multiple parts, but in the case where you are just scraping the limit, it might actually cost twiice as much to send a text message that says "you are" instead of "u r".

I'm not excusing it, I hate reading it myself, it makes people look illiterate and, sadly, in many cases people really aren't able to express themselves in normal English. I know people in their mid twenties who type "his" when they mean "he is", and, to use an example I received recently through email, "gess how i sore." when they meant "guess who I saw?". No, I didn't make it up, and no the person wasn't joking.

Re:They should probably avoid Slashdot (1)

mizhi (186984) | more than 9 years ago | (#11447162)

Because, unlike the parent's assumption, the phenomenon isn't related to computers. It's related to text messaging. It might be just as fast to type "you" instead of "u" with a keyboard, but it's noticably slower on mobile phones, especially before predictive text became popular.


I'd still advance the argument that extensive use of computers by a larger portion of the population has contributed to the phenomenon. I remember seeing those abbreviations before cell-phone use became almost ubiquitous.

It's also not just little things like modifying the spelling of words, but also syntax and morphological changes.

Re:They should probably avoid Slashdot (1)

Moderatbastard (808662) | more than 9 years ago | (#11446927)

It seems that people are under the assumption that languages are static and don't change, but this is incorrect.
I for one am not under that misconception. However I disagree with many /.ers who argue that the fluidity of language means that making it up as you go along is acceptable. "It's" is not a posessive, and you can't loose your shoe if the laces are lose.
would be akin to socialogical studies
"sociological".

Re:They should probably avoid Slashdot (1)

mizhi (186984) | more than 9 years ago | (#11447143)

I for one am not under that misconception. However I disagree with many /.ers who argue that the fluidity of language means that making it up as you go along is acceptable. "It's" is not a posessive, and you can't loose your shoe if the laces are lose.


Right now, yes. But in a generation or two, perhaps they'll lose the distinction.

"sociological"


Whoops. :-)

Re:They should probably avoid Slashdot (1)

jez9999 (618189) | more than 9 years ago | (#11446945)

What do you think Natalie Portman would do if she actually viewed Slashdot some time?

Re:They should probably avoid Slashdot (1)

ggvaidya (747058) | more than 9 years ago | (#11447009)

Do you mean as an actress seeing a (rather weird) subset of fan, or as a psychiatrist [wikipedia.org] ?

Re:They should probably avoid Slashdot (0)

Anonymous Coward | more than 9 years ago | (#11446963)

She's naked with pants on?

Hey Timothy! (-1, Troll)

Anonymous Coward | more than 9 years ago | (#11446692)

You didn't answer my question. Do you think Michael is a wanker?

Btw, I hope the GNAA are all killed. Lets lynch them!

Indeed (4, Funny)

Pan T. Hose (707794) | more than 9 years ago | (#11446693)

Indeed what their sayin is true. U can learn English very well, especially grammer readin /. frist psots. Teh intarweb seems to certainly kick arse for that sorta research. Very 1337 articel. Thx d00dz.

Lameness (1)

Pan T. Hose (707794) | more than 9 years ago | (#11447123)

Indeed what their sayin is true. U can learn English very well, especially grammer readin /. frist psots. Teh intarweb seems to certainly kick arse for that sorta research. Very 1337 articel. Thx d00dz.

I have just read the above and I must admit it: I am teh lame, amn't I?

I rue the day... (3, Funny)

sandstorming (850026) | more than 9 years ago | (#11446694)

When we might actually say words like 'lol' out aloud. Imagine a deal going down between two mining companies and the CEO of one company with a straight face, and deadly serious demeanour saying to the cameras: "Despite many thinking we pwned them in the deal, we believe it came out leet for every1"

Re:I rue the day... (3, Interesting)

Peter Cooper (660482) | more than 9 years ago | (#11446701)

When we might actually say words like 'lol' out aloud.

I've heard it done. I've also heard 'roffle' (an attempt at pronouncing ROTFL I guess). Bizarre, really, since those terms are attempts to turn physical real-life actions into a verbal-only form.

Re:I rue the day... (1)

Rie Beam (632299) | more than 9 years ago | (#11446750)

I guess I'm sorta immune to that stuff, then - maybe I'm the only one, but when I see something like "lol" or "rofl", it translates in my head to more of an idea than an actual sound - in essence, a loud "heh" rather than it's own word.

Re:I rue the day... (1)

Moderatbastard (808662) | more than 9 years ago | (#11446936)

I've heard it done. I've also heard 'roffle' (an attempt at pronouncing ROTFL I guess). Bizarre, really, since those terms are attempts to turn physical real-life actions into a verbal-only form.
Huh? It's an acronym - Rolling On The Floor Laughing. First etters of the words. Nothing to do with representing physical actions, just words.

Re:I rue the day... (0)

Anonymous Coward | more than 9 years ago | (#11446997)

Huh? It's an acronym - Rolling On The Floor Laughing. First letters of the words. Nothing to do with representing physical actions, just words.

"Rolling on the floor laughing" is a physical action. As is "laughing out loud". If you're talking to me face-to-face you shouldn't be saying "LOL", you should just be laughing.

Re:I rue the day... (1)

Moderatbastard (808662) | more than 9 years ago | (#11447082)

"Rolling on the floor laughing" is a physical action.
Yes it is, Mr State-the-Obvious, but ROTFL relates to the action itself how? Oh that's right, it doesn't. It relates to the arbitrary linguistic units commonly used to describe said actions - spefically, in modern English. If it related directly to the action it would be the same in other languages e.g. German, not "KADBL" if memory serves well.

If you're talking to me face-to-face you shouldn't be saying "LOL", you should just be laughing.
And I probably would be.

In contrast, this :-) directly represents a physical action. No word is used as an intermediary. A Chinese baby sees the same meaning as a Harvard professor. Understand the difference now?

Re:I rue the day... (1)

Peter Cooper (660482) | more than 9 years ago | (#11447214)

but ROTFL relates to the action itself how? Oh that's right, it doesn't. It relates to the arbitrary linguistic units commonly used to describe said actions - spefically, in modern English.

Good point. I'd disagree with your comment that it 'describes said actions', though. I'd say that Rolling On The Floor Laughing is more an idiom, since it's very rare anyone ever actually rolls on the floor, and isn't really describing that process ever happening. Anyway, I see the difference you're trying to pick out.

Re:I rue the day... (1)

dapyx (665882) | more than 9 years ago | (#11446740)

I heard 'lol' actually being said a few times. :-)

Re:I rue the day... (1)

UserGoogol (623581) | more than 9 years ago | (#11446967)

Ell-oh-ell or lawl?

Re:I rue the day... (1)

initsix (86050) | more than 9 years ago | (#11446764)

Yes that day is here, here is one (lame) example.
http://home.planet.nl/~cruij087/vin3.mp3

Re:I rue the day... (2, Informative)

JustKidding (591117) | more than 9 years ago | (#11446966)

You may be unaware that "lol" actually is a correct word in the dutch language, meaning (having) fun.

lol (de ~) 1 [inf.] plezier
(taken from, www.vandale.nl, an authoritive dutch dictionary)

inner city teens (1)

Anonymous Coward | more than 9 years ago | (#11446699)

more than just web users are adjusting to this shift in language. i countinously question my co-workers (social workers) in telling the youth what is propper and not. if a launguage does not evolve then it dies. using words, moslty slang and rap song lyrics, is becoming more than just the normal, it is becoming the standard.

Re:inner city teens (3, Insightful)

Kafir (215091) | more than 9 years ago | (#11446857)

i countinously question my co-workers (social workers) in telling the youth what is propper and not.

I'm glad they're telling the youth what is proper; you're clearly incompetent to do so.

using words... is becoming more than just the normal, it is becoming the standard.

Is that right? Using words is "becoming more than just the normal"? I've been using words for years now; I'm glad to hear that's becoming the standard. Your post is a perfect example of why people should learn to write in something approaching standard English. Your meaning is barely intelligible, and you sound like an idiot.

Re:inner city teens (0)

Anonymous Coward | more than 9 years ago | (#11446974)

if a launguage does not evolve then it dies.

That makes no sense. If a language stops being used then it dies. It doesn't have to "evolve".

Oh, and whether you like it or not, there are loads of people who noticed your atrocious English (it sticks out like a sore thumb) and automatically decided you were an idiot before they'd even considered your point. I think it's irresponsible to let kids suffer the same fate just because you don't value the ability to express yourself clearly.

I couldn't even decipher what you meant by "using words" because your English skills are that bad!

Epiphany (2, Funny)

phaln (579585) | more than 9 years ago | (#11446705)

It came to me that the English language was in deep trouble when people started saying "rotfl" and "lol" in person. There seems to be kind of a backlash brewing though, with improved email composition styles dictated by employers, and such.

Google does it again (3, Interesting)

vladd_rom (809133) | more than 9 years ago | (#11446708)

This is not the first time when Google (and search engines in general) changed how we do things.

Nowadays copyrighters use Google to search for potential violations of their intelectual property. Plagiarism is easy to detect nowadays thanks to Google as well. Instead of using rather expensive [turnitin.com] systems in order to search for duplicate work, teachers are now one search away in distinguishing original work from the rest.

Re:Google does it again (1)

BReflection (736785) | more than 9 years ago | (#11446781)

While Google made it easy to search for copyright violations, Google also made it a helluva' lot easier to violate copyright.

Re:Google does it again (0)

Anonymous Coward | more than 9 years ago | (#11446932)

Nowadays copyrighters use Google to search for potential violations of their intelectual property.

Forgive me if you were being ironic, but did you actually mean copyrighter? Or did you mean copy writer? I have never seen "copyrighter" used as a synonym for "copyright holder", and your sentence makes sense using "copy writer", but I get the feeling you actually meant "copyright holder".

Re:Google does it again (0)

Anonymous Coward | more than 9 years ago | (#11447144)

Do you mean copyrights, trademarks or patents? If you do then say so, don't hide behind the "IP".

Don't trademark that! (1)

perhj (68103) | more than 9 years ago | (#11447002)

Please refrain from trademarking your 'unique' spelling of intellectual. Thank you.

Hey (0)

Anonymous Coward | more than 9 years ago | (#11446717)

Does He far from succeeded, sound totally fuckin retarded to anybody else?? Like something an idiot would say to try and seem intelligent?

Re:Hey (1)

muzzmac (554127) | more than 9 years ago | (#11446729)

Something like pluralising virus to Virii?

Re:Hey (1)

l3v1 (787564) | more than 9 years ago | (#11446762)

"It does ? ... It does." [Partridge@Equilibrium]

So, yes, it does seem a fracking uselessly mistyped/misknown/miswrote/misthought way to express oneself. But that is changed now, because some people with too much time on their hands think it is a new form of expression and this is the way the English language is changing. So now we are supposed to treat these insentient ideas as the new ways ? Bahh, get lost.

Re:Hey (0)

Anonymous Coward | more than 9 years ago | (#11446792)

Yup! "He was far from success.", or possibly "He was far from succeeding", sounds much better to me.

Interesting (0)

Anonymous Coward | more than 9 years ago | (#11446719)

This begs the question of how much "incorrect" use of a phrase is necessary for it to become the "correct" use of a phrase.

NB: I'm being ironic.

Exactly (1)

sakahna (597647) | more than 9 years ago | (#11446959)

English isn't my first language, so I often use Google to verify the use of an expression by comparing the number of hits I get for various forms, or as a "spell-checker" by using Google "Did you mean" suggestions to correct my spelling mistakes.
Lately, I find that some mistakes have become so "popular", that I can't do this anymore, because Google now recognized the mistake as a "valid" search word.

Re:Exactly (0)

Anonymous Coward | more than 9 years ago | (#11447080)

I can't do this anymore

I'm not entirely certain *, but I believe the word "anymore" is particular to American English and is considered incorrect in other dialects. The two words "any more" are considered correct everywhere as far as I know.

* Well, I immediately recognised it as an error, but I checked and it appeared in an American English dictionary, which was surprising. I am English, by the way.

*BSD be dyin' (2, Funny)

Anonymous Coward | more than 9 years ago | (#11446728)

It be now official. Netcraft gots confirmed, dig dis: *BSD be dyin'

One mo'e cripplin' bombshell hit da damn already beleaguered *BSD community when IDC confirmed dat *BSD market share gots dropped yet again, now waaay down t'less dan some fracshun uh 1 puh'cent uh all servers. Comin' on de heels uh a recent Netcraft survey which plainly states dat *BSD gots lost mo'e market share, dis news serves t'reinfo'ce whut we've knode all along. What it is, Mama! *BSD is collapsin' in complete disarray, as fittin'ly 'esemplified by failin' wasted last [samag.com] in de recent Sys Admin comprehensive netwo'kin' test. Man!

You's duzn't need t'be de Amazing Kreskin [amazingkreskin.com] t'predict *BSD's future. De hand writin' be on de wall, dig dis: *BSD faces a bleak future. In fact dere won't be any future at all fo' *BSD a'cuz *BSD be dyin'. Doodads is lookin' real baaaad fo' *BSD. As many of us is already aware, *BSD continues t'lose market share. Red ink flows likes some riva' of blood.

FreeBSD be de most endangered uh dem all, havin' lost 93% uh its co'e developuh's. De sudden and unpleasant departures uh long time FreeBSD developuh's Jo'dan Hubbard and Mike Smid only serve t'undersco'e da damn point mo'e clearly. Slap mah fro! Dere kin no longa' be any doubt, dig dis: FreeBSD be dyin'.

Let's keep t'de facts and look at da damn numbers.

OpenBSD leada' Deo states dat dere are 7000 users uh OpenBSD. How many users uh NetBSD is dere? Let's see. De numba' of OpenBSD versus NetBSD posts on Usenet be roughly in ratio uh 5 to 1. Derefo'e dere is about 7000/5 = 1400 NetBSD users. BSD/OS posts on Usenet is about half uh de volume uh NetBSD posts. Derefo'e dere are about 700 users uh BSD/OS. A recent article put FreeBSD at about 80 puh'cent uh de *BSD market. Man! Derefo'e dere is (7000+1400+700)*4 = 36400 FreeBSD users. Dis be consistent wid de numba' of FreeBSD Usenet posts.

Due t'de troubles uh Walnut Creek, abysmal sales and so's on, FreeBSD went out uh business and wuz snatchn upside by BSDI who sell anoda' troubled OS. Now BSDI be also wasted, its co'pse turned ova' to yet anoda' charnel crib.

All majo' surveys show dat *BSD gots steadily declined in market share. *BSD be very sick and its long term survival prospects is very dim. WORD! If *BSD be to survive at all it gots'ta be among OS dilettante dabblers. *BSD continues t'decay. Slap mah fro! Nodin' sho't uh a miracle could save it at dis point in time. Fo' all practical purposes, *BSD be wasted.

Fact, dig dis: *BSD be dyin'

HAMMER REVOLUTION --; (1)

clubhouse (840238) | more than 9 years ago | (#11446736)

hammerrevolution.com [hammerrevolution.com] --;

Re:HAMMER REVOLUTION --; (1)

courseB (837633) | more than 9 years ago | (#11446848)

person 1: like person 2: like person 1: --; person 2: yea

Re:HAMMER REVOLUTION --; (1)

c0dedude (587568) | more than 9 years ago | (#11446948)

Oh christ, not you whackjobs again. You've infested our forum [dailyjolt.com] . Sure, --; is a neat emoticon, but when could one ever use it? On a seperate note, anyone cringe when reading "He far from succeeded."? On a completely seperate note, anyone notice how programmers write with slightly different grammar? Extra punctuation always goes outside the ", never inside, as above.

Re:HAMMER REVOLUTION --; (1)

Hognoxious (631665) | more than 9 years ago | (#11447111)

On a completely seperate note, anyone notice how programmers write with slightly different grammar? Extra punctuation always goes outside the ", never inside, as above.
Are you referring to "logical quoting"? [retrologic.com] .

Re:HAMMER REVOLUTION --; (1)

Daengbo (523424) | more than 9 years ago | (#11447245)

This is a major difference between American English and British English. American English tends to suggest usage always inside quotes, but Brits will put the quote inside if it's related, and outside if it's not. The subject is always hotly debated, though.
I never look to "The Programmers' English Corpus" for style points...

Be carefull thought... (3, Interesting)

Anonymous Coward | more than 9 years ago | (#11446742)

There are more non native speakers on the web then
native speakers.
In the European community the native English
speaking persons are by far a minority. That way
French expressions are poring into the language
in an unstoppable way. Those expressions are then
used by native speaking politicians and are
broadcasted by television. That way they enter the
mainstream of the English language.

Regards

Re:Be carefull thought... (1, Insightful)

Anonymous Coward | more than 9 years ago | (#11446770)

There are more non native speakers on the web then
native speakers.

Of course, non-native speakers have generally less trouble distinguishing "then" from "than" than the so-called "native" speakers do. You might speak it natively, but remember, you don't write it natively.

Re:Be carefull thought... (1, Funny)

Anonymous Coward | more than 9 years ago | (#11446885)

Ahhh run for the hills the French are coming!!!

Re:Be carefull thought... (1, Funny)

Anonymous Coward | more than 9 years ago | (#11446943)

Nobody runs away from the French. Not even the Eyeties.

Re:Be carefull thought... (1)

Spy Hunter (317220) | more than 9 years ago | (#11447012)

Who needs to be careful? Hopefully the Internet *will* cause languages to merge. It could be like the Tower of Babel in reverse. Wouldn't it be great if there was a unified global language?

Now I know some people would be quite upset at the horrible "loss" of cultural diversity implied by a single global language. But we can be just as diverse in many other ways that don't cause us to be unable to communicate with each other on a basic level. And IMHO, being able to communicate is much more important than some academic's ideal of "cultural identity".

Re:Be carefull thought... (1)

Haeleth (414428) | more than 9 years ago | (#11447070)

Wouldn't it be great if there was a unified global language?
Now I know some people would be quite upset at the horrible "loss" of cultural diversity implied by a single global language. But we can be just as diverse in many other ways that don't cause us to be unable to communicate with each other on a basic level. And IMHO, being able to communicate is much more important than some academic's ideal of "cultural identity".


Okay... how about the complete loss of the ability to read any of the world's literature without special training? It's bad enough at the moment, when most people can read only the literature in their native language. If the current languages were no longer spoken natively by anyone, the vast majority of people would no longer know any great literature except through the lossy process of translation. We're not talking about losing cultural diversity. We're talking about losing culture itself!

Not to mention that there is nothing academic about the link between cultural identity and language. Bloody wars have been fought over it. The recognition of a minority's language is often one of their deepest desires, and the suppression of a minority language is a common tool of oppression - see Welsh and Gaelic in English-occupied Wales and Ireland, Catalan and Basque in Spain, Chinese and Korean under the Japanese occupations, Kurdish in Turkey... the list goes on. If linguistic diversity is something that only academics care about, why do ordinary people all over the world get so upset about it?

Finally, what's so great about a world language, anyway? I don't suffer at all in my daily life from the inability to chat with Chinese or Spaniards; I did feel the need to be able to communicate with the French and the Japanese, so I did them the simple courtesy of learning their languages. Those who need to communicate in more languages than they can learn are generally politicians or businessmen who can afford interpreters.

Re:Be carefull thought... (2, Insightful)

Spy Hunter (317220) | more than 9 years ago | (#11447208)

You're overdramatizing. This is a process that will take hundreds if not thousands of years, even with technology helping to accelerate it. It's not like we'll wake up 10 years from now with a unified language and forget how to read today's literature!

By the time we have a unified language, we'll have a whole new set of literature to go along with it. Today's literature will be like ancient greek literature, and yes, it will only be readable by people with special training. It will need to be translated, just like ancient greek is today. What's the big deal? The biggest difference is that only one translation would be needed, and therefore all the translation work could be focused on that.

Furthermore, nobody will be forced to adopt a unified language. It will simply evolve. Words will travel from one language to another. Phrases will creep in from other languages. Languages will become closer, and eventually merge. You can see it happening today; at least the beginnings. It will only continue even faster, as the Internet is here to stay and the growth of the global marketplace shows no signs of slowing.

Academics care about linguistic diversity in an abstract sense, but normal people really don't. People care about it, but in a much more practical sense of everyday communication. People will accept gradual, evolutionary changes to their language, as long as they can express themselves in a way they like. Academics often fight against change, because their theories were all developed to explain the old ways of doing things. They will fight against language unification; luckily I believe they will not be able to prevent it, or even slow it very much. [Note: this is a gross generalization about "academics", please remember that all generalizations are false.]

You ask what's so great about a global language? The removal of all language barriers from everything! Duh!

Maybe you don't personally notice any language barriers right now, but that doesn't mean you couldn't benefit from their removal. Maybe there are some really cool people in China right now doing brilliant work in your field that you just don't know about because it's all in Chinese. Maybe you would benefit from the increased efficiency of a global economy without language barriers. I think it's an indisputable fact that removing language barriers is a great thing.

Re:Be carefull thought... (1)

violajack (749427) | more than 9 years ago | (#11447233)

It's a nice idea and all, but I just don't think it would really happen. We don't learn our language from the internet, it just influences slang amoung the young-uns. Language is learned in infancy by listening to adults, so the only real way to get a global language is to change the way all of the adults talk to their babies and then wait for the babies to grow up.

Even if we could get that to happen, it wouldn't be long before dialects cropped up and veered away from one another. I'll never forget the time I was in Italy (as a scared little American trying to live there for a month as part of a festival) and I needed some help figuring out the bus system in Piza. Imagine my non-Italian-speaking relief when I heard a goup chatting away in English. I was going to ask them for help, when suddenly, it all turned into unitelligible babbel. I did an aural double-take. I listened carefully to see if my ears had been playing tricks on me, and I started picking farmiliar words out of a VERY British accent. It took a lot of thought to understand what they were saying, and we were supposedly speaking the same language. All it takes is an ocean and a few generations, and all that same-language-speaking-goodness goes out the window.

Done: nous sommes desolés que notre president (3, Insightful)

new500 (128819) | more than 9 years ago | (#11447059)

. . .

Those expressions are then
used by native speaking politicians and are
broadcasted by television.


Dude, it's worse, the French have already infiltrated as far as the advertising business and are using covert channels to spread some dangerous crack i heard was called La Liberte :

http://french.about.com/b/a/081281.htm

Slightly more seriously :

Apart from pointing out that your use of the word native is rather presumptive of geographic origin in this big wide internet thing, i wonder if this linguistic adoption is more one way towards English since the internet. OK the French got Le Weekend, and tons of anglicised nouns, tried to ban them all and didn't manage. But i read Friday that a British pilot training firm lost a contract to a French one. The reason cited by the Asian airline was that, whilst the training had to be in English, the French trainers spoke better, clearer, more intelligble English than did the English. I can't argue with that. Sadly.

I've used the web for corpus linguistics research (2, Informative)

Anonymous Coward | more than 9 years ago | (#11446744)

I've used the web for corpus linguistics research. My last big project was to look at a lot of web pages with Mexican and Chilean slang Spanish, and see if there was a difference in vocabulary usage. There was a significant difference; I could, 70% of the time, tell if a given passage was Chilean or Mexican Spanish.

I could have gotten a higher accuracy rate, but this was just a simple undergraduate project.

Same here (1)

Estanislao Martnez (203477) | more than 9 years ago | (#11446787)

Though I've done it at a higher level of the educational system (while doing a Ph.D. in Linguistics). The big, big advantage of using search engines is the sheer size and variety of the content available on the web. For a number of things, there is simply no other way to get enough examples, because the phenomenon you're interested in is just too rare. The downsides are repetitiveness (it's often the case that you get the same document a lot of times at many different URLs; for example, song lyrics), typos, unreliable language-dectection algorithms in search engines (search for weird stuff in Spanish in Google, and you'll often get back some Portuguese results), unreliable numbers, etc.

Re:Same here (0)

Anonymous Coward | more than 9 years ago | (#11446853)

The downsides are...

Quite. And it's a hell of a lot worse for English because of the wider adoption as a second language.

What about bad translations into English of corporate copy originally written in another language, Babelfish caches, common or garden typos, etc, etc?

Linguists usually, and quite rightly, worry about prescriptivism vs descriptivism - becoming the story rather than just reporting it - but in this case they're potentially exercising a disproportionate influence on the development of the language by drawing attention to phenomena derived from a skewed set of sources.

Timeo empiricos, et data ferentes...

Re:Same here (0)

Anonymous Coward | more than 9 years ago | (#11446977)

The thing about English is that, since it is such a widespread second language for international communication (Esperanto should have won, but because of the lazyness of people 100 years ago, it didn't), there is more standardization of English then there is of Spanish.

Even the cuss words are mostly the same across dialects of English, and it's the cuss words that change the most quickly when dialects change.

If studying linguistic variation, Spanish is far richer than English. English is mainly interesting when looking at errors that L2 learners of English make.

Re:Same here (1)

Hognoxious (631665) | more than 9 years ago | (#11447036)

Esperanto should have won,
It's a joke. Latin wi' t' grammar took out.
but because of the lazyness of people 100 years ago, it didn't
How I envy them, working a 12 hour day down the mine and getting rickets or diptheria. Aye, they had it easy in them days. Luxury! [monologues.co.uk] .

print "$badgram{vocab}" (2, Funny)

pinball667 (848816) | more than 9 years ago | (#11446749)

Without RTFA my fist instint is to say why post anything related to natural language on slashdot? But the truth is, as a sysadming/webmaster/anything that plugs into an outlet for a small credit union I am appalled at the way people want to write on the web. It's hard to describe, but see (for the moment) this [usalliancecu.org] for a crippled example (yeah, a work site published externally, FSCK'ing horrible - more where that came from). Anyhow, it seems the second people publish shit one the web they give up on grammer/puncuation etc - in the included link originally draft had every link capitolized. No bold, color or anything - fuck it, aparently it's OK to throw proper grammer to the wind if it's on the web, even if the purpose is to manage peoples retirement. ARGH.

side note - my bad grammer/spelling is OK only because I'm a FUCKING CODER. I don't want to hear from the grammer/spelling Nazis on the text of this post.

anyhow - slight possibility of feedback on a complelty offsubject page I'm working on, here [usalliancecu.org] . Break it, fuck with it whatever. Jon.

Compression Prize (1)

Baldrson (78598) | more than 9 years ago | (#11446757)

There needs to be an anual prize for the highest compression ratio using random pages from the web as the corpus. This would probably do more for real advancement of artificial intelligence than the Turing competitions.

citing web articles (-1)

Anonymous Coward | more than 9 years ago | (#11446759)

It seems like pdf/ps/laTex ....doc files should be the only documents that should be cited. with html, how would you give the page number in a citing? Watever the printer says the page was. www.wired.com/blahblahyadayada (go there and read TFA). I cited stuff from the web when I was in college, but it was always half-assed(i grajeated thouw). mla.org has examples of web citings but they still suck.

Very Hot Booty (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11446763)

Blondes and brunettes are here [groovybooty.com]

Non-official English (2, Informative)

Anonymous Coward | more than 9 years ago | (#11446776)

Unlike French and Italian, there is no official instution that defines 'correct' English. Essentially, the English-speaking world just 'makes it up' as it goes. Thus when I see the adverb 'really' butchered into 'real' I must try not to get annoyed. i.e. It's real hard to use your mother tongue. vs. It's really hard to use your mother tongue. Please help me here - is the misuse/non-use of 'really' something that's taught in school?

Re:Non-official English (2, Insightful)

Kafir (215091) | more than 9 years ago | (#11446920)

From Merriam-Webster Online [m-w.com] :
real (3, adverb): VERY (he was real cool -- H. M. McLuhan)
usage Most handbooks consider the adverb real to be informal and more suitable to speech than writing. Our evidence shows these observations to be true in the main, but real is becoming more common in writing of an informal, conversational style. It is used as an intensifier only and is not interchangeable with really except in that use.

I'd say you're fighting a losing battle on this one. I'm not too bothered by it, either; the English language has other words that function both as adjectives and as adverbs, despite the existence of a distinct adverb form - near dead and nearly dead are both standard, for instance.

Re:Non-official English (0)

Anonymous Coward | more than 9 years ago | (#11447084)

the English language has other words that function both as adjectives and as adverbs, despite the existence of a distinct adverb form - near dead and nearly dead are both standard, for instance.

That's a special case: "near" was originally the comparitive form of "nigh", i.e. "nigh-er", with "nearly" being a back-formation; you'd expect a word like that to have a few idiosyncracies.

Re:Non-official English (0)

Anonymous Coward | more than 9 years ago | (#11447264)

real-really-reallest, didn't you learn that in school?
/funny

Language Lives! (1)

theguywhosaid (751709) | more than 9 years ago | (#11446777)

Good! Natural language is a moving target. The web is an excellent communication medium and ignoring it would be quite a
silly move. The example reminds me of "To boldly go", which was not proper, but its elegance is hard to argue against.

Re:Language Lives! (1)

wildBoar (181352) | more than 9 years ago | (#11446859)

In fact there are some arguments about the To Boldly go etc.

Apparently written English Grammar varies so much from how it is often spoken as the rules were written down by a Scholar in latin who firmly believed that English should conform to the same rules - even though it doesn't

A careful poke at this 17th century book ( thereabouts - which sets the standard for modern grammar ) means that even Shakespeare wrote bad grammar, and he isn't the only one.

So in fact correct grammar isn't so correct at all and should be taken with a pinch of salt.

Re:Language Lives! (1)

Hognoxious (631665) | more than 9 years ago | (#11447015)

the rules were written down by a Scholar in latin who firmly believed that English should conform to the same rules
My understanding is tht the banning of split infinitives was never a hard and fast rule, even among good writers; Orwell certainly dissented. Infinitives in Latin (and French, German, Italian and Spanish[1]) can't be split anyway, as they are one word.
So in fact correct grammar isn't so correct at all and should be taken with a pinch of salt.
There's a middle ground. How many moles of NaCl do you need for s to be always preceded by an apostrophe?

[1] Well, German has separable verbs and Italian has reflexive pronouns that detach[2] and go all over the place, but nothing like the bizarre "to" that English has. I've always been intrigued about its origin.

[2] Ignore that - they're attached in the infinitive: anrufen, lavarsi.

roflcopter (1)

Kentsusai (837912) | more than 9 years ago | (#11446803)

roflcopter....

'Language' == spoken || written? (2, Insightful)

adam31 (817930) | more than 9 years ago | (#11446804)

How do you even pronounce 'pwn3d' ? Google is not a tool to study speech patterns, and there's nothing to say that speech even resembles written text.

The article addresses this in a weird way, where it first draws attention to the distinction, but once it reaches its crux, where google is used as a tool, the distinction is ignored entirely; instead it opts to focus on stranger things.

Re:'Language' == spoken || written? (1)

Twisted64 (837490) | more than 9 years ago | (#11446866)

I'm going to go with "pawned," a la trading something in for money. "owned" seems better, but it doesn't get across the spelling of the word as well as "pawned" :)

Re:'Language' == spoken || written? (1)

jez9999 (618189) | more than 9 years ago | (#11446964)

'Pooned' is another variant.

Re:'Language' == spoken || written? (0)

Anonymous Coward | more than 9 years ago | (#11447148)

I've heard people try to pronounce it as powned... That is, 'owned' with a 'p' in front of it. Sounds odd :P

Re:'Language' == spoken || written? (1)

woah (781250) | more than 9 years ago | (#11447184)

No, it's more like "pnud".

OMG! (1)

Frogbert (589961) | more than 9 years ago | (#11446805)

I woulda thght such a thng was unpossible.

Re:OMG! (0)

Anonymous Coward | more than 9 years ago | (#11446836)

You misunderestimate the power of teh intar-web.

internet messaging data (0)

Anonymous Coward | more than 9 years ago | (#11446840)

Scouring the net for written material, prose or otherwise, and studying, analyzing, tabulating it is a cool and grand idea. Lots to be learned I'm sure.

However ... What about researching and analyzing vernacular data that is not publicly available on google, news sites, public message boards, usenet, etc? What similarities and differences can be found in what is considered to be personal or private communication?

I'm almost sure someone has thought of this before but the obvious problem is: how is one able collect ample data categorized as private or personal communications? Afterall, it isn't possible to just google or grep ICQ or AIM logs from thousands of people...or is it?

Popular usage != wanted usage (2, Informative)

KiloByte (825081) | more than 9 years ago | (#11446892)

Yes, we can record the errors made by the uneducated public (and even those done by, uhm, me). The question is: should we do that or not?

I was pretty taken aback when a council of linguist in Poland suddenly declared some widely-chastised and not even very popular errors to be valid usage. I've been brought up in the circles of people who not only put a lot of stress to the language you use, but also cruelly point out every incorrect word or phrase you use -- and this made me quite intolerant to bad speech.

Being but a dirty foreigner, I know that my English can sound bad in the ears of native English speakers -- that's why I sometimes ask people to correct me if they spot errors.

In other words: some people find careless speech repulsive. Thus, we should do whatever we can to promote correct usage as opposed to legalising incorrect uses.

Re:Popular usage != wanted usage (0)

Anonymous Coward | more than 9 years ago | (#11446956)

some people find careless speech repulsive. Thus, we should do whatever we can to promote correct usage as opposed to legalising incorrect uses.

I don't think that the fact some people find it repulsive matters much. That makes it sound like people who can't express themselves well are just annoying some hard-liners.

The fact is, when you can't use your native language properly, people (including many people you wouldn't consider "hard-liners") judge you for it. Consciously or subconsciously, you appear less educated and less intelligent to them.

If we change the rules, all this means is that some people are going to think that it's okay to be lax with language. It's not going to stop people thinking "what a moron" when somebody emails them with something like "ru up4 the meeting tomorrow?!?!?"

using google as a spell checker (1)

tinkerton (199273) | more than 9 years ago | (#11446893)

when you doubt between two spellings of a word, check the search results count in google. I've used that trick.

Then again, my idea of fun is to use google count for finding the words that get misspelt(google ratio with misspelled 5%) the most often.

I thought compatable was common, but i only get a 1% ratio there. Maybe there should be a category 'non native'.

Is conneXion considered an error? I like it much better than connection.

Just now i find out that there are lists , eg at most commonly misspelled words [world-english.org] .

Re:using google as a spell checker (0)

Anonymous Coward | more than 9 years ago | (#11446965)

when you doubt between two spellings of a word, check the search results count in google. I've used that trick.

Why? If it's an incorrect spelling, it tells you and suggests the correct spelling. If it's a correct spelling, it's usually got a link to the dictionary definition in the top right.

Re:using google as a spell checker (1)

Kafir (215091) | more than 9 years ago | (#11446975)

Is conneXion considered an error? I like it much better than connection.

It's correct, but British. Just like colour/color, or theatre/theater. Or foetus/fetus, though that doesn't seem to come up so often.

connexion
Pronunciation: k&-'nek-sh&n
chiefly British variant of CONNECTION

Did it never occur to you to check an actual online dictionary [dictionary.com] ? I use google to see if my usage of a word or phrase is acceptable (or at least common), but a dictionary is probably a better bet for spelling.

Re:using google as a spell checker (1)

tinkerton (199273) | more than 9 years ago | (#11447064)

Did it never occur to you to check an actual online dictionary?

To be perfectly honest, yes. But I don't want people to think I'm a sissy.

Three types of language (3, Interesting)

Dracos (107777) | more than 9 years ago | (#11446894)

I think that for most of the 20th century, English, and most languages in the industrialized world, was largely static, dominated by the written word which was dominated by proper grammar. Since WWII, popular culture and faster communications have increasingly exposed us to local vernaculars, mostly through radio and television. The written word lagged behind in its cultural evolution.

Thanks to the internet (initially email, BBS's and IRC, but more widely known on the Web), we now have a hybrid of the spoken and written word: the "typed word". This form of language evolves at the same rate as the spoken word, and injects its own vernacular as a side effect of the medium: acromyn and abbreviation "words" (rofl, how r u), along with common misspellings (pwned), and mixing letters with numbers or punctuation (133t, n00b). All of these serve at least one purpose, whether as a form of super shorthand, insult, the appearance of being "cool", or are merely the result of laziness on the part of the author. Most typed-word terms don't transfer well when spoken.

One of my hobbies is studying (European) languages and how they are related. Sometimes I worry about the damage the typed word is causing to the spoken and written word (and any proper linguist should at least be interested in the phenomenon). Luckily, most typed word expressions aren't pronounceable, and the ones that are sound absurd, because they are removed from their original context when spoken, and everyone recognizes gibberish when they hear it. How the typed word affects the written word remains to be seen. Yes both are typed now, but only the written word has a chance of going through an editorial process. I think it will take a very long time for the formal lexicon and rules of grammar to embrace, however reluctantly if ever, the typed vernacular.

Re:Three types of language (1)

grahamlee (522375) | more than 9 years ago | (#11447031)

I think that for most of the 20th century, English, and most languages in the industrialized world, was largely static, dominated by the written word which was dominated by proper grammar. Since WWII, popular culture and faster communications have increasingly exposed us to local vernaculars, mostly through radio and television. The written word lagged behind in its cultural evolution.

You do realise that most of the 20th century happened after the second world war, don't you? A condition that became false after the events of 1945 cannot be considered true for most of the period 1901-2000.

Google as a grammar checker (2, Interesting)

Hal XP (807364) | more than 9 years ago | (#11446919)

I've had the chance to use Google as a grammar or style checker in my day job as a glorified copy editor. I type two nearly identical expressions X and Y in the search box. If expression X gets 10,100 hits and expression Y only 500 hits, I use expression X.

For example, as a non-native speaker, I found myself waffling between the expression (A) "run for mayor of" and the expression (B) "run as mayor of." Letting Google arbitrate, I found 14,900 hits for (A) and only 200 hundred hits for (B). I chose (A).

I discovered there's practically a dead heat between the expressions "a new lease on life" (which, if I'm not mistaken, is the expression favored by American usage) and "a new lease of life," with the latter nosing out the former 144,000 hits to 140,000. In this instance I let my own usage arbitrate. Since I'm more exposed to American than to English, I chose on.

Tongue Gymnastics (1)

Indy Media Watch (823624) | more than 9 years ago | (#11446926)

Linguists are gradually adopting the World Wide Web as a useful corpus for linguistic research.

I love a bit of cunning linguistics.

Reminds me of "Meme Tree"... (3, Informative)

Slur (61510) | more than 9 years ago | (#11446935)

...which was this little program I wrote around the nascence of the internet. it took any sentence as input and kept a record of which words preceded each word, and which words followed each unique word. The idea was to build up a simple map of which words could precede or follow others completely without context. From this you could follow paths that made sentences or paths that looped forever, or paths that made no sense, and some interesting paths that made unintended sense.

Why a tree? Language and geneology seem to have a common thread. Meaning is like genetics. Language is expressive. Information is a kind of tree whose branches grow as reality elaborates and past events accumulate. New terms need to be invented for the dynamics we perceive in reality, just as new names are given to individuals as they emerge into the world. Patterns, continuity, periodicity. Such things lie at the heart of material existence and provide the hooks for consciousness itself. Information theory is the next great frontier, along with particle physics. Already they have converged and diverged and converged again. And playing with artificial trees turns out to be a lot of fun.

As for the "Meme Tree" program ... The next iteration built up a more discreet map by scoring proximity of unique words in sentences and inclusion in sentences together. Again, the idea was to build a simple statistical map free of any context, simply to get a sense of pure lexical association.

The theory is that the internal consistency of these various lexical maps should roughly reflect many aspects of associative meaning. You could think of the statistical map as a Godelian bubble whose "truth" - if you will - is imposed by the laws governing the statistical associations. We don't derive the laws of language and meaning from these exercises, but we create an internally-complete map that reflects something about the nature of meaning.

There is a practical aim as well. If you can derive the strength of equivalence and the various levels and colors of associative meaning you could in theory build a "Truth Machine" capable of answering any question with a high degree of accuracy. The result of any question could be computed as any other information retrieval problem would be.

I never got around to having my little Meme Tree programs scrape the internet for random sentences. However, this should be a very simple thing to do. Google has had programming contests in the past - programs that use the Google database in interesting ways. Statistical analysis of language is basically what they do. Research projects on their data could provide stunning insights into the nature of information itself, its relation to language and to reality, and likely into our very nature as linguistic beings.

BBC voices (2, Informative)

matt me (850665) | more than 9 years ago | (#11446937)

Link on front page of bbc.co.uk - bbc.co.uk/voices/ [bbc.co.uk] - their attempt at tracking accents and dialects across the UK.

Another use of Google in Linguistics (1, Informative)

Anonymous Coward | more than 9 years ago | (#11446952)

Just a month ago I finished a paper exploring using Google counts in great detail for language analysis and other forms of meaning extraction.
"Automatic Meaning Discovery Using Google":http://arxiv.org/abs/cs.CL/0412098/ [arxiv.org]

Comments welcome, -Rudi.

lol (0)

Anonymous Coward | more than 9 years ago | (#11447069)

lol @ anonymous cowards

lol lol lol

how close are we to self forming dictionaries? (1)

Vnimam (852525) | more than 9 years ago | (#11447072)

Using Google Groups, it is pretty close to using a thesaurus. Personally, it is one of the most fruitful advances I've ever seen from the net. Being of AI-mind...

My question to all -- so how far are we, I ask to you master linguists + computer scientists, before we will have self forming dictionaries based strictly on cached google data?

two years?

-o- Geoff Peters [geoffpeters.com]

Writing in Japanese (3, Insightful)

minairia (608427) | more than 9 years ago | (#11447234)

I am American but have to write in Japanese for work. No matter how much one learns in school, when one writes in a foreign language, you'll hit a point of wondering if what you wrote is how native speakers say something or is even understandable. Whenever I hit a point like that, I put the sentence in question (or key fragments thereof) into a Google search. If nothing comes up, I know I have to rewrite. If only a few links come up, I know what I wrote might be a little wierd, but is at least understandable. If I get pages and pages of links, I'm golden.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?