Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Falsehoods Programmers Believe About Names

timothy posted more than 4 years ago | from the can't-we-stick-to-slashdot-user-ids? dept.

Databases 773

Jamie points out this interesting article about how hard it is for programmers to get names right. Since software ultimately is used by and for humans, and we humans are pretty tightly linked to our names (whatever the language, spelling, or orthography), this is a big deal. This piece notes some of the ways that names get mishandled, and suggests rules of thumb (in the form of anti-suggestions) to encourage programmers to handle names more gracefully.

cancel ×

773 comments

Sorry! There are no comments related to the filter you selected.

As the author of RFC 2100... (4, Interesting)

jra (5600) | more than 4 years ago | (#32609054)

I found the piece very interesting.

Though my inability to post this comment appears to have outlived the slashdotting of the site.

Re:As the author of RFC 2100... (3, Funny)

OneAhead (1495535) | more than 4 years ago | (#32609160)

That doesn't make sense. I can read your comment, therefore your inability to post it has gone away. The site is still slashdotted. Ergo, the slashdotting of the site has outlived your inability to post.

Oh wait... RFC2100...

Re:As the author of RFC 2100... (-1, Troll)

Anonymous Coward | more than 4 years ago | (#32609202)

brazzers.com username: Brazzers4chan
password: cumbuckets

Yes, it's real. No, I don't know why it isn't deactivated already.

Re:As the author of RFC 2100... (2, Interesting)

TheLink (130905) | more than 4 years ago | (#32609180)

I dunno, the guy just lists out reasons why you can't uniquely identify people by names. e.g. "some people don't have names".

Well that's why Governments start handing out people national ID numbers[1]. Then even if you aren't who you claim you are, at least the poor data entry person has something to key in and can actually type it in on his/her keyboard ;).

[1] As for foreigners wihtout a passport number or national ID, please wait here for those friendly guys in uniforms...

I don't know what the complaint is about? (1, Interesting)

jackb_guppy (204733) | more than 4 years ago | (#32609258)

Most developers, do not get that the world is made up multiple standards and refuse to consider local vs database relationships like:

Boolean: Yes/Si/... No/No/..

Amounts: ,. ., 0,2,3 places from right for display, 0,2,3 places from right for value (USD: ,.22, JPY: .,20)
Dates:: ./ 0 suppressed ISO, JPN, USA

How do you thing the do working with phone@, addresses and names?

Database engines fail with these simple-complex constructs because sorting and matching tests are still left hand and character set driven. A database MUST treat all of these names the same: McClean, MacClean, MCLean, Mc Clean, Mac Clean. McCleen, ...

Re:I don't know what the complaint is about? (4, Interesting)

scdeimos (632778) | more than 4 years ago | (#32609392)

A database MUST treat all of these names the same: McClean, MacClean, MCLean, Mc Clean, Mac Clean. McCleen, ...

Are you sure? What if "Mac Clean" is actually somebody's first and last names?

I know plenty of people whose legal name is a single word, such as "Alex", "Max" or "Virgil." Would your system put that in the first_name, middle_name or surname column? Storing names and using them sensibly is hard, as TFA acknowledges.

You'd think that e-mail addresses by comparison would be simpler, but I have a hard time trying to register my e-mail address with sites that won't allow even simple things like "+", "-" or "." characters in the local part.

Re:I don't know what the complaint is about? (4, Interesting)

Dragonslicer (991472) | more than 4 years ago | (#32609410)

A database MUST treat all of these names the same: McClean, MacClean, MCLean, Mc Clean, Mac Clean. McCleen, ...

I assume you left out a "not" in that sentence? I think there are quite a few people that will kindly (or maybe not-so-kindly) explain why "Mc" and "Mac" are not the same.

Re:As the author of RFC 2100... (2, Funny)

patio11 (857072) | more than 4 years ago | (#32609260)

After Reddit got done with the site yesterday, I decided "Sure, why not upgrade to Wordpress 3.0. I'll just turn off caching for a little while and..."

Re:As the author of RFC 2100... (5, Funny)

pushf popf (741049) | more than 4 years ago | (#32609290)

I found the article to be contrived and pointless.

Yes, there are people and entities that do not fit into a normal name slot in a database, and no, I don't care at all because it hasn't been a problem for anything I've written in the last thrity years. When someone pops up and says "My name is this thing I drew on the sidewalk using chipmunk poop, and it doesn't fit in your database", I'll say "Yes, you're right it doesn't, then go have a beer.

You can't handle every edge case in the universe because you'll never actually release anything.

Sounds like people need to fix thier names (3, Funny)

h4rr4r (612664) | more than 4 years ago | (#32609056)

Who the hell has numbers in there name?

Re:Sounds like people need to fix thier names (5, Funny)

Anonymous Coward | more than 4 years ago | (#32609086)

3Jane Tessier-Ashpool, for one.

Re:Sounds like people need to fix thier names (2, Informative)

ChipMonk (711367) | more than 4 years ago | (#32609112)

Chad 8 5, for another.

Re:Sounds like people need to fix thier names (2, Informative)

DarrenBaker (322210) | more than 4 years ago | (#32609224)

OCHOCINCO!!!!

Re:Sounds like people need to fix thier names (1)

Trepidity (597) | more than 4 years ago | (#32609120)

Fortunately for programmers, she doesn't exist.

Re:Sounds like people need to fix thier names (1)

chooks (71012) | more than 4 years ago | (#32609264)

Simple.

Logan 5 if you're a guy.
Jessica 6 if you're not.

Re:Sounds like people need to fix thier names (2, Interesting)

Speare (84249) | more than 4 years ago | (#32609452)

Love the literary reference. In a much earlier sci-fi story, This Perfect Day [wikipedia.org] , every citizen has a nameber, an identifier that is part name, part number. There are only four male names, four female names, and these are combined with a multi-digit code to make the ID unique. Ever since online forums started suggesting logins like "MaryBeth131" I can't help but think of namebers.

Re:Sounds like people need to fix thier names (3, Informative)

dogdick (1290032) | more than 4 years ago | (#32609098)

Andre3000

Re:Sounds like people need to fix thier names (0)

Anonymous Coward | more than 4 years ago | (#32609108)

While we're at it, can we get a spell checker that handles homonyms?

Re:Sounds like people need to fix thier names (5, Funny)

Khakionion (544166) | more than 4 years ago | (#32609118)

homonyms?

Hey, learn a little tolerance, bud.

Re:Sounds like people need to fix thier names (4, Informative)

0100010001010011 (652467) | more than 4 years ago | (#32609134)

Mr. Ochocinco [wikipedia.org]

For those that aren't privy to American Football. Apparently some guy with the number 85, renamed himself 85.

Re:Sounds like people need to fix thier names (1, Informative)

Anonymous Coward | more than 4 years ago | (#32609228)

No, he didn't, he renamed himself Chad Ochocinco, which any standard name field would handle just fine. Incidentally, despite legally changing his name, he claims to still primarily use Chad Johnson.

Re:Sounds like people need to fix thier names (3, Informative)

Miseph (979059) | more than 4 years ago | (#32609380)

He legally changed his name because fans refer to him as "Ochocinco" and he wanted to put it on his jersey, but because the NFL hates both fans and lulz, they only allow a person's legal surname to appear there. Rather than lay down and take it, he gave them a massive middle finger by changing his name.

The NFL actually has a surprising number of players that behave like btards, it's rather amusing.

Re:Sounds like people need to fix thier names (2)

kenj0418 (230916) | more than 4 years ago | (#32609536)

The NFL actually has a surprising number of players that behave like btards, it's rather amusing.

I'd be a bit more concerned with the Michael Vicks and Leonard Littles of the NFL than some guy who changes his name. (dogfighting and and drunk-driving-with-fatal-accident for those not in the US or otherwise not aware)

Re:Sounds like people need to fix thier names (0)

Anonymous Coward | more than 4 years ago | (#32609154)

Some people I know - Native Americans, East Indians

Re:Sounds like people need to fix thier names (5, Informative)

spitzig (73300) | more than 4 years ago | (#32609156)

Chinese, written in pinyin, has numbers. Pinyin is how Chinese is typed. The numbers represent tones and every word in Chinese has a tone.

Re:Sounds like people need to fix thier names (4, Informative)

Fnordulicious (85996) | more than 4 years ago | (#32609378)

You are a little confused. Please reread the Wikipedia article on Hanyu Pinyin. It normally uses diacritics - namely macron, acute, hacek ("caron"), and grave - to represent the Mandarin tones other than neutral tone. Numbers have been used by people who lack diacritics on their typewriter or input system, but using numbers is not standard in Hanyu Pinyin, instead it's a kludge.

That said, if your input form doesn't allow some guy to type in his name with tone number suffixes on a US Windows keyboard layout where he lacks access to diacritics, then you're not a very thoughtful programmer.

Also, people who make software with an input fields that accept Unicode but specify a particular font that has a tiny character repertoire suck.

Oh, and Slashdot sucks even more for only supporting ASCII and stripping everything else.

Re:Sounds like people need to fix thier names (1)

LearnToSpell (694184) | more than 4 years ago | (#32609158)

You do!

Re:Sounds like people need to fix thier names (2, Funny)

Kitkoan (1719118) | more than 4 years ago | (#32609164)

HAL 9000 [wikipedia.org]

Re:Sounds like people need to fix thier names (1, Funny)

Anonymous Coward | more than 4 years ago | (#32609232)

I met that guy once. He's a fucking douchebag. Wouldn't do anything I told him.

Re:Sounds like people need to fix thier names (0)

Anonymous Coward | more than 4 years ago | (#32609422)

Given the forum I seriously doubt the wikipedia link is needed. Either that or those damn kids need to get off my lawn.

Re:Sounds like people need to fix thier names (2, Interesting)

notthepainter (759494) | more than 4 years ago | (#32609218)

Bo3b Johnson

http://www.linkedin.com/pub/bo3b-johnson/13/846/a52 [linkedin.com]

The 3 is silent. And no, I don't know him but I know someone who does.

Re:Sounds like people need to fix thier names (0)

Anonymous Coward | more than 4 years ago | (#32609226)

Does that refer to BOOB

Re:Sounds like people need to fix thier names (4, Informative)

BluBrick (1924) | more than 4 years ago | (#32609368)

Bo3b? Presumably, the 3 is silent because he wants to point out how individual he is (ironically, by rehashing a joke made over 50 years ago.)

From Tom Lehrer's introduction to "We will all go together when we go":

I am reminded at this point of a fellow I used to know whose name was Henry, only to give you an idea of what an individualist he was he spelt it H-E-N-3-R-Y. The 3 was silent, you see.

Re:Sounds like people need to fix thier names (5, Funny)

PopeRatzo (965947) | more than 4 years ago | (#32609250)

Who the hell has numbers in there name?

Well, for starters, Thurston B. Howell, III. Malcolm X, and Jimmy Two Times.

Re:Sounds like people need to fix thier names (1)

jedidiah (1196) | more than 4 years ago | (#32609364)

The X in Malcolm X is not a number.

The 3rd in Thurston Howell is not part of the name. It's a generational suffix.

Re:Sounds like people need to fix thier names (3, Funny)

aiht (1017790) | more than 4 years ago | (#32609538)

What about Arthur "Two Sheds" Jackson?
Nah, I guess that doesn't count 'cause it's written as a word.

Re:Sounds like people need to fix thier names (1)

RedWizzard (192002) | more than 4 years ago | (#32609340)

According to this BBC article [bbc.co.uk] there is a New Zealander legally called "Number 16 Bus Shelter".

American journalist (0)

Anonymous Coward | more than 4 years ago | (#32609356)

Jennifer 8. Lee [wikipedia.org] .

Re:Sounds like people need to fix thier names (1)

xenn (148389) | more than 4 years ago | (#32609374)

Johnny 2 Hats

Re:Sounds like people need to fix thier names (1)

westcoast philly (991705) | more than 4 years ago | (#32609476)

half a million Norwegians, probably...

Re:Sounds like people need to fix thier names (1)

sonamchauhan (587356) | more than 4 years ago | (#32609506)

Variables

Re:Sounds like people need to fix thier names (1)

tux0r (604835) | more than 4 years ago | (#32609512)

Who the hell has numbers in there name?

Oh, the irony [slashdot.org] .

Re:Sounds like people need to fix thier names (3, Funny)

sonamchauhan (587356) | more than 4 years ago | (#32609518)

And King James III

Re:Sounds like people need to fix thier names (3, Informative)

fishexe (168879) | more than 4 years ago | (#32609522)

Who the hell has numbers in there name?

Former New York Times writer Jennifer 8 Lee [wikipedia.org] does.

Rip out the vowels (2, Funny)

jimmydevice (699057) | more than 4 years ago | (#32609078)

and let god sort them out...

Re:Rip out the vowels (3, Funny)

bkpark (1253468) | more than 4 years ago | (#32609172)

and let god sort them out...

If written Hebrew is any indication, God doesn't bother with vowels either, apparently.

Re:Rip out the vowels (0)

Anonymous Coward | more than 4 years ago | (#32609302)

If written Hebrew is any indication, God doesn't bother with vowels either, apparently.

You dare befoul the name of YHWH with that ovoid abomination?!

I've been dealing with this for years. (4, Interesting)

Wonko the Sane (25252) | more than 4 years ago | (#32609136)

I am fortunate enough to be the child of a professional smart-ass who intentionally gave all his children two middle names so that we would not fit into the computer systems of the era.

When I grew up my parents used my first middle name as a "given nickname" (it's actually in quotation marks on my birth certificate). So most of the time when I give my name for something I use my "given nickname" as my first name. Unless I feel like using my legal first name as my first name in which case I use that. There are probably four or five different versions of my name attached to my SSN in various different databases.

I've also got a sufffix: III. I don't have two ancestors with the exact same name as me, but since the various parts come from two different relatives my parents settled on III.

Re:I've been dealing with this for years. (5, Funny)

Graff (532189) | more than 4 years ago | (#32609168)

I prefer the story of this mom [xkcd.com] .

Re:I've been dealing with this for years. (1, Informative)

Anonymous Coward | more than 4 years ago | (#32609440)

It's also fun when a parent has the same first name, yet a different middle name, but the problem being that the middle name has the same first letter. So all the damn computer databases that insist on reducing the middle name to an initial are a pain in the ass. And no, I'm not interested in all this senior citizen stuff I'm not qualified for. (Give another 30 years maybe.) I also wonder if the ol' fart is getting junk mail relating to video games and electronics that he likely has no interest in. The real problem comes up in billing and city stickers and things like that.

The only solution so far is that I put both my first and middle name in the "first name" field in cases where a space is allowed as a valid character. It's something I'll have to keep doing until enough people get a clue and changes their database conventions.

Slashdotted already? (5, Informative)

RenQuanta (3274) | more than 4 years ago | (#32609142)

After just 15 minutes of the story being posted?

Wow, that's gotta be a personal best for /. (or, the site is a wee bit underpowered... ;)

Here's the Google cache in the meanwhile: http://webcache.googleusercontent.com/search?q=cache:http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ [googleusercontent.com]

Re:Slashdotted already? (1)

CAIMLAS (41445) | more than 4 years ago | (#32609240)

Not so... back in the day, such a slashdotting was quite regular. Surely you remember that.

Re:Slashdotted already? (2, Funny)

RenQuanta (3274) | more than 4 years ago | (#32609288)

Not so... back in the day, such a slashdotting was quite regular. Surely you remember that

Yeah, I might, if my memory weren't failing with age. ;-)

Text only cache (2, Informative)

SuperKendall (25149) | more than 4 years ago | (#32609166)

Even the cache needs tweaking to load.

Text only version. [googleusercontent.com]

Re:Text only cache (1)

patio11 (857072) | more than 4 years ago | (#32609454)

While I'm busy replacing the pile of molten slag that is my VPS, you can find the whole thing here, served as all static (including assets):

http://www.bingocardcreator.com/kalzumeus-cache/names.html [bingocardcreator.com]

Article text (5, Informative)

Anonymous Coward | more than 4 years ago | (#32609174)

John Graham-Cumming wrote an article [jgc.org] today complaining about how a computer system he was working with described his last name as having invalid characters. It of course does not, because anything someone tells you is their name is--by definition--an appropriate identifier for them. John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.

I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them. (Most people call me Patrick McKenzie, but I'll acknowledge as correct any of six different "full" names, any many systems I deal with will accept precisely none of them.) Similarly, I've worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them. I have never seen a computer system which handles names properly and doubt one exists, anywhere.

So, as a public service, I'm going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.

  1. People have exactly one canonical full name.
  2. People have exactly one full name which they go by.
  3. People have, at this point in time, exactly one canonical full name.
  4. People have, at this point in time, one full name which they go by.
  5. People have exactly N names, for any value of N.
  6. People's names fit within a certain defined amount of space.
  7. People's names do not change.
  8. People's names change, but only at a certain enumerated set of events.
  9. People's names are written in ASCII.
  10. People's names are written in any single character set.
  11. People's names are all mapped in Unicode code points.
  12. People's names are case sensitive.
  13. People's names are case insensitive.
  14. People's names sometimes have prefixes or suffixes, but you can safely ignore those.
  15. People's names do not contain numbers.
  16. People's names are not written in ALL CAPS.
  17. People's names are not written in all lower case letters.
  18. People's names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
  19. People's first names and last names are, by necessity, different.
  20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
  21. People's names are globally unique.
  22. People's names are almost globally unique.
  23. Alright alright but surely people's names are diverse enough such that no million people share the same name.
  24. My system will never have to deal with names from China.
  25. Or Japan.
  26. Or Korea.
  27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have "weird" naming schemes in common use.
  28. That Klingon Empire thing was a joke, right?
  29. Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
  30. There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
  31. I can safely assume that this dictionary of bad words contains no people's names in it.
  32. People's names are assigned at birth.
  33. OK, maybe not at birth, but at least pretty close to birth.
  34. Alright, alright, within a year or so of birth.
  35. Five years?
  36. You're kidding me, right?
  37. Two different systems containing data about the same person will use the same name for that person.
  38. Two different data entry operators, given a person's name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
  39. People whose names break my system are weird outliers. They should have had solid, acceptable names, like . [note from AC: Chinese characters...Slashdot will probably delete them]
  40. People have names.

This list is by no means exhaustive. If you need examples of real names which disprove any of the above commonly held misconceptions, I will happily introduce you to several. Feel free to add other misconceptions in the comments, and refer people to this post the next time they suggest a genius idea like a database table with a first_name and last_name column.

Re:Article text (1)

Merls the Sneaky (1031058) | more than 4 years ago | (#32609418)

Names are random data of random length.

Re:Article text (0)

Anonymous Coward | more than 4 years ago | (#32609470)

A name by any other rose would give me a migraine.

Re:Article text (0)

Anonymous Coward | more than 4 years ago | (#32609558)

Of course, in the interest of actually getting your system off the ground, you need to not accommodate all of these... a fraction of your users, depending on how many you omit workarounds for, might need to spend a little bit of time adjusting to your system. No, sir, you cannot have an irrational number for a name, fuck off and find a pseudonym.

who needs vowels? (3, Interesting)

theNAM666 (179776) | more than 4 years ago | (#32609192)

Re (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#32609200)

Thanks for your post!
Fashion Games [dressup9x.com]

Dumbfuck summary (5, Insightful)

oldhack (1037484) | more than 4 years ago | (#32609220)

Names of what?!

Re:Dumbfuck summary (4, Informative)

bigstrat2003 (1058574) | more than 4 years ago | (#32609270)

Yeah, TFS is very ambiguous about that. Turns out that TFA is talking about names of people, and the pitfalls you can run into when allowing someone to enter their name into a system.

Re:Dumbfuck summary (2)

thePowerOfGrayskull (905905) | more than 4 years ago | (#32609278)

Indeed. Reading the summary, I thought it was some kind of article on how programmers can't remember names (I know I can't...)

But basically, it's some dude whining about how - because there is no single set of rules that can be universally applied to all names - no systems handle them correctly. That seems kind of self-evident to me; computers are rules-based creations. If you can't define the rules, it sure is hard to code for them. Blaming the programmers is stupid - as his own article shows. (eg. "[don't assume] Names are case-sensitive. [don't assume] Names are case-insensitive".

Not sure why this made it to slashdot -- it's just a rant.

Thanks (-1, Offtopic)

AlexMcP (1836074) | more than 4 years ago | (#32609274)

Thank for your post! Fashion Games [dressup9x.com]

Article makes wrong assumption about software. (5, Insightful)

Vellmont (569020) | more than 4 years ago | (#32609282)

Software is NOT designed to be perfect and cover every case. Have a numeral in your name? Too bad. Need some names to be case sensitive, and others case insensitive? Sucks to be you. Have a 200 character name that doesn't fit in the 100 characters the designers thought no crazy person would ever have? Tough.

I started reading through the list, and it's just ridiculous. There's a few good points, like names don't change, or names are unique. But they're so obvious that the vast majority of the times it's not a big problem. More often it's just a matter of training the data edit/entry folks how to change someones name, or how to not assume a name is a sole identifier.

But assuming the worst and trying to design a system that'll allow people's names to be Chinese characters when you don't do business in China, have presence in China, or ever ever plan to? That's ridiculous. Software doesn't have to be perfect out of the shoot. It should be adaptable though if some unforeseen shortcoming becomes a larger problem. Gee, I guess if you ever chose to do business in China and need Chinese character names you might have to re-write part of the damn software. Oh well, that's what software developers are FOR!

If you don't even HAVE a name, then I submit you're crazier than the artist formerly known as the artist formerly known as Prince. At least HE had a name, though it was an unpronounceable symbol. The world can't accommodate every possibility, and software is no exception.

Yeah, article is kind of asinine (5, Insightful)

Trepidity (597) | more than 4 years ago | (#32609312)

He's essentially arguing that, because names vary a lot and are complex, your software should never do anything useful with them. Sorry, but that's a stupid answer. In a lot of systems, being able to sort by surname may well be more important than being able to handle people who claim they have no surname.

Of course, you shouldn't gratuitously do stupid things, and interfaces should aim to be relatively clear. But most people can figure out how to enter their names into relatively standardized forms, and those that don't should probably figure out how.

Re:Yeah, article is kind of asinine (2, Informative)

snowgirl (978879) | more than 4 years ago | (#32609446)

I'm going to throw in my agreement here. Yes, there are people who put numerals in their names, or non-unicode point characters, or various other things, but there just isn't a reason to foist that on other people.

There is frustration about things like, "people have N number of names", and "names don't change" which are good and valid points... but some of the things are just like "dude... seriously..."

Re:Article makes wrong assumption about software. (0, Flamebait)

lennier (44736) | more than 4 years ago | (#32609346)

Software is NOT designed to be perfect and cover every case.

Then it's designed to fail, and probably also has security holes, and should not be deployed. At the very least it will be a block to compatibility, at worst it will become a botnet.

Software in the Internet age NEEDS to be 100% correct because it has a potential lifetime of infinity years and 6 billion and counting potential users. If you're building a system 'to throw away' as the XP people suggest, you're also building one to be rooted unless you DO throw it away before the. And if you ever put it or any references to it on the Web, you're going to leave permanent identifiers to your system scattered all over the planet even after you throw it away - so if you're designing it to be thrown away, you're designing to leave a trail of broken data and angry users in your wake.

Is that really what you want to do?

Re:Article makes wrong assumption about software. (1)

lennier (44736) | more than 4 years ago | (#32609376)

Edit: 'unless you do throw it away before the bad guys find it.'

Which is our current approach to patching, and is failing, hard. mmm, zero-days.

Re:Article makes wrong assumption about software. (2, Insightful)

Vellmont (569020) | more than 4 years ago | (#32609388)


Then it's designed to fail

Anything ever designed is designed to fail. This applies to bridges, the pyramids, and all software. This belief you have that software doesn't have to be maintained is as ridiculous as the idea that a bridge or any physical structure doesn't have to be maintained. Software lives and dies like anything else. Nothing lives forever.

Re:Article makes wrong assumption about software. (3, Insightful)

PrecambrianRabbit (1834412) | more than 4 years ago | (#32609444)

You're overreacting (I know, I know, "welcome to the Internet"). Software should behave in some sane, safe manner given any input. Sometimes, the sane thing to do is to throw an error, or say "Sorry, Dave, I can't do that."

In particular, systems don't necessarily have to shoehorn insane data into their processing. To use a relevant example, simply because Prince wants to upload a PNG in the "Name" field doesn't mean that the software has to let him. Rejecting this case does not doom said software system to "become a botnet" or "leave a trail of broken data."

Re:Article makes wrong assumption about software. (1)

yyxx (1812612) | more than 4 years ago | (#32609574)

Then it's designed to fail

You bet it is. All software can and will fail.

and probably also has security holes

It probably does, but not because it rejects non-standard names. In fact, having a well-defined model of what a name is and rejecting input that doesn't conform to it is necessary for security.

At the very least it will be a block to compatibility

No, a "block to compatibility" would be to accept arbitrary inputs as names. Compatibility requires some standardization, and that requires rejecting things that the software doesn't know about.

Software in the Internet age NEEDS to be 100% correct

Software can never be 100% correct; it's not even well-defined what that means. But accepting arbitrary inputs as names is probably not correct anyway.

Re:Article makes wrong assumption about software. (1)

rainmaestro (996549) | more than 4 years ago | (#32609582)

Software doesn't need to be 100% correct. It just needs to be able to handle the incorrect things gracefully. You don't need to support every possible name type (letters, pictographs, audio file containing a sequence of letters in morse code, blah blah blah), you just need it to reject non-standard formats without shitting all over the server.

As far as lifespan, yeah, just yesterday I was using a bunch of spiffy Win 3.1 apps. And I hear those dirt farmers in Ghana are really tech savvy (though they spend all their time preaching on Usenet about anarcho-syndicalist communes). All software has a finite lifespan, Internet be damned, and it is typically very short. How many people do you see running Netscape Navigator 4.X nowadays? How about Microsoft Bob? Hell, I've got door controller software at work written in the XP era that can't even run properly in a SP2 environment, let alone Windows 7. When XP dies, that software is buried with it.

Re:Article makes wrong assumption about software. (3, Funny)

canajin56 (660655) | more than 4 years ago | (#32609588)

So, you shouldn't deploy software that doesn't, as the retarded article says, properly handle people with names that are over 65,000 characters long, where some portions are case sensitive, so if that part is lowercase instead of upper, that's a different name. But other parts are case insensitive, so its still the same name even in all caps. Oh, did I also mention that some of the letters in the name aren't part of any character set, so they can't even be typed in the first place? Because the article says that assuming names can even be text at all is wrong and your software is broken if you made that stupid assumption. (See Prince) PS, that person with the 65 thousand letter long name? He has 8,000 aliases and needs to enter all of them, better hope you allow that many aliases. Also, there is a huge subset of his name in common with a friend of his, but they are not related, it is sheer coincidence, you better not assume relation just because only two people in the world have the same last few hundred words of their name in common! Also, his brother has no name. Not, like his name is "No name" or he goes by "The artist formerly known as Prince", as in, his name is just the empty string, so your software better fucking not have name as a required field!

Re:Article makes wrong assumption about software. (0)

Anonymous Coward | more than 4 years ago | (#32609348)

I 100% agree.

Generally when building a form that asks for a name, I create a first name field and a surname field. I allow A-Z, a-z, space, - and '. Considering the company I work for is local (i.e. Australian), I don't need to waste time trying to think out all the insane ways names could be formatted in other languages. My rules might be a bit restrictive but most names would fit. Sure you might get the few odd names that contain strange characters etc, but this is not extremely common.

Some of the rules on that page are really stupid as well, like for example saying it's wrong to have separate fields in a database for first name, surname etc. If you were to create one large field for a name that makes it even more difficult. What if, for example, you wanted to display on your webpage a message like "Welcome {first name}". With all the crazy ways the article mentions that people come up with names it would be impossible for the system to split the first name from the rest of the string.

Re:Article makes wrong assumption about software. (2, Insightful)

jrumney (197329) | more than 4 years ago | (#32609502)

Generally when building a form that asks for a name, I create a first name field and a surname field.

And you fail right there. For some people, their first name is their surname. Others don't have a surname. Some of those without a surname may use a patronym or matrinym as part of their full name, but you never use it to address them without their personal name. Some people have a first name and second name that always go together, so parsing a first name out of the full name, or disallowing whitespace in the first name field is another common fail.

Names are complex. Don't assume it doesn't matter because your database is only intended for local use, because unless you live somewhere as closed as North Korea, there are immigrants in your town that break your assumptions.

Re:Article makes wrong assumption about software. (1)

YoshiDan (1834392) | more than 4 years ago | (#32609540)

Well, I'm pretty sure that a surname is a legal requirement in Australia, so even if there are immigrants that have a single name, they would have had to create a surname to immigrate here anyway. (I'm the AC btw, that was my first post here and I didn't notice the post anonymously checkbox)

Re:Article makes wrong assumption about software. (4, Insightful)

lennier (44736) | more than 4 years ago | (#32609372)

But assuming the worst and trying to design a system that'll allow people's names to be Chinese characters when you don't do business in China, have presence in China, or ever ever plan to? That's ridiculous.

Or sell in New Zealand, or Australia, or anywhere else in the Pacific, or deal with immigrants, or be used by anyone who has a Chinese name?

This is the Internet now. Welcome to it.

Re:Article makes wrong assumption about software. (5, Insightful)

Trepidity (597) | more than 4 years ago | (#32609400)

Most Chinese emigrants to countries that use a Roman alphabet are perfectly capable of writing their name in Roman characters if they need to. If they weren't, they wouldn't have been able to get visas and get into the country in the first place.

Re:Article makes wrong assumption about software. (1)

atamido (1020905) | more than 4 years ago | (#32609514)

Just use UTF-8, all modern databases accept it as a field type. This isn't exactly a complex issue.

If you can't write your name in UTF-8, it doesn't deserve to be written.

Thanks, Prince (4, Informative)

BlueBoxSW.com (745855) | more than 4 years ago | (#32609314)

Thanks, Prince

Only names you need (1)

Alcoholist (160427) | more than 4 years ago | (#32609322)

Foo and Bar, only names you need. Besides, Foo is a pretty name for a girl!

Re:Only names you need (0)

Anonymous Coward | more than 4 years ago | (#32609396)

Foo Kyou
Foo Kme

Irish need not log in? (5, Insightful)

thepainguy (1436453) | more than 4 years ago | (#32609420)

My last name is O'Leary and over the past 5 years web sites have not gotten any better, and arguably have gotten worse, at handling the apostrophe in my last name

Help me Slashdot, you're my only hope.

Eat some oats (0)

Anonymous Coward | more than 4 years ago | (#32609542)

Eat some sheep too

Re:Irish need not log in? (5, Funny)

kenj0418 (230916) | more than 4 years ago | (#32609550)

You've probably compiled a lengthly list of sites vulnerable to SQL-injection. I'm sure you could sell that to someone somewhere to compensate you for your pain and suffering.

not surprising (2, Interesting)

Phoenix Dreamscape (205064) | more than 4 years ago | (#32609432)

Considering how many entry forms still don't allow '+' in an e-mail address (or, worse, allow it in the sign-up box but not in the unsubscribe box), and considering how many banks still restrict you to an 8-character password, does it come as any surprise that they have difficulty with something that isn't defined in an RFC [ietf.org] ?

You are number 6 (1)

Punto (100573) | more than 4 years ago | (#32609436)

You are number 6

Coral Cache FTW... (1)

Qubit (100461) | more than 4 years ago | (#32609478)

Okay, I finally got the page to load:

Falsehoods Programmers Believe About Names [nyud.net]

If only Slashcode would do a single hit on <domain> + .nyud.net + /rest/of/url/ for every link posted in an article, then it would be trivial to switch over to cached copies of the content.

Yea f-ing right. What about little Bobby Tables? (1, Redundant)

Narcocide (102829) | more than 4 years ago | (#32609490)

'nuff said [xkcd.com]

Well Duh (4, Insightful)

Saint Stephen (19450) | more than 4 years ago | (#32609492)

First thing I learned back in 1993 when I got started.

1) George Foreman has five boys named George Foreman. Your database better be able to handle that.
2) Your database better be able to handle Cher (no last name).
3) People are not required to have Social Security numbers. (it's an optional program - you don't have to partipate).
4) Not everyone's last name starts with a capital letter.
5) Mexican people's names break ASCII (the tilda n).
6) People named O'Grady have a hard time getting their name in a database sometimes and have a hard time getting their name passed via a URL sometimes and generally mess stuff up.
7) People from Sri Lanka will break your name length limits.
8) Some people's name is only a single letter.
9) Some people go by their middle name god damn it! :-)

Programmers hate my real name (4, Funny)

SexyKellyOsbourne (606860) | more than 4 years ago | (#32609496)

My first name: "where 1=1 "
My last name: "'; drop table users; --"

Looking at it from the wrong direction (1)

quietwalker (969769) | more than 4 years ago | (#32609500)

The article writer started out well, and then immediately ignored his own line of thought.

"John Graham-Cumming wrote an article today complaining about how a computer system he was working with described his last name as having invalid characters. It of course does not, because anything someone tells you is their name is--by definition--an appropriate identifier for them."

Yes, in your scope, your name is going to be accurate by whatever cultural, political, legal, optional, character-set or other restrictions or freedoms apply.

However, in the scope of any given app, your name may very well be an 8 character, [a-zA-Z] string. That's what the app means by the name field - by definition.

Naturally, no programmer would use any user-provided input for any sort of unique key ( err... right?) ... so really what this is about is that someone did not properly set user expectations, that they may not be allowed to arbitrarily spool data in any format or fashion into any given input field - 'name' in this case.

Or someone already knows that fact, and is ranting because they chose today to ignore it.

Why do programmers get the blame? (5, Insightful)

justfred (63412) | more than 4 years ago | (#32609504)

I code to spec. The product and marketing departments write the spec (what little there is); the QA department amends the spec with overly specific test cases. I suggest that the spec is incomplete and won't handle...but I'm told, just code it to spec. I recommend changed, but we don't have time for edge cases. I point out potential problems, but we're unlikely to get any of those. I warn of potential compatibility problems but we don't care. Are you just trying to be difficult? If there's a problem QA will catch it. The project is overdue already, and by the way here are some new requirements that need to make it in, and we can't change the release date because we already promised the stockholders. Why is your code so complicated, my twelve-year-old kid could write this.

It's not my fault. I code to spec.

Missing in list: Single names & Initials (1)

aneroid (856995) | more than 4 years ago | (#32609534)

Single names: I've done data conversions on a project which covered multiple countries in Asia, Africa and Latin America. A "new" thing I came across about African names was: Some people have just a single name. It's not their first name or their last name, it's just their name.Though, they're generally okay with it being considered their first name. However, most legal docs require a last name (well, maybe not in some African countries) so we had to use dummy text in the first name.
(Almost covered in assumptions 19 & 20 but not quite.)

Initials: Globally, there are ppl who have a single letter, or two, as their first or last name. These could be initials of their parents' names or based on something else. Point is, min. length requirement of 3 or even 2 does not work.

Implications of the article in general:
1. You can't do any validation on people's names. No programmer would consider this proper.
2. You can't use someone's nationality to create validation rules, since nationality/citizenship can change - and using 'country of birth' doesn't cover re-located/migrated families, assuming your application uses that field.
3. Most Importantly...and Obviously: Either ask the client/customer what allowances they want in the application. Or, clearly state your assumptions and have them review & refine it.

I can has cheeseburger? (0)

Anonymous Coward | more than 4 years ago | (#32609548)

I can has cheeseburger

it's not software, it's people (2, Insightful)

yyxx (1812612) | more than 4 years ago | (#32609554)

Software shouldn't have to satisfy every whim and excentricity. If you don't have a well-defined first name and last name that consists of extended alphanumeric characters in Unicode and starts with a letter, well, then get one, OK? And while you're at it, come up with decent Romanized and ASCII (traditional Latin) versions of your name, conformant with one of the common Romanization systems of your language; you will need that too if you want to travel internationally. Single letter names are also a potential problem because they are confusable with abbreviations, so consider using a variant spelling ("O" -> "Oh").

This isn't because programmers have some sort of hangups about names, it's because people themselves need to be able to refer to individuals in some reasonable and standardized way, they need to be able to write your name, alphabetize it, and correct errors.

...so what? (4, Insightful)

SanityInAnarchy (655584) | more than 4 years ago | (#32609580)

It seems to me that most misconceptions about names can be fixed by the following:

Allow a single, Unicode-enabled field of "unlimited" length (let's say 4 kilobytes) which represents "name". Several would be defined by different roles -- "Real name", "Nickname", "login", where only login (sometimes simply an email address) is required to be globally unique.

Now let's look at what that breaks:

First, #1, 2, 4, and 5. How am I supposed to avoid assuming these? People should be allowed to enter an arbitrary number of names for themselves? I suppose that's possible, but it immediately kills most of the potential uses of this data. If I want to set a nickname that goes with my forum posts, say, what good is it for me to have five nicknames? Seems like the only potential use would be making people easy to find by real name -- so, a social network.

#6 -- surely 4k is enough, but this is also not a terribly difficult assumption to change later. Annoying, but not devastating, not even as hard as changing from the first name / last name combination into one "real name" field.

#7, 8 -- most systems would make it trivial for people to change their names.

#9, 10 -- UTF8 is easy.

#11 -- very, very curious to see an example. And wouldn't that be a bug in Unicode? And this is again one where I have to ask -- how do you change this? Allow arbitrary images?

#12, 13 -- obvious solution is to make the name system case-preserving, thus allowing both case-sensitive and case-insensitive searches.

#14 -- again, avoid by simply allowing the name to be a single opaque field.

#15, 16, 17 -- if your name supports random unicode, no idea why these would be a problem.

#18 -- not sure why it matters.

#19, 20 -- again, if it's just arbitrary text, it just works.

#21, 22, 23 -- not sure how I'd make that assumption.

#24, 25, 26, 27 -- again, the name is just an opaque bunch of characters.

#28 -- what?

#29 -- opaque characters.

#30 -- keep the original text as-is. If you want to try to split people out by naming scheme, do it later, but keep the original. This should be a "duh" concept -- always preserve the original user input. Cache transformations for speed, if you like, but they're a cache -- keep the original. Your algorithm might change.

#31 -- bad idea to assume bad words won't cause problems in general. I currently play an MMO in which I physically can't talk about Emily Dickinson, and have occasion to more frequently than you might suspect.

#32-36 -- why would it matter? Unless...

#37 -- Fine, but how would I otherwise connect the same person?

#38 -- How about unicode-equivalent? And of course, they might not -- one might make a mistake, or the name might be represented differently. But you'd have to deal with typos anyway, so this isn't exactly shocking.

#39 -- I'm going to have to agree with the assumption, though. If I develop a system which works well for people who only follow the US standard, and I suddenly have a ton of people from China wanting to use my service -- enough that this is actually a problem for me -- that's a nice problem to have.

#40 -- People can make up names. I guess this explains #32-36, though.

The sense I get is that half the list is stuff you'd almost have to be stupid to run into (seriously, who doesn't use Unicode?), and the other half involves some seriously weird names and cultures that are going to have to meet me halfway, if they expect me to do anything interesting with their name. As I understand it, the only way to get this right would be to allow people to have zero or more names, each of which is either an unlimited amount of text in any encoding, or an image (raster or vector) of unlimited size. To query such a system requires insane amounts of logic just to deal with the text, and throw in some OCR for good measure.

I think this is a case where I would much rather see people evolve to match the technology, rather than the other way around. When you grow up and enter the world of the Internet, pick one short, catchy, globally-unique identifier for yourself. If you speak English, you might even consider forming it out of purely ASCII characters. I'm being pragmatic -- it would be very nice if computer systems could be more internationalized, and I plan to do it to the extent I reasonably can, but... meet me halfway. I'm not going to store your name as a bitmap just because it's only valid when you draw a smiley at the end -- especially when the Unicode smiley exists anyway.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>