×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Mr. Pike, Tear Down This ASCII Wall!

samzenpus posted more than 3 years ago | from the fresh-design dept.

Programming 728

theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

728 comments

The thing with ASCII (5, Insightful)

enec (1922548) | more than 3 years ago | (#34083854)

The thing with ASCII is that it's easy to write on standard keyboards, and does not require a specialized layout. Once someone can cram the necessary unicode symbols into a keyboard so that I don't have to remember arcane meta-codes or fiddle with pressing five different dead keys to get one symbol, I'm all for it.

Re:The thing with ASCII (1)

mail2345 (1201389) | more than 3 years ago | (#34083954)

Different keyboard manufacturers have a selected range of keys that their keyboards can type, which will probably get confusing:
"Only hp unicode keyboards have the print symbol. For other unicode keyboards or ASCII keyboards, type alt-8713 to get the print symbol."

Re:The thing with ASCII (-1, Troll)

user10 (1932300) | more than 3 years ago | (#34084282)

I have once seen a keyboard [bit.ly] that has dedicated keypad for Unicode numbers

Re:The thing with ASCII (0)

Anonymous Coward | more than 3 years ago | (#34084364)

I don't see any keys.
Just a huge, gaping hole.

Re:The thing with ASCII (4, Informative)

angus77 (1520151) | more than 3 years ago | (#34083960)

Japanese is typed using a more-or-less standard QWERTY keyboard.

Re:The thing with ASCII (5, Informative)

MichaelSmith (789609) | more than 3 years ago | (#34084050)

Japanese is typed using a more-or-less standard QWERTY keyboard.

Tediously.

Would it be less tedious to have 10,000+ keys? (5, Insightful)

Sycraft-fu (314770) | more than 3 years ago | (#34084312)

Because that's what you find in JIS X 0213:2000. Even if you simplify it to just what is needed for basic literacy, you are talking 2000 characters. If you have that many characters your choices are either a lot of keys, a lot of modifier keys, or some kind of transliteration which is what it done now. There is just no way around this. You cannot have a language that is composed of a ton of glyphs but yet also have some extremely simple, small, entry system.

You can have a simple system with few characters, like we do now, but you have to enter multiple ones to specify the glyph you want. You could have a direct entry system where one keypress is one glyph, but you'd need a massive amount of keys. You could have a system with a small number of keys and a ton of modifier keys, but then you have to remember what modifier, or modifier combination, gives what. There is no easy, small, direct system, there cannot be.

Also, is it any more tedious than any Latin/Germanic language that only uses a small character set? While you may enter more characters than final glyphs, do you enter more characters than you would to express the same idea in French or English?

Re:The thing with ASCII (1)

Firethorn (177587) | more than 3 years ago | (#34084068)

True, but I remember reading that it was complex enough that many reporters preferred to dictate to a voice recognition system than to try to type their story in.

It seemed to work a lot like predictive keystrokes on a cellphone.

I have no real problems with allowing Unicode in programming, but I'd see it mostly being used in defining strings and naming variables, and even then you'd probably want to restrict the character set simply because so many of the symbols look so similar, yet are so different code wise.

Sure, with Unicode you could probably make every function a single character, but human minds aren't really written to recognize that. Sure, Chinese and Japanese do the 'one word one character' thing, but they also end up with like 3 character sets and a substantial additional amount of work learning said additional characters.

Re:The thing with ASCII (1)

jonbryce (703250) | more than 3 years ago | (#34084228)

Japanese characters are mostly sound-based rather than meaning-based, though a single Japanese character will generally map to two latin characters.

Re:The thing with ASCII (0)

Anonymous Coward | more than 3 years ago | (#34084286)

Japanese characters are mostly sound-based rather than meaning-based

Only for some values of 'character'.

Re:The thing with ASCII (5, Informative)

Kagetsuki (1620613) | more than 3 years ago | (#34084300)

I'm Japanese, so let me clarify how entering Japanese works here: Japanese is composed of two sets of Kana (characters with no meaning but they have a sound) and Kanji (characters with meaning). To enter a word in Japanese, let's say the word "Me/I" you would hit hit a key to activate your IME [input method editor] - usually the key on the top left of the keyboard, then type "watashi", just like that, and you would get in kana (hiragana). Next hit the space key, that converts it to kanji. Now hit enter to finish input or just start typing your next word. You can also enter multiple words, hit space, and then break up and convert the sentence all at once. It is not difficult, you don't actually need a special keyboard, and I've never heard of anybody capable of using a keyboard using voice recognition because they found the act of entering in words laborious.

Re:The thing with ASCII (0)

Anonymous Coward | more than 3 years ago | (#34084120)

And it takes 12 of them to do it. One to dictate and 11 to type.

Re:The thing with ASCII (-1, Troll)

user10 (1932300) | more than 3 years ago | (#34083994)

If keyboards like the Optimus keyboards [wikipedia.org] weren't too expensive, it might be possible to code in many symbols (not necessary in other that English languages.) This is an impressive video of Optimus keyboard usage [dyndns.info] for quick typing.

Re:The thing with ASCII (0)

thenextstevejobs (1586847) | more than 3 years ago | (#34084024)

The thing with ASCII is that it's easy to write on standard keyboards.

Why should the notations which we use to express our programs be limited to 'standard keyboards'?

I'm sure there could be decent schemes for writing alternate symbols with meta-keys and such. Learn a new keyboard layout, it won't kill you. Reminds me of folks refusing to learn a language other than C++/Java/whatever because they are afraid it'll cause them some irreparable mental damage.

For example, I'd love use standard logic symbols to express statements in my day to day coding, why not? Well, because I'm writing C/Ruby. But hey, I'd like to see them available as an alternative perhaps, not required?

Shooting this down because the keyboard we're all using in 2010 doesn't accommodate it well doesn't seem like the best way forward to me. Seems like the whole Ford 'faster horse' sort of thing. Take a longer view. Think about the possibilities. Maybe there's some cool things this would open up.

I don't think lines of code are taking up storage such that we'd have any trouble moving to UTF-8, 16, or any other longer format than ASCII.

Re:The thing with ASCII (1)

fwarren (579763) | more than 3 years ago | (#34084202)

Perhaps you like the idea of ColorForth http://www.colorforth.com/ [colorforth.com]

Color is used to denote different states. The equivalent in C would be where includes are in RED, and functions are in blue while their parameters are in green and some {} are no longer needed because of the color coding.

Mind you, it totally sucks if you are color blind. But you are able to create significantly terser code because of the amount of syntax that is represented by color.

Re:The thing with ASCII (5, Insightful)

arth1 (260657) | more than 3 years ago | (#34084052)

Once you've had to do an ad-hoc codefix through a serial console or telnet, you appreciate that you can write the code in 7-bit ASCII.

It's not about being conservative. It's about being compatible. Compatibility is not a bad thing, even if it means you have to run your unicode text through a filter to embed it, or store it in external files or databases.

It'd also be hell to do code review on unicode programs. You can't tell many of the symbols apart. Is that a hyphen or a soft hyphen at the end of that line? Or perhaps a minus? And is that a diameter sign, a zero, or the DaNo letter "Ø" over there? Why doesn't that multiplication work? Oh, someone used an asterisk instead of the multiplication symbol which looks the same in this font.

No, thanks, keep it compatible, and parseable by humans, please.

Re:The thing with ASCII (0)

offsides (1297547) | more than 3 years ago | (#34084352)

The original article talks about "write-only languages." I see the proposal to allow Unicode source as creating "read-only languages" - hard to write, impossible to debug, but fairly easy for someone to read, even if they're not a programmer. This proposal isn't about giving programmers more power to code, it's about making it easier for non-english speakers who aren't coders to read the code that their programmers write.

Real programmers understand the fundamental limitations of a parser/compiler, as well as the need for a consistent set of reserved words and symbols...

Re:The thing with ASCII (0)

Anonymous Coward | more than 3 years ago | (#34084226)

Non-ASCII characters won't be possible to type easily, no, but IDEs can assist: e.g. converting <= to the math symbol for LE, or != to the math symbol for NE. The previous example would increase readability, but it would also kill the "write anywhere" principle.

I know this post will incite a lot of negative responses saying things like "That idea blows!" or "I'll sacrifice my girlfriend before programming in that language!" Let me say that I'm only mentioning a possibility, not recommending the idea as a good or clever one.

Re:The thing with ASCII (0)

Anonymous Coward | more than 3 years ago | (#34084348)

The thing with ASCII is that it's easy to write on standard keyboards, and does not require a specialized layout.

Define "standard keyboard".

You do realize that the keyboards in the US (and English Canada) are different than those in France, are different than those in Germany, are different than those in Japan, are different than those in China, are different that those in....

Project Gutenberg (5, Insightful)

symbolset (646467) | more than 3 years ago | (#34083864)

Michael decided to use this huge amount of computer time to search the public domain books that were stored in our libraries, and to digitize these books. He also decided to store the electronic texts (eTexts) in the simplest way, using the plain text format called Plain Vanilla ASCII, so they can be read easily by any machine, operating system or software.

- Marie Lebert [etudes-francaises.net]

Since its humble beginnings in 1971 Project Gutenberg has reproduced and distributed thousands of works [gutenbergnews.org] to millions of people in - ultimately - billions of copies. They support ePub now and simple HTML, as well as robo-read audio files, but the one format that has been stable this whole time has been ASCII. It's also the format that is likely to survive the longest without change. Project Gutenberg texts can now be read on every e-reader, smartphone, tablet and PC.

If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.

Re:Project Gutenberg (5, Insightful)

shutdown -p now (807394) | more than 3 years ago | (#34083916)

If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.

This is false dichotomy. Plain text can be non-ASCII, and ASCII doesn't necessarily imply plain text. All the formats you've listed allow to add either visual or semantic markup to text, whereas ASCII is simply a way to encode individual characters from a certain specific set. They do not propose to move to rich text for coding, but to move away from ASCII.

There are still many reasonable arguments against it, but this isn't one of them.

Re:Project Gutenberg (1)

snowgirl (978879) | more than 3 years ago | (#34084026)

They do not propose to move to rich text for coding, but to move away from ASCII.

This is a bit of a false dichotomy as well. An ASCII-7 text is identical to the UTF-8 encoding of the same text.

There are a few issues with Unicode, in that CJK characters are lumped together by semantics, while LGC are not. Thus, while simplified Chinese, traditional Chinese, and Japanese may all write the same "character" differently, they are all represented by the same codepoint, while "o" despite being pronounced identically from the most common Latin-based written languages to Cyrillic are written with different codepoints, even despite having identical appearances.

Either way, for instance Perl, supports code written in UTF-8, which is awesome, and it's fairly unicode agnostic about everything. So being able to code using variable names written in your own language, vs. transliterating them into Latin characters is a huge benefit... but ultimately only a minor factor in programming.

The matter still remains that programming languages are heavily dependent upon English for keywords and such, and as a result, are heavily dependent upon some representation thereof.

But all of this ignores the matter that ASCII is a subset of Unicode anyways... so why be so dorky about "zomg, get rid of ASCII!!!!" it's retarded...

Re:Project Gutenberg (5, Informative)

Netbrian (568185) | more than 3 years ago | (#34084284)

This is untrue.

First off, Simplified and Traiditional characters are separated in Unicode.

Second off, Cyrillic characters and Latin characters have always been considered two different scripts, while Chinese logographs are considered to be the same script, used in different contexts.

See http://unicode.org/notes/tn26/ [unicode.org].

In any event, it would make good sense for programming environments to be able to handle Unicode source.

huh (3, Insightful)

stoolpigeon (454276) | more than 3 years ago | (#34083866)

so we should start coding in Chinese?

Seems easier to spell words with a small set of symbols than to learn a new symbol for every item in a huge set of terms.

Re:huh (4, Insightful)

MightyYar (622222) | more than 3 years ago | (#34084010)

so we should start coding in Chinese?

Exactly! Keep the "alphabet" small, but the possible combination of "words" infinite.

You don't need a glyph for "=>" for instance. Anyone who knows what = and > mean individually can discern the meaning.

And further (I know, why RTFA?):

But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

This is easily done with a split screen, and sounds like an editor feature to me. Not sure why you'd want a programming language that was tied to monitor size and aspect ratio.

Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

Again, if you want this, do it in the editor. Doesn't he know anyone who is colorblind? And even a normally sighted user can only differentiate so many color choices, which would limit the language. And forget looking up things on Google: "Meaning of green highlighted code"... no wait "Meaning of hunter-green highlighted code" hmmmm... "Meaning of light-green highlighted code"... you get the idea.

Re:huh (2, Interesting)

jonbryce (703250) | more than 3 years ago | (#34084238)

No, but I think the idea of being able to draw flowcharts on the screen and attach code to each of the boxes could be an idea that has mileage.

Re:huh (4, Interesting)

CensorshipDonkey (1108755) | more than 3 years ago | (#34084366)

Have you ever used a visual diagrammatic code language before, such as LabView? Every scientist I've ever met that had any experience writing code vastly prefers the C based LabWindows to the diagrammatic LabView - diagrammatic is simply a fucking pain in the ass. Reading someone else's program is an exercise in pain, and they are impossible to debug. Black and white, unambiguous plain text coding may not be pretty to look at but it is damn functional. Coding requires expressing yourself in an explicitly clear fashion, and that's what the current languages offer.

Re:huh (1)

Kjella (173770) | more than 3 years ago | (#34084322)

Let me take an example, in Norwegian year = år. That means that for a billing system it might be fully resonable to have classes related to financial years (finansår), close of year (årsavslutning), the tax report (årsoppgave) and so on. In practice everybody sticks to A-Z, but it's a system limitation not a natural one.

Learn2code (4, Insightful)

santax (1541065) | more than 3 years ago | (#34083870)

I can express my intentions just fine with ASCII. They have cunningly invented a system for that. It's called language and it comes in very handy. The only thing I would consider missing is a pile of shit-character. I could use that one right now.

Re:Learn2code (0)

Anonymous Coward | more than 3 years ago | (#34083980)

A pile of shit character is available in Unicode 6.0.

Re:Learn2code (2, Informative)

santax (1541065) | more than 3 years ago | (#34084048)

Oh crap... I guess you can forget about my earlier comment. I'm adopting unicode as we speak! U+1F4A9 ftw.

Re:Learn2code (0)

Anonymous Coward | more than 3 years ago | (#34084320)

Surely you mean U+2668, #9823;

Re:Learn2code (5, Funny)

Noughmad (1044096) | more than 3 years ago | (#34084336)

I don't know about you, but I have a pile-of-shit key on my keyboard, right between the left Ctrl and Alt.

Yes, Unicode is "the new black" (2, Informative)

Antique Geekmeister (740220) | more than 3 years ago | (#34083882)

Yes, it's the next fad that just _everyone_ has to wear. this season. Within 5 years, it will be something else, and given the ability of major vendors like Microsoft to get Unicode _wrong_, it's not stable for mission critical applications. If you want your code to remain parseable and cross-platform compatible and stable in both large and small tools, write it in flat, 7-bit ASCII. You also get a significant performance benefit from avoiding the testing and decoding and localization and most especially the _testing_ costs for multiple regions.

Look up "microsoft unicode error" on Google for hundreds if not thousands of examples. ASCII for code is like flat text for email. It assures that you're not simply publishing coding spam, and actually wrote what you meant.

Re:Yes, Unicode is "the new black" (2, Insightful)

shutdown -p now (807394) | more than 3 years ago | (#34083958)

Yes, it's the next fad that just _everyone_ has to wear. this season. Within 5 years, it will be something else

Unicode has been around for, what, over 15 years now? It's part of countless specifications from W3C and ISO. All modern OSes and DEs (Windows, OS X, KDE, Gnome) use one or another encoding of Unicode as the default representation for strings. No, it's not going away anytime soon.

If you want your code to remain parseable and cross-platform compatible and stable in both large and small tools, write it in flat, 7-bit ASCII.

This may be a piece of good advice. Even for languages where Unicode in the source is officially allowed by the spec (e.g. Java or C#), many third-party tools are broken in that regard.

You also get a significant performance benefit from avoiding the testing and decoding and localization and most especially the _testing_ costs for multiple regions.

I don't see how this has any relevance to your previous point (writing the source code in ASCII). If your app source is in Unicode, it will still compile (or not compile) the same on any locale. And what would you be you testing? The compiler?

I've no idea what "decoding and localization" means in this context, either.

Well, unless you're also advocating for the use of ASCII as the default runtime string encoding in apps, and completely forgoing localization. Which is fine if you only intend your app to be used in the USA, I guess (and even then, considering take-up of Spanish, it may not be such a wise idea).

Re:Yes, Unicode is "the new black" (3, Insightful)

scdeimos (632778) | more than 3 years ago | (#34084382)

Unicode has been around for, what, over 15 years now? It's part of countless specifications from W3C and ISO. All modern OSes and DEs (Windows, OS X, KDE, Gnome) use one or another encoding of Unicode as the default representation for strings. No, it's not going away anytime soon.

And yet major vendors like Microsoft still get Unicode wrong. A couple of examples:

  • Windows Find/Search cannot find matches in Unicode text files, surely one of the simplest file formats of all, even though the command line FIND tool can (unless you install/enable Windows Indexing Service which then cripples the system with its stupid default indexing policies). This has been broken since Windows NT 4.0.
  • Microsoft Excel cannot open Unicode CSV and tab-delimited files automatically (i.e.: by drag-and-drop or double-click from Explorer) - you have to go through Excel's File/Open menu and go through the stupid import wizard.
  • Abuse of Unicode code points by various Office apps, causing interoperability issues even amongst themselves.

Re:Yes, Unicode is "the new black" (2, Insightful)

Anonymous Coward | more than 3 years ago | (#34084104)

"Yes, it's the next fad that just _everyone_ has to wear. this season."

Like the Metric System.

Re:Yes, Unicode is "the new black" (1)

petermgreen (876956) | more than 3 years ago | (#34084146)

and most especially the _testing_ costs for multiple regions.
Heh you still need to test on multiple language versions of your OS even if all your text is 7-bit ascii. For example you need to figure out where you will be using the local conventions for decimal seperators and where you will be using the dot and make sure you use the right conversion routines in the right place. Failure to do this will lead to software that works fine on english systems but may break on continental european ones.

We've tried this before (4, Informative)

FeatherBoa (469218) | more than 3 years ago | (#34083894)

Everyone who tried to do something useful in APL, put up your hand.

Re:We've tried this before (1)

MichaelSmith (789609) | more than 3 years ago | (#34084082)

Everyone who tried to do something useful in APL, put up your hand.

I never had access to the right keyboard.

Re:We've tried this before (0)

Anonymous Coward | more than 3 years ago | (#34084328)

I, but it is inevitably a waste of time. I get the feeling this guy is clueless about the reality of unicode. ASCII was invented out of necessity based on the lessons that unicode would have taught us in the long run, basically solving the problems before they were actual problems. This [wikimedia.org] is simply amazing. Now stop wasting my time and get off my lawn.

Re:We've tried this before (4, Interesting)

SimonInOz (579741) | more than 3 years ago | (#34084368)

Incredibly, I worked for a major investment company who had, indeed, done something useful in APL. In fact they had written their entire set of analysis routine in it, and deeply interwoven it with SQL. I had to untangle it all. (Would you beleive they had 6 page SQL stored procedures? No, nor did I - but they did).
APL is great sometimes - especially if you happen to be a maths whizz and good at weird scripts. Not exactly easy to debug, though. Sort of a write-only language.

For the last ten plus years, we have been steadily moving in the direction of more human readable data - the move to XML was supposed to be a huge improvement. It meant you could - sort of - read what was going on at ever level. It also meant we had a common interchange between multiple platforms.

So you want to chuck all that away to get better symbols for programming? No, I don't think so.
I must point out that the entire canon of English Literature is written in - surprise - English, and that's definitely ascii text. I don't think it has suffered due to lack of expressive capability.

What does supriose me, though, is how fundementally weak our editors are. Programs, to me, are a collection of parts - objects, methods, etc, all with internal structure. We seem very poor at further abstracting that - why, oh tell me why, when I write a simple - trivial - bit of Java code, do I need to write funtions for getters and setters all over the place - dammit, just declare them as gettable and settable - or (to keep full source code compatibility) the editor could do it. Simply ,easily, tranparently. And why can't the editor hide everything except what I am concerned with?
Microsoft does a better job of this in C#, but we could go much, much further. We seem stuck in the third generation language paradigm.

If you can't express yourself in ASCII... (4, Funny)

MaggieL (10193) | more than 3 years ago | (#34083914)

...the character set isn't the problem.

And I say this as an old APL coder.

(There aren't many new APL coders.)

Already proposed... on C++ (1)

kikito (971480) | more than 3 years ago | (#34083920)

And more than 10 years ago, in Bjarne Stroustrup's "Generalizing Overloading for C++2000". PDF can be donwloaded here:

www2.research.att.com/~bs/whitespace98.pdf

Pages 4-5 delve with this.

It was also a joke paper. Like I hope this article is.

Examples? (1)

Waccoon (1186667) | more than 3 years ago | (#34083928)

So, what are his ideas?

Re:Examples? (2, Interesting)

izomiac (815208) | more than 3 years ago | (#34084292)

From TFA apparently he wants to be able to use (Omega) to name a variable, and ÷ (Division Sign) as an operator. My interpretation of his opinion is that a descriptive name for a variable is inferior to using greek letters, and that using mathematical operators that take an extra five or so keystrokes are superior to the standard +-*/^ set that people have become accustomed to.

IMHO, if you use more than 26 single letter variables something is seriously wrong, and trying to make mathematical formulas pretty in code isn't practical without a whole lot of unneeded complexity. Sure, having an eight line formula with fractions within fractions and tiny exponent numbers might be (slightly) better than five layers of parenthesis, but you aren't going to get that with just unicode (AFAIK), and the pain of dealing with a slightly misplaced term confounding the unicode to math converter isn't one I'd like to experience. Unicode or even LaTeX code for comments might be useful though.

It all winds up as binary anyway. (4, Funny)

foodnugget (663749) | more than 3 years ago | (#34083940)

How silly of us to be compiling to binary all this time!
We've been relegating ourselves to only two different options for decades!

I reckon that a memory cell and single bit of a processor opcode should have --at least-- 7000 different possibilities. Think of everything a computer could accomplish *then*!

Seriously, someone tell this guy you're allowed to use more than one character to represent a concept or action, and that these groups of characters represent things rather well.

Re:It all winds up as binary anyway. (1)

hedwards (940851) | more than 3 years ago | (#34084170)

It does, however the comments aren't. I'm not sure how useful this is since you still need to use ASCII characters for programming.

It ain't broke! (5, Insightful)

webbiedave (1631473) | more than 3 years ago | (#34083944)

Let's take our precious time on this planet to fix what's broken, not break what has clearly worked.

vim-cute-python syntax config (0)

Anonymous Coward | more than 3 years ago | (#34083950)

On serious note, this article reminded me of this project I saw the other day: http://github.com/ehamberg/vim-cute-python [github.com]. It makes vim show various Unicode characters for Python keywords, such as "alpha" and "not".

Kinda neat :)

Not only no, (4, Funny)

Anonymous Coward | more than 3 years ago | (#34083962)

but fuck no.
I eagerly await comments saying how anglo-centric, racist, bigoted, culturally-imperialist the insistence of using ASCII is.
The nuanced indignation is salve for my frantic masturbation.
(If my post is the only one that mentions this, all the better)

limiting? (2, Insightful)

Tei (520358) | more than 3 years ago | (#34083968)

the chinese have problems to learn his own language, because have all that signs, it make it unncesary complex.

26 letter lets you write anything, you dont need more letters, really. ask any novelist.

also, programming languages are something international, and not all keyboards have all keys, even keys like { or } are not on all keyboards, so tryiing to use funny characters like ñ would make programming for some people really hard.

all in all, this is not a very smart idea , imho

Re:limiting? (3, Interesting)

Sycraft-fu (314770) | more than 3 years ago | (#34084116)

For that matter, we could probably even get away with less letters. Some of them are redundant when you get down to it. What you need are enough letters that you can easily denote all the different sounds that are valid in a language. You don't have to have a dedicated letter for all of them either, it can be through combination (for example the oo in soothe) or through context sensitivity (such as the o in some in context with the e on the end). We could probably knock off a few characters if we tried. If that is worth it or not I don't know but we sure as hell shouldn't be looking at adding MORE.

Also in terms of programming a big problem is that of ambiguity. Compilers can't handle it, their syntax and grammar is rigidly defined, as it must be. That's the reason we have programming languages rather than simply programming in a natural language: Natural language is too imprecise, a computer cannot parse it. We need a more rigidly defined language.

Well as applied to unicode programming that means that languages are going to get way more complex if you want to provide an "English" version of C and then a "Chinese" version and a "French" version and so on where the commands, and possibly the grammar, differ slightly. It would get complex probably to the point of impossibility if you then want them to be able to be blended, where you could use different ones in the same function, or maybe on the same line.

Re:limiting? (1)

hedwards (940851) | more than 3 years ago | (#34084192)

That's what I'm wondering about, are there any languages which use unicode for any of the actual language stuff? Because without languages needing the extra unicode characters for actual programming this stuff doesn't appear to really make a difference beyond comments.

Re:limiting? (1)

Jeff DeMaagd (2015) | more than 3 years ago | (#34084360)

In my opinion, Chinese isn't really so bad, though it understandably looks intimidating to the uninitiated. There are problems, in my opinion the biggest is using a keyboard paradigm designed around Latin languages, but the rest of it is about trade-offs. There are a lot of problems learning English too. Witness how many people take a dozen years of English classes and can't articulate themselves halfway decently. English is an amalgam of three or four languages, plus a ridiculous number of loan words, and then there are all the idioms.

Anyways, the numerous characters may seem daunting, but there's a method to the madness, it's often possible to derive the meaning and pronunciation of a character based on its sub-glyphs. I don't pretend to have that term right, it's been a while since I covered it. I don't know how they handle the character input into computers though.

Re:limiting? (1)

nameer (706715) | more than 3 years ago | (#34084392)

Twenty six letters, sure. But twenty six glyphs? Far from it. Along with all of the punctuation (the obvious addition) there are ligatures, italics, bold, caps, small caps, etc. Authors use all of these tools to express complex ideas clearly when twenty six letters isn't enough.

Fisher-Price programming? (0)

Anonymous Coward | more than 3 years ago | (#34083972)

Yes, I want all the keys on my expensive LCD screen keyboard to look like it came straight from Fisher-Price just so I can do some programming.

Now, where's the :rolleyes: ascii character on this traditional keyboard...

lame (0, Offtopic)

Anonymous Coward | more than 3 years ago | (#34083978)

This is lame. If you can't program using just the keyboard in front of you, GTFO

This is nonsense (4, Insightful)

Kohath (38547) | more than 3 years ago | (#34083988)

Programming languages usually have too much syntax and too much expressiveness, not too little. We don't need them to be even more cryptic and even more laden with hidden pitfalls for someone who is new, or imperfectly vigilant, or just makes a mistake.

If anything, programming needs to be less specific. Tell the system what you're trying to do and let the tools write the code and optimize it for your architecture.

We don't need longer character sets. We don't need more programming languages or more language features. We need more productive tools, software that adapts to multithreaded operation and GPU-like processors, tools that prevent mistakes and security bugs, and ways to express software behavior that are straightforward enough to actually be self-documenting or easily explained fully with short comments.

Focusing on improving programming languages is rearranging the deck chairs.

Re:This is nonsense (2, Interesting)

Twinbee (767046) | more than 3 years ago | (#34084096)

One day, I think we'll have a universal language that everyone uses (yeah English would suit me, but I don't care as long as whatever language it is, everyone uses it). Efficiency would rocket through the roof, and hence we'll save billions or trillions of pounds.

In the same way, we'll all be using a single programming language too (even if that language combines more than one paradigm). Yes competition is good in the mean time, but I mean ultimately. It'll be as fast as C or machine code, but as readable as a much higher level language. It won't have baggage such as headers or be unnecessarily verbose either.

Until that point, we need to do a lot more to improve languages, and it won't just be deckchair arranging.

Program vs. Literature (1)

oldhack (1037484) | more than 3 years ago | (#34083998)

We like economy and precision in programming languages. You may have many complaints about English, but it's pretty damn good common language due to its slutty tendency - it soaks in whatever useful from other languages.

In general, I don't want poetry in coding. I definitely don't want Egyptian glyphs or Chinese ideograms.

Two words: Perl 6 (1)

Etcetera (14711) | more than 3 years ago | (#34084000)

I like how he mentions perl, but completed neglects to mention Perl6 [perl6.org].

One of the most derided or most lauded features (depending on your POV) in perl6 is the copious use of additional syntax operators in the interests of further Huffman coding. There are certain operators (for example, the "hyper" operators [perl6.nl] that are defined in terms of unicode symbols ("") and use ASCII digraphs as an alternate form (">>").

So, it's there now in a mostly stable form... you can program in unicode-laced form all you like at this point.

Re:Two words: Perl 5 (1)

Krishnoid (984597) | more than 3 years ago | (#34084136)

Perl 5.8 and above have native Unicode string and I/O support, per the first chapter [safaribooksonline.com] of the most current rev of the Perl Cookbook, and you can use utf8 [perl.org] as well to write your scripts in Unicode.

No we don't (4, Informative)

Sycraft-fu (314770) | more than 3 years ago | (#34084022)

Because I don't want to have to own a 2000 key keyboard, or alternatively learn a shitload of special key combos to produce all sorts of symbols. The usefulness of ASCII, and just of the English/Germanic/Latin character set and Arabic numerals in general is that it is fairly small. You don't need many individual glyphs to represent what you are talking about. A normal 101 key keyboard is enough to type it out and have enough extra keys for controls that we need.

To see the real absurdity of it, apply the same logic to the numerals of the character set. Let's stop using Arabic numerals, let's use something more. Let's have special symbols to denote commonly used values (like 20, 25, 100, 1000). Let's have different number sets for different bases so that a 3 can be told what base its in just by the way it looks! ...

Or maybe not. Maybe we should stick with the Arabic numerals. There's a reason they are so widely used: The Indians/Arabs got it right. It is simple, direct, and we can represent any number we need easily. Combining them with simple character indicators like H to indicate hex works just fine for base as well.

You might notice that even languages that don't use the English/ASCII character set tend to use keyboards that use it. Japanese and Chinese enter transliterated expressions that the computer then interprets as glyphs. Doesn't have to be that way, they could different keyboards, some of them rather large depending on the character set being used, but they don't. It is easy and convenient to just use the smaller, widely used, character set.

Now none of this means that you can't use Unicode in code, that strings can't be stored using it, that programs can't display it. Indeed most programs these days can handle it, just fine. However to start coding in it? To try and design languages to interpret it? To make things more complex for their own sake? Why?

I am just trying to figure out what he thinks would be gained here. Also remembering that the programming languages, the compilers, would need to be changed at the low level. Compilers do not take ambiguity, if a command is going to change from a string of ASCII characters to a single unicode one, that has to be changed in the compiler, made clear in the language specs and so on.

Unicode symbols in Code??? (1)

zAPPzAPP (1207370) | more than 3 years ago | (#34084044)

I don't get it.
When coding, I already am annoyed by the placement of on my keyboard, on a key that I don't reach easily (good thing i don't do html, hehe). Using lots of symbols, that require me to do a two-key combination, slow me down.
Now I'm supposed to use Unicode? Is that guy insane?
How am I supposed to type out unicode expressions on my keyboard, without typing in the whole 4 digit number?
And if I want to address a unicode-named variable, but I forgot the magical number to make it appear.. then what? Copy paste?

Must be a joke then, right.

What about Sun's Fortress language (4, Informative)

philgross (23409) | more than 3 years ago | (#34084046)

Sun's Fortress language allowed you to use real, LaTeX-formatted math as source code. They reasoned, correctly I think, that for the mathematically literate, this would make the programs far clearer. Google for Fortress Programming Language Tutorial.

The article's author's concerns are misdirected (0)

Anonymous Coward | more than 3 years ago | (#34084064)

While I agree that compatibility with ASR-33 should be tossed to the side, replacing ASCII alone isn't going to solve this problem. The article argues that language developers have had to squeeze reliable syntax out of a small character set, but this is a result of the problem, not the cause of it. Extensibility is the key. Where we are trapped is in syntax definition. As mentioned with C/C++ being unable to define custom operators. If a problem need be solved here (which, IMHO this isn't really a problem), then its solution is making every keyword and type user extensible. However, doing this sort of thing can and would have major repercussions across the business world. When types and basic math become a matter of contention things can get ugly really quick. We'd spend the first 5 years hoping that the market would pan out an interface library of common custom types.

But I digress. Sane localization of syntax via unicode isn't too horrible an idea. 1 to 1 translation of words should be fairly easy to implement without loss of meaning when re-localized. However, development is about logic, not necessarily math. While mathematics does define a whole slew of operators we don't have the option of typing in 1 character, typing/reading their names works well for logic.


Full Disclosure: While I develop in many languages my day to day development is done in Visual Studio, and I'm therefore one of those bastards that's at least a bit spoiled by Intellisense.

Fortress allows Unicode, but has ASCII equivalent (3, Interesting)

thisisauniqueid (825395) | more than 3 years ago | (#34084074)

Fortress [wikipedia.org] allows you to code in UTF-8. However it has a multi-char ASCII equivalent for every Unicode mathematical symbol that you can use, so there is a bijective map between the Unicode and ASCII versions of the source, and you can view/edit in either. That is the only acceptable way to advocate using Unicode anywhere in programming source other than string constants. Programming languages that use ASCII have done well over those that don't, for the same reason that Unicode has done well over binary formats.

Re:Fortress allows Unicode, but has ASCII equivale (1)

thisisauniqueid (825395) | more than 3 years ago | (#34084098)

Sorry, meant to say "for the same reason XML [not Unicode] has done well over binary formats".

I don't see the problem and the problem is solved (1)

davidwr (791652) | more than 3 years ago | (#34084086)

Sure, strings and other items that can be seen on the screen would benefit from an expanded character set, but otherwise, why bother?

The only advantage I can think of is so that variable names, function names, and other user-defined non-display values can be in languages other than English or other Latin-letter languages. However, as English is currently the lingua franca of the technology world, encouraging fragmentation in this area is not a good idea.

Besides, nothing stops you from writing your code in Chinese or whatever other Unicode character set you want and using a preprocessor to convert it into ASCII before it hits the compiler or interpreter. The only "gotcha" is that there isn't a standardized way of doing the conversion, which can make it hard to link to binary libraries unless you use the same pre-processor.

Haskell (3, Interesting)

kshade (914666) | more than 3 years ago | (#34084114)

Haskell supports various unicode characters as operators and it makes me wanna to puke. http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource [haskell.org] IMO one of the great things about programming nowadays is that you can use descriptive names without feeling bad. Single character identifiers from different alphabets are something that rub me the wrong way in mathematics. Keep 'em out of my programming languages!

Bullshit from the article:

Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

OmegaZero is at least something everybody will recognize. And why would you name a variable like that anyway? It's programming, not math, use descriptive names.

But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

Because we're not using the same IDE?

And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

... what?

For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.

... WHAT? If you don't express your intentions clearly in a program it won't work!

And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.

vim does Unicode just fine. And from the Wikipedia entry on the author (http://en.wikipedia.org/wiki/Poul-Henning_Kamp):

A post by Poul-Henning is responsible for the widespread use of the term bikeshed colour to describe contentious but otherwise meaningless technical debates over trivialities in open source projects.

Irony? Why does this guy come off as an idiot who got annoyed by VB in this article when he clearly should know better?

It's Halloween ... (0)

Anonymous Coward | more than 3 years ago | (#34084122)

you're just trying to scare us ... right? ... right?

I really can't think of a lot of coding that would usefully be done with a more 'expressive' character set. The output of the code often has to be expressive but that isn't the same.

The most popular programming languages are Java, C, C++ http://langpop.com/ [langpop.com] They aren't popular because they are easy to use. They are used because they are effective. The innovative languages are well down the list.

You can read many reasons why the more innovative languages are better; in theory. C is either the most popular or second most popular language. There's a reason for that. Theory be damned.

Perl 6 (1)

jepaton (662235) | more than 3 years ago | (#34084132)

Perl 6 has guillemets in its standard syntax (equivalent to "<<" and ">>"). These are non-ASCII symbols. It will also be possible to declare new operators using whatever character you want (e.g. a snowman operator, see: http://perl6advent.wordpress.com/2009/12/17/day-17-making-snowmen/ [wordpress.com]).

Re:Perl 6 (2, Interesting)

russotto (537200) | more than 3 years ago | (#34084168)

Sure, but Perl is often derided as a "write only language", and Perl 6 is simply continuing the tradition.

On a related note... (0)

Anonymous Coward | more than 3 years ago | (#34084154)

We should all use trinary systems instead of binary!

PROGRAMMERS ARE CONSERVATIVE? (0, Troll)

WheelDweller (108946) | more than 3 years ago | (#34084178)

Most are pro-abortion, happy about socialist overlords, and think we're about to find another Earth just any day; which is great, since they all believe in the junk-science of ManMadeGlobalWarming(TM).

No, they're not Conservative, they're LAZY. Unicode isn't easy. ASCII is. It's actually that simple.

Author seems to be high or something (5, Insightful)

Tridus (79566) | more than 3 years ago | (#34084210)

He comes up with a bunch of ideas at the end that are out to lunch. Let's take a look:

Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

Well, let's think. Possibly because nobody knows what 0x03a9+0x2080 does without looking it up, and nobody seeing the character it produces would know how to type said character again without looking it up? I know consulting a wall-sized "how to type X" chart is the first thing I want to do every 3 lines of code.

While we are at it, have you noticed that screens are getting wider and wider these days, and that today's text processing programs have absolutely no problem with multiple columns, insert displays, and hanging enclosures being placed in that space? But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

If you actually look at word processing programs, the document is also highly vertical. The horizontal stuff is stuff like notes, comments, revisions, and so on. Putting source code comments on the side might be a useful idea, but putting the code over there won't be unless the goal is to make it harder to read. (That said, widescreen monitors suck for programming.)

And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

So anybody who has some color-blindness (which is not a small number) can't understand your program? Or maybe we should make a red + do something different then a blue +? That's great once you do it six times, then it's just a mess. (Now if you want to have the code editor put protected regions on a framed light gray background, sure. But there's nothing wrong with sticking "protected" in front of it to define what it is.) It seems like he's trying to solve a problem that doesn't really exist by doing something that's a whole lot worse.

Artifacts (1)

H3xx (662833) | more than 3 years ago | (#34084220)

I'm also wondering why we insist that:

  • our source code be able to be wrapped to 78 characters
  • a tab (0x09) character is equal to 8 spaces (unless you specify otherwise)
  • most major programming languages have function names that are in US English, even Ruby, which was developed by the Japanese, and Scilab's [scilab.org] programming language which was developed by French scientists
  • POSIX regular expressions' [:alphanum:] character class is most often written as [A-z0-9]

The truth is that we programmers prefer to be able to type things quickly without having to memorize character codes for a variety of Unicode characters; we want to be able to type simple variable and function names using a standard set of glyphs and not have to worry about remembering which variation of a Chinese pictograph was used.

If it comes down to it, we could all just use Ook [dangermouse.net] and not worry about language barriers (or getting much of anything done for that matter).

Ok, let's go Unicode then! (1)

Exitar (809068) | more than 3 years ago | (#34084242)

Now, what kind of marvelous and innovative language the author of the article will propose?

how about a character solely for escaping (1)

lulalala (1359891) | more than 3 years ago | (#34084290)

though this is just a programmer's dream, I always wished that we have a character solely for the purpose of escaping other characters. This will have a few benefits:
1. You won't need to escape this escape-character.
2. makes it easier for different languages to use the same way to escape stuffs. I won't need to worry about this string that gets escaped in SQL, ASP then JavaScript.
3. Having a new escaping character shouldn't impact the old code. It just gives the user another option.

Microsoft Visual Studio allows Unicode identifiers (1)

Myria (562655) | more than 3 years ago | (#34084306)

Microsoft Visual C++ and C# allow Unicode identifiers; that is, variable and function names. Visual C++ allows this:

int meow()
{
    int áéíóú = 1;
    return áéíóú;
}

Ok, stop. (1)

Kagetsuki (1620613) | more than 3 years ago | (#34084342)

Unicode/UTF(8) compatibility as a base feature of the language - very good, I fight constantly with languages and code conversion because some dipshit didn't realize some people want to use mulitbyte strings. What's worse is people like Microsoft who assume they can just add crap to files to specify they contain multibyte strings (like their "BOM" for UTF8 - add that and you'll never read the file properly again in anything but Visual Studio).

Unicode/UTF(8) compatibility within the language (function names, variable names) - questionable, but it would be nice sometimes. Some languages already do this (I think I've seen it in ruby even?). You would make your code unreadable to someone who didn't ready your language but sometimes that could be a good thing, and hey worst case scenario run the code through a translator.

Unicode/UTF(8) is required to enter the language - NO. WHY WOULD YOU DO THIS?

You cannot attach a royalty payment to ASCII... (1)

GumphMaster (772693) | more than 3 years ago | (#34084380)

You cannot attach a royalty payment to ASCII so clearly, in this enlightened age when even implementing public APIs risks copyright litigation, we need to move away from this dangerously socialist encoding. We need a new encoding so that the relevant "owners" of the "intellectual property" embodied in the computer language can bill appropriately for your level of use of their "artistic endeavours". Each ideogram should contain encoded information on the rights owner so that corporate publisher birth-rights can be honoured in perpetuity on a per-instance basis. Of course, such encoding should be preserved through compilation to enable the collection of royalty payments for each end-use of the system also. After all, it's only fair.

Tried coding in Japanese (1)

ook_boo (1373633) | more than 3 years ago | (#34084388)

About 15 years ago I worked in a Japanese office where the database had its own scripting language. The company that created the database had translated all the keywords into Japanese and made it so that it would display correctly, so IF --> , etc. Further, you could flip back and forth between English and Japanese versions easily and not have problems with the compiler. But not one of the Japanese programmers used the Japanese version. They thought it was just weird, and they'd already learned how to use IF in English anyway. I suspect using non-ASCII symbols is a solution without a problem.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...