×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Unicode 6.1 Released

Unknown Lamer posted more than 2 years ago | from the tent-emoji-style dept.

Upgrades 170

An anonymous reader writes "The latest version of the Unicode standard (v. 6.1.0) was officially released January 31. The latest version includes 732 new characters, including seven brand new scripts. It also adds support for distinguishing emoji-style and text-style symbols and emoticons with variation selectors, updates to the line-breaking algorithm to more accurately reflect Japanese and Hebrew texts, and updates other algorithms and technical notes to reflect new characters and newly documented text behaviors."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

170 comments

Stick to ASCII (-1, Troll)

Anonymous Coward | more than 2 years ago | (#38892135)

Unicode seems to break everything and is completely unnecessary.

Re:Stick to ASCII (5, Funny)

cc1984_ (1096355) | more than 2 years ago | (#38892227)

Yeah but can you write a pile of poo in ASCII?

http://www.fileformat.info/info/unicode/char/1f4a9/index.htm [fileformat.info]

Re:Stick to ASCII (0)

Anonymous Coward | more than 2 years ago | (#38892427)

Yes. +2D3cqQ

Re:Stick to ASCII (3, Funny)

metamatic (202216) | more than 2 years ago | (#38894727)

This is Slashdot, I'm sure you can find any number of examples of people who've written a pile of poo in ASCII.

Re:Stick to ASCII (2)

Xtifr (1323) | more than 2 years ago | (#38894943)

Yeah but can you write a pile of poo in ASCII?

As far as I know, Windows was originally written in ASCII... :)

Re:Stick to ASCII (1)

countertrolling (1585477) | more than 2 years ago | (#38892235)

Slashdot seems to believe so, seeing that we can't type accents and whatnot without jumping through a few hoops

No hoops... (0)

Anonymous Coward | more than 2 years ago | (#38894015)

Àçcênts aré easy (if you have Windows). See http://vulpeculox.net/ax.
Works for 'any' application. Free. No stupid picking or codes.

Re:Stick to ASCII (1)

unixisc (2429386) | more than 2 years ago | (#38893685)

Yeah, it's fantastic that Cyrillic or Katanaga or Devanagiri scripts can be so beautifully supported in ASCII. Speaking of which, does HTML5 have a complete character list for unicode, or is it still restricted to ASCII?

Re:Stick to ASCII (-1)

Anonymous Coward | more than 2 years ago | (#38894209)

Yeah, it's fantastic that Cyrillic or Katanaga or Devanagiri scripts can be so beautifully supported in ASCII.

On the day that anything is written in them that's worth reading will you come back and tell us?

I blame Star Trek & LotR. (1)

Hognoxious (631665) | more than 2 years ago | (#38893909)

Well said, that man. If you feel the desire to "write" with stick figures and squiggles use a bastarding graphic, for fuck's sake.

Eklinóringëon my arse.

Zomg (0)

Anonymous Coward | more than 2 years ago | (#38892163)

13 new emoticons1!1! http://www.unicode.org/charts/PDF/Unicode-6.1/U61-1F600.pdf

Re:Zomg (1)

aepurniet (995777) | more than 2 years ago | (#38892245)

9 cat faces emoticons? is this really necessary in a character standard?

Re:Zomg (1)

BSAtHome (455370) | more than 2 years ago | (#38892455)

The correct sequence for business, politics and everything is of now:
#1F648 #1F649 #1F64A

Gotta love the effort that went into providing the proper symbols.

27cb appearing in HTML in 5.4.3.2.1... (2)

vlm (69642) | more than 2 years ago | (#38892187)

Take a good look at glyph 27cb aka \diagup part of the Misc Math Symbols. People are gonna try embedding that in html now. Can't wait.

Re:27cb appearing in HTML in 5.4.3.2.1... (1)

GrangerX (1959200) | more than 2 years ago | (#38892753)

If I read the character list correctly, that's the division / slash symbol. That does sound somewhat ominous from a malformed-URL perspective. (They also added something that looks like backslash as 27cd).

Re:27cb appearing in HTML in 5.4.3.2.1... (0)

Anonymous Coward | more than 2 years ago | (#38893949)

These aren't the only slash-like characters. U+2044 and U+2215 look even more slash-like. And why shouldn't they be in Unicode? If your concern is hostname spoofing, I can assure you that the set of non-Latin characters allowed in the hostname part of URLs is very restricted. In the path part it doesn't matter.

Re:27cb appearing in HTML in 5.4.3.2.1... (1)

vlm (69642) | more than 2 years ago | (#38894871)

Thats a good once, but I'm also worried about html parsers needing to understand half a dozen variants of the "closing slash"

Favourite unicode character (3, Interesting)

Cocodude (693069) | more than 2 years ago | (#38892189)

has got to be the Love Hotel [fileformat.info] .

Does anyone know why this is even there?

Re:Favourite unicode character (2)

vlm (69642) | more than 2 years ago | (#38892253)

As if http://www.fileformat.info/info/unicode/char/1f4be/index.htm [fileformat.info] makes sense to anyone under age 30. I demand the addition of a punchcard glyph...

Re:Favourite unicode character (1)

tepples (727027) | more than 2 years ago | (#38892277)

What better icon is there for the action of committing an edited document to storage?

Re:Favourite unicode character (2)

am 2k (217885) | more than 2 years ago | (#38892317)

The "don't bother me with those implementation details"-icon?

Re:Favourite unicode character (1)

tepples (727027) | more than 2 years ago | (#38892381)

What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

Re:Favourite unicode character (1)

am 2k (217885) | more than 2 years ago | (#38892547)

What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

The location where the data ist stored (RAM vs. harddrive). There are some effects that play against each other here:

  • For editing, the data has to be in RAM (at least the part that's edited at the moment).
  • When the data is in RAM, but not on the disk, the state is lost after a crash or sudden power loss. This is undesirable.
  • Copying from RAM to harddrive (aka "saving") takes time.

As computers get better, the latter effect becomes negligible. This means that when this is done automatically in the background (which is certainly possible for most data these days), the user doesn't have to manage this technical detail. Less management means that the user has more time thinking about the things that he/she really wants to do using the application, and it reduces the number of errors (losing hours of work, because the user forgot to save).

Re:Favourite unicode character (1)

tepples (727027) | more than 2 years ago | (#38892735)

...Copying from RAM to harddrive (aka "saving") takes time. As computers get better, the latter effect becomes negligible.

Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In addition, one ordinarily doesn't want to create a new numbered revision of the document in a revision control system after each keypress; there has to be some way to mark one's changes as suitable for being viewed by other editors of the document, not unlike the SQL keyword COMMIT.

Re:Favourite unicode character (1)

am 2k (217885) | more than 2 years ago | (#38892917)

Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In addition, one ordinarily doesn't want to create a new numbered revision of the document in a revision control system after each keypress; there has to be some way to mark one's changes as suitable for being viewed by other editors of the document, not unlike the SQL keyword COMMIT.

Yes, you shouldn't save after every single keypress, but a timer for saving every minute or so (if there are any changes) should suffice. Committing for others to see is a different thing, that's something a user can be expected to understand.

Ultimately, for revert/versions there should be a timeline slider like there was in Google Wave, where you can go back to your document's state of any point in the past.

btw, affordable SSDs are already large enough for everyday use. My notebook has a 256GB SSD in it, and I didn't have to sell my car for it.

Re:Favourite unicode character (1)

tepples (727027) | more than 2 years ago | (#38893281)

Committing for others to see is a different thing, that's something a user can be expected to understand.

Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?

btw, affordable SSDs are already large enough for everyday use.

Not when "everyday use" includes storing a large collection of purchased music and purchased movies.

I didn't have to sell my car for [a 256 GB SSD].

But you did have to pay more than one would for the stock hard drive that comes bundled with a low-end laptop. Google Product Search shows 256 GB SSD in the $300-$400 range. Until the ultrabook market matures, autosave will still waste the computer's hardware resources.

Re:Favourite unicode character (2)

DragonWriter (970822) | more than 2 years ago | (#38893783)

Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?

The generic flowchart datastore symbol with an inbound arrow (retrieving something previously committed would use the same symbol with an outbound arrow.)

For products with less technical audiences, a stone tablet with an etching instrument, since committing results in the data being "carved in stone".

Re:Favourite unicode character (1)

tepples (727027) | more than 2 years ago | (#38894585)

The generic flowchart datastore symbol with an inbound arrow

Thank you. I had forgotten about the flowchart symbols because nowadays none of them appear see popular use except an oval for module entry and exit, a box for a step, and a diamond for a decision.

Re:Favourite unicode character (1)

jbengt (874751) | more than 2 years ago | (#38894509)

If the user never saved it, then where is it when the user needs it later? Auto-saved, OK, but where and under what name? There still needs to be a save option, and an icon, even if outdated, is useful for that.

Re:Favourite unicode character (1)

piripiri (1476949) | more than 2 years ago | (#38892553)

Like "I want to store my document on the hard disk but the available feature is saving to a floppy", or for youglings these days: "WTF is that?"

Re:Favourite unicode character (1)

tlhIngan (30335) | more than 2 years ago | (#38893231)

What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

Why should the user be bothered with it? There aren't many real-life instances where a user creates and it isn't "autosaved".

It's one of the things that OS X Lion is doing - it's asking "why do we still do this?". Lion-aware apps automatically autosave in the background, and have a time-machine like feature that lets them view their document as it existed in the past. If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.

Heck. Lion is trying to get away from the whole "You need to manage your application's state" as well - the OS can manage its resources.

Right now, most apps implement some form of autorecovery. Word keeps crashing on me so I'm thankful when it seems to only lose a few minutes work. Ditto vim. And that's because people forget to save - why not have the OS do it for them? (And with Lion's autosave, it won't commit unrecoverable changes so you can always go back to an earlier revision).

Disclosure, drive space, and spinning up (2)

tepples (727027) | more than 2 years ago | (#38893557)

If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.

For one thing, an application that saves (and sends) a document's undo history along with the document can disclose things that the document's author did not want to disclose. I seem to vaguely remember scandals with Word's AutoRecover being used to recover redacted parts of a document. For another, how much of the limited space on the drive should be dedicated to saving a document's undo history since creation, especially when the document is a large layered picture or multitrack audio project?

And that's because people forget to save - why not have the OS do it for them?

I agree, but how often should the OS spin up the hard drive to do so?

Re:Favourite unicode character (0)

Anonymous Coward | more than 2 years ago | (#38892539)

For the "under 30" group: cloud [fileformat.info]
It even has an error character too: thundercloud [fileformat.info]

Re:Favourite unicode character (1)

Hognoxious (631665) | more than 2 years ago | (#38894267)

What better icon is there for the action of committing an edited document to storage?

One with the word "Save" on it.

"Save" with no icon in a toolbar full of icons (1)

tepples (727027) | more than 2 years ago | (#38894549)

In a toolbar full of icons, the word "Save" or its localization without an icon will probably look out of place. Is this out-of-placeness somehow superior to the use of a floppy disk icon?

Re:"Save" with no icon in a toolbar full of icons (1)

Hognoxious (631665) | more than 2 years ago | (#38895101)

Yes, because none of the [working] machines here has a floppy drive and nobody under the age of twenty has ever even seen one except in a museum, you smug wanker.

Re:Favourite unicode character (0)

Anonymous Coward | more than 2 years ago | (#38892371)

It amuses me how, in our zeal to infuse ourselves with whatever the newest whatever is, we're accelerating towards "lawl anything made last month is so ancient and wut is taht lawl". Amusing in that I've already lost my faith in "internet culture" years ago, and it'll be hilarious to watch the entire thing inevitably implode due to everyone repeating the same mistakes a few years earlier solely because it's not trendy and non-trendy things are boring.

But, don't mind me, I'll just play along and laugh from the sidelines. lawl wut iz a floppeedsik is taht liek a ipod lawl?

Re:Favourite unicode character (1)

JDG1980 (2438906) | more than 2 years ago | (#38892727)

Oh, come on. Everyone who uses computers even casually knows that the floppy-disk icon means "Save." That it no longer reflects the underlying hardware is irrelevant.

Re:Favourite unicode character (1)

GreatBunzinni (642500) | more than 2 years ago | (#38893713)

Here, a punch card glyph. Not quite what I expected but still...
http://www.fileformat.info/info/unicode/char/5361/index.htm [fileformat.info]

There is also a card index glyph do?
http://www.fileformat.info/info/unicode/char/1f4c7/index.htm [fileformat.info]

There might not be a punchcard glyph, but there is a minidisk one:
http://www.fileformat.info/info/unicode/char/1f4bd/index.htm [fileformat.info]

and an optical disk one:
http://www.fileformat.info/info/unicode/char/1f4bf/index.htm [fileformat.info]

and a DVD one:
http://www.fileformat.info/info/unicode/char/1f4c0/index.htm [fileformat.info]

I cannot imagine how this can ever be used in a useful manner, instead of being simply an irrelevant gimmick. Does anyone know why this stuff found its way into the standard?

Re:Favourite unicode character (2)

snowgirl (978879) | more than 2 years ago | (#38893977)

They have 14 planes of ~65,536 characters... even after including massive syllabaries, and the unified CJK ideographs, they still had really only used the first plane. Now they're presented with only using about 7% of the space available, and so they started chucking just about every pictograph that they could possibly come up with into it...

I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictographs, before we have something like tengwar, and cirth. Sure, tengwar and cirth are made up fantasy scripts, but they're more widely used than Linear B...

Re:Favourite unicode character (1)

X0563511 (793323) | more than 2 years ago | (#38894187)

The first one you link is a Chinese symbol. Looks totally valid to me.

Remember, Chinese has symbols for entire words or ideas, it is not "alphabetical" like most other popular languages.

Re:Favourite unicode character (1)

GreatBunzinni (642500) | more than 2 years ago | (#38895001)

Yes, it is. I don't question that character. The others, on the other hand, are a bit silly though.

Re:Favourite unicode character (1)

X0563511 (793323) | more than 2 years ago | (#38895147)

Agreed. Myself, I think it would be better to just reserve the space for future use, giving us plenty of expansion room without having to increase the word size (utf8 to utf16 to utf32) - instead of just filling the section up with nonsense.

And where's Tengwar? (1)

Xtifr (1323) | more than 2 years ago | (#38894927)

They've got symbols for a love hotel, a horse [fileformat.info] , and a steaming pile of poo [fileformat.info] , along with emoticons, and they still haven't accepted the Tengwar [evertype.com] draft that's been around since '93? Where are these people's priorities!?

Why Slashdot won't adopt it (5, Informative)

tepples (727027) | more than 2 years ago | (#38892197)

Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text. For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

Re:Why Slashdot won't adopt it (1)

countertrolling (1585477) | more than 2 years ago | (#38892285)

Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode...

Oops [slashdot.org] .. But I kinda wish the <i> tag still worked

Re:Why Slashdot won't adopt it (4, Insightful)

BetterThanCaesar (625636) | more than 2 years ago | (#38892387)

Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

I'd love to be able to write IPA when discussing pronunciation, or actually write out words in other languages, ohm character for discussing electronics, pound and yen signs for currency ... Hey, even a bigger whitelist than what we have now would be great!

Checking for the release of a new version (1)

tepples (727027) | more than 2 years ago | (#38892485)

Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

&#x1F64B; If I were writing such a parser, I don't know how I'd get it to automatically check for the release of a new version of the standard and determine which code points are new bidi characters to be popped.

I'd love to be able to write IPA when discussing pronunciation

It'd be nice but not necessary: X-SAMPA.

or actually write out words in other languages

I guess the rationale is that most moderators would not be able to read foreign words without transliteration into Latin characters.

pound and yen signs for currency

£ is Alt+0163 on a Windows machine, and ¥ is Alt+0165. They're probably Ctrl+Shift+U A 3 Enter and Ctrl+Shift+U A 5 Enter on a Linux machine, but I don't have one in front of me right this minute with which to test.

Re:Checking for the release of a new version (5, Funny)

Canazza (1428553) | more than 2 years ago | (#38892833)

£ is Shift+3, what are you on about?

Re:Checking for the release of a new version (0)

Anonymous Coward | more than 2 years ago | (#38892927)

£ is AltGr+Shift+4. What are you on about?

Re:Checking for the release of a new version (0)

Anonymous Coward | more than 2 years ago | (#38892951)

-1, does not have a Linux machine within arm's reach. And admits to it.

Re:Checking for the release of a new version (0)

Anonymous Coward | more than 2 years ago | (#38894627)

&#x1F64B; If I were writing such a parser, I don't know how I'd get it to automatically check for the release of a new version of the standard and determine which code points are new bidi characters to be popped.

Bidi ranges are already set by the Unicode roadmaps. [unicode.org] It's just a range check.

Re:Why Slashdot won't adopt it (1)

Kjella (173770) | more than 2 years ago | (#38892391)

Just admit that it's because it's old and random, there's a few HTML entities working but there's no reason why &aelig; = æ should would and &mu; = shouldn't - like in micrograms, or uTorrent. It's a geeky site, but it's made for writing English prose with some half-hearted Latin1 support, no math or science.

Re:Why Slashdot won't adopt it (1)

X0563511 (793323) | more than 2 years ago | (#38894401)

Here's the reason: æ = 0xE6 (or 0xC6 for capitol) in extended ASCII, where Mu is not present in extended ASCII. It appears slashdot dumps anything outside of that range.

Lets try an experiment:
0xAB and 0xBB:
0xA7 and 0xB6:

Re:Why Slashdot won't adopt it (1)

X0563511 (793323) | more than 2 years ago | (#38894423)

False! Only a subset is allowed, but anything outside of it most definitly seems to fail.

Re:Why Slashdot won't adopt it (1)

Anonymous Coward | more than 2 years ago | (#38892459)

Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text.

Trolls gonna troll; that's what moderation is for.

For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

So filter those character ranges.

The next version of the standard (1)

tepples (727027) | more than 2 years ago | (#38892545)

Trolls gonna troll; that's what moderation is for.

At one point, ASCII art spammers were filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis, so fast that moderators could not keep up.

So filter those character ranges.

Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

Re:The next version of the standard (4, Funny)

StuartHankins (1020819) | more than 2 years ago | (#38892987)

...filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis...

Yeah, the way they are going they might actually *have* these characters in the set now...

Re:The next version of the standard (1)

afabbro (33948) | more than 2 years ago | (#38893853)

Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

That would lead to the Slashdot "editors" having to maintain their code, and we can't have that.

Re:The next version of the standard (1)

Dahan (130247) | more than 2 years ago | (#38893927)

Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

It's not difficult to update a simple file/DB entry/whatever to add more characters to the blacklist. Include a little util to parse the UnicodeData file and automatically blacklist all control characters. But even if you wanted to go with a whitelist instead of a blacklist, there's no reason for the whitelist to be as small as it currently is. And then there's what I assume is a Slashcode bug where non-ASCII characters that are in the whitelist don't come through properly. I've seen numerous posts where a stray character gets included. I don't feel like looking for examples right now, but I don't think people are all making the same consistent typos.

Re:The next version of the standard (1)

Hognoxious (631665) | more than 2 years ago | (#38894041)

At one point, ASCII art spammers were filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis, so fast that moderators could not keep up.

They can do that with or without unicode, so how does blocking unicode help?

Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

How often do new versions come out? We aren't talking about Firefox here.

Re:Why Slashdot won't adopt it (1)

Fastolfe (1470) | more than 2 years ago | (#38892519)

There are technical solutions to these problems, such as tracking language/BIDI overrides when embedding strings provided by users (and reversing the effect afterward). You could also do it the "easy" way and just filter out characters based on their Unicode property (e.g. disallow all 'other' characters, which would include these formatting characters).

Re:Why Slashdot won't adopt it (0)

Anonymous Coward | more than 2 years ago | (#38892905)

The old bullshit excuses...

Unicode has different *pages*. You can filter by page. This *guarantees* that nobody will do any tricks with e.g. direction reversal etc. So that "argument" is out.
And about the ASCII art: Hell, other blogs have, *gasp* IMAGE links!
How about that?

You know what? What's stopping us from just creating a Greasemonkey script that translates back and forth from HTML with square brackets and allows the full HTML set, by putting every message in its own e.g. IFRAME so it can't mess with the stuff around it. (Or alternatively, just disallow style parameters, allow only certain CSS classes, and force a maximum size on the comment content.)

Come on, it's not that hard! You're just either too lazy, too stupid, or both.

Hundreds of iframes (1)

tepples (727027) | more than 2 years ago | (#38893423)

Unicode has different *pages*. You can filter by page.

New versions of Unicode introduce new pages. If you're blocking a page for some reason, the next version of Unicode might introduce another page that extends the functionality of the old page, reintroducing the behavior that led you to block the old page.

What's stopping us from just creating a Greasemonkey script that translates back and forth from HTML with square brackets and allows the full HTML set

Slashdot's lameness filter would probably confuse those square brackets with ASCII art, and even if not, the comment would likely draw negative moderations from moderators who haven't installed the Greasemonkey script.

by putting every message in its own e.g. IFRAME

There was a time when hundreds of <iframe> elements on a page would cause the browser to become unusably slow or even crash. I reported this to bugzilla.mozilla.org as Bug 103649, and a decade later it's still not RESOLVED FIXED. And are you going to put the subject of a comment in its own iframe too?

and force a maximum size on the comment content.

Until April 2014, when IE 6 passes out of extended support, one can't assume that all supported browsers support CSS max-width.

Re:Hundreds of iframes (1)

Jesus_666 (702802) | more than 2 years ago | (#38894141)

Why not use a reasonable whitelist? It's unlikely that a new version of Unicode would turn a printable character into a bidi control character and printable JIS characters are not automatically evil, especially not if the lameness filter treats them as non-letters.

As for "people could spam ASCII art": People could also flood Slashdot with bizarre textual porn copypasta. The key part of "posting ASCII art faster than the mods can cope" is "faster than the mods can cope", not "ASCII art".

It is fairly weird that a geek-centric website like Slashdot doesn't support Unicode but instead relies on an undocumented subset of Latin-1. Especially in 2012.

Re:Hundreds of iframes (1)

JDG1980 (2438906) | more than 2 years ago | (#38894623)

New versions of Unicode introduce new pages. If you're blocking a page for some reason, the next version of Unicode might introduce another page that extends the functionality of the old page, reintroducing the behavior that led you to block the old page.

So use a whitelist instead of a blacklist for pages.

emoticons? (3, Insightful)

pz (113803) | more than 2 years ago | (#38892481)

Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?

Re:emoticons? (3, Informative)

snowgirl (978879) | more than 2 years ago | (#38892701)

And little horseys, too?

U+1F40E ... no, seriously...

Re:emoticons? (1)

GreatBunzinni (642500) | more than 2 years ago | (#38893629)

The U+1f4af character is a bit harder to explain than little horses, because it relies on a 4-octet code character to express something which can be easily expressed by using 3 1-octed characters.

Re:emoticons? (0)

Anonymous Coward | more than 2 years ago | (#38892803)

Well then, we need to include:
  - all the sprites and playfield object tiles from the NES Super Mario Bros. 1 CHR ROM.
  - all the Tetris pieces
  - glyphs of game pieces of all well known games
  - heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards
  - throw the Major Arcana tarot cards in there too
  - gang symbols

Tetris, Chess, Baseball, and gang symbols (4, Informative)

tepples (727027) | more than 2 years ago | (#38893073)

all the Tetris pieces

The polyominoes up to five squares can be composed from U+2580 (upper half block), U+2584 (lower half block), and 2588 (full block) characters. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.

glyphs of game pieces of all well known games

A lot of well-known pre-1923 tabletop games' game pieces already exist in Unicode. Chess is U+2654 through U+265F, and Checkers is U+26C0 through U+26C3. A lot of game pieces are simple enough in form that the Geometric Shapes (U+25A0 through U+25FF) represent them just fine. For example, Othello is U+25CB and U+25CF, as is Connect Four. Even the enemy in Fast Eddie for Atari 2600 is in Miscellaneous Technical (U+237E) as is home plate in Baseball (U+2302).

heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards

Those can already be composed from a Basic Latin letter or number and a suit symbol. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.

throw the Major Arcana tarot cards in there too

I don't know about Tarot, but all twelve signs of the zodiac are in Miscellaneous Symbols, even the "69" looking sign of Cancer (U+264B).

gang symbols

The symbol of "Folk Nation" gangs is similar to that of Judaism: a Star of David (U+2721). The symbol of "People Nation" gangs is similar to that of Islam: a 5-point star and crescent (U+262A).

Re:emoticons? (0)

Anonymous Coward | more than 2 years ago | (#38894473)

Well then, we need to include:

  - heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards

Done and done. [unicode.org]

- throw the Major Arcana tarot cards in there too

Working on it. [unicode.org]

Smile emoticon at CP437 code 0x01 (1)

tepples (727027) | more than 2 years ago | (#38892825)

Seriously, emoticons? Who ever thought it a good idea to include those in a standard?

Unicode had to be able to round-trip (losslessly encode and decode) all old popular encodings. This includes encoding now called "code page 437", introduced with the first IBM PC, which includes a smile emoticon at code value 0x01. It also includes the encodings associated with the widely distributed system fonts Zapf Dingbats and Wingdings.

Re:Smile emoticon at CP437 code 0x01 (1)

Anonymous Coward | more than 2 years ago | (#38894311)

This has nothing to do with emoticons. Emoticons are by definition composed of several characters, or reinterpretations of existing characters (such as the tilde with dieresis).

GP probably means emoji. There is an emoji encoding widely used on Japanese mobile devices, so it makes perfect sense. Either Unicode includes emojis, or Japanese mobile devices are never going to switch to Unicode. Unicode emojis were originally requested by Google in 2007 and released with Unicode 6.0 in October 2010, after a long-winded open discussion and many changes.

Re:emoticons? (0)

Anonymous Coward | more than 2 years ago | (#38893331)

Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?

Look, unless you want to put the Unicode committee members out of jobs, we have GOT to keep looking for new characters!

Re:emoticons? (1)

gutnor (872759) | more than 2 years ago | (#38893505)

Unicode encode old characters of a dead languages only a few professor will ever use, that makes a lot less sense than emoticons, character that are actually used daily by lots of people.

Re:emoticons? (0)

Anonymous Coward | more than 2 years ago | (#38893757)

Seriously, emoticons? Who ever thought it a good idea to include those in a standard?

The 100+ million Japanese who use them to communicate on a daily basis? Or the phone manufacturers only have to worry about one encoding system going into the future to display they characters?

Re:emoticons? (0)

Anonymous Coward | more than 2 years ago | (#38894353)

Who ever thought it a good idea to include those in a standard?

Umm, Google, for one. Yahoo, for another. Turns out there were something like 15 proprietary versions of Shift JIS that the different phone carriers were using to transmit and store emoji. Instead of having to tag every message with an encoding - welcome to the bad old days of mojibake! - or having to settle for only one of the JIS versions that wouldn't necessarily reflect every carrier's emoji set, the Unicode repertoire allows all those SMS messages to get stored and indexed.

Alternative proposal: (-1, Troll)

SuricouRaven (1897204) | more than 2 years ago | (#38892745)

Standardise the world on English. It'll be easier. It's already the second-most-spoken language, and Chinese is a real nightmare of character encoding in itsself. Then we can go back to good old ASCII.

Re:Alternative proposal: (0)

Anonymous Coward | more than 2 years ago | (#38892955)

English also has the second-worst spelling system on the planet (only outdone by Japanese). I may have to use it on /. but I'm happy I don't have to resort to it for daily usage. And even if your idiotic proposal were to be universally accepted (which it won't, it's like asking everyone to use DOS) we'd still be in of a way to encode historical documents and such.

Re:Alternative proposal: (2)

snowgirl (978879) | more than 2 years ago | (#38894021)

English also has the second-worst spelling system on the planet (only outdone by Japanese).

??? WTF are _YOU_ on about? English does not have the worst spelling system on the planet, and Japanese certainly doesn't qualify as the worst. "But they have three different scripts: two syllabaries, and an ideographic set" but...

Look, perhaps I better just demonstrate to you what a real bad spelling system looks like; go look at Irish [wikipedia.org] .

Re:Alternative proposal: (1)

shutdown -p now (807394) | more than 2 years ago | (#38894409)

??? WTF are _YOU_ on about?

Can you concisely explain why the English word "psyche" is pronounced the way it is to a non-native speaker of the language?

Re:Alternative proposal: (0)

Anonymous Coward | more than 2 years ago | (#38895021)

Can you explain why any word in french is pronounced the way it is?
It seems like they have different rules for what letters to pronounce for every word.

Anyway the reason you pronounce psyche like that is because it sounds better than psitsh.

Re:Alternative proposal: (2)

DragonWriter (970822) | more than 2 years ago | (#38893861)

Standardise the world on English. It'll be easier. It's already the second-most-spoken language, and Chinese is a real nightmare of character encoding in itsself. Then we can go back to good old ASCII.

ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)

Re:Alternative proposal: (2)

snowgirl (978879) | more than 2 years ago | (#38894047)

ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)

Some that aren't foreign as well. "Coöperate" is an archaic spelling. Basically, any prefix that ends in "o" that is attached to a word that starts with an "o" can archaically be spelled with a diaeresis, in the French/Dutch method of "this vowel should be pronounced separately, and not as part of a diphthong".

I can't see the new characters (0)

Anonymous Coward | more than 2 years ago | (#38893031)

Because my browser doesn't support Unicode 6.1 yet...

frist s7op!! (-1)

Anonymous Coward | more than 2 years ago | (#38893371)

continues toChew I type this. Insisted that they're gone KMac the mundane 3hores bombshell hit so that their personal rivalries

Great, yet another "unified language" (-1, Troll)

tobiasly (524456) | more than 2 years ago | (#38894609)

First Google Dart, then Mozilla Rust, and now this "Unicode"? Yet another attempt for a universal "one language for all uses" that is destined to fail.

Re:Great, yet another "unified language" (0)

Anonymous Coward | more than 2 years ago | (#38894733)

What a hell are you talking about? Scuse me, are you from the past?!

It needed to be flexible, so it's a VM now. (1, Offtopic)

VortexCortex (1117377) | more than 2 years ago | (#38895199)

"It needed to be flexible, so it's a VM now."

I fear this is the next step. The right to left and line wrapping BS is complicated enough that I'd welcome a specialized VM with loadable bytecode & glyph data. Yes, from a security standpoint this could create a wider attack surface. However, I'd argue it would be less attack surface considering that the VM for my unlimited precision scientific & programming calculator is smaller than my UTF-8 text display implementation.

I'd also argue that it would be faster to adopt new glyphs and behaviors if all I needed was to drop in a new batch of bytecode.

I'd also argue just to argue... because, well this IS Unicode we're talking about.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...