Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

cancel ×

44 comments

AHA! (-1, Redundant)

Anonymous Coward | more than 4 years ago | (#32042318)

FRIST PAST

The winner is impressive. (0, Redundant)

iamhassi (659463) | more than 4 years ago | (#32042328)

FTA:

"Todd Lehman says:
March 25, 2010 at 10:14 pm
The maximum number of bits you can squeeze into a tweet using 32-bit non-Unicode characters is 4339.

Below is an algorithm which uses large-integer arithmetic to carry out operations which encode and decode a 4339-bit value. Perhaps someone can develop this into a working program.

Constants:

Let V = 2 = 2147483648 be the number of non-negative integers representable in 31 bits.
Let U = 10FFFF + 1 = 110000 = 1114112 be the number of valid Unicode code points.
Let N = V – U = 2147483648 – U = 2146369536 be the base for numeric conversions below.

The encoding and decoding algorithms below operate on the principle that 4339 / logV 139.97 and that ceil(4339 / logV) = 140.

Encoding algorithm (K M[140])

(1) Let K be any 4339-bit integer
(2) a := K // Initialize accumulator
(3) for each i in {0..139}:
(3a) d := a % N // Extract lowest-order digit from accumulator
(3b) c := U + d // Convert base-N digit to 32-bit character
(3c) M[139-i] := c // Store 32-bit character in message
(3d) a := (a – d) / N // Subtract lowest-order digit and shift accumulator right
(4) Output M[140]

Decoding algorithm (M[140] K)

(1) Let M[140] be an array of characters representing a message to decode
(2) a := 0 // Initialize accumulator
(3) for each i in {0..139}:
(3a) c := M[i] // Extract 32-bit character
(3b) d := c – U // Convert 32-bit character to base-N digit
(3c) a := (a * N) + d // Shift accumulator left and add digit
(4) K = a"

Anyone care to translate?

Re:The winner is impressive. (0)

Anonymous Coward | more than 4 years ago | (#32042404)

It's kind of in bad web etiquette to ninja that entire post from Ksplice.

If you take your time and read it slowly, it actually makes sense (given that you have some programming knowledge)

Arithmetic_shift [wikipedia.org] is about the hardest thing there.

Re:The winner is impressive. (1)

iamhassi (659463) | more than 4 years ago | (#32047886)

"It's kind of in bad web etiquette to ninja that entire post from Ksplice."

Actually AC it's very common on /. to post an article in case a site get's slashdotted

Re:The winner is impressive. (1)

SpazmodeusG (1334705) | more than 4 years ago | (#32042432)

Well his algorithm is dead set simple. It encodes a 4339bit block of data into a series of valid tweetable characters and back again. It does nothing more and it could have been written in a much simpler way.

The maths at the top refer to the number of unique tweet-able messages. If there are 2^4339 total unique tweet-able messages then that means there are 4339 bits available.

Re:The winner is impressive. (2, Informative)

SpazmodeusG (1334705) | more than 4 years ago | (#32042528)

For those wondering of a better way.

for( i = 4339; i > 0; i-=31) {
    output((wxchar)(bigInt & 0xef)); //output lowest 31bits of our 4339bit block of data
    bigInt = bigInt >> 31; // Shift down
}

Reverse

while( curInput = input() ) {
    bigInt += curInput; // add the 31 bits to the current bitInt
    bigInt = bigInt 31; // Shift up
}

Re:The winner is impressive. (1)

Rockoon (1252108) | more than 4 years ago | (#32042666)

You've got it almost right.

Your encoder is encoding the original 31-bit "words" from right to left, but your decode is decoding the original 31-bit "words" from left to right

Re:The winner is impressive. (1)

flargleblarg (685368) | more than 4 years ago | (#32046542)

It's not that simple. You can't use the full 31 bits per character, and therefore you can't use base-2 shifting. You can use *almost* 31 bits, but since the first 1114112 characters (the valid Unicode characters) are unavailable for use, you really only get ~30.99925 bits per character.

Re:The winner is impressive. (1)

selven (1556643) | more than 4 years ago | (#32042510)

1) Every "character" in Twitter, in this algorithm, can store one of 2146369536 values, since the other 1114112 run into problems.
2) 2^4339 2146369596^140
3) Thus, any 4339-bit integer in base 2 can be converted into a 140-bit integer in base 2146369596. To encode, do this conversion. To decode, do it backwards.

Re:The winner is impressive. (1)

selven (1556643) | more than 4 years ago | (#32042524)

Sorry, Slashdot removed my mathematical symbols. Number 2 should read "2^4339 is less than 2146369596^140"

Re:The winner is impressive. (1)

sys.stdout.write (1551563) | more than 4 years ago | (#32042730)

Sorry, Slashdot removed my mathematical symbols.

We should have a 1st International Longest Slashdot Post competition. Same rules as the Twitter competition, except you have to deal with Slashdot's draconian input stripping

Re:The winner is impressive. (1)

jnaujok (804613) | more than 4 years ago | (#32044142)

Congratulations, he discovered the Chinese Remainder Theorem [wikipedia.org] . While it's a reasonable encoding method, it offers little to no actual compression of the data.

It would make far more sense to first compress the data (LZW [wikipedia.org] for example)and then encode with CRT. That would give you about a 4600 4.5K ZIP file you could send. With typical 85% compression on English language files, that means the resulting output could be about 30K in length.

Re:The winner is impressive. (1)

Hatta (162192) | more than 4 years ago | (#32044384)

The contest required a scheme that would work for arbitrary data. Compressing random data with LZW can result in a file that's larger than the input.

Looking at that entry (2, Insightful)

SpazmodeusG (1334705) | more than 4 years ago | (#32042374)

I don't get it.
If they ask what can be arbitrarily stored in the 4339bits available then there you can store 4339 arbitrary bits. It's a rule of compression. If they are asking for an English language compression program there are plenty better out there. Also if the goal is compression of English text and they aren't including the program size in the tweet then the competition can easily be cheated using a dictionary in the program that can be looked up.

At the winner it's not a particularly good compression algorithm. It doesn't even seem to take bayesian probability of characters into account. I can't see any arithmetic coding (mathematically the perfect entropy encoder) either.

Re:Looking at that entry (0)

Anonymous Coward | more than 4 years ago | (#32042392)

At the winner it's not a particularly good compression algorithm.

It was good enough to win the competition. As such it is superior to any more complicated solution for the problem at hand.

Re:Looking at that entry (2, Interesting)

Thiez (1281866) | more than 4 years ago | (#32042426)

It is superior to the other algorithms that entered the competition. I find it interesting that the TFA doesn't mention how many submissions were made.

Re:Looking at that entry (4, Insightful)

Volguus Zildrohar (1618657) | more than 4 years ago | (#32042460)

You don't get it?

A weak, inexplicable imitation of earlier, better tech?

That's Twitter in a nutshell.

Re:Looking at that entry (4, Informative)

Zocalo (252965) | more than 4 years ago | (#32042576)

It does what is required of the competition. There are 2^4339 available bits in a valid tweet so the first algorithm takes any 2^4339 bit sequence and converts it into a valid tweet, the second converts it back again. What is missing is the means for generating that 2^4339 bit value and converting it back into the original content.

4339 bits is 542 bytes plus three spare bits, so if you wanted to actually use this for something you could use those three bits to define your data format from one of eight types, then "attach" your data payload to the header to generate the sequence of 4339 bits. Some ideas for the payload would be:
  • A sequence of 542 8bit characters
  • A sequence of 619 7bit characters + 3 padding bits
  • A sequence of 722 6bit characters + 4 padding bits
  • A Zip file equal to, or smaller than, 542 bytes
  • A GZip file equal to, or smaller than, 542 bytes
  • etc.
  • If using compressed files, you'd also need some way of dealing with spare bytes in the payload; either a decompressor that can ignore extra characters at the end of the file or a compressor that can manipulate the file to equal 542 bytes - using the comments field of the archive, perhaps.

Re:Looking at that entry (1)

binkzz (779594) | more than 4 years ago | (#32042748)

There are 2^4339 available bits in a valid tweet so the first algorithm takes any 2^4339 bit sequence and converts it into a valid tweet, the second converts it back again.

That's one heck of a compression algorithm! You could fit the entire internet in a single tweet! I think you're on to something, where can I invest my money?

Re:Looking at that entry (2, Informative)

Zocalo (252965) | more than 4 years ago | (#32042932)

Doh! That should be just "4339 bits" and not "2^4339 bits" which is a somewhat larger value, to put it mildly... I think you could theoretically describe a snapshot of the state of the entire universes in 2^4339 bits, and probably do so several times over as well, let alone the entire Internet. :)

Re:Looking at that entry (0)

Anonymous Coward | more than 4 years ago | (#32043180)

WTF is up with Slashdot's mini-port scan of my IP whenever I post?

They check if your IP is an open proxy and block your post if it is.

Re:Looking at that entry (1)

sabt-pestnu (967671) | more than 4 years ago | (#32046568)

Is 2^4339 bits actually 1337 code granules? c00lz!

Re:Looking at that entry (1)

tlhIngan (30335) | more than 4 years ago | (#32044528)

Or, 512 bytes plus pointers leading to next/previous "sectors" of data as metadata.

Now you're able to store an arbitrary file, and all you really need to know is the ID of the beginning. Or one of the pieces and you can then recover the file.

Sounds like a great way to store and spread files - TwitterShare! Like Rapidshare, but without the suck. And let the MPAA/RIAA battle it out with all the users.

Re:Looking at that entry (1)

egcagrac0 (1410377) | more than 4 years ago | (#32045348)

Or, you could display your 4339 bit number in base 36 to encode non-beautiful alphanumeric messages. (about 813 characters, but no capitalization, punctuation, nor whitespace)

ITA2 [wikipedia.org] (5 bit) does even better for some messages. 867 characters, but you lose some when you shift between modes (letters vs numbers/punctuation).

Re:Looking at that entry (1)

egcagrac0 (1410377) | more than 4 years ago | (#32045456)

I should note that yes, I know that doesn't leverage compression, and compression algorithms will do better.

Re:Looking at that entry (1)

AEton (654737) | more than 4 years ago | (#32045256)

The contest is to figure out a way to make more bits available.

It is not obvious that Twitter messages are always guaranteed to carry 4339 bits of information (which is why the original post announcing the contest offers only 4200 bits).

Any attempt to use "compression" as we usually understand it would be pointless because you can't always fit x bits of arbitrary data in an x-1 bit channel.

If it makes you feel any better, a lot of commenters didn't get it, either.

Re:Looking at that entry (1)

nacturation (646836) | more than 4 years ago | (#32046292)

No, they're asking how many bits you can get out of a 140 character tweet regardless of what you can encode into those bits. If they just type ASCII, that's 140 * 7 bits = 980 bits of info (I'm ignoring nonprintables for the sake of argument). Yeah, you could build a compression scheme on top of those 980 bits, but that's now what the competition is about. Through the use of Unicode characters and other tricks, someone managed to get over 4339 bits, averaging 31 bits per "character".

Erm ... (4, Insightful)

daveime (1253762) | more than 4 years ago | (#32042472)

Except for the fact the algorithms he has submitted have NOTHING to do with compression, and are just a method of mapping the 4339 bits into the allowable Unicode character set over 140 x 32 bit character "slots", i.e. encoding / decoding only.

With 4339 bits, hell in theory the longest actual tweet you could make is 2^4339 of any single character you choose, using the 4339 bits just to represent a (very large) counter of how many times to repeat the character.

Considering that 2^4339 is approximately 10^1305, and there are probably only 10^82 atoms in the whole universe, that's one bloody long tweet.

Re:Erm ... (0, Troll)

zarzu (1581721) | more than 4 years ago | (#32042504)

you're absolutely right. the only reason this contest exists is to make twitter headlines, so let's bury it.

Re:Erm ... (1)

pjt33 (739471) | more than 4 years ago | (#32042734)

Nah, you can do far better than that. "The character 'a' repeated Graham's number of times" is just a start...

Re:Erm ... (1)

dkf (304284) | more than 4 years ago | (#32043102)

Nah, you can do far better than that. "The character 'a' repeated Graham's number of times" is just a start...

But the Kolmogorov Complexity [wikipedia.org] of that is rather smaller. It's that which is limited by Twitter, not the eventual expanded size of the message.

Re:Erm ... (2, Insightful)

pjt33 (739471) | more than 4 years ago | (#32043584)

Yes, but that's not what GPP was talking about. Why on Earth would you assume that comments on /. would be on-topic, when that would require reading TFS? ;)

Re:Erm ... (0)

Anonymous Coward | more than 4 years ago | (#32044990)

-->Insert The whole of human knowledge here<--

Technically the most, totally useless though

Re:Erm ... (0)

Anonymous Coward | more than 4 years ago | (#32042896)

With 4339 bits, hell in theory the longest actual tweet you could make is 2^4339 of any single character you choose, using the 4339 bits just to represent a (very large) counter of how many times to repeat the character.

but you're not able to transmit arbitrary messages that way, so it's worthless. Read the linked blog post, the first comment makes the same mistake (and is promptly corrected).

The thing is, you can send at most (!) 4339 arbitrary bits on twitter in one go. You can sometimes pack longer messages and get them to less than 4339 bits, but that's an entirely different thing (especially since it assumes the existence of an unpacker on the recipient's computer, conveyed through other, less limited channels).

Re:Erm ... (2, Insightful)

zarzu (1581721) | more than 4 years ago | (#32043174)

yes, but there is no trick to mapping to 4339 characters, it's simply the thing you will be doing if you want the most arbitrary bits transmitted. it's like a contest to paint a car: you have a fixed size car with a known surface area and the target of the contest is to paint the biggest surface. now you can either simply paint the whole surface and you will have won or you're gonna do some crazy pattern painting that totally misses the point. now tell me why did we just have this painting contest?

Re:Erm ... (1)

Ambiguous Coward (205751) | more than 4 years ago | (#32045540)

Yes, but what if you put down two coats of paint? Hmm? What do you think of that, Mr. Smartypants?

Re:Erm ... (1)

daveime (1253762) | more than 4 years ago | (#32048568)

Depends on your definition of "arbitrary".

I was merely describing a very simple compression system, which does in fact define up to 2^4339 distinct messages. The fact that all the messages are composed of a single character does not detract from the theoretical maximum number of possibilities.

Of course, as your dictionary (character set) increases, then the number of your "arbitrary" messages becomes a lot less than this ...

If your character set is limited to the uppercase A-Z and a space (enough to transmit a simplistic English message, albeit without punctuation), and with zero redundancy, you can still encode about 904 characters on the basis 2^4339 / 2^4.8

As all text will have some redundancy, I'd put an upper bound at maybe 3000 characters using that 27 character subset. 1.44 bits per character is perfectly "doable" with todays compression algorithms.

Since when.... (0)

Anonymous Coward | more than 4 years ago | (#32042700)

Is impressive another word for fucking stupid?

I missed that memo i guess...

Hey boss! you're pretty impressive today!

Limits (2, Funny)

kiehlster (844523) | more than 4 years ago | (#32042832)

Ah, so someone has finally determined the absolute breadth of the Twittersphere. If the world ran on tweets, maybe we wouldn't ever need more than 64kB of memory.

16000bits (4, Funny)

M8e (1008767) | more than 4 years ago | (#32042890)

Solution - Just tweet the following picture of a swimming fish:

".`.`..`.>"

Given that 1 word is 16 bits, and a picture is equal to 1,000 words,
that makes my above tweet 16,000 bits of information (fitting
several pictures in a tweet may extend this further) :-)

(.)(.)

(.Y.)

d^_^b

48000bits!

A report from 4chan (1)

PlasmaEye (1128377) | more than 4 years ago | (#32046136)

Long tweet is looooooooooong.

The FIRST International Longest Tweet Contest? (1)

Protoslo (752870) | more than 4 years ago | (#32047198)

I eventually gave in and read TFA; they actually describe the winning algorithm...in the contest description. The contest was just to implement it (sort of). And (apparently) no one attempted to use the valid unicode characters as well. They just avoided them (like the contest bloggers) because they weren't sure that there wasn't some arbitrary string of characters that would mess up the message.

I suppose that the contest could continue on that basis alone: how many more bits can you encode by using the printable characters, without choking on arbitrary data? It is more likely that it will vanish in a poof of apathy instead, since no one bothered to do that this time.

Conen O brian joke (0)

Anonymous Coward | more than 4 years ago | (#32053110)

Twitter is down? I don't know what Ashton Kutcher is having for lunch!

I DON'T KNOW WHAT ASHTON KUTCHER IS HAVING FOR LUNCH!!!!!!

ALL THAT HAS BEEN AND ALL THAT WILL BE IS GONE FOREVER! REPENT SINNERS! REPENT!!!!

these characters here to bypass bullshit filter adfasdfasdf dfasdfasdf asdfasdfasdf asdfasd fasdfasdf asdf
Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...