Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Old Protocol Could Save Massive Bandwidth

Hemos posted more than 13 years ago | from the reduce-reuse-recycle dept.

News 287

GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."

Sorry! There are no comments related to the filter you selected.

More bandwidth is good (-1, Flamebait)

Cmdr (Fuck You) Taco (469621) | more than 13 years ago | (#2167414)

Give me more of this [goatse.cx] .

Re:More bandwidth is good (-1)

evil_spork (444038) | more than 13 years ago | (#2167435)

Congrats on the first post; I don't believe I can dispute this one. :)

Re:More bandwidth is good (0, Funny)

Big Brass Balls (257794) | more than 13 years ago | (#2167447)

Better not let Microsoft [goatse.cx] get a hold of it [goatse.cx] , otherwise, they might just screw things up [microsoft.com] .

Re:More bandwidth is good (-1)

Cmdr (Fuck You) Taco (469621) | more than 13 years ago | (#2167524)

Microsoft is cool.

Re:More bandwidth is good (-1)

Cmdr (Fuck You) Taco (469621) | more than 13 years ago | (#2167455)

No you can't. It' mine, mine alone and nobody else's. I'm feeling so special now.

Re:More bandwidth is good (0, Funny)

Big Brass Balls (257794) | more than 13 years ago | (#2167474)

I'm feeling so special now.

You sure are, considering you're this kind of special [specialolympics.org] .

Re:More bandwidth is good (-1)

Cmdr (Fuck You) Taco (469621) | more than 13 years ago | (#2167490)

Nobody asked for your family album.

Re:More bandwidth is good (-1)

Big Brass Balls (257794) | more than 13 years ago | (#2167492)

Indeed, nobody did. It's your family album that's on display there.

Re:More bandwidth is good (-1)

Cmdr (Fuck You) Taco (469621) | more than 13 years ago | (#2167504)

Watch this:

Fuck you.

What was it used for? (0)

Mike5558 (206879) | more than 13 years ago | (#2167423)

What was this protocol originally used for?

Re:What was it used for? (2, Informative)

andri (23774) | more than 13 years ago | (#2167433)

It is still used to encode SNMP packets, for example.

X.509 digital certs, among other things (2)

coyote-san (38515) | more than 13 years ago | (#2167500)

It's still used for many, many things. One of the major current uses is X.509 digital certificate encoding.

TCP/IP (0)

djhertz (322457) | more than 13 years ago | (#2167424)

How would this compare to TCP/IP? Just wondering.

Re:TCP/IP (1)

andri (23774) | more than 13 years ago | (#2167460)

It cannot be compared to TCP/IP, as ASN.1 is a syntax notation with various encoding rules (BER/CER/DER/XER), while TCP and IP are networking protocols.

Re:TCP/IP (0)

Anonymous Coward | more than 13 years ago | (#2167559)

I think they refer to BER, which is the most common. I have rarely seen any other being used in a real life.

Re:TCP/IP (0)

Anonymous Coward | more than 13 years ago | (#2167628)

The difference is pretty moot in this case. TCP/IP packet formats represent a syntax notation with various encoding rules.

Re:TCP/IP (1)

cREW oNE (445594) | more than 13 years ago | (#2167503)

Not.

Compressed TCP/IP over IP wouldn't be such a bad idea though, except that compression usually belongs in the application layer. Stuff like HTML compresses like crazy. Good thing mod_gzip usage is on the rise.

What? No way. (1)

GoogolPlexPlex (412555) | more than 13 years ago | (#2167425)

An XML has 99% of its information content redundant?

Re:What? No way. (0)

Anonymous Coward | more than 13 years ago | (#2167568)

Sure. Read some XML and you'll see this. < and > are minor by comparison with some of the ridiculously_stupid_long_names_for_tags used in some XML schemas.

Postum primus? (3, Funny)

hivolt (468311) | more than 13 years ago | (#2167439)

Sounds like a lossy compression program I heard about early April....it could compress to 0 bytes, if I remember correctly.

Re:Postum primus? (1)

re-Verse (121709) | more than 13 years ago | (#2167465)

Why the hell would i want a lousy compression format?

Plus if its so lousy it compresses to 0, it means its not there anymore, right?

Re:Postum primus? (1, Funny)

Anonymous Coward | more than 13 years ago | (#2167533)

Why the hell would i want a lousy compression format?

You're not the sharpest tool in the shed, are you?

Re:Postum primus? (0)

JeromeyKesyer (463790) | more than 13 years ago | (#2167556)

You're not the sharpest tool in the shed, are you

Well, the "tool" part of that sentence was correct.

Re:Postum primus? (0)

Anonymous Coward | more than 13 years ago | (#2167592)

Um.. That was an April Fools Day joke story.

Re:Postum primus? (2)

gallir (171727) | more than 13 years ago | (#2167645)

If it can compress to 0 bits, not only we can save lot of bandwidth transferring those 0 bytes, but also lot faster. Light-speed is only a limit if the transfered "thing" convey information, so we don't have such a limit.

Errr... just realised that most /. posts can be also transferred at higher speeds.

PS: did that information appear in early April? I missed it.

Re:Postum primus? (2)

NonSequor (230139) | more than 13 years ago | (#2167673)

Your Latin is incorrect. "Primus" should agree with "postum." It should be "postum primum."

Check this out! (0)

Anonymous Coward | more than 13 years ago | (#2167444)

You can compress your entire 40 Gig hard drive into only a few bytes!

format /q c:

100:1 text compression ? (1, Insightful)

mcspock (252093) | more than 13 years ago | (#2167445)

somehow i find it hard to believe that a method for compressing text at a 100:1 ratio has been buried away forever. standard compression programs get about 10:1 on text, you'd think that a better model would be incorporated if one existed.

Re:100:1 text compression ? (3, Informative)

cREW oNE (445594) | more than 13 years ago | (#2167481)

First....

200 BYTE (!) XML documents are pretty rare. They probably standarized a few headers and instead of sending they just send some code.

Don't believe for a second we're talking about a compression scheme here. The usual slashdot lack of information applies.

Re:100:1 text compression ? (1, Insightful)

Anonymous Coward | more than 13 years ago | (#2167654)

Of course we're talking about a compression scheme. It's just one for structured data and not for plain text files. Looking at some example XML files, they can clearly be compressed by some large amount - 2 orders of magnitude doesn't seem unreasonable when you have 40-character tag names.

Re:100:1 text compression ? (1, Informative)

Anonymous Coward | more than 13 years ago | (#2167658)

It is not the compression, but data representation.

In BER encoding, a integer that can fit into single byte takes two characters, where as in XML, it can take almost infinite number of bytes depending on number of tags and how they are nested.

Imagine.... (-1, Offtopic)

Anonymous Coward | more than 13 years ago | (#2167448)

a baowolf cluster of these....

Re:Imagine.... (-1)

evil_spork (444038) | more than 13 years ago | (#2167578)

Learn to spell "beowulf" you fuckwit AC.

Yes, I agree (0)

Anonymous Coward | more than 13 years ago | (#2167456)

the Code Red III worm should be coded is ASN.1. It's the least we can do to spare Microsoft further humiliation.

What about Code Red? (-1)

LoRider (16327) | more than 13 years ago | (#2167457)

Aren't we supposed to be talking about Code Red?

Come on people this is serious. Oh wait no it's not, it's just another M$ blunder.

They don't build 'em like they used to. (3, Interesting)

pjbass (144318) | more than 13 years ago | (#2167458)

When you look at it, it's pretty cool to see that protocols that go back many years (Ethernet for example) just keep coming back with positive results, and scale way beyond what they were ever intended for in their respective RFC. What happened to most current protocols developed recently? Exchange is one that comes to mind...

Re:They don't build 'em like they used to. (0)

Anonymous Coward | more than 13 years ago | (#2167576)

Well, it is space efficient, but very epensive for encoding/decoding. But with current processsor speeds, that is not that important anymore.

It (BER encoding, to be exact) has quite a few political flaws too, which makes it stupid sometimes.

200 bytes to 2 +/- (0, Redundant)

Drizzten (459420) | more than 13 years ago | (#2167461)

From the article: "Raw XML is very verbose -- it's not a good technology for the telecommunication of data unless you combine it with ASN.1," said Scott. "Together they can solve the problem without wasting bandwidth. An XML data set encoded into ASN.1 will be orders of magnitude less verbose than the raw XML." How much depends on the application, he said. "In one benchmark I have read about, a 200-byte message was reduced to 20 bytes with normal compression methods, but ASN.1 encoded it into just 2 bytes and a few bits," said [Bancroft] Scott.

Pretty impressive compression. Think anyone will reconsider it now?

Yeah... (0)

Anonymous Coward | more than 13 years ago | (#2167463)

...and I could compress the entire Library of Congress to one byte. The decompression algorithm would be a pain, though...

Hmmmm... (1)

disneyfan1313 (138976) | more than 13 years ago | (#2167464)

What could it compress CowboyNeal's Dinner into?

Re:Hmmmm... (-1)

evil_spork (444038) | more than 13 years ago | (#2167597)

I think CowboyNeal could use this compression for himself. Only problem is he'd still be fat as hell.

Re:Hmmmm... (0)

Anonymous Coward | more than 13 years ago | (#2167613)

The only thing that can compress Cowboy Neal's dinner is his tight young asshole as he takes a big stinky shit and cloggs the toilet.

ASN.1 is evil (1)

Alfred (16073) | more than 13 years ago | (#2167466)

Its tha devils spawn I tell ya.
Its extremely complex and hard to debug.

The whole reason the net has taken off so quickly is the simple, open and clear protocols used. You need to debug your email server? Just telnet in and talk to it! With ASN.1 you need a compiler to make each damn data packet.

Its a case of a trade off between bandwidth and computing power. ASN.1 requires CPU (and lots of debugging) while HTML,etc require bandwidth :)

Re:ASN.1 is evil (1)

cREW oNE (445594) | more than 13 years ago | (#2167532)

That's why the protocol and compressing the payloud should be seperate. Like the different HTTP encoding schemes.

I'm all for compression especially since it saves bandwidth (money) AND is generally way faster then transferring the uncompressed payload. But that compression should not get in the way of maintanability and extensibility of a protocol. That has proven to be a bad idea(tm) in the past.

Re:ASN.1 is evil (1)

Mr. Barky (152560) | more than 13 years ago | (#2167546)

Its a case of a trade off between bandwidth and computing power. ASN.1 requires CPU (and lots of debugging) while HTML,etc require bandwidth :)

Yes, but which is the most limited in most situtations? I'd say bandwidth is the limiting factor in most cases.

Re:ASN.1 is evil (1)

Alfred (16073) | more than 13 years ago | (#2167574)

That is why they are pushing ASN.1 for wireless apps, but for the internet at large I contend that the complexity introduced by ASN.1 (and debugging problems) greatly outweigh the bandwidth benefits, especially if you consider using a seperate compression layer.

Re:ASN.1 is evil (3, Informative)

eigenhead (245821) | more than 13 years ago | (#2167619)

Its tha devils spawn I tell ya. Its extremely complex and hard to debug.

Having worked with ASN.1 and CMIP I can certainly state that most examples for ASN.1 data types I've seen (M3100 and that lot) are far too complex (too many CHOICE, ANY values). But I still think ASN.1 and BER/PER are a decent way to efficiently encode data in a platform-independent manner. ASN.1 data types can be really simple or really complex, so blame the designers defining complex types in ASN.1 not the notation itself.

The whole reason the net has taken off so quickly is the simple, open and clear protocols used. You need to debug your email server? Just telnet in and talk to it! With ASN.1 you need a compiler to make each damn data packet.

I think it is only fair to state that a lack of good (I mean open and free, of course) ASN.1 decoders/encoders contributes to the lack of widespread adoption of technologies like ASN.1. Not that tools like SNACC are all that bad, but were good tools around in the early days of ASN.1? Certainly CMIP never had good free toolkits.

The standards bodies play a role here. Making sure you advocate for your standard early on and doing your best to promote good open reference implementations goes a long way towards helping a standard gain widespread adoption.

I think SNMP is a good example of how ASN.1 can be used effectively. Just because ASN.1 allows for complex types doesn't mean people have to build complex types into their standards/protocols.

I'm growing tired of the "I've got the world on a String" school of data typing ;->

Sometimes efficient, compact encoding/decoding is just what the solution calls for, whether it is ASN.1 BER/PER or the OMG IDL using CDR.

Not likely... (-1)

Brad Andrews (18226) | more than 13 years ago | (#2167470)

The compression seems to be due to a pre-existing tokenized form of data rather than some universal ".zip for your web page" type thing. If I figure out that I can query your server faster by throwing a rock at you and asking how it is than by remote administration tools, does that mean I've created some miracle compression and should be able to deliver full motion video if I just use a bigger rock?

Typo. (1)

mborland (209597) | more than 13 years ago | (#2167471)

For God's Sake, please fix the typo. -20- bytes, not 2. Jeezis.

not quite (1)

OO7david (159677) | more than 13 years ago | (#2167494)

"In one benchmark I have read about, a 200-byte message was reduced to 20 bytes with normal compression methods, but ASN.1 encoded it into just 2 bytes and a few bits,"

Re:not quite (2, Funny)

thejake316 (308289) | more than 13 years ago | (#2167536)

Well, in one benchmark my friend's sister told me about, a friend of a 200-byte message was compressed to 2 bytes and a few bits when he crashed his car into a tree, but they never found his eyes, so they think he always had two glass eyes but never told anybody. True story, ask anyone.

Re:not quite (1)

Waffle Iron (339739) | more than 13 years ago | (#2167624)

"In one benchmark I have read about, a 200-byte message was reduced to 20 bytes with normal compression methods, but ASN.1 encoded it into just 2 bytes and a few bits,"

I can do better than that. How about my algorithm that can compress the whole Bible into 1 bit:

if ($msg[0] & 0x80) {
STDOUT << $king_james_version;
$msg[0] &= 0x7f;
}
STDOUT << @msg;

Re:Typo. (0)

Anonymous Coward | more than 13 years ago | (#2167496)

Listen up, sunshine:

In one benchmark I have read about, a 200-byte message was reduced to 20 bytes with normal compression methods, but ASN.1 encoded it into just 2 bytes

2 bytes.

Re:Typo. (1)

anotherbadassmf (159050) | more than 13 years ago | (#2167514)

It was regular compression that made it 20 bytes. With ASN.1 it was ~2 bytes.
Anyway, these numbers don't mean anything when it's mentioned so flippently without the actual original XML.

What a bunch of marks. (1)

thejake316 (308289) | more than 13 years ago | (#2167475)

Hey, /. staff, you better check your backs for chalk. Yeah, they can compress a 200 byte xml doc to 2.5 bytes, if it's something like two tags and a bunch of spaces. Get real.

1st post (0)

Anonymous Coward | more than 13 years ago | (#2167478)

my 1st post!

mod_gzip ? (4, Informative)

AdamInParadise (257888) | more than 13 years ago | (#2167480)

Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...

Hmm, I knew it would come back.... (0, Offtopic)

digitalamish (449285) | more than 13 years ago | (#2167482)

Just like bellbottoms or skinny ties. I knew all those token ring cards would come back in style.

---
No 1's were harmed in the typing of this post.

Re:Hmm, I knew it would come back.... (0)

Anonymous Coward | more than 13 years ago | (#2167637)

I'm interested in upgrading my 28.8 kilobaud internet connection to a 1.5 megabit fiberoptic T1 line, will you be able to provide an IP router that's compatable with my token ring/ethernet LAN configuration?

Hello, haven't we read Comer's book? (4, Interesting)

Karpe (1147) | more than 13 years ago | (#2167487)

I believe it was Internetworking with TCP/IP, or perhaps Tanenbaum's Computer Networks, and the "conclusion" of the chapter on SNMP (which uses ASN.1) was that today, it is much more important to make protocols that are simple to handle, than stuff that conserves bandwidth at the price of performance, since the "moore's law for bandwidth" is stronger than the "moore's law for cpu power". You could use (and already uses) compressed communication links, anyway.

This is the same philosophy of IP, ATM, or any modern network technology. Simple, but fast.

Re:Hello, haven't we read Comer's book? (0)

Anonymous Coward | more than 13 years ago | (#2167604)

Is the bandwidth increase supposed to be *faster* than double every 18 months? Because by my reckoning it's about exactly that [14.4k back in 1993, 1M DSL today].

The reason textual protocols are supposedly better than binary protocols is not CPU time, it's engineer time.

bandwidth is cheap (2, Insightful)

Proud Geek (260376) | more than 13 years ago | (#2167491)

So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.

Re:bandwidth is cheap (0)

Anonymous Coward | more than 13 years ago | (#2167539)

So just use a packet sniffer with a built-in decoder.

Re:bandwidth is cheap (0)

Anonymous Coward | more than 13 years ago | (#2167552)

So why don't you just try and make me.

bandwidth is cheap? On what planet? (2)

Carnage4Life (106069) | more than 13 years ago | (#2167601)

So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.

You're kidding right? Most CS people I know cringe at the fact that XML can more than double the size of a document with largely redundant tags. The only thing to be thankful for is that the documents typically compress very well due to the large number of redundant tags and that HTTP 1.1 supports compression especially know that XML over HTTP (i.e. web services) is being beaten to death by a lot of people in the software industry. Numerous [xml.com] articles [irt.org] about [att.com] XML compression [xml.com] also tend to disagree with you that it is not an issue.

PS: If bandwidth is so cheap how come DSL companies are going out of business and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day.

Re:bandwidth is cheap (1)

nitromuriatic (254088) | more than 13 years ago | (#2167614)

Is bandwith plentiful to the point that using more is cheaper then making things more efficient? can more useful programs be delivered with current bandwidth capabilities and budget by spending less time of making communications less effficient but working more on front-end functionality? In many situations we have already witnessed where efficiency has been sacrificed somewhat for ease of development (Python,VB,OpenMP, etc.). Is the web not at the point where bandwith will be sacrificed for ease of development? How does the need for standards effect this as it does not affect other realms? Sitting at the end of a low-quality rural dialup, I'm not yet convinced it's time to give up on efficiency on the internet.

Re:bandwidth is cheap (4, Informative)

Jeffrey Baker (6191) | more than 13 years ago | (#2167641)

at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.

Translated:

My debugging tools are inadequate, and my brain is inadequate for improving them.

You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal [ethereal.com] ! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 [ethereal.com] because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone [unigone.com] .

Use your CPU, not your eyeballs!

Re: Leave compression to the hardware (2)

willy_me (212994) | more than 13 years ago | (#2167644)

I agree, leave XML uncompressed. Let modems compress the data - it might not be as efficient but it keeps things simple.

Willy

More data, please! (1)

MotownAvi (204916) | more than 13 years ago | (#2167495)

Compressing a 200-byte XML file down to two bytes may be impressive, but with all the overhead of XML (doctype tags, etc), that's pretty much an empty file. I'd love to see how this performs on a larger data file of, say, megabytes in size.

Avi

ASN.1 "compression" vs XML (3, Insightful)

Bruce Perens (3872) | more than 13 years ago | (#2167497)

What we're really saying here is that XML is a very verbose protocol, and that ASN.1 isn't. But verbosity, or lack thereof, is hardly unique. Also, there is no compression claim here - only the difference in verbosity.

ASN.1 uses integers as its symbols. Remember the protocol used for SNMP? Did you really like it? It's not too human-readable or writable.

Also, the idea of promoting it through a consortium is rather old-fashioned.

Bruce

Re:ASN.1 "compression" vs XML (-1)

Cmdr (Fuck You) Taco (469621) | more than 13 years ago | (#2167543)

Dear Mr Perens,

Fuck You.

Re:ASN.1 "compression" vs XML (2)

Jeffrey Baker (6191) | more than 13 years ago | (#2167668)

Bruce, I had to flame the guy a few posts up from you, but he has a 6-digit slashdot userid. Nobody cares how obtuse the wire encoding is because here in the Cenozoic era, we have learned to walk upright and also to use labor-saving software to analyze our protocols. My favorite is ethereal [ethereal.com] but you might like to browse some others [appwatch.com] .

Yes - html and xhtml are ok too (1)

matek (101962) | more than 13 years ago | (#2167508)

According to the linked story, ASN can easily be used to encapsulate html and other text-formats.

That's what makes it beatyfull!

Yuck!!! (0)

Anonymous Coward | more than 13 years ago | (#2167513)

Same stuff used in SNMP, Have you seen or tried to describe an object with it??? Fine it's small... It's like saying yeah there was a version of MS Word for the 128k Mac. Does anyone want to use it it's small???? Get that last mile bandwidth up and who cares.

Multimedia? (3, Interesting)

starseeker (141897) | more than 13 years ago | (#2167517)

Isn't most of the bandwith on the internet is consumed by multimedia - images, music files, and the odd video? I have seldom encountered an html file larger than a meg, and even those are in my experience very rare.

Yes, it would be nice to make the internet move faster with current technology, and I would support this for people on very slow connections. It might also be a boon for servers that get hit hard and often (though I doubt it would stop the Slashdot effect ;-) For the majority of single use internet concerns, however, I just don't see this doing a whole lot.

Of course, I hope I'm wrong. More effective bandwith is a Good Thing.

Re:Multimedia? (0)

Anonymous Coward | more than 13 years ago | (#2167611)

How many XML services do you use per day?

One or two maybe?

Assuming XML-based services start becoming popular, XML size is going to become a bigger issue. Until then, of course, it's a non-issue.

ASN.1 not suitable (5, Informative)

cartman (18204) | more than 13 years ago | (#2167523)

ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.

Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?

It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.

ASN.1 was designed to be efficient (4, Informative)

Anonymous Coward | more than 13 years ago | (#2167526)

If I remember the history right, ASN.1 was designed during the era of X.25 and charging for every 64 byte packet. I used to use ASN.1 for remote communications in a commercial product, but later changed it to a hybdrid of CORBA and XML, mostly due to more modern techologies, and since the actual bandwith did not cost that much anymore, it did not make sense to keep an old protocol alive. ASN.1 has it's drawbacks too--8 different ways to encode a floating point number. It was a political reason, because everyone involved wanted their own floating point format included, and as a net result, everyone has to be able to decode 8 different formats. A encoding designed by a committee (a stoneage telcom committe as a matter of fact).

Ouch (0)

Anonymous Coward | more than 13 years ago | (#2167527)

ASN.1 (and BER encoding) is a bitch to develop with. In my experience, at least.

Chers,

--fred

Missing the point? (2, Insightful)

MikeyNg (88437) | more than 13 years ago | (#2167547)

Bandwidth is cheap now, but it may not be forever. Yes, we'll most likely continue to see order of magnitude increases for years and decades to come, but it'll slow down sometime.

Also, consider wireless devices. Their bandwidth isn't there right now, and maybe with 3G we'll see a nice increase, but I can see that as a practical application for this type of compression.

Let's also not forget that even though it's compressed, you can always uncompress it into regular old XML to actually read it and understand it, for you human folks that actually need like LETTERS and stuff! That's it. I'm just going to start writing everything in integers soon. Time to change my .sig!

Decoding (1)

Sawbones (176430) | more than 13 years ago | (#2167553)

I don't quite see how this could be decoded perfectly on the other end. I mean suppose I have a single node xml document:

<docroot>
<Node_of_type_crap> hi </Node_of_type_crap>
</docroot>

But NEED that node to be named "Node_of_type_crap" on the end of whatever I'm transmitting it to (rather than some arbitrary bit value) that information is going to have to be transmitted eventually and that will take up space. Not saying this won't be a huge bandwidth saver, but the 200 bytes -> 2 bytes compression can't be that common.

HTML could be compressed (2, Flamebait)

Restil (31903) | more than 13 years ago | (#2167557)

What you would lose is the readability. Any symbol in an html file could be reduced to a byte or less depending on the total number of symbols used. Consider a 80 character line of text with
each character a different color. For each character you'd need data approxately equal to:

a

This entire sequence could be compressed into 4 bytes or less, but you would require an html compiler instead of coding it by hand (unless you're one of those crazy people that prefer coding opcodes straight over using C).

The issue with html, and the reason why we don't worry about the inefficiency much is the fact that you could have a rather extensive html file with one link to a single picture, and that picture would easily take up the space of the entire html file.

-Restil

Yuck... (0)

Anonymous Coward | more than 13 years ago | (#2167562)

ASN.1 is horrible. It's horrible to understand, horrible to implement, and horrible to try to decipher in a packet dump.

Bandwidth Versus Computational Effort (2, Insightful)

DougM (175616) | more than 13 years ago | (#2167573)

When the web was lots of static pages and images, and bandwidth was scarce, compression made sense.

With the current over-supply of domestic bandwidth and the move to database-driven, customised web sites, is it worth spending CPU cycles compressing small data files on-the-fly?

Most popular websites don't suffer from poor connectivity -- they suffer from too little back-end grunt.

Re:Bandwidth Versus Computational Effort (1)

cREW oNE (445594) | more than 13 years ago | (#2167602)

With the current over-supply of domestic bandwidth and the move to database-driven, customised web sites, is it worth spending CPU cycles compressing small data files on-the-fly?

Absolutely. Positively. 100% YES!

Say you have a fairly large HTML page. (Take, em... slashdot.) You can compress it from, say, 100Kb to about 11Kb in a fraction (a 800Mhz P3 does it in 0.009 seconds) of a second. That saves an enormous amount of bandwidth and speeds up your browsing too. Definitely worth it.

Re:Bandwidth Versus Computational Effort (2)

donglekey (124433) | more than 13 years ago | (#2167653)

Imagine starting your own website. When you are paying for bandwidth on a site that has a >100KB front page (like slashdot on my configuration) then it is definitly worth it. Not everyone is on broadband and many people won't be for a long long time. Saving bandwidth is always good, whatever the situation. And besides, many many page serves can be had (10,000 a day) off a very inexpensive computer (K6-2 400 Mhz) even on a complex website (scoop driven).

ASN.1 resources on the web. (3, Informative)

gd23ka (324741) | more than 13 years ago | (#2167577)

Actually ASN.1 is a formal way of specifying how to encode data into binary representations like BER, CER, DER and PER which do save bandwidth compared to XML.

Those of you that want to find out more about ASN.1, can pick up free e-books on ASN.1 here [oss.com] . There's some blatant propaganda in them for OSS Nokalva's ASN.1 compiler, but of course there's also snacc [gnu.org] , an GPL'd open source ASN.1 compiler. Snacc however only generates code for encoding to BER, so you might also want to check out the a hacked version [qut.edu.au] of snacc from Queensland University of Technology.

ASN.1 is a base technology for a lot of standards out there like X.509, PKCS and LDAP, the OSI application layer protocols etc.

bah (0)

Anonymous Coward | more than 13 years ago | (#2167581)

Considering that most XML documents are 99% empty space and .99% repeated tokens, I'm not particularly impressed.

And the two bytes are... (0, Troll)

hayz (160976) | more than 13 years ago | (#2167583)

$34 $32

(Actually, it's possible to compress *much* more information into these two bytes.)

Oh crap. (1, Insightful)

G-funk (22712) | more than 13 years ago | (#2167584)

This is just crap. Let's say it's two bytes and 2 bits. That means that it can only describe 2^20 different files. With 200 bytes to play with, you can have around 80^200 different xml files (80 was pulled from my ass, 2 uppercase + lowercase + symbols).

Let's put it this way. 2.5 out of 200 is 80. that means .0125% of all 200 possible byte files, can be compressed down to 2.5 bytes, and that's providing perfect compression.

I'm sure that with the right sample file LZH will compress it down to just a few bytes too.

Better than that... (1, Offtopic)

jmv (93421) | more than 13 years ago | (#2167593)

I can do much better with "delete" and "undelete". DOS rules! It had filesystem compression (format/unformat) long before the others.

Reverse Engineer hax0r3d! (4, Funny)

TroyFoley (238708) | more than 13 years ago | (#2167596)

I figured it out. They do it by removing the data pertaining to popup/popunder banners! 100 to 1 ratio seems about right.

Totally misses the point (5, Insightful)

coyote-san (38515) | more than 13 years ago | (#2167599)

This idea totally misses the point.

ASN.1 achieves good compression because the designer must specify every single and parameter for all time. The ASN.1 compiler, among other things, then figures out that that "Letterhead, A4, landscape" mode flag should be encoded as something like 4.16.3.23.1.5, which is actually a sequence of bits that can fit into 2 bytes because the ASN.1 grammar knows exactly how few bits are sufficient for every possible case.

In contrast, XML starts with *X* because it's designed to be extensible. The DTDs are not cast in stone, and in fact a well-behaved application should read the DTD for each session, and only extracting the items of interest. It's not an error if one site decides to extend their DTD locally, provided they don't remove anything.

But if you use ASN.1 compression, you either need to cast those XML DTDs into stone (defeating the main reason for XML in the first place), or compile the DTD into an ASN.1 compiler on the fly (an expensive operation, at least at the moment).

This idea is actually pretty clever if you control both sides of the connection and can ensure that the ASN.1 always matches the DTD, but as a general solution it's the wrong idea at the wrong time.

Re:Totally misses the point (1)

p3d0 (42270) | more than 13 years ago | (#2167649)

I can't believe I just used up my last mod point before reading this. This is the most informative article I have seen here in a long time.

Missing the point as to why XML is good (4, Insightful)

Eryq (313869) | more than 13 years ago | (#2167615)

XML, by virtue of being text-based, may be easily inspected and understood. Sure, it's a little bulky, but if you're transmitting something like an XML-encoded vCard versus an ASN.1 encoding of the same info, the bulk is negligible.

Yes, for mp3-sized data streams, or real-time systems, there would be a difference. But many interesting applications don't require that much bandwidth.

ASN.1 achieves its compactness by sacrificing transparency. Sure, it's probably straightforward enough if you have the document which says how the tags are encoded, but good documentation of anything is rare as hen's teeth, and not all software companies are willing to play nice with the developer community at large and share their standards documents. And some of them get downright nassssssty if your reverse engineer...

Transparency is one of the reasons for the rapid growth of the Web: both HTML and HTTP were easy enough to understand that it took very little tech savvy to throw up a website or code an HTTPD or a CGI program.

Transparency and extensibiliy also make XML an excellent archival format; so if your protocol messages contain data you want to keep around for a while, you can snip out portions of the stream and save them, knowing that 10 or 15 years from now, even if all the relevant apps (and their documentation) disappear, you'll still be able to grok the data.

It is not the bandwidth that is so important.. (0)

Anonymous Coward | more than 13 years ago | (#2167616)

ASN.1 (BER encoding I assume) is very space efficient from X.25 days, at least when compared to XML. But if XML is a reference point, so is CORBA or Sun RPC encoding--even if they take more space than BER, they are much more CPU friendly encodings.

XML's primary strength lies on the fact that is very friendly format to connect "loosely-connected" systems togetther. With all the other formats (ASN.1, CORBA, RPC) the systems have to agree exactly what is the data format. Whilst with XML, the format can vary over the time, and the systems can still understand each other.

ISO not free (as in beer) (1)

Coot (87864) | more than 13 years ago | (#2167622)

In order to fully analyze the ASN.1 standard, you have to have a copy of the standards documents [asn-1.com] and read them. Unfortunately, to do that, I have to pay ANSI/ISO [ansi.org] several hundred US dollars. OK, the C and C++ standards are available from ANSI for only 18 USD, but the other standards are much more ... in fact, go to the search page [ansi.org] and search for ASN.1 ... see for yourself.

W3C and Internet STDs and RFCs are freely (as in beer and as in speech) available. This is partly why many of them are so widely adopted.

If the ASN.1 folks want their standards widely adopted, they first have to make it easy and cheap to get copies of the standards.

Will we have 'hardware accelerated' modems? (1)

beowulf_26 (512332) | more than 13 years ago | (#2167635)

From the looks of today's news, it seems that we need to tackle bandwidth issues from both ends (please don't flame me for being too obvious). After reading the NasaWatch article about streaming HDTV, Covad filing for bankruptcy, and finally the rather negative comments to this networking protocol, it seems that we've got a long way to go.

Obviously, we need more accessible fat pipe and larger bandwidth, which means these things need to be cheaper. Thankfully, with the advent of high-power round lasers (featured in last month's Wired if I'm not mistaken) the equipment for routing optical lines, will become much less complicated and FAR cheaper. Which means greater accessiblity to broadband and probably a better environment for high-speed providers.

The second end seems to be developing a new STANDARD protocol. Current ones while being fairly open and without need for debugging are nice, but seem rather inefficient. If everyone can agree on a compression scheme for the internet, what is the possiblity of seeing hardware accelerated modems? Will we have something akin to hardware DVD decoders, or GeForce 3's for our net access?

If any of you know of current movements for such technology, I know I'd be interested to hear about them, and I'm sure your fellow /. readers would as well.

ASN.1? (0)

Anonymous Coward | more than 13 years ago | (#2167667)

You've got [alvestrand.no] to be kidding [slashdot.org] .

Compare SNMP, LDAP, and Kerberos (if you've ever worked with implementations of them) with SMTP and NNTP.

ASN.1 -- excellent choice (4, Informative)

ciurana (2603) | more than 13 years ago | (#2167672)

Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.

ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.

ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.

An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).

I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.

http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.

Cheers!

E
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?