Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Does the World Need Binary XML?

michael posted more than 9 years ago | from the using-gzip-would-be-too-easy dept.

Programming 481

sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."

Sorry! There are no comments related to the filter you selected.

A GNAA FP (-1, Troll)

Anonymous Coward | more than 9 years ago | (#11364101)

the world needs binary gay niggers

For Starters (2, Insightful)

Nom du Keyboard (633989) | more than 9 years ago | (#11364112)

what can be done to make XML better, faster and stronger.

For starters, keep Microsoft out of it.

Re:For Starters (1)

jrm228 (677242) | more than 9 years ago | (#11364196)

Good point, because we're all better off if the world's biggest and most influential software vendor makes their own standards without any external input. Not too bright.

Re:For Starters (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#11364227)

You must be new here...perhaps you just aren't aware of the fact that Microsoft has repeatedly warped standards to their own uses, ruining the "standard" part of it and the benefits therein.

Re:For Starters (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#11364236)

And thats different from microsoft rewriting the standard to suit them and forcing the industry along with MS-XML because "its a standard"?

Re:For Starters (3, Insightful)

Soko (17987) | more than 9 years ago | (#11364265)

Agreed.

However, let me re-phrase the grandparent:

"For starters, make sure Microsoft can't extend it to lock out compeditors in some way."

Better?

Soko

Re:For Starters (1)

jrm228 (677242) | more than 9 years ago | (#11364455)

Figured that's what you meant, but I'd rather not propel Slashdot's typical propensity for anti-company/anti-MS bias, especially in a technology thread.

Re:For Starters (0)

Anonymous Coward | more than 9 years ago | (#11364222)

You know that Jean Pauli, a co-founder of XML, works for Microsoft right?

Re:For Starters (4, Interesting)

Omega1045 (584264) | more than 9 years ago | (#11364249)

Why? Microsoft has done a fairly good job promoting XML and SOAP XML Web Services. As long as they stick to the standards (yes, I know) I see no reason to keep them out.

IBM has actually tried to introduce some goofy stuff into the XML standards, like line breaks, etc, that should not be in a pure node-based system like XML. Why are not you picking on them in your comment?

As far as SOAP and XML Web Services (standardized protocols for XML RPC transactions) Microsoft was way ahead of the pack. And I rather enjoy using their rich set of .NET XML classes to talk to our Unix servers. It helps my company interop.

Re:For Starters (2, Insightful)

leerpm (570963) | more than 9 years ago | (#11364267)

Good idea. Without Microsoft's support from their tools division, this idea will be dead on arrival..

Re:For Starters (1)

Comatose51 (687974) | more than 9 years ago | (#11364393)

Their .Net XML components are pretty damn nice. It makes parsing XML really easy. The ability to save Office documents as XML is really nice as well. So far, Microsoft has only helped spread the usage of XML.

Then what (2, Funny)

chris_mahan (256577) | more than 9 years ago | (#11364124)

Then what happens, do you base64 the binary xml and wrap it in an ascii xml document?

Two words. (1)

Dasein (6110) | more than 9 years ago | (#11364287)

DIME attackments.

Then we wrap it again, that's what! (4, Funny)

Tackhead (54550) | more than 9 years ago | (#11364407)

> Then what happens, do you base64 the binary xml and wrap it in an ascii xml document?

Of course not! That's not XML!

<file=xmlbinary> <baseencoding=64> <byte bits=8> <bit1>0 </bit><bit2>1 </bit><bit3>1 </bit><bit4>0 </bit><bit5>1 </bit><bit6>0 </bit><bit7>0 </bit><bit8>1 </bit> </byte>
<boredcomment>(Umm, I'm gonna skip a bit if y'all don't mind)</boredcomment>
</baseencoding> </file>

Now it's XML!

Binary = Proprietary (0)

Anonymous Coward | more than 9 years ago | (#11364127)

This will kill XML

Re:Binary = Proprietary (1)

taybin (622573) | more than 9 years ago | (#11364187)

Just like binary killed jpeg? Or ELF? Please. Binary != proprietary.

Re:Binary = Proprietary (3, Insightful)

Adhemar (679794) | more than 9 years ago | (#11364282)

Of course binary doesn't equal proprietary. Those are two completely different concepts.

PNG is a binary format. It isn't proprietary, though. And although I can't immediately find a text-based proprietary format, such formats are not impossible (although arguably easier to reverse-engineer than binary proprietary formats).

But if the XML is really such a problem, I suggest the simple solution. Compressing XML with a simple and open algorithm like gzip or bzip2, is the way to go. XML usually compresses very easily.

Re:Binary = Proprietary (1)

Adhemar (679794) | more than 9 years ago | (#11364353)

But if the XML is really such a problem
was meant to be
But if the
size of XML is really such a problem

And

XML usually compresses very easily.
should rather be
XML usually compresses very
well.
Yes, I did preview. I just did it blindly.

Re:Binary = Proprietary (2, Insightful)

Austerity Empowers (669817) | more than 9 years ago | (#11364447)

That's the dumbest statement I've ever heard.

As long as it's standardized, the standard is freely available to anyone who wants it, it does not depend on an external library, and it is unencumbered by any sort of patent, it isn't proprietary.

I hate XML right now because of all the string processing and parsing. Text is a sloppy way of defining something, and it begets lots of big processing libraries. It's OK for big PC memory hog apps, but I can't build a small enough one that is still robust enough to want to integrate it into the work I do (small, compact stuff). I find myself doing other, backwards things, or worse, fracturing XML into useable subsets. It somewhat defeats its utility.

Binary XML sounds like a great idea to me, as long as we're clear on a few things. One, it has to be totally documented in a standard (see above for my definition). Two, the standard must define a tool that can read an XML file and say "Yes this is XML" or "No, this is some [microsoft] non-compliant crap". Three, keep it simple: no compression, no outside library dependencies, no cruft.

If those things cannot be achieved then it will not reach maximum utility and something proprietary will swoop down and take over (*cough* microsoft *cough*).

The solution is clear... (3, Funny)

LordOfYourPants (145342) | more than 9 years ago | (#11364128)

Use the Z-modem protocol between Information Superhighway routers to compress the plaintext.

No, and this is the EXACT reason why... (0, Troll)

(TK9)Dessimat0r (672412) | more than 9 years ago | (#11364130)

_ _ _ _ _ _ __ _..._ ALL YOU FUCKING SLASHDOT USERS
_ _ _ _ _ _ .-' . . '-. THIS FUCKING PENISBIRD SHITS
_ _ _ _ _ _/. . ._ . ._\ DOWN YOUR NECK INTO YOUR STOMACH
_ _ _ _ _ /. . .(o) ./__) WHERE THE SHIT BURNS FOR THE REST OF
_ _ _ __ /. . .,_ . .| '| YOUR SHORT AND PATHETIC LIFE
_ _ _ _ |. . ./ .\ . /_/
_ _ _ _ /. . .`"`" . .} IT THEN GRIPS ONTO YOUR COCK WITH ALL ITS MIGHT
_ _ __ /. . . . . . . { AND INJECTS VARIOUS MUTAGENS INTO YOUR BLOODSTREAM THROUGH
_ _ _ /. . . . . . . .} ITS RAZOR-SHARP CLAWS WHERE IT REACTS WITH YOUR
_ __ /. . . . .\/\ /\ { VAST RESERVES OF FAT AND BLUBBER
_ _ |. . . . . .;``"``\
__ /. . . . . . / ; ; ;| NOBODY IS SAFE FROM THE PENISBIRD, AND IT
_ |. . . . . . / ; ; ; | FUCKING HATES ALL SLASHDOT USERS
_ \ . . . ._.-`|; ; ; ;|
_ /`-..--`` a a| ; ; ; | YOU ARE NEXT, YOU FUCKING FAT, FILTHY PIG
_|a a a a a a a|; ; ; ;|
_| a a a a a a | ; ; ; /_ _ _ _ ,--........,, FUCKING POST, YOU FUCKING
_|a a a a a a / ; ; ; ; _ _ _ .' . . . . . -='. BASTARD ASCII.. I CAN'T BELIEVE
_| a a a a a / ; ; ; / _ _ _ _\ . . . . . . . : THIS FUCKING STUPID LAMENESS
_|a a a a a/` ; ; ; \ _ _,==" .\ . . . . . . .' FILTER, WHAT AN ARSEFUCKING COCKLORD
_\ a a a .'. _ ,._'\.\~" o //` .\. . . . . .'
_|a a a.___~' \ \-~| | o ./,\.` .\. . . _.' WHAT KIND OF SHIT NAME
p|; a a/ _|.-~'| |o| |. . . . ,-''\..--' IS LAMENESS FILTER ANYWAY
p| _..-'"'. . .| | | |. . _="`
pp~ . . . \\ . | | / /_="` WHAT THE FUCK? MORE LIKE TROLL FILTER
ppp. . . ./,\ / /_,)") FUCKING CMDRTACO, YOU FUCKING FAT BASTARD
pppp . . ._,.-)")
pppp__,=~"| ===============
ppppp|; .;| Penisbird/. 1.3
pppp | y .| ===============
pppp |;|\ |
ppp_ |/' \| LETS GET IT ON, MOTHERFUCKERS.

Trollkore
"I hate you, I hate your country, and I hate your face!"


Important Stuff # Please try to keep posts on topic. # Try to reply to other people's comments instead of starting new threads. # Read other people's messages before posting your own to avoid simply duplicating what has already been said. # Use a clear subject that describes what your message is about. # Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything, even moderated posts, by adjusting your threshold on the User Preferences Page) Important Stuff # Please try to keep posts on topic. # Try to reply to other people's comments instead of starting new threads. # Read other people's messages before posting your own to avoid simply duplicating what has already been said. # Use a clear subject that describes what your message is about. # Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything, even moderated posts, by adjusting your threshold on the User Preferences Page)

Step 1 to getting binary XML (2, Insightful)

Anonymous Coward | more than 9 years ago | (#11364139)

Binary XML = zip file.xml > file.xml.zip
Thats all you need. XML compresses great.

Re:Step 1 to getting binary XML (0)

Anonymous Coward | more than 9 years ago | (#11364318)

zip file Why??? it sucks and is closed.

bzip or Gzip. bzip is smaller and faster and 100% open and free. I can make a product with bzip compression and sell it for 20 trillion dollars a copy.

fools use closed or patent encumbered compression.

and Zip is exactly that

Re:Step 1 to getting binary XML (3, Insightful)

Dasein (6110) | more than 9 years ago | (#11364483)

The problem is that many systems that produce XML have a more compact internal storage (rows from a DB or whatever), then they go through an "expansion" to produce XML.

So, to propose simply compressing it means that there's and expansion (which is expensive) followed by a compression (which is really expensive). That seems pretty silly. However, given an upfront knowledge of which tags are going to be generated, it's pretty easy to implement a binary XML format that's fast and easy to decode.

This is what I did for a company that I worked for. We did it because performance was a problem. Now, if we don't get something like this through the standards bodies, more companies are going to do what mine did and invent thier own format. That's a problem -- back to the bad old days before we had XML for interoperability.

Now, if we get something good through the standards body then, even though it won't be human readable, it should be simple to provide converters. To have something fast that is onvertable to human readable and back seems like a really good idea.

KISS (5, Interesting)

stratjakt (596332) | more than 9 years ago | (#11364141)

On the face of it, compressing XML documents by using a different file format may seem like a reasonable way to address sluggish performance. But the very idea has many people -- including an XML pioneer within Sun -- worried that incompatible versions of XML will result.

I agree with his point.

What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2?

Re:KISS (1)

barryman_5000 (805270) | more than 9 years ago | (#11364226)

I am more concerned with the idea that "when you send something in xml is in text form and therefore easier to read." -- Well how about you make a secure connection before you go off throwing your text around? Isn't this how 99% of the world buys stuff off the internet with their credit cards? Problems with bzip/gzip -- not on windows.

Re:KISS (2, Informative)

Ewan (5533) | more than 9 years ago | (#11364404)

gzip uncompression is built into internet explorer, it's used all the time for speeding up the transfer of html to clients.

There's no reason why it couldn't be used for xml just as it is for html.

Ewan

Re:KISS (1)

barryman_5000 (805270) | more than 9 years ago | (#11364442)

I know it is on IE but is it built into windows servers?

Re:KISS (1)

Derek Pomery (2028) | more than 9 years ago | (#11364421)

Huh? There are plenty of archive tools for windows that read gzip/bzip - and anyway, most users don't need to open XML archives. Programs for windows can load a lib just fine.

Re:KISS (0)

Anonymous Coward | more than 9 years ago | (#11364272)

Because that doesn't fix the processing overhead in parsing text. String processing is the slowest operation to do on a CPU.

Re:KISS (1)

man_of_mr_e (217855) | more than 9 years ago | (#11364361)

The problem is transmission speed over the network. Even the slowest modern processor is orders of magnitude faster at parsing than the fastest network.

Re:KISS (0)

Anonymous Coward | more than 9 years ago | (#11364390)

yes but what about when that processor is driven by a battery and you want to maximize the battery life?

Re:KISS (0)

Anonymous Coward | more than 9 years ago | (#11364365)

what's wrong with that?
Microsoft or another company cant make an intentionally incompatable version that will not work with other platforms.

duh, what other reason is there to use non-open formats?

parsing. (1, Insightful)

Anonymous Coward | more than 9 years ago | (#11364381)

When the XML is in text you still need to parse it. Sounds like an easy job if you're just doing it on your home computer. But a server handling thousands of simultaneous transactions can get bogged down parsing text down to binary when it can just get sent in binary to begin with.

MUCH faster. And you don't have the overhead of compression. Sure, gzip/bzip2 will cut down on network overhead, but what about processor overhead?

Re:KISS (1)

phasm42 (588479) | more than 9 years ago | (#11364396)

This is often done (large feeds to Amazon.com are compressed). However, you still have to decompress and parse the resulting stream, which is where a big penalty is incurred. I'm hoping that whatever compression they are considering, it will reduce the uncompressed size, as well as making parsing/searching faster.

Re:KISS (1)

rootmonkey (457887) | more than 9 years ago | (#11364408)

My previous company used XML as a realtime protocol (I know very lame) and its not the size of the docs is the overhead in parsing especially when you have several Mb a second and only one intel cpu. Ascii --> binary --> Ascii really kills an app.

Re:KISS (0)

Anonymous Coward | more than 9 years ago | (#11364431)

Exactly right. Something like this [schroepl.net] is the solution.

10 types of people (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11364154)

There are 10 types of people. Those what support binary XML and those what oppose it.

Re:10 types of people (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11364350)

mod parent up
FUNNY!!!
(what is 10 in binary?)

Re:10 types of people (1)

Eric604 (798298) | more than 9 years ago | (#11364497)

(what is 10 in binary?)

1010

Make a XML compiler... (1)

Yaa 101 (664725) | more than 9 years ago | (#11364161)

But make it a open source one...

I guess this is another itch to scratch by the community...

Re:Make a XML compiler... (1)

leerpm (570963) | more than 9 years ago | (#11364293)

Compilers are for code, not data. Xml is data.

Re:Make a XML compiler... (0)

Yaa 101 (664725) | more than 9 years ago | (#11364369)

Really?

Re:Make a XML compiler... (1)

Yaa 101 (664725) | more than 9 years ago | (#11364405)

<rect class="red str" x="15" y="15" width="100" height="50" rx="12" ry="18" />

Re:Make a XML compiler... (0)

Anonymous Coward | more than 9 years ago | (#11364370)

"Make a XML compiler...
But make it a open source one...

  • Got something against the word "an"?

Re:Make a XML compiler... (0, Offtopic)

Yaa 101 (664725) | more than 9 years ago | (#11364443)

Dunno...
Maybe I am not a english native speaking person and maybe you are arrogant elitist?...

Oooh, limelight! (1)

csbruce (39509) | more than 9 years ago | (#11364170)

Check out CWXML/BXML [cubewerx.com] . Especially significant though perhaps unintuitive is the savings in compression time from the source data being more compact.

Re:Oooh, limelight! (1, Insightful)

Anonymous Coward | more than 9 years ago | (#11364351)

Any programmer worth his salt can put together a really good/efficient binary representation of XML in a few days. That's not the issue. The issue here is standardization.

a kabosh? (1)

krisp (59093) | more than 9 years ago | (#11364175)

looks like the developer in question is a little too close to his prize development. speeding up xml by removing all the bloat, however that would be accomplished, be it compiling xml into some sort of byte code or whatnot, seems like a much better idea from the client and server point of view. why transfer 100kb of text data when you can send 10kb of binary data for the same message?

Re:a kabosh? (2, Funny)

SnapShot (171582) | more than 9 years ago | (#11364473)

Considering that for most purposes XML contains a lot of redundant formatting it seems like you could get nearly 10:1 compression simply by using (as has already been mentioned) zip or some other compression algorithm.

However, you wanted to go to a binary encoding you could try for something relatively straight forward like:

original:
<tag name="value"/>
patented XML encoding algorithm (hexideximal):
3c746167 206e616d 653d2276 616c7565 222f3e00

Awesome! (0)

Anonymous Coward | more than 9 years ago | (#11364177)

Now we can have competing formats of Binary XML. Fuck that human readability bullshit, what we need is to make it so that Apple's Binary XML implementation differs from SUN's Implementation and nothing works with Microsoft's, not even their own files!

Binary XML has been around a while... (4, Informative)

PipianJ (574459) | more than 9 years ago | (#11364182)

Binary XML is nothing new, as I wager that many people here are already using it, albeit unknowingly.

One of the earliest projects that has tried to make a binary XML (as far as I'm aware) was the EBML (Extensible Binary Meta-Language) [sourceforge.net] which is used in the Matroska media container [matroska.org] .

Re:Binary XML has been around a while... (0)

Anonymous Coward | more than 9 years ago | (#11364301)

That's not really binary XML, that's a Binary Meta Language similar to XML.

Re:Binary XML has been around a while... (1)

leerpm (570963) | more than 9 years ago | (#11364342)

Of course, there are a zillion ways to binary encode XML, but none are a W3C standard.

I admit I'm just a starting developer... (1)

temporalillusion (688393) | more than 9 years ago | (#11364183)

...but web servers and browsers can use gzip to reduce the size of the HTML going back and forth, why not have something similar where a web service gzips the XML and the consumer decompresses it?

Goals (1)

realdpk (116490) | more than 9 years ago | (#11364190)

FTFA "The goal of the Fast Infoset project is to generate interest among developers and eventually create a standardized binary format."

I'm not sure why they think that one has to come before the other.

Frankly, make it a standard so I can write proper code to handle it, and you'll have me (joe random developer) interested.

Re:Goals (2, Insightful)

WarPresident (754535) | more than 9 years ago | (#11364416)

FTFA "The goal of the Fast Infoset project is to generate interest among developers and eventually create a standardized binary format." I'm not sure why they think that one has to come before the other.

Because standards written in a vacuum tend to suck. Why wouldn't you want input from developers with different backgrounds and needs, then cherry pick the best ideas (many of which you didn't think of), toss out universally reviled ones, and implement a broad, useable standard?

Makes no sense (1, Insightful)

Anonymous Coward | more than 9 years ago | (#11364194)

Binary XML would destroy what makes xmal powerful: being able to use vi or emacs to understand its content, no fuss, no adobe reader like software, no nothing.

gzip ? (2, Interesting)

JonyEpsilon (662675) | more than 9 years ago | (#11364195)

Am I missing something, or would just gzip'ing xml when it goes over the network not solve the problem ? And isn't this sort of solution already widely implemented for web content ?

Somebody fill me in ...

Re:gzip ? (0)

Anonymous Coward | more than 9 years ago | (#11364260)

cpu cost of gzipping...

there are already standards for this... (2, Interesting)

ophix (680455) | more than 9 years ago | (#11364201)

... its called zipping, most webservers have it as an option to zip the data up as it streams to the client browser

i fail to see the need to have a "binary xml" file format when there are already facilities in place to compress text streams

Re:there are already standards for this... (5, Insightful)

rootmonkey (457887) | more than 9 years ago | (#11364441)

I'll say it again.. Its not the size of the document its the overhead in parsing.

What would Homer Simpson do? (1, Funny)

Anonymous Coward | more than 9 years ago | (#11364218)

What would Homer Simpson do if he found out about this news in Springfield? Be creative! Best answer gets 2+ mod points. Good Luck!!!!

XML is not S-Expressions (0)

Anonymous Coward | more than 9 years ago | (#11364219)

For all those going to say this? Read this. [prescod.net]

ASN.1 (1)

XorA (147020) | more than 9 years ago | (#11364229)

Is binary xml not just a stupid idea and clashing with ASN.1.

ASN.1 is already a standard, used heavilly in the smartcard/GSM sim industry.

ASN.1 + BER? (0)

Anonymous Coward | more than 9 years ago | (#11364232)

BER encoded ASN.1 data is just this - a tree structure of values w/ external definitions of data types and structures...

http://www.insidiae.org/~mike/code/asn1dec1-00.0 0. 01.zip

Maybe this is like comparing assembly to C (5, Insightful)

Stevyn (691306) | more than 9 years ago | (#11364237)

Programs written in assembly can run faster than programs written in C, but it's easier for someone to open a .c file and figure out what's going on.

I'm sure when C came out, the argument was similar that the performance hit doesn't make up for the readability or cross compatibility. But as computers and network connections became faster, C becomes a more viable alternative.

Modern C compilers write better assembly (1)

chopper749 (574759) | more than 9 years ago | (#11364493)

them most assembly programers can right.

You don't need to change XML itself (2, Insightful)

Nom du Keyboard (633989) | more than 9 years ago | (#11364238)

XML's verbosity and lack of inherent compression...XML standard calls for information to be stored as text.

Text compresses quite well, especially redundant text like the tags. So why not just leave XML alone and compress it at the transportation level with protocols like sending it as a zip, let v.92 modems do it automatically, or whatever. No need to touch XML itself at all.

Binary is need for more than just file size (0)

Anonymous Coward | more than 9 years ago | (#11364247)

Would you want to store a .bmp as a series of words like pixel(253,8764) = Black? Somethings are better left in binary for and if XML is going to be used for data transportation between programs then it needs support binary data.

I for one (0)

Anonymous Coward | more than 9 years ago | (#11364252)

welcome our binary XML overlords.

Binary XML is called ASN.1 (2, Insightful)

Saint Stephen (19450) | more than 9 years ago | (#11364264)

For starters, we already have binary XML, it's called ASN.1. Don't argue, I know it's not exactly the same.

But secondly, no, you don't need Binary XML, all you need to do is Gzip it on the wire. It gets as small as Binary XML.

One of the easiest ways to shrink your XML by about 90% is use tags like:
<a><b><c>
instead of
<FirstName><CompanyName><Address>
You can use a transformation to use the short names or long names on the wire.

Re:Binary XML is called ASN.1 (1)

Keeper (56691) | more than 9 years ago | (#11364338)

But secondly, no, you don't need Binary XML, all you need to do is Gzip it on the wire. It gets as small as Binary XML.

And it becomes even slower to parse as a result. Binary XML's advantage isn't its size, it is its parsing performance.

Re:Binary XML is called ASN.1 (1)

Saint Stephen (19450) | more than 9 years ago | (#11364417)

Most server processes are not CPU bound. That's not the low-hanging fruit.

Re:Binary XML is called ASN.1 (0)

Anonymous Coward | more than 9 years ago | (#11364490)

Uh, yes they are. What else do you think accounts for the ever-increasing power and numbers of server CPUs?

Re:Binary XML is called ASN.1 (0)

Anonymous Coward | more than 9 years ago | (#11364377)

With that advice its no wonder you are onboard. The whole point to having a text based humanly parsable file format is that you cant make sense of it with a text editor. Your suggestion breaks things just as bad.

Re:Binary XML is called ASN.1 (1)

ine8181 (618938) | more than 9 years ago | (#11364410)

I think you're mostly correct. ASN.1 is very cool and efficient, but the problem is standardisation. Same goes with GZip. If everybody decides to use Gzip everytime they send an XML document, we will have a solution.

I have to object to the shorter tag names though -- this method does not get rid of the inherent redundancy in the open/close brackets, whitespaces and the ASCII data inside, which can be further compressed.

I develop on Java and .NET, been using XML full time for last 3 years, and yes, we had the XML bloat problem and tried various things including shortening the tag names.. Which didn't help much.

At the moment we're getting around the problem by the gzip method, which is non standard.

XML is nothing but verbose s-expressions (0)

Anonymous Coward | more than 9 years ago | (#11364439)

Another improvement the lisp guys noticed decades ago is instead of redundantly putting the name of the tag in the closing tag, you don't need it.

<Name><FirstName>John</FirstName><LastName> Doe</LastName></Name>

vs

<Name><FirstName>Jo hn</><LastName>Doe</></>

or better

(Name (FirstName John) (LastName Doe))

CPU vs. Bandwidth (1)

ancalagon (35314) | more than 9 years ago | (#11364448)

While
<a><b><c>
is indeed much smaller as
<FirstName><CompanyName><Address>
, it takes the same amount of CPU cycles (more or less) to PARSE that string. If you have a really fast data stream (say 1 Gbit/s or more), you will have a problem on the receiver's end.

If you gzip the stream, you save bandwidth, but gunzip on the receiver makes the problem worse. However, bandwidth is usually not a concern within clusters. You want to something with the data you received, right? This takes CPU cycles as well.

What we need is a combination of XML and binary, fixed data streams.

Re:Binary XML is called ASN.1 (0)

Anonymous Coward | more than 9 years ago | (#11364449)

A better example would be the HDF standard (currently HDF5, go google).

Gzip is relatively slow, and you still have to store the tags, even with something like huffman encoding, where the most commonly used characters get represented with 8 bits.

Also, reading a large chunk of binary data, such as a 400mb hyperspectral image, can be reasonably fast if you don't need to uncompress ascii, convert to binary, and then tossed into the final array. Instead, you can use system calls to toss straight from filestream to array.

If I were world dictator (0)

Anonymous Coward | more than 9 years ago | (#11364269)

ok, time for the obligatory what I would do if i was world dictator posts:

....aaaaaaand GO!

Amen To That (5, Insightful)

American AC in Paris (230456) | more than 9 years ago | (#11364275)

XML, as originally designed, is deliciously straightforward. Data is encoded into discrete, easy-to-process chunks that any given XML parser can make sense of.

XML, as implemented today, is often little more than a thin wrapper for huge gobs of proprietary-format data. Thus, any given XML parser can identify the contents as "a huge gob of proprietary data", but can't do a damned thing with it.

Too many developers have "embraced" XML by simply dumping their data into a handful of CDATA blocks. Other programmers don't want to reveal their data structure, and abuse CDATA in the same way. Thus, a perfectly good data format has been bastardized by legions of lazy/overprotective coders.

The slew publications exist for the sole purpose of "clarifying" XML serves as testament to the abuse of XML.

Re:Amen To That (1)

hdc (665183) | more than 9 years ago | (#11364419)

Uh, yeah. If I'm not mistaken, one of the original goals of XML was to make data simply interchangeable. Doesn't making it binary totally demolish that purpose? Silly me, there I go talking sense again....

Re:Amen To That (1)

jandrese (485) | more than 9 years ago | (#11364423)

The problem with trying to solve the connector conspiracy (in this case obtuse undocumented binary files) is that not everybody [b]wants[/b] to solve the connector conspiracy. Some people would rather have their file format die off than have a competitor gain any advantage whatsoever over their product. They also don't want people buying cheap knockoffs of their products and think they can stop this by not giving away any details on how to interface with their product. If we find a way to change this perception, then the connector conspiracy will mostly go away on its own (save for those lazy guys who just implement it however they want and never document anything, regardless of whatever standards are available).

ZIP ?! (0, Redundant)

Bazouel (105242) | more than 9 years ago | (#11364281)

Why not simply zip it ?

As far as I know, there are programs/library for that format on every platform ...

Re:ZIP ?! (1)

gstoddart (321705) | more than 9 years ago | (#11364482)

Why not simply zip it ?


As far as I know, there are programs/library for that format on every platform ...


Because smaller file sizes is only one of the reasons for Binary XML.

Simply compressing it makes it smaller, but does nothing to simplify handling. Parsing XML is the big hairy deal in this case. Things like XML include a lot of ambiguities and complex things, parsing/representing the trees can be a challenge. Think processing of name-spaces and all of the myriad things in XML.

I suspect the purpose of a Binary XML is to have the data already parsed into a traversable structure that applications can use easier. This would improve load-times, as well as make it less necessary to have parsers fully implemented as part of every program.

The problems with Binary XML mean that you no longer have a human-readable form of the data, so editing/reading becomes difficult. At this point, you've got yet another obscure binary file which is less easy to work with and fairly opaque to a user.

In this case, this is what Tim Bray is complaining about.

Cheers

ASN.1? (0)

Anonymous Coward | more than 9 years ago | (#11364299)

Don't we already have ASN.1 [elibel.tm.fr] ?

Compression and huffing around (2, Insightful)

tod_miller (792541) | more than 9 years ago | (#11364307)

A huff transform will give you entropy +1 compression. Not suitable for larger data sets (dictionary based compression is even better for this). 7z compression (or is it z7?) will give you a neat storage format.

Lets talk about where this verbose talk of verbosity is stemming from:


apple
orange
pineapple


this is a data set. Noone knows what it is.
Here it is again with some pseudo xml style tags
I am listing vegetables here

this is a list of vegetables
vegetables are listed on thier own without any children pr parent tags, there can be one or more of them, this is version 1 of the document
here now follows a vegetable
tomato
that was a vegetable
here now follows a vegetable
leek
that was a vegetable
here now follows a vegetable
potato
that was a vegetable
here now follows a vegetable
haddock
that was a vegetable

as you can see, this is (albeit slightly weird looking) list of items called 'vegetables'.

The beauty of XML is two fold, the description of the document format (DTD and schemas) and the abilty to verify a document is valid, for any specified format.

XML is a human readable file specification language, and file format, all in one, written in itself!

A binary format of XML would be nice, you can make it yourself though.

veg:http://slashdot.org/veg.xml
v:tomato
v:fru itcake
v:lemongrass
v:cat

this is a minimal way to represent the same xml like structure, in a less verbose way.

This is undeniable complexity, a binary format is just like a way of saying introduce a standard loosless compression format for XML, without changing what XML is.

I say anything that gets the W3C stamp of 'this is official' gets my vote. After all, 1 bad standard is better than 11 good proprietary solutions in a world of millions of interconnected systems.

Sure... (1)

Further82 (720625) | more than 9 years ago | (#11364309)

Given XML's predictable syntax and well-formed requirement it should be relatively easy to create a compression scheme taking advantage of XML then combining that with something like gz or bz2, rather than just compressing XML with gz or bz2. It would be like the difference between compressing a wav file with ZIP and with FLAC. Though with XML the difference would likely only be significant with very large files.

Of course anything like this should be endorsed by W3C before being put into wide use.

Several points. (1)

NoMoreNicksLeft (516230) | more than 9 years ago | (#11364324)

1) Isn't the greatest benefit of XML that it can be opened in a text editor, and made sense of?

2) Can't webservers and browsers (well, maybe not IE, but then it's not a browser... it's an OS component, haha) transparently compress XML with gzip or some other?

3) Making it binary won't compress it all that much, using a proper compression algo will.

4) Doesn't something like XML, that makes use of latin characters and a few punctuation marks, compress with insane ratios even in lame compression algo's?

5) In a world moving ever closer to ubiquitous broadband, is a difference between a 10kb html file and a 17kb XML file all that fatal? Surely bittorrent and spam does more to suck up all available bandwidth than XML does (what little is out there).

Oh please god no (1)

seldolivaw (179178) | more than 9 years ago | (#11364326)

I've had to work with binary XML for formatting WAP push messages and it is the ghastliest thing ever. Yes, I can see that it has low-bandwidth applications but my opinion is that I'd much rather have less bandwidth than have to deal with binary XML :-)

SMPTE KLV (1)

TheSync (5291) | more than 9 years ago | (#11364349)

I would suggest that people seeking fast, standard ways to deliver binary data look at SMPTE KLV (key, length, value) coding. It is SMPTE 336M, and is the standard for metadata coding in television, video, and digital cinema.

it's needed today, not tomorrow (1)

alan_dershowitz (586542) | more than 9 years ago | (#11364362)

I totally drank the XML kool-aid, so don't interpret this as saying that I hate XML or anything. I really love it. However, you don't really get an appreciation of just how slow and bloaty XML is until you see it used in real life a few times. I sometimes wonder if these guys have ever built a system on something that wasn't a top-notch research bed.

I'm not seeing in the article where he submits a solution to the problem, he just said as computers and networks get faster, the bloat won't be slow anymore. There's a very good chance I'll be using the same infrastructure in 3 years, so that is a non-solution for me, and I suspect many other people too.

It's pretty clear to me he's out of touch. Everyone is clamoring for problems they have right now, and he wants everyone to wait for universal gigabit ethernet and 10Ghz CPUs.

Sounds like CORBA or any other RPC. (2, Insightful)

Anonymous Coward | more than 9 years ago | (#11364371)

The XML guys are funny. First make a text version of binary protocols to make it easy to sell XML them to the mass of "31137 HTML PRogrammers" who feel comfortable "programming" in dreamweaver; and then make a binary version to make it work.

Fielding on binary Waka (HTTP replacement) (1)

suso (153703) | more than 9 years ago | (#11364402)

Roy Fielding, who is developing the Waka protocol, which is binary, argued at ApacheCon 2000 that as long as the protocol is still understood, binary utilities could be made to decode things for debugging. But the 99.9% of other requests would be more important and benifit more from being in binary.

xtp:// (1)

krygny (473134) | more than 9 years ago | (#11364403)

XML transfer protocol.

Ok, we got a name. Now all we need is one fart smella to design it.

Doesn't work at all (1)

revery (456516) | more than 9 years ago | (#11364412)

What the world needs now, it binary XML?

Nope, sorry, those lyrics suck. We're gonna stick with Mr. Bacharach's version.

Images in XML? (1)

jergh (230325) | more than 9 years ago | (#11364459)

...mobile-phone companies such as Nokia, have argued for a binary XML format. Without it, large files such as images will take too long to download to devices such as mobile phones

So they instead of JPEGs they use something like this?
<image width="800" height="600">
<pixel type="rgb">#FF0011</pixel>
<pixel type="rgb">#444444</pixel>
<pixel type="rgb">#838300</pixel>
<pixel type="rgb">#303030</pixel>
...
</image>
WTF!?

Binary XML? (1)

telstar (236404) | more than 9 years ago | (#11364466)

That's what you get when somebody forgets to choose "BIN" in their FTP client and dumps a bunch of XML to a directory, right?

But ASCII is binary after all... (2, Interesting)

MarkWPiper (604760) | more than 9 years ago | (#11364470)

The fact is, ASCII is a binary format. It just happens to be a format that has become universally accepted. As the article says, there are certainly benefits to having ASCII-based XML: "The fact that XML is ordinary plain text that you can pull into Notepad... has turned out to be a boon, in practice," he said. "Any time you depart from that straight-and-narrow path, you risk loss of interoperability."

However, if anything, XML has shown us the power of well-structured information. XML has given the possibility of universal interoperability. Developments in XML-based technologies have led us to the point where we know enough now to create a standard for structured information that will last for several decades.

It's time that we had a new ASCII. That standard should be binary XML.

When I think of the time that has been wasted by every developer in the history of Computer Science, writing and rewriting basic parsing code, I shudder. Binary XML would produce a standard such that an efficient, universal data structure language would allow significant advances in what is technically possible with our data. For example: why is what we put on disk any different from what's in memory? Binary XML could erase this distinction.

A binary XML standard needs to become ubiquitous, so that just as Notepad can open any ASCII file today, SuperNotepad could open any file in existance, or look at any portion of your computer's memory, in an informative, structured manner. What's more, we have the technology to do this now.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?