Google Open Sources Its Data Interchange Format

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Google Open Sources Its Data Interchange Format 332

Posted by kdawson on Tuesday July 08, 2008 @04:07PM from the it's-fast-that's-why dept.

A number of readers have noted Google's open sourcing of their internal data interchange format, called Protocol Buffers (here's the code and the doc). Google elevator statement for Protocol Buffers is "a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more." It's the way data is formatted to move around inside of Google. Betanews spotlights some of Protocol Buffers' contrasts with XML and IDL, with which it is most comparable. Google's blogger claims, "And, yes, it is very fast — at least an order of magnitude faster than XML."

This discussion has been archived. No new comments can be posted.

Google Open Sources Its Data Interchange Format

Load All Comments

Search 332 Comments Log In/Create an Account

Comments Filter:

An order of magnitude over XML? (Score:5, Funny)

by Anonymous Coward writes: on Tuesday July 08, 2008 @04:10PM (#24105175)

So is, well, just about anything.

Share
twitter facebook
- Re:An order of magnitude over XML? (Score:5, Interesting)
  
  by dedazo ( 737510 ) writes: on Tuesday July 08, 2008 @04:34PM (#24105539) Journal
  
  Looks like Google just invented the IIOP [wikipedia.org] wire protocol, which is also platform agnostic and an open standard.
  I guess the main difference here is that their "compiler" can generate the actual language-domain classes off of the descriptor files, which is a definite advantage over "classic" IDL.
  "Google protocol Buffers" is cooler than the OMG terminology, but this kind of thing has been around for 20 years.
  
  Parent Share
  twitter facebook
  - Re:An order of magnitude over XML? (Score:4, Funny)
    
    by kriston ( 7886 ) writes: on Tuesday July 08, 2008 @05:22PM (#24106289) Homepage Journal
    
    Oh, I'm a little ashamed that I recognize this message as CORBA flamebait.
    
    Parent Share
    twitter facebook
  - Re:An order of magnitude over XML? (Score:5, Informative)
    
    by jd ( 1658 ) writes: <imipak@yahoGINSBERGo.com minus poet> on Tuesday July 08, 2008 @06:12PM (#24107089) Homepage Journal
    
    Technically, you are correct - platform-agnostic data transfer has been possible since Sun's earliest RPC implementations. However, this seems to be considerably lighter-weight (although so is Mount Everest) and because order is specified, it's going to be much simpler to pluck specific data out of a data stream. You don't need to have an order-agnostic structure and then an ordering layer in each language-specific library.
    There have been all kinds of attempts to produce this sort of stuff. RPC, DCE, Corba, DCOM, etc, are programmatic interfaces and handle function calls, synchronization, etc. OPeNDAP is probably the closest to Google's architecture in that it is ONLY data. It's more sophisticated, as it handles much more complex data types than mere structures, but it has its own overheads issues. It isn't designed to scale to terabyte databases, although it DOES scale extremely well and is definitely the preferred method of delivering high-volume structured scientific data - at least when compared to the RPC family of methods, or indeed the XML family. I wouldn't use it for the kind of volume of data Google handles, though, you'd kill the servers.
    
    Parent Share
    twitter facebook
    - Re:An order of magnitude over XML? (Score:5, Insightful)
      
      by vrmlguy ( 120854 ) writes: <samwyse&gmail,com> on Tuesday July 08, 2008 @07:23PM (#24108123) Homepage Journal
      
      Technically, you are correct - platform-agnostic data transfer has been possible since Sun's earliest RPC implementations. However, this seems to be considerably lighter-weight (although so is Mount Everest) and because order is specified, it's going to be much simpler to pluck specific data out of a data stream. You don't need to have an order-agnostic structure and then an ordering layer in each language-specific library.
      Actually, XDR (used for Sun's RPC) is very lightweight, arguably lighter than PB. (Yes, I forsee a Java implementation called PB&J.) XDR is potentially more compact, since it doesn't encode field identifiers, but it's also big-endian, which made it less attactive as little-endian computer archtectures took over the world. Also, while XDR demands a fixed ordering of fields, field order in PB *isn't* specified; the field identifiers allow you to order the fields anyway that you like.
      Overall, I like it. It's obvious that the developers were familar with the flaws of older protocols, and found ways to fix most of them. The only obvious thing I see missing is a canonical way to encode the .proto file as a Protocol Buffer, to make a stream self-describing.
      
      Parent Share
      twitter facebook
      - Re:An order of magnitude over XML? (Score:4, Informative)
        
        by vrmlguy ( 120854 ) writes: <samwyse&gmail,com> on Tuesday July 08, 2008 @11:25PM (#24111095) Homepage Journal
        
        The only obvious thing I see missing is a canonical way to encode the .proto file as a Protocol Buffer, to make a stream self-describing.
        A-ha! I found it! [google.com] "Thus, the classes in this file allow protocol type definitions to be communicated efficiently between processes."
        Why do you need this? Well, you may not. "Most users will not care about descriptors, because they will write code specific to certain protocol types and will simply use the classes generated by the protocol compiler directly. Advanced users who want to operate on arbitrary types (not known at compile time) may want to read descriptors in order to learn about the contents of a message."
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Informative)
        
        by Nynaeve ( 163450 ) writes:
        
        I got a 404 on your link. try this one [google.com].
- Re:An order of magnitude over XML? (Score:5, Funny)
  
  by alexgieg ( 948359 ) writes: <alexgieg@gmail.com> on Tuesday July 08, 2008 @04:45PM (#24105683) Homepage
  
  An order of magnitude over XML? So is, well, just about anything.
  Well, let's also not forget that the meaning of the expression "an order of magnitude" depends strongly from the numeric base you're using.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by jd ( 1658 ) writes:
    
    Given how evil Google can be at times, we can assume they are working in base 13.
    - Re:An order of magnitude over XML? (Score:5, Funny)
      
      by sconeu ( 64226 ) writes: on Tuesday July 08, 2008 @05:59PM (#24106889) Homepage Journal
      
      Nobody makes jokes in Base 13 [wikipedia.org]!
      
      Parent Share
      twitter facebook
- Re: (Score:2)
  
  by jellomizer ( 103300 ) writes:
  
  But the Slashdot Add above the message says XML combined with Java is fast. And the slow part is the Database server. Could I be mistaken.
- - - Re:Between a rock and hard place (Score:4, Insightful)
      
      by metamatic ( 202216 ) writes: on Tuesday July 08, 2008 @06:16PM (#24107157) Homepage Journal
      
      Funny, I'm tired of seeing YAML in places where XML would work fine.
      Like serializing my Ruby objects, for example. When I don't care about performance, XML is best, because almost everything else will read and write it, including my text editor, and I know the syntax. When I *do* care about performance, I'm not going to use YAML either.
      I don't see the niche YAML fits, frankly.
      
      Parent Share
      twitter facebook
Likely story! (Score:5, Funny)

by TheRealMindChild ( 743925 ) writes: on Tuesday July 08, 2008 @04:13PM (#24105217) Homepage Journal

"Google's blogger claims, "And, yes, it is very fast -- at least an order of magnitude faster than XML."

That is just because they aren't using enough XML!

Share
twitter facebook
- More XML? EXI, Efficient Xml Interchange! (Score:4, Informative)
  
  by refactored ( 260886 ) writes: <{zn.oc.tenx} {ta} {tneyc}> on Tuesday July 08, 2008 @08:12PM (#24108645) Homepage Journal
  
  http://www.w3.org/XML/EXI/ [w3.org]
  The development of the Efficient XML Interchange (EXI) format was guided by five design principles, namely, the format had to be general, minimal, efficient, flexible, and interoperable. The format satisfies these prerequisites, achieving generality, flexibility, and performance while at the same time keeping complexity in check.
  Many of the concepts employed by the EXI format are applicable to the encoding of arbitrary languages that can be described by a grammar. Even though EXI utilizes schema information to improve compactness and processing efficiency, it does not depend on accurate, complete or current schemas to work.
  
  Parent Share
  twitter facebook
- - Re:Likely story! (Score:4, Informative)
    
    by caerwyn ( 38056 ) writes: on Tuesday July 08, 2008 @04:26PM (#24105389)
    
    Are you serious? XML is great for certain applications, but the one thing it *isn't* is fast. It's very believable that something like this could be an order of magnitude faster.
    
    Parent Share
    twitter facebook
    - - Re:Likely story! (Score:5, Informative)
        
        by cnettel ( 836611 ) writes: on Tuesday July 08, 2008 @08:13PM (#24108675)
        
        The problem is that, in my experience, it is easy to write a 99 % XML-compliant parser that is 10 times faster. That last percent, though...
        
        Parent Share
        twitter facebook
  - Re:Likely story! (Score:5, Funny)
    
    by jandrese ( 485 ) writes: <kensama@vt.edu> on Tuesday July 08, 2008 @04:27PM (#24105409) Homepage Journal
    
    Yeah, I mean XML didn't earn its reputation for being lightning fast and byte efficient for nothing...
    
    Parent Share
    twitter facebook
  - Re:Likely story! (Score:5, Insightful)
    
    by cduffy ( 652 ) writes: <charles+slashdot@dyfis.net> on Tuesday July 08, 2008 @04:32PM (#24105497)
    
    Being 10x faster than XML to work with is entirely believable: If you're serializing directly to binary structures, those structures can be directly manipulated without any parsing at all... and if you need to do some byte-swapping and alignment adjustments to get them into and out of native form for your current processor, those are still operations which can be performed in a matter of a few CPU instructions, rather than through a few hundred KB of libraries.
    I drink the XML kool-aid plenty -- but there are things it's good for, and things it's not. Serializing and parsing truly massive amounts of data is part of the latter set.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by Reality Master 101 ( 179095 ) writes:
    
    To anyone who seriously believe's google's protocol is an order of magnitude faster than XML, I have two words for you: No.
    You're right -- if it's less than two orders of magnitude faster, I would be very surprised.
  - Re: (Score:2, Insightful)
    
    by dedazo ( 737510 ) writes:
    
    The 10x does not refer to the transmission speed (you're not getting that for a 100KB XML string vs. a 80KB binary blob), but the speed at which the [de]serialization occurs.
    In fact this approach is even faster than runtime-specific stream serialization like cPickle in Python or the built-in binary formatter in the .NET CLR, because those use reflection.
  - - Re:This is a good thing (Score:5, Insightful)
      
      by Temporal ( 96070 ) writes: on Tuesday July 08, 2008 @06:13PM (#24107101) Journal
      
      The example they give is for a small set of data, and percentages vary more dramatically as sample sizes decrease.
      We wanted to give an idea of the speed without trying to boast too much or look like we were directly challenging anyone. Of course every news outlet has chosen to highlight the speed comment -- including the numbers which were intended to be ballpark figures -- more than was intended, but I guess that isn't surprising.
      I agree that the tiny "person" example is not a good benchmark case. It was intended as a usage example, not a speed example, but I stuck the speed numbers in there just meaning to give people a vague idea of the difference. The "20-100 times faster" comment is based on testing a variety of formats -- both unrealistic ones and real-life formats used in our search pipeline -- against programmatically generated XML equivalents (which may or may not themselves be realistic, though they contain the same data with the same structure). libxml2 was used for parsing XML. I don't really know how libxml2's speed compares to other XML parsers, but I didn't have a lot of time to investigate. The 20x faster number comes from the largest data set (~100k-ish) while the 100x number comes from a very small message. The most realistic case was about 50x. Sorry that I cannot provide exact details of the benchmark setup since many of the test cases were proprietary internal formats.
      In any case, I'm hoping that some independent source conducts some tests because I think anything we produced would probably have unintentional biases in it. Of course, I'll update the numbers in the docs if they turn out to be wildly off-base.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Informative)
        
        by Temporal ( 96070 ) writes:
        
        Note that protocol buffers give you the equivalent of a DOM -- an object representing the parsed message. This is usually much more convenient to use than SAX parsing (depending on your use case, of course). So, I'm not sure if comparing against SAX is necessarily fair. Though I think protocol buffers would still win just because there is less to parse and parsing length-delimited chunks is faster than character-delimited.
I bet ... (Score:5, Funny)

by Anonymous Coward writes: on Tuesday July 08, 2008 @04:15PM (#24105239)

... it requires piping data through google's servers for data mining and ad injection purposes.

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by eddy ( 18759 ) writes:
  
  Hey, that's a pretty cool concept.
  $ cat spanish.txt | http://google.com/language_tools/tr?ESEN [google.com] | grep "terrorist"
  I'm sure I'm years late to the party. <sigh>
What? (Score:2)

by Yvan256 ( 722131 ) writes:

Is that like PHP's serialize?
- Re: (Score:2)
  
  by psergiu ( 67614 ) writes:
  
  More like the Oracle SQLLoader ...
  Or the VMS Fixed Record Length/Indexed or VFC files ...
  I think Google might just receive a visit from the patent fairy ...
- Re: (Score:2)
  
  by Foofoobar ( 318279 ) writes:
  
  No. This is more along the lines of a hashmap or a multidimensional array. With serialize in PHP, you still have to unserialize which takes time to parse. With a multidimensional array, it's already in a usable state; no additional parsing is required. And you can add on or remove variables whenever you want without having to reparse.
- Re: (Score:3, Informative)
  
  by merreborn ( 853723 ) writes:
  
  1) It has a binary format, far more compact (and faster to unserialize) than PHP's text-based serialized format.
  2) It handles multiple versions of the same objects (e.g., your server can interact with both PhoneNumber 2.0 and PhoneNumber 3.0 objects relatively trivially)
  3) It generates code for converting each format into objects in their 3 supported languages.
  So, no, not really.
No PERL API ??!!?? (Score:4, Insightful)

by Proudrooster ( 580120 ) writes: on Tuesday July 08, 2008 @04:18PM (#24105275) Homepage

C++
Python
Java
what about PERL ? :]

Share
twitter facebook
- Re:No PERL API ??!!?? (Score:4, Insightful)
  
  by Anonymous Coward writes: on Tuesday July 08, 2008 @04:21PM (#24105317)
  
  Go out and write one, sonny!
  That's the beauty of open source.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by jandrese ( 485 ) writes:
  
  I'm sure it won't take long for the module to show up on CPAN.
- Re:No PERL API ??!!?? (Score:5, Informative)
  
  by yknott ( 463514 ) writes: on Tuesday July 08, 2008 @04:40PM (#24105601) Homepage Journal
  
  According to Brad Fitzpatrick's(of LiveJounral fame) blog [livejournal.com], He's working on Perl support.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by A beautiful mind ( 821714 ) writes:
  
  It's called "Perl".
- Re: (Score:3, Informative)
  
  by Onan ( 25162 ) writes:
  
  Uh, no. Google officially deems perl unmaintainable, and its internal use is completely verboten.
  You're quite welcome to write your own if you want it, but it's not something we'd ever use ourselves.
- - Re: (Score:2)
    
    by fbjon ( 692006 ) writes:
    
    You... You have GOT to be new here.
  - Re: (Score:2)
    
    by Goaway ( 82658 ) writes:
    
    You're not really going to see the benefits of Perl in one month. It's not a very straightforward language like that.
  - Re:No PERL API ??!!?? (Score:5, Insightful)
    
    by mpeg4codec ( 581587 ) writes: on Tuesday July 08, 2008 @05:32PM (#24106431) Homepage
    
    Perl is to programming languages what English is to natural languages: easy to fool around with, hard to learn well, but when you do, the expressive power is incredible. And when you mess it up, nobody understands what you're trying to say.
    
    Parent Share
    twitter facebook
We love the sight of power (Score:2)

by heroine ( 1220 ) writes:

Just think of the kind of power it took to make millions of employees standardize on the same format for their data interchange. Humans just gravitate to power wielding forces. Wonder what format they require for their surprise blog posts.
How about C? (Score:2)

by microbee ( 682094 ) writes:

SunRPC is old and awkward. Always want something better.
- Re: (Score:3, Insightful)
  
  by AuMatar ( 183847 ) writes:
  
  They gave you C++. If you can't translate C++ to C, please turn in your keyboard and leave.
  - Re:How about C? (Score:4, Funny)
    
    by vigmeister ( 1112659 ) writes: on Tuesday July 08, 2008 @05:11PM (#24106075)
    
    Well, I can't translate C++ to C until after it is DECLASSIFIED...
    *rimshot*
    Cheers!
    
    Parent Share
    twitter facebook
Now just release Goobuntu... (Score:2)

by mdm-adph ( 1030332 ) writes:

...and we'll be happy.
- Re: (Score:2, Funny)
  
  by fph il quozientatore ( 971015 ) writes:
  
  Here, fixed the typo for you:
  
  Now just release Boobuntu...
  ... and we'll be happy
As a former user of CORBA (Score:5, Interesting)

by Anonymous Coward writes: on Tuesday July 08, 2008 @04:28PM (#24105431)

It looks like Google has taken some of the good elements of CORBA and IIOP into its own interchange format.
While CORBA certainly is bloated in a lot of ways, the IIOP wire protocol it uses is vastly faster and more efficient than any XML out there.. and yes it is just as "open" (publicly documented and Freely available for use in any open source application) as any XML schema out there. J2EE uses IIOP as well and its is technically possible to interoperate (although the problem with CORBA is that different implementations never really interoperated as they were supposed to).
As a side note, I'd rather write IDL code than an XML schema any day of the week too, but that's another rant.

Share
twitter facebook
compare to thrift ( from facebook) (Score:5, Informative)

by Anonymous Coward writes: on Tuesday July 08, 2008 @04:29PM (#24105439)

both really from the same design sheet, but thrift has been opensource'd for over a year, and has many more language bindings. its been in use in several opensource projects (thrudb comes to mind), and has much more extant articles/documentation.
http://developers.facebook.com/thrift/

Share
twitter facebook
Fast (Score:5, Interesting)

by JamesP ( 688957 ) writes: on Tuesday July 08, 2008 @04:30PM (#24105457)

"And, yes, it is very fast â" at least an order of magnitude faster than XML."
Just wait for the XML zealots to come crashing and not believing that XML is not the fastest, best, solution to all the world's problems (including cancer) and of course people at Google are amateurs and id10ts and WHY DO YOU HATE XML kind of stuff.
Or, as Joel Spolski once said: http://www.joelonsoftware.com/articles/fog0000000296.html [joelonsoftware.com]
No, there is nothing wrong with XML per se, except for the fans...

Share
twitter facebook
- Ok, I'll bite... (Score:5, Interesting)
  
  by Dutch Gun ( 899105 ) writes: on Tuesday July 08, 2008 @05:03PM (#24105961)
  
  Obviously, those at Google felt XML didn't work well for them. They have the resources to invent a protocol and libraries to support it. And, they are big enough to be their own ecosystem, which means as long as everyone at Google is using their formats, interop is no biggie. Good for them, I don't begrudge that decision.
  I'm actually a game developer, not a web developer, so I'll speak to XML's use as a file format in general. Here's a few points regarding our use of XML:
  * We only use it as a source format for our tools. XML is far too inefficient and verbose to use in the final game - all our XML data is packed into our own proprietary binary data format.
  * We also only use it as a meta-data format, not a primary container type. For instance, we store gameplay scripts, audio script, and cinematic meta-data in XML format. We're not foolish enough to store images, sounds, or maps in a highly-verbose, text-based format. XML's value to us is in how well it can glue large pieces of our game together.
  * All our latest tools are written in C# and using the .NET platform (Windows is our development platform, of course). It's astoundingly easy to serialize data structures to XML using .NET libraries - just a few lines of code.
  * Because it's a text-based format and human readable, if a file breaks in any way, we can just do a diff in source control to see what changed, and why it's breaking.
  I'll make a concession that I've heard of some pretty awful uses of XML. But those who dismiss XML as a valuable tool in the toolchest are equally as foolish as those who believe it's the end-all and be-all of programming (I'm not saying that's true of you, just pointing out foolishness on both sides). Like any tool, it's most valuable when used in it's optimal role, not when shoehorned into projects as a solution to everything.
  
  Parent Share
  twitter facebook
Smart move (Score:5, Insightful)

by ruin20 ( 1242396 ) writes: on Tuesday July 08, 2008 @04:32PM (#24105491)

Since they're Google people will clamor over this (as we're doing here) and the result will be at least a handful of folks will learn and use it. Google's key to success has always been finding fresh talent and removing barriers from their contributing and advancement so what I've seen they've done is A) help train potential employee's on how they're tech and thought process works, and B) provide themselves a filter by which to gauge the ability for a potential employee to understand they're system.
And as a bonus, they help undermine opponents who use competing technologies by helping train the workforce away from their practices. Overall I think it's very intelligent and well done strategic move.

Share
twitter facebook
The killer feature is simplicity (Score:5, Insightful)

by jandrese ( 485 ) writes: <kensama@vt.edu> on Tuesday July 08, 2008 @04:37PM (#24105571) Homepage Journal

The point of this isn't so much that it's faster than XML (so is everything else), it's that google took everything that a real person needs in a IDL and cut out everything else. Most IDLs have a serious case of second system effect, where features are added that nobody uses but seriously complicate the API. Even XML suffers from that (have you ever seen the kind of data structure you need to store a DOM, or what that does to library APIs for manipulating XML)?

I'd use it because 95% of the time all I need is something simple like this, and the other 5% of the time I should go back and rethink my design anyway.

That said, there is still a case for XML, especially the self documenting and human readable nature of the document, but there are a lot of cases where it is used today where it only adds unnecessary complexity and actually makes your code more difficult to maintain instead of simpler.

Share
twitter facebook
XML is a crappy format (Score:5, Insightful)

by Alex Belits ( 437 ) * writes: on Tuesday July 08, 2008 @04:42PM (#24105649) Homepage

I always told people that -- it's optimized for:
1. Easy parsing by parsers written by people who slept through their compiler classes.
2. Verification in situations when it's impossible to devise a meaningful reaction to a failure (other than either "everything failed, turn off the computers and go home" and "assume the data to be valid anyway because ALL of it will have the same formatting error because the same program generates it")
3. Dealing with data that arrives in neatly packaged "documents" and "requests", as opposed to being constantly produced and consumed.
4. Either communicating between programs that have the same knowledge of message semantics, or preparation of pretty human-readable documents.
None of the above even remotely applies to anything practical except UI/display formats -- this is why XHTML and ODF (and because of that at some extent XSL) are usable, SOAP is a load of crap, and for the rest of purposes XML is used as a glorified CSL with angle brackets. XML is widespread because monumentally stupid standard is still better than no standard.
So here is your example of how superior can be ANY format that is not based on this stupid idea.

Share
twitter facebook
- XML is not a 'format'! (Score:3, Insightful)
  
  by r3g3x ( 1147243 ) writes:
  
  XML is crappy format
  That statement underlines most people's myopic vision of the XML family of technologies. XML is not a format it is a family of technologies based around a common grammar.
  
  XML is not a bucket.
  It is not a passive container for data.
  It is a transformable semantic graph.
  
  The heart and sole of XML is XLST [w3.org] it serves as a common 'glue' that allows the transformation between the various standardized 'languages' XML [w3.org], XHTML [w3.org], XLST [w3.org], XSL-FO [w3.org], SVG [w3.org], RDF [w3.org], RSS [harvard.edu], etc...
  
  Example; the same XML document (lets say it r
  - - Re: (Score:3, Insightful)
      
      by r3g3x ( 1147243 ) writes:
      
      XML is absolutely definitely a format -- eXtensible Markup Language.
      XML is a system of grammar that is used to create defined formats.
      
      You can't use XML to markup data. You have to use a defined grammar to create a format. You might say that this is an issue of semantics but that is the point. If your only use/understanding of XML is as a static data format then your doing it [XML/XSLT/..] wrong.
      
      XML is crappy tool for static storage. If the data is being read/written by the same program there are faster/simpler was to encode that data. But that isn't what XML is meant
      - Re: (Score:3, Interesting)
        
        by Alex Belits ( 437 ) * writes:
        
        XML is a system of grammar that is used to create defined formats.
        ...made for people who slept through compiler courses.
        
        You can't use XML to markup data. You have to use a defined grammar to create a format. You might say that this is an issue of semantics but that is the point. If your only use/understanding of XML is as a static data format then your doing it [XML/XSLT/..] wrong.
        No, you can't "create" a format with XML. To "create" anything but the most trivial formats you have to provide a definition of both syntax and semantics. XML provides ridiculously complex, stupidly designed means to define a syntax, and absolutely nothing to define semantics, so you still have to either document it or, more likely, provide an implementation.
        Guess what? The syntax is such a microscopic part of your task, the amount of work you have just
- Re:XML is a crappy format (Score:4, Insightful)
  
  by mmurphy000 ( 556983 ) writes: on Tuesday July 08, 2008 @07:31PM (#24108223)
  
  Y'know, I usually give low-UID Slashdotters a modicum of respect, but this diatribe is off-the-charts nonsense.
  1. Easy parsing by parsers written by people who slept through their compiler classes.
  And your evidence of this assertion is...what exactly? Not to mention the minor detail that XML and compilers are orthogonal: you can use XML (or many other data interchange formats) with non-compiled languages, and most compilers know nothing about XML (or many other data interchange formats).
  2. Verification in situations when it's impossible to devise a meaningful reaction to a failure (other than either "everything failed, turn off the computers and go home" and "assume the data to be valid anyway because ALL of it will have the same formatting error because the same program generates it")
  And your evidence of this assertion is...what exactly? XML-consuming programs that are aware of the data structure can have as detailed a "reaction to a failure" as a JSON-consuming program, or a YAML-consuming program, or a Protocol Buffer-consuming program. XML-consuming programs that are not aware of the data structure can, if the XML supplies it, validate against a DTD or schema, things which are not possible in some other data interchange formats (e.g., JSON, YAML).
  3. Dealing with data that arrives in neatly packaged "documents" and "requests", as opposed to being constantly produced and consumed.
  All data comes in neatly packaged buckets of varying types. We call them "bytes" and "packets" and "structures" and "records" and "frames" and "rows" and the like. The only way I can interpret your claim in a way that makes sense is to translate it as "XML sucks for streaming audio and video", which is undoubtedly true, and I don't think anyone uses it in that arena.
  4. Either communicating between programs that have the same knowledge of message semantics, or preparation of pretty human-readable documents.
  On the contrary, this is one of XML's primary strengths — handling cases where programs lack the "same knowledge of message semantics".
  With most data interchange formats, from CSV to JSON to Protocol Buffers, either you know everything about the data structure you're receiving, or you're screwed. In other words, there is no discoverability and no standardized means of being able to only deal with a portion of the data. This is particularly true for binary formats, like Protocol Buffers — either you know exactly what structure you received so you can parse it, or you're SOL, since it's just a bunch of bytes.
  With XML namespaces, it is entirely possible for Program X to publish data that Program Y has no intrinsic knowledge of in its entirety, but might know in part. If Program Y knows how to handle documents containing Dublin Core elements, for example, it can work with just those elements and ignore the rest of the document.
  You're welcome to have any opinion of XML you like. Heck, I even agree that XML tends to be used in places where it's overkill or too verbose. But if you want to convince others that your opinion is the correct one, you'll need to do a better job than this.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by pikine ( 771084 ) writes:
    
    Not to mention the minor detail that XML and compilers are orthogonal: you can use XML (or many other data interchange formats) with non-compiled languages, and most compilers know nothing about XML (or many other data interchange formats).
    If you have taken a compiler class, you'd learn about "compiler compilers" which are parser generators. He's just talking about the concept of parsing in general, and that XML is for people who don't understand how to write parsers.
    I don't agree with everything he says, b
  - - Re:XML is a crappy format (Score:4, Interesting)
      
      by mmurphy000 ( 556983 ) writes: on Tuesday July 08, 2008 @09:12PM (#24109197)
      
      And all of them "check" the format, wasting CPU time, memory and cache, then can do nothing but crash (oh, sorry, throw exception for which there is no valid logic to handle) in the impossible case of format being invalid, and doing nothing if the actual data is semantically invalid (because semantic processing is done by a program written by a programmer who knows that it can't verify the data). Validation solves the problem that does not exist, it makes as much sense as accompanying data structures in memory with a CRC -- if it ever does not match, what are you going to do, send a message "Stand by for imminent crash" into the log? It's a completely wrong place for verification unless your application development model is "perma-debugging".
      In the world I live in, data is frequently valid, but not always:
      
      Data corruption in a communications link (e.g., this series of tubes we're using)
      Data corruption in a storage medium (e.g., hardware hiccup, bit flip due to cosmic ray)
      Version differences between sender and receiver conception of the data format
      Malware that pretends to be a legitimate sender but, instead, sends invalid data
      Many of those can be caught by the general-purpose validators that you decry, and that limits the number of validation routines programmers have to deal with. And your complaints re: CPU, memory, and cache place a value on them that may or may not be proper in every context. Or, as my former business partner put it, "in six months' time, computers will be faster and cheaper, but programmers will be neither".
      Most of the data in anything that actually used for some practical purpose is of a "streaming" kind, request-response cycle is more often an exception than a rule. It only became popular because it's easy to implement with crappy tools.
      You obviously have a very different definition of "streaming" than I do, as I'd argue virtually nothing uses streaming, from the days of FORTRAN and COBOL to the present day.
      By definition, if you don't know semantics, data is meaningless (get it -- semantics, meaning).
      Precisely. Decomposable formats, like XML, allow programs to have semantics for part, but not all, of a data structure. Non-decomposable formats, like C structs, require semantics for all of a data structure. In situations where you know 100% of all use cases for a data structure, non-decomposable formats are fine. If, however, you want to allow for what Jonathan Zittrain refers to as "generativity" (i.e., unanticipated uses for existing technology as a means of advancing said technology), decomposable formats can be a benefit.
      Take, for example, ODT vs. classic binary Word documents, which are pretty much just a serialization of a big-ass binary structure as I understand it. I've written programs that parse and generate ODT, or, more precisely, the portions of ODT that I need. Frankly, I don't care what the rest of it is, so long as my generated documents work properly. And I didn't need to refer to the ODT documentation on OASIS or anything to write them, as the XML was sufficiently human-readable that, accompanied with experimentation, I was able to determine how to generate valid ODT. With Word, even if there were OOXML-sized documentation for it, I'd have to hand-roll my own parser for the whole damn format, just to pick out the pieces I need to work with. Now, if I worked for Microsoft on the Word team, I wouldn't have that problem, because I'd already have the parser. However, I, like most people, don't work for Microsoft, and even if Microsoft's parsers were available, they might not fit my environment (e.g., won't run on Linux).
      Don't get me wrong, XML definitely gets overused. That's a problem with the uses of XML, not XML itself.
      
      Parent Share
      twitter facebook
    - - Re: (Score:3, Interesting)
        
        by Alex Belits ( 437 ) * writes:
        
        I think you are missing the point. XML is good where you want to receive data from other systems over which you have no control. So it doesn't matter how good you are as a programmer, and how well you write YOUR program, the issue is that you've got cabbages (or programmers who resemble cabbages) upstream sending you data.
        So XML is good for talking to systems that use XML, and not for actually developing efficient or usable software!
        That's my whole point -- its only value is that it's some standard that replaced the situation when no common standard existed. Actual quality of its design is still crap, it's written by wrong people, derived from wrong theoretical base and is implementing using wrong tools and techniques. I am not claiming that it's completely unusable, or that people shouldn't use it for user-oriented applicat
- - Re: (Score:3, Informative)
    
    by Alex Belits ( 437 ) * writes:
    
    Actually it handles languages EXTREMELY POORLY, because one of the design goal was to make Unicode mandatory. If XML was truly designed for handling multilingual data, every tag would be able to have attributes for language, charset and encoding, and those tags would default to "undefined, treat as opaque", to ensure safe round trip of untagged data from/to other formats.
    Now it's impossible to use non-Unicode charsets when using multiple languages in the same program because THE WHOLE FREAKING DOCUMENT (why
    - - Re: (Score:3, Informative)
        
        by Alex Belits ( 437 ) * writes:
        
        You still have all conversion routines built into language support, so all non-Unicode charsets still carry their support code into software. And it would be very easy to switch between charsets -- this happens anyway when you deal with character ranges that are not present in the fonts you use for your output. It all happens behind the scenes anyway.
        The problem is, XML developers' Unicode fanaticism threw all this flexibility out of the window on the level of document data and metadata processing, keeping
        
        Re: (Score:3, Informative)
        
        by shutdown -p now ( 807394 ) writes:
        
        The problem is, XML developers' Unicode fanaticism threw all this flexibility out of the window on the level of document data and metadata processing, keeping all complexity and sabotaging functionality, just to leave implementors and users no choice but to convert everything to Unicode.
        You might have not noticed, but it's not just XML. Almost everyone has moved to Unicode now, and those who haven't yet (Ruby, PHP) are being mocked for just that, and have the move on the top of their TODO. Learn to live wi
JSON (Score:5, Interesting)

by hey ( 83763 ) writes: on Tuesday July 08, 2008 @04:49PM (#24105729) Journal

Looks kinda like JSON to me.

Share
twitter facebook
- Re: (Score:2)
  
  by SuperKendall ( 25149 ) writes:
  
  I was kind of wondering the same thing, JSON was created to fill the same need. JSON is more like XML in that it's meant to be human parsable though, which counts for a lot in web use I think.
- Re:JSON (Score:5, Informative)
  
  by Temporal ( 96070 ) writes: on Tuesday July 08, 2008 @05:20PM (#24106247) Journal
  
  Structurally Protocol Buffers are similar to JSON, yes. In fact, you could use the classes generated by the Protocol Buffer compiler together with some code that encodes and decodes them in JSON. This is something some Google projects do internally since it's useful for communicating with AJAX apps. Writing a custom encoding that operates on arbitrary protocol buffer classes is actually pretty easy since all protocol message objects have a reflection interface (even in C++).
  The advantage of using the protocol buffer format instead of JSON is that it's smaller and faster, but you sacrifice human-readability.
  
  Parent Share
  twitter facebook
- Re:JSON (Score:4, Informative)
  
  by pavon ( 30274 ) writes: on Tuesday July 08, 2008 @05:57PM (#24106865)
  
  The major difference between this and something like JSON or YAML or even XML is that those formats all include the format information (variable names, nesting, etc) along with the data. This does not.
  message Person {
  required int32 id = 1;
  required string name = 2;
  optional string email = 3;
  }
  What you are looking at above is the Protocol Format (.proto file) for a single message, which is analogous to an XML schema. No data is stored in that file - the numbers you see are unique ids for the different fields, and they are used in the low low-level representation of the data (not all fields have to be included in every instance of a message)
  The actual data is serialized using a compact binary format, not ASCII like JSON/YAML/XML which makes it much more efficient both to transfer over a network as well as to parse.
  
  Parent Share
  twitter facebook
- Re:JSON (Score:5, Interesting)
  
  by 0xABADC0DA ( 867955 ) writes: on Tuesday July 08, 2008 @06:35PM (#24107449)
  
  Modify JSON so unquoted attributes are 'type labels' and define the type of an attribute by giving a label or a default value. For instance:
  phoneType: { MOBILE: 0, HOME: 1, WORK: 2 }
  
  phoneNumber: { "number": "", "type": phoneType }
  
  person: { "name": "", "id": 0, "email": "", "phone": [ phoneNumber ], }
  
  ... now you have pretty much exactly the same message definition as protocol buffers, but in pure JSON. It could also use some convention like "@WORK" for labels/classes so that a normal JSON parser can parse the message definitions. You can write a code generator to make access classes for messages just by walking the json and looking at the types. I don't see that 'required' and 'optional' keywords help much... imo defaults are generally better (even if they are nil). But this could easily be expressed in a json message definition.
  It's easy to make a binary JSON format that is fast and also small, so there is little advantage to protocol buffers there. It's also easy and ridiculously fast to compress JSON text using say character-based lzo (Oberhumer).
  Maybe somebody can explain, but it doesn't seem like protocol buffers really have much advantages over JSON. It sounds like it is effectively just a binary format for JSON-like data (name-value pairs they say) along with a code generator to access it. The code generator is nice, but this is like a day's work max. Maybe I'm not understanding google's problems, but I'll stick with JSON since it actually is a cross-platform, language neutral data format... and you can always optimize it if actually needed.
  
  Parent Share
  twitter facebook
Have they ever heard of BER/DER? (Score:3, Insightful)

by ugen ( 93902 ) writes: on Tuesday July 08, 2008 @04:51PM (#24105755)

How is this either implementationally or conceptually different from BER/DER encoding (commonly used and available all over the place)?
Looks to me like it is exactly the same thing, reimplemented. I am sure bearing a mark of Google is nice and all, but they are definitely reinventing the wheel here.

Share
twitter facebook
- Re: (Score:2)
  
  by Dan Berlin ( 682091 ) writes:
  
  Have you ever met anyone who worked with ASN.1 and didn't run screaming for the hills?
  - Re: (Score:2)
    
    by forsetti ( 158019 ) writes:
    
    Yeah - those guys at MIT (Kerberos), UMich (LDAP), and the SSL guys ... not that anyone uses any of those protocols/implementations ...
    ASN.1 is the solution ... the problem just hasn't been properly specified yet.
    - Re:Have they ever heard of BER/DER? (Score:4, Funny)
      
      by Dan Berlin ( 682091 ) writes: on Tuesday July 08, 2008 @05:09PM (#24106053)
      
      Uh, having one of the OpenSSL guys working down the hall, he certainly said he would shoot himself if he had to work with ASN.1 again.
      
      Parent Share
      twitter facebook
- ASN.1 encoded with BER/DER just needs tools (Score:4, Informative)
  
  by Animats ( 122034 ) writes: on Tuesday July 08, 2008 @05:51PM (#24106741) Homepage
  
  ASN.1, from 1985, really is very similar. Here's a message defined in ASN.1 form:
  Order ::= SEQUENCE { header Order-header, items SEQUENCE OF Order-line} Order-header ::= SEQUENCE { number Order-number, date Date, client Client,payment Payment-method } Order-number ::= NumericString (SIZE (12)) Date ::= NumericString (SIZE (8)) -- MMDDYYYY Client ::= SEQUENCE { name PrintableString (SIZE (1..20)), street PrintableString (SIZE (1..50)) OPTIONAL,postcode NumericString (SIZE (5)), town PrintableString (SIZE (1..30)), country PrintableString (SIZE (1..20)) DEFAULT default-country } default-country PrintableString ::= "France" Payment-method ::= CHOICE { check NumericString (SIZE (15)), credit-card Credit-card, cash NULL } Credit-card ::= SEQUENCE { type Card-type, number NumericString (SIZE (20)), expiry-date NumericString (SIZE (6)) -- MMYYYY -- } Card-type ::= ENUMERATED { cb(0), visa(1), eurocard(2), diners(3), american-express(4) }
  
  Note that this has almost exactly the same feature set as Google's representation. There are named, typed field which can be optional or repeated. It just looks more like Pascal, while Google's syntax looks more like C.
  
  Parent Share
  twitter facebook
XDR? (Score:2)

by Rene S. Hollan ( 1943 ) writes:

I guess that XDR wasn't good enough, then, or ASN.1 (which supports multiple abstract encodings to boot).
XML, as an interchange format?
I suppose one could load source code into memory, and compile it every time, too. Even Java compiles to bytecode.
Bloated formats are fine for human interpretation (I rather like one kind of structure for my config files), or occasional parsing (which is why most of the stuff in /etc is human-readable, for small data sets (I do remember when "the internet" was one big /etc/ho
How is this different.. (Score:2)

by Ztream ( 584474 ) writes:

.. from things like YAML and JSON?
- Re: (Score:2)
  
  by Temporal ( 96070 ) writes:
  
  YAML and JSON are text-based formats intended for human readability. Protocol Buffers are binary, and therefore smaller and faster, but not human-readable.
  Also, the protocol buffer compiler provides friendly data access objects. You could actually use these with JSON or YAML, by just writing a new encoder and decoder (which is easy to do).
I have an XML alternative format too. (Score:4, Funny)

by IGnatius T Foobar ( 4328 ) writes: on Tuesday July 08, 2008 @05:15PM (#24106169) Homepage Journal

I have my own data format that is an alternative to XML as well. It works by normalizing the data into records which all contain the same number of fields, and placing an agreed-upon delimiter between each field. The end of the record is indicated by a newline.

I think this "delimited" format has a lot of potential.

Share
twitter facebook
Binary message formats are good (Score:3, Insightful)

by kriston ( 7886 ) writes: on Tuesday July 08, 2008 @05:20PM (#24106253) Homepage Journal

Thankfully an alternative to XML.
If you didn't think XML was among least efficient transport formats then you weren't really paying attention. Battery-conscious mobile devices do not really enjoy parsing XML DTD and then the XML file itself.
It reminds me a little bit of AOL's SNAC message types.
We get something good for the industry from Google, after a rash of bad press, and is actually NOT a beta.

Share
twitter facebook
Old-School Property lists? (Score:4, Insightful)

by menace3society ( 768451 ) writes: on Tuesday July 08, 2008 @05:35PM (#24106481)

The similarity between these things and NeXT's Property Lists (now called "Old-School Property Lists" that Apple/NeXT has standardized on XML) is incredible. Some things are changed, like having a specification instead of just assuming that the recipient will parse it and figure it out, but the likeness is there. I wonder if any of the proto people at google had experience with plists, or if it's just a case of convergent design.
Everything old-school is new-school again, I guess.

Share
twitter facebook
Elevator Statement (Score:3, Funny)

by somethingwicked ( 260651 ) writes: on Wednesday July 09, 2008 @08:24AM (#24115019)

Google elevator statement for Protocol Buffers is "a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more."
Christ, I hope I'm never in an elevator with someone who would consider THAT an elevator statement.

Share
twitter facebook
- Re:Back to the 70's night? (Score:4, Insightful)
  
  by Temporal ( 96070 ) writes: on Tuesday July 08, 2008 @04:32PM (#24105487) Journal
  
  Wow! They've invented fixed position data files. What will they invent next, a cool new programming language called RPG?
  The article is actually completely wrong there. The protocol buffer binary format uses tag/value pairs, not fixed positions. Parsers simply ignore any tag they don't recognize and move on to the next.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  Microsoft has open-sourced some things upon abandonment. That's better than some companies, even. Companies can be good in some areas, and evil in others, however.
- Re:WTF am I missing (Score:5, Informative)
  
  by jandrese ( 485 ) writes: <kensama@vt.edu> on Tuesday July 08, 2008 @04:47PM (#24105701) Homepage Journal
  
  They open sourced the compiler (for C++, Java, and Python) that lets you actually use the data interchange format. If you follow the link you can download the code and start using it today. The code is open source.
  
  Parent Share
  twitter facebook
- Re:WTF am I missing (Score:5, Insightful)
  
  by Chyeld ( 713439 ) writes: <chyeld@gma i l . c om> on Tuesday July 08, 2008 @04:51PM (#24105753)
  
  Seems like you are missing the code they released that allows you to implement this in a number of languages from the 'get-go'.
  You've also missed that they've just told the world how the majority of their systems talk, something most people would find interesting given how much Google does and the fact that one of Google's strong points is mangling huge amounts of data in a relatively quickly manner.
  PS. Your format stinks and is horribly slow and unscalable when it comes to adding to the library. Genre's are so unbelievably grey defined that you might as well just sort them by the dominate color of the cover. Google would have done better.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by shis-ka-bob ( 595298 ) writes:
  
  You open access to the source code of the C++, Java and Python libraries that you use in your internal work.
- Re:Why another encoding scheme? (Score:4, Insightful)
  
  by QuoteMstr ( 55051 ) writes: <dan.colascione@gmail.com> on Tuesday July 08, 2008 @04:50PM (#24105749)
  
  This is just yet another way in which Google demonstrates that it is suffering from NIH syndrome [wikipedia.org]. Instead of improving existing tools, they have to go off and re-invent all the bad mistakes of past, including non-relational databases [wikipedia.org], clunky [google.com] binary encodings, and a bizarre non-POSIX filesystem [wikipedia.org].
  Just imagine how far we ahead we would be today if Google had put the same effort into creating tools the rest of the SQL-writing, open(2)-using world could use.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by QuoteMstr ( 55051 ) writes:
    
    I'm not trolling. I genuinely believe what I've written above.
  - Re:Why another encoding scheme? (Score:5, Informative)
    
    by Abcd1234 ( 188840 ) writes: on Tuesday July 08, 2008 @05:33PM (#24106447) Homepage
    
    You think? Take BigTable. Wikipedia describes it as: '"a sparse, distributed multi-dimensional sorted map", sharing characteristics of both row-oriented and column-oriented databases'. Sounds, to me, like a specialized solution to a very specialized problem, a problem that, I presume, didn't fit with any existing solution. Same goes with GFS. After all, do you really think they didn't evaluate existing solutions before embarking on building an entirely new distributed filesystem? Do you really think they're that stupid?
    As for Protocol Buffers, given the existing solutions out there (such as ASN.1 and CORBA) are generally ugly and/or over-engineered, it sounds to me like they're simply addressing a gap in the industry... after all, XML and SOAP aren't the end-all and be-all of generic object-passing protocols.
    
    Parent Share
    twitter facebook
    - - Re: (Score:3, Insightful)
        
        by SnowZero ( 92219 ) writes:
        
        ~2000-2001 (I think the reference is in this video [youtube.com]). Even if something newer is a bit better, we're not going to go back and port everything. Some future Google APIs will probably have an optional PB interface, because that's what it was being converting to internally anyway, so everyone might as well benefit from the compact over-the-wire encoding.
  - Re:Why another encoding scheme? (Score:4, Insightful)
    
    by miffo.swe ( 547642 ) writes: <daniel@hedblom.gmail@com> on Tuesday July 08, 2008 @05:41PM (#24106571) Homepage Journal
    
    I dont think its NIH syndrome. They no doubt tested other solutions before doing their own thing.
    Dont forget this code is in widespread use and works very well. Googles server farm aint exactly small and the load they see is probably second to none.
    A couple of percents of better efficiency for Google probably means millions in saved costs. Tossing a couple of months on development on something like this is money well spent.
    I guess if all you have is SQL everything is a SQL SELECT no matter what you want to achieve.
    
    Parent Share
    twitter facebook
  - need something different (Score:5, Insightful)
    
    by speedtux ( 1307149 ) writes: on Tuesday July 08, 2008 @07:11PM (#24107957)
    
    If Google had tried to build their system on relational databases, XDR, and NFS, they would have spent huge amounts of money and spent lots of time trying to shoehorn their software into those constraints. And it's not just Google that did this: Amazon did the same thing, with their SimpleDB, S3, and SQS.
    The actual mistakes were relational databases, XML, and distributed POSIX file systems; all of those were systems designed by people with too much time on their hand and no real-world, large scale problems to solve. Finally, those mistakes are getting corrected, at least when it comes to high-end computing. At the low end, I suppose people will continue to tinker around with those toys.
    
    Parent Share
    twitter facebook
  - Re:Why another encoding scheme? (Score:5, Insightful)
    
    by CoughDropAddict ( 40792 ) * writes: on Tuesday July 08, 2008 @07:46PM (#24108367) Homepage
    
    You think it's a "mistake of the past" that Google wrote things like GFS and BigTable that run on commodity hardware, scale basically horizontally (eg. you can just throw machines at the problem) and survive machine failures without human intervention?
    You don't "improve" on an existing tool like a relational database by adding a "feature" like fault tolerance. You have to redesign from the base up with those assumptions.
    
    Parent Share
    twitter facebook
  - Re:Why another encoding scheme? (Score:5, Interesting)
    
    by joelwyland ( 984685 ) writes: on Tuesday July 08, 2008 @08:24PM (#24108765)
    
    Just imagine how far we ahead we would be today if Google had put the same effort into creating tools the rest of the SQL-writing, open(2)-using world could use.
    We wouldn't be ahead at all. We use different tools than they do because they are dealing with different volumes of traffic, data and demands. Let's take a moment and look at your specific complaints. You say Google suffers from NIH syndrome. Having previously worked at Google, I think you are half right. The difference is that Google both benefits _and_ suffers from NIH syndrome. Sometimes the company spends too much time reinventing the wheel, but sometimes the tools out there aren't (and shouldn't be) useful to Google. Apache shouldn't be changed to support the kind of traffic that Google handles because then it wouldn't nearly as good for all of the rest of the world. General software is great because it solves so many problems. However, general software isn't the right solution for all problems, especially extreme ones. Just about all of Google's needs are extreme ones due to the volume of traffic. You dislike the idea of BigTable. Why not use the right tool for the right job? BigTable is a ridiculously fast database system that works beautifully with petabyte sized databases. SQL isn't the right answer to all solutions. They DO use SQL... but when it is the appropriate solution. They have some really sexy internal tools for dealing with SQL and such and I'm hoping those are coming down the open source pipeline soon. :) You claim the Protocol Buffers are clunky. I've used them and developed with them extensively. They aren't clunky at all, they are actually quite elegant and easy to use. They streamline development, are incredibly reliable, and are incredibly fast. You obviously are confused by GFS as well. The system is transparent to the application by using standard i/o stream classes. It is inherently redundant to ensure data security. It is so fast in its response time that Google search is the fastest of any major player. The list goes on and on. I don't really see how you can be upset at Google for making awesome software and then giving us access to it.
    
    Parent Share
    twitter facebook
- - Re: (Score:3, Informative)
    
    by MightyMartian ( 840721 ) writes:
    
    It's not hard because XML has to be the most bloated (and yet still, ironically, nowhere near human-readable) format ever invented. That it has not only not been discarded, but is now being used to store binary blobs by guys like Microsoft and OO.org is testimony to the sheer overwhelming stupidity of a lot of developers.
- Re: (Score:2)
  
  by Dan Berlin ( 682091 ) writes:
  
  I don't think you "get" it. Google open sourced this because they thought it would be cool, not because they think it is an amazingly new idea that nobody has ever done before. It's not like Google hasn't been using this internally for 5 years (Which of course, makes all the JSON comments humorous).
- Re: (Score:2)
  
  by neokushan ( 932374 ) writes:
  
  I think it makes me look an order of magnitude smarter, yes.
- Re: (Score:2)
  
  by natoochtoniket ( 763630 ) writes:
  
  Of course it's not new. It not only looks like ASN.1, it actually is very much like ASN.1. But to me it looks more like an extension of rpcgen, because ASN.1 came with a lot of other baggage. Of course, both rpcgen and asn.1 are just the best known implementations of ideas that were developed far earlier. Shannon's book on information theory explains just this sort of prefix code. These kinds of prefix codes have been in use since the 1960s, and code-generators have been around since the 1970s.
  I think
- Re: (Score:3, Informative)
  
  by Temporal ( 96070 ) writes:
  
  It's worth noting that writing alternative encoders and decoders for protocol buffers is really easy (since protocol message objects have a reflection interface, even in C++), so you can use the friendly generated code without being tied to the format.
- - Re: (Score:3, Informative)
    
    by Temporal ( 96070 ) writes:
    
    XML and this protocol differ in only one way: one is plain text, the other is binary.
    They also differ in that XML has a *lot* more features. For example, protocol buffers have no concept of entities, or even interleaved text. Those can be useful when your data is a text document with markup -- e.g. HTML -- but they tend to get in the way when you just want to pass around something like a struct.
- All the world is not a VAX^W^WWindows. (Score:3, Informative)
  
  by argent ( 18001 ) writes:
  
  Anyway, can someone shed some light on how this is different than binary serialization I've been using to pass C# objects around for quite some time now?
  It's portable and language-independent?
- - Re: (Score:3, Informative)
    
    by jrumney ( 197329 ) writes:
    
    Can you still read serialized objects created by older versions of your software?
    As long as all you have done is added new fields, then you can tag the new fields as OptionalField or NonSerialized to maintain backwards compatibility. The advantage of using Google's library is that it works across languages and runtimes. Java, .NET, PHP and Python all have serialization built in, but they are all incompatible, so you can't use it to pass an object from your Java backend to a C# client then on to Python for s
- Re: (Score:3, Informative)
  
  by Temporal ( 96070 ) writes:
  
  This is 49 bytes: <person name="John Doe" email="jdoe@example.com">
  The equivalent Protocol Buffer is 28 bytes. In addition to the 24 bytes of text, each field has a 1-byte tag and a 1-byte length. The example you quoted is protocol buffer *text* format, which is used mostly for debugging, not for actual interchange.
  - Lets actually compare (Score:3, Informative)
    
    by cryptoluddite ( 658517 ) writes:
    
    The big difference is that a protocol buffer cannot be understood without the message format (.proto file). Now lets actually take a look at a real list, like say the developers for apache [sourceforge.net] (as a list of {name:,email:} objects):
    protobuf: ~1654 bytes
    json: 1915 bytes
    protobuf.lzop: ~744 bytes
    json.lzop: 809 bytes
    What you see is precious little difference in the size of the data even though the json is self-describing. The lzop version is essentially identically sized, and compressing and decompressing with lzo

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

An order of magnitude over XML? (Score:5, Funny)

Re:An order of magnitude over XML? (Score:5, Interesting)

Re:An order of magnitude over XML? (Score:4, Funny)

Re:An order of magnitude over XML? (Score:5, Informative)

Re:An order of magnitude over XML? (Score:5, Insightful)

Re:An order of magnitude over XML? (Score:4, Informative)

Re: (Score:3, Informative)

Re:An order of magnitude over XML? (Score:5, Funny)

Re: (Score:3)

Re:An order of magnitude over XML? (Score:5, Funny)

Re: (Score:2)

Re:Between a rock and hard place (Score:4, Insightful)

Likely story! (Score:5, Funny)

More XML? EXI, Efficient Xml Interchange! (Score:4, Informative)

Re:Likely story! (Score:4, Informative)

Re:Likely story! (Score:5, Informative)

Re:Likely story! (Score:5, Funny)

Re:Likely story! (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2, Insightful)

Re:This is a good thing (Score:5, Insightful)

Re: (Score:3, Informative)

I bet ... (Score:5, Funny)

Re: (Score:2, Funny)

What? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

No PERL API ??!!?? (Score:4, Insightful)

Re:No PERL API ??!!?? (Score:4, Insightful)

Re: (Score:2)

Re:No PERL API ??!!?? (Score:5, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re:No PERL API ??!!?? (Score:5, Insightful)

We love the sight of power (Score:2)

How about C? (Score:2)

Re: (Score:3, Insightful)

Re:How about C? (Score:4, Funny)

Now just release Goobuntu... (Score:2)

Re: (Score:2, Funny)

As a former user of CORBA (Score:5, Interesting)

compare to thrift ( from facebook) (Score:5, Informative)

Fast (Score:5, Interesting)

Ok, I'll bite... (Score:5, Interesting)

Smart move (Score:5, Insightful)

The killer feature is simplicity (Score:5, Insightful)

XML is a crappy format (Score:5, Insightful)

XML is not a 'format'! (Score:3, Insightful)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re:XML is a crappy format (Score:4, Insightful)

Re: (Score:3, Informative)

Re:XML is a crappy format (Score:4, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

JSON (Score:5, Interesting)

Re: (Score:2)

Re:JSON (Score:5, Informative)

Re:JSON (Score:4, Informative)

Re:JSON (Score:5, Interesting)

Have they ever heard of BER/DER? (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Have they ever heard of BER/DER? (Score:4, Funny)

ASN.1 encoded with BER/DER just needs tools (Score:4, Informative)

XDR? (Score:2)

How is this different.. (Score:2)

Re: (Score:2)

I have an XML alternative format too. (Score:4, Funny)

Binary message formats are good (Score:3, Insightful)

Old-School Property lists? (Score:4, Insightful)

Elevator Statement (Score:3, Funny)

Re:Back to the 70's night? (Score:4, Insightful)

Re: (Score:2)

Re:WTF am I missing (Score:5, Informative)