Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Microsoft Releases Office Binary Formats

kdawson posted more than 6 years ago | from the this-way-lies-madness dept.

Programming 259

Microsoft has released documentation on their Office binary formats. Before jumping up and down gleefully, those working on related open source efforts, such as OpenOffice, might want to take a very close look at Microsoft's Open Specification Promise to see if it seems to cover those working on GPL software; some believe it doesn't. stm2 points us to some good advice from Joel Spolsky to programmers tempted to dig into the spec and create an Excel competitor over a weekend that reads and writes these formats: find an easier way. Joel provides some workarounds that render it possible to make use of these binary files. "[A] normal programmer would conclude that Office's binary file formats: are deliberately obfuscated; are the product of a demented Borg mind; were created by insanely bad programmers; and are impossible to read or create correctly. You'd be wrong on all four counts."

cancel ×

259 comments

Sorry! There are no comments related to the filter you selected.

first post? (-1, Troll)

darkob (634931) | more than 6 years ago | (#22486854)

Hmm, first post? That's nice from M$. Why did they do that? Because .DOC/.XLS, etc. formats will in any case became obsolete? I wonder.

14$7 P0$7 (0)

Anonymous Coward | more than 6 years ago | (#22486872)

The original post is brought to you by the Microsoft corporation

Re:14$7 P0$7 (-1, Flamebait)

Anonymous Coward | more than 6 years ago | (#22486936)

And the jewniggers. Never forget to mention the jewniggers. If you forget to mention the jewniggers then one of them will come put his enourmous, hastily-circumcised-with-the-jawbone-of-a-wolverine, ebony jungle peen0r in your butt and then he'll pee up in your colon and give you the AIDS. This would suck for a normal person, but since you're on Slashdot we can assume you're a fan of open sores.

Re:14$7 P0$7 (-1, Offtopic)

cloakable (885764) | more than 6 years ago | (#22487642)

Holy crap you nearly packed in every slashdot troll I've seen, apart from the shit eating one and the goatse.

Re:first post? (3, Insightful)

Timothy Brownawell (627747) | more than 6 years ago | (#22486938)

I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats?

Re:first post? (1)

somersault (912633) | more than 6 years ago | (#22486964)

Only took something like 5 years*, eh? :P

* I can't actually remember how long ago it was

Re:first post? (3, Informative)

julesh (229690) | more than 6 years ago | (#22487156)

I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats?

As far as I remember, they only insisted on protocols (it was on the basis of a complaint from server OS vendors that MS was tying their market-leading desktop OSs to their server OSs and gaining an unfair advantage).

Open sores faggots (0, Offtopic)

lennyhell (869433) | more than 6 years ago | (#22486860)

You're like Zonk. You unshaved turds. BTW Frist Psot.

Joel (2, Insightful)

Mario21 (310404) | more than 6 years ago | (#22486894)

Joel's articles are a joy to read. No matter what time I receive the email about a new article by Joel, it will be read on the spot.

Re:Joel (1, Insightful)

zootm (850416) | more than 6 years ago | (#22486922)

I agree to some degree, but as a slight contrary point I find his silly insistence that Hungarian is a "good thing" [joelonsoftware.com] and his constant pimping of FogBugz (especially the "this is usually a bad idea, but it's alright when we do it!" attitude of some of the posts to be a little annoying. He's definitely smart and makes a lot of sense though.

Re:Joel (4, Insightful)

AKAImBatman (238306) | more than 6 years ago | (#22487000)

If you actually read the article, he's right. His point is that the use of Hungarian notation has been bastardized beyond believe. Programmers didn't understand why Hungarian originally used his famous notation, and thus tend to make an error every time they attempt to replicate his work. That's why we have tons of Java programs that look like crap due to some foolish programmer mindlessly following Hungarian Notation.

On the subject of the Office Document format, I believe that everything he says is also true; but with a few caveats. The first is the subject of Microsoft intentionally making Office Documents complicated. I fully accept (and have accepted for a long time) that Office docs were not intentionally obfuscated. However, I also accept that Microsoft was 100% willing to use the formats' inherent complexity to their advantage to maintain lock-in. The unnecessary complexity of OOXML proves this.

The other caveat is that I disagree with his workarounds. He suggests that you should use Office to generate Office files, or simply avoid the issue by generating a simpler file. There's no need to do this as it's perfectly possible to use a subset of Office features when producing a file programatically. Libraries like POI can produce semantically correct files, even if they aren't the most feature rich.

Re:Joel (4, Informative)

zootm (850416) | more than 6 years ago | (#22487164)

I'm not going to say anything against the Microsoft doc; he's pretty much absolutely right and it's a great introduction to why older formats are how they are in general to boot.

The Hungarian thing – no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system; it's essentially adding unverifiable documentation to variable names in a way that is unnecessary, in a language which can verify type assertions perfectly well. The examples in the article are just ones where good variable naming would have been more than sufficient. It's not good enough.

Oh god I've started another hungarian argument.

Re:Joel (3, Interesting)

Anonymous Coward | more than 6 years ago | (#22487612)

Hungarian should not be used in any language which has a reasonable typing system;

That's "Systems Hungarian" in the original article, and you are correct.

"Apps Hungarian", which adds semantic meaning (dx = width, rwAcross = across coord relative to window, usFoo = unsafe foo, etc) to the variable, not typing, is what is good and what he is advocating. It is exactly "good variable naming". You can see that you shouldn't be assigning rwAcross = bcText, because why would you turn assign a byte count to a coordinate even though they're both ints. The article is quite good really. How relevant it is in a .NET/Java world is another discussion entirely.

Re:Joel (1)

mOdQuArK! (87332) | more than 6 years ago | (#22487720)

You're not parsing his (Joel's) article correctly. The Hungarian notation that everyone learned to hate is not the type of notation which was originally proposed.

He describes the original form of Hungarian notation as a way to add a concise description of how the data that a particular variable is holding is intended to be used, NOT just a way to restate the type info already maintained by the compiler.

Way, way, down in the article he has a short blurb which has some short examples of how the original notation was meant to be used:

Apps Hungarian had very useful, meaningful prefixes like "ix" to mean an index into an array, "c" to mean a count, "d" to mean the difference between two numbers (for example "dx" meant "width"), and so forth.

He contrasts that with the way that people ended up understanding Hungarian notation:

Systems Hungarian had far less useful prefixes like "l" for long and "ul" for "unsigned long" and "dw" for double word, which is, actually, uh, an unsigned long. In Systems Hungarian, the only thing that the prefix told you was the actual data type of the variable.

Originally Hungarian did not encode type info (0)

Anonymous Coward | more than 6 years ago | (#22487766)

...or at least, not much, encoding type info is not what he intended. And you've just demonstrated AKAImBatman's point that "Programmers didn't understand why Hungarian originally used his famous notation". That's not to say Hungarian Notation is necessarily good or bad (I'm not arguing about it! heh), but you're not making your judgment on the facts.

Re:Joel (1)

cp.tar (871488) | more than 6 years ago | (#22487872)

I'm not going to say anything against the Microsoft doc; he's pretty much absolutely right and it's a great introduction to why older formats are how they are in general to boot.

The Hungarian thing – no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system; it's essentially adding unverifiable documentation to variable names in a way that is unnecessary, in a language which can verify type assertions perfectly well. The examples in the article are just ones where good variable naming would have been more than sufficient. It's not good enough.

Oh god I've started another hungarian argument.

Hungarian notation has nothing to do with typing systems.
Hell, I'm barely a novice programmer, but even I can see that.

Hungarian notation is a good variable naming practice — as long as you use it to mirror internal program semantics, not create redundant typing information.

So far, I have tried to implement something similar to Hungarian notation in most of my programs; this article taught me a thing or two more, though some aspects touch on things way beyond my level.

Anyway, his article on Hungarian notation and — more importantly — visual code review in general reminds me of feature checking in Chomskyan syntax... easy, mechanical, and rather foolproof if implemented properly.

Re:Joel - Hungarian Notation (1)

mlwmohawk (801821) | more than 6 years ago | (#22488274)

The Hungarian thing - no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system;

A "typing" system doesn't help you read and understand the code. It doesn't give you any clues to the types of data being acted upon in a section of code. While I never bought in to the whole hungarian notation thing, at the time it was an "ism" that people went nuts about, it did address a specific problem with code readability. The concepts addressed by hungarian notation are still valid and some of the naming techniques are still also valid.

One can look at code and see "szKeyName" and know, without having to find the declaration, that it is a zero terminated character string used as a key. That's the crux of hungarian notation, but IMHO Microsoft went crazy with it and focused more on the notation and less on the naming, which actually made things harder to read. Like I said, I didn't go crazy, but even today I still try to incorporate some clue to the type of thing a variable represents in its name.

Hungarian notation is an example of a good idea in moderation that completely destroys itself when overused.

Re:Joel (4, Informative)

mhall119 (1035984) | more than 6 years ago | (#22487294)

Programmers didn't understand why Hungarian originally used his famous notation
It wasn't created by some guy named "Hungarian", it was created by Charles Simonyi.

http://en.wikipedia.org/wiki/Hungarian_notation [wikipedia.org]

Re:Joel (0, Redundant)

Jamu (852752) | more than 6 years ago | (#22487472)

Taking bad examples of code and using them as proof that the other method is good is hardly a convincing argument for Hungarian. I can see how it's useful for some langauges, but C++? Someone enlighten me: What is a convincing argument for using Hungarian in a strongly typed language?

Re:Joel (3, Informative)

encoderer (1060616) | more than 6 years ago | (#22487902)

It's not the language that makes it obsolete, it's today's IDEs.

First, understand that nearly every bit of "Hungarian Notation" you've ever seen is misused. The original set of prefixes suggested by Simonyi were designed to convey the PURPOSE of the variable, not simply the data type. It was adding semantic data to the variable name.

This is still valuable today.

However, in days of lesser IDEs, the more common use of Hungarian Notation is still helpful, as it was a lot more work to trace a variable back to it's declaration to identify the type.

Re:Joel (3, Informative)

encoderer (1060616) | more than 6 years ago | (#22487834)

"Programmers didn't understand why Hungarian originally used his famous notation"

Uhh.. There was never a "Mr. Hungarian" ....

It was invented by Charles Simonyi and the name was both a play on "Polish Notation" and a resemblance to Simonyi's father land (Hungary) where the family name precedes the given name.

Re:Joel (1, Interesting)

mike_sucks (55259) | more than 6 years ago | (#22487864)

All design patterns are workarounds for missing language features. See GTK's use of an object oriented pattern in C, for example. Hungarian is a design pattern (well, naming convention, but same thing) for the same weakly typed language: C.

Modern languages are strongly typed and hence will tell the programmer when they've screwed up, at compile time or later. So there's no need for Hungarian in these languages, much like C# or Java and maybe even C++ now has built in support for object oriented programming.

So again, Joel spins something that was useful historically as being something that is still essential, even though it is now completely redundant. This man is a living, breathing excuse for poor practices based on historical, obsolete artifice.

Now, to get to your point - it is moot that some programmers are using some bastardised version of Hungarian, because even if done correctly it is now a waste of time when using a modern programming language. It only contributes by making a program harder to read, hence increasing complexity and reducing maintainability rather than providing any actual benefit.

/Mike

Re:Joel (1)

encoderer (1060616) | more than 6 years ago | (#22487964)

When done properly, it has nothing to do with being strongly or weakly typed. It has nothing to do with knowing when you've "screwed up."

The original set of prefixes suggested by Simonyi were designed to convey the PURPOSE of the variable, not simply the data type. It was adding semantic data to the variable name.

Outside of HN, the only way to include this semantic information in all the super excellent languages you mentioned is by adding a comment after the variable declaration.

That's do-able, though. Not because of the LANGUAGE, but because of the IDE, where it's trivial now for the IDE to take you back to the declaration of a given variable and then right back to the last position in the codebase.

Re:Joel (1)

mike_sucks (55259) | more than 6 years ago | (#22488310)

If you've got to use an IDE to find the definition of a variable in your codebase, you have a much bigger problem.

Still, how is "rwPosition" any better than "rowPosition"? (from the Wikipedia article) Sure, "i" is kinda ambiguous, but use a modern for-loop instead and get rid of it altogether. Again citing Wikipedia, some of Simonyi's suggested prefixes added semantic information, but not all.

I'll say it again: Hungarian is pointless in a modern language.

/Mike

Re:Joel (1)

richlv (778496) | more than 6 years ago | (#22487254)

the constant bragging and plugging of his own product makes me want to stay away from it as much as possible.

Re:Joel (1)

mike_sucks (55259) | more than 6 years ago | (#22487574)

Ack, no. They always appear to be really nifty on the surface, but they always go wrong on the details.

Take this article, for instance - sure, he's right that trying to implement support for these specs is futile. It's the same reason why Office's OOXML "standard" is a joke. But he didn't really need to spend 6 pages saying so. And sure, the workarounds are fine if you're a Windows shop, but workarounds #2 and #3 are not simple "half day of work" if you have no experience with Microsoft technologies - it's weeks at least. You're much better off using an existing free or proprietary library to do the work in whatever environment you are familiar with.

And 2 days work to allow for an adjustable epoch? To add a constant to a parsed number? I lol'ed! For someone who believes in metrics based software development, he sure seems to be pulling a lot of numbers out of his arse.

I like his stuff when I first came across it, but there's too much cool-aid, not enough reality.

/Mike

patent promise doesn't sound very good (4, Insightful)

Timothy Brownawell (627747) | more than 6 years ago | (#22486898)

Microsoft irrevocably promises not to assert any Microsoft Necessary Claims against you for making, using, selling, offering for sale, importing or distributing any implementation to the extent it conforms to a Covered Specification ("Covered Implementation"), subject to[...]
If your implementation is buggy, does that mean you're not covered?

To clarify, "Microsoft Necessary Claims" are those claims of Microsoft-owned or Microsoft-controlled patents that are necessary to implement only the required portions of the Covered Specification that are described in detail and not merely referenced in such Specification.
This sounds like:
  • If there are any optional parts of the spec, those parts aren't covered.
  • If the spec refers to another spec to define some part of the format, that part isn't covered.

Re:patent promise doesn't sound very good (2, Insightful)

zebslash (1107957) | more than 6 years ago | (#22487054)

Yes, you know, they are afraid that buggy implementations show their format in a bad light. For instance, that would be like writing your own buggy implementation of Java and then to distribute it in order to contaminate the market with a flawed version, just to show it under a bad light. Oh wait...

Re:patent promise doesn't sound very good (0, Troll)

Anonymous Coward | more than 6 years ago | (#22487444)

Yes, you know, they are afraid that buggy implementations show their format in a bad light. For instance, that would be like writing your own buggy implementation of Java and then to distribute it in order to contaminate the market with a flawed version, just to show it under a bad light. Oh wait...
Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

It was that last part that made Sun kill it. They couldn't stand Java running better on Windows than it did on Solaris. (Never mind the fact that Sun's Java to this day runs better on Windows than any other platform.)

So they sued Microsoft and forced Microsoft to stop updating their Java implementation. Because they were no longer updating Java their implementation fell behind Sun's. Because Sun has this compulsive urge to bloat the Java library, the Microsoft implementation became incomplete over time.

But back in the day, the Microsoft J++ development environment was far superior to anything Sun had to offer. We're talking a good 10 years ago. Sun has finally managed to catch up in the past two or three years, but still, Sun's problem wasn't that the Microsoft implementation was worse: their problem was that it was better.

Re:patent promise doesn't sound very good (5, Informative)

jsight (8987) | more than 6 years ago | (#22487632)

Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.


Among other issues, borderlayoutmanager did not behave properly in MS's implementation. It was buggy in incompatible ways, but your right, that in and of itself wasn't the big problem. The big problem was their insistence on both not fixing the bugs, and not going along with major initiatives (such as JFC/Swing).

But back in the day, the Microsoft J++ development environment was far superior to anything Sun had to offer. We're talking a good 10 years ago. Sun has finally managed to catch up in the past two or three years, but still, Sun's problem wasn't that the Microsoft implementation was worse: their problem was that it was better.


If by "2 or 3 years" you mean about 5 years, then I'd agree. Java development tools didn't really reach maturity until things like Eclipse came onto the scene about 5 years ago.

Re:patent promise doesn't sound very good (2, Insightful)

Anonymous Coward | more than 6 years ago | (#22487722)

actually the problem was that microsoft 'extended' it after the previous step of 'embrace' - and continued to call it java.
These extensions were of course, windows only - which missed the entire point of a cross platform language.

the old 'embrace,extend,extinguish' strategy has been in the microsoft playbook for quite a while.

Re:patent promise doesn't sound very good (5, Insightful)

msuarezalvarez (667058) | more than 6 years ago | (#22487976)

Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

If their `implementation' different from the specs, then it was not a correct implementation. If it was supposed to be a Java implementation, then by definition it was buggy. If wasn't suppose to be one, then it had no business being called Java. That is why Sun sued them.

Re:patent promise doesn't sound very good (4, Informative)

ozmanjusri (601766) | more than 6 years ago | (#22488050)

Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

Ah, marketing. Where would we be without it?

Microsoft developed J/Direct specifically to make Java non-portable to other OSs. The MS JVM wasn't better than Suns, it was just tied heavily into the OS, and code developed for it broke if run on any other VM.

J++ was another lockin tool to ensure any "Java" developed in Microsoft's IDE would only run on Microsoft OSs. JBuilder was always a better package anyway.

Re:patent promise doesn't sound very good (5, Interesting)

Ed Avis (5917) | more than 6 years ago | (#22487064)

Basically, Microsoft reserves the right to sue you for software patent infringements. So do thousands of other big software companies and patent troll outfits. The new thing now is that Microsoft likes to generate FUD by producing partial waivers and promises that apply to some people in limited circumstances (Novell customers, people 'implementing a Covered Specification', and so on). The inadequacy of this promise draws attention to the implicit threat to tie you up in swpat lawsuits, which was always there - but until this masterstroke of PR the threat wasn't commented on much.

Ignore the vague language and develop software as you always have.

Re:patent promise doesn't sound very good (5, Informative)

ContractualObligatio (850987) | more than 6 years ago | (#22487132)

If there are any optional parts of the spec, those parts aren't covered.

RTFA. That's in the FAQ. Yes they are.

If the spec refers to another spec to define some part of the format, that part isn't covered.

In other words - if you do something related to a spec that isn't covered, it isn't covered. How could it be any different?!

I'm not saying that there aren't any flaws, but this kind of ill informed, badly thought out comment (a.k.a. "+5 Insightful", of course) has little value.

Re:patent promise doesn't sound very good (1)

mhall119 (1035984) | more than 6 years ago | (#22487482)

In other words - if you do something related to a spec that isn't covered, it isn't covered. How could it be any different?!
I think the concern is that the "something related to the spec" is actually something vitally important to the spec.

Re:patent promise doesn't sound very good (4, Interesting)

julesh (229690) | more than 6 years ago | (#22487192)

If your implementation is buggy, does that mean you're not covered?

That is my primary concern with the entire promise. None of this bullshit not-tested-in-court crap that came up the other day: it doesn't cover implementations with slight variations in functionality.

This, it seems, is intentional. MS don't want to allow others to embrace & extend their standards.

Re:patent promise doesn't sound very good (1)

Azuma Hazuki (955769) | more than 6 years ago | (#22488044)

Where is the "itsatrap!" tag?

No, seriously. This looks like a patent trap. I didn't read the article, but the sheer number of weasel words in the introduction alone and some posts (like the first, which I'm replying to) make it obvious that this is another attempt to trap and destroy FOSS and its developers. Steer clear!

OOo developers once said, (1, Flamebait)

Enleth (947766) | more than 6 years ago | (#22486914)

that the hell would rather freeze over - well, looks like Satan is now skating on frozen magma lakes...

Obfuscation (2, Insightful)

Anonymous Coward | more than 6 years ago | (#22486926)

Except... we all don't have this, OLE, thing on our computers nor do we all walk it easier than the languages we deal with now.

But let's say you do. Now you have to find an API to do it for you. As an every day guy, I can write my own HTTP parser, IP connection manager and so forth, w/o requiring special API to do it. As a smarter guy, I'd look for the libraries that can do some of the heavy lifting for me. It's flexibility. The document structure is going to affect how I write code to work with ti.

W/ office docs, Joel is arguing, I have to know the one way to interact with them. There's no TIMTOWTDI about it. There's no intuitive way to do it either. Were the format to be simple, be it "sanely" constructed CSV, XML, RTF, etc, I have more choices. I'd rather use the most well known, bestest of the best, but sometimes it's not intuitive and just hamper's work. It shuts out programmers who would think, open(file); readSomeData(); construct_a_structure();. Now it's, structure = oneOfAHandfulOfParsersThatWillEverWork().

The worst part of that is, since I have no way *I* can choose how to mess with documents. I have to either a) spend more time figuring out the native format unless I'm a genius or have an MS crone behind me, or b) parse it incorrectly, and then have to go back and fix any number of things, including my methodology. Remember how the various encodings affected document format? I.e. UTF-8, 16, Latin-1, Unicode, etc etc etc..

Joel, you're not right.

Re:Obfuscation (2, Insightful)

wlandman (964814) | more than 6 years ago | (#22488374)

What Joel is trying to say is that at the time that Excel and other Office products were made, it was not possible to store it in XML. Joel also reminds us that as Microsoft had new versions of the software come out, they had to keep the compatability with the older versions.

I think Joel makes a lot of good points and gives great insight into thinking at Microsoft.

One possible reason for releasing the specs now (5, Insightful)

Stan Vassilev (939229) | more than 6 years ago | (#22486932)

One may wonder, why release the documentation now?

If you read Joel's blog you'll see the formats are very old, and consist primarily of C-structs dumped to OLE objects, dumped directly to what we see as an XLS, DOC and so on files.

There's almost no parsing/validation at load time.

Having this in a well laid documentation may reveal quite a lot of security issues with the old binary formats, which could lead to a wave of exploits. Exploits that won't work on Microsoft's new XML Office formats.

So while I'm not a conspiracy nut, I do believe one of Microsoft's goals here are to assist the process of those binary formats becoming obsolete, to drive Office 2007/2008 adoption.

Re:One possible reason for releasing the specs now (5, Informative)

Chief Camel Breeder (1015017) | more than 6 years ago | (#22487232)

Actually, I think they're releasing it now because they were ordered to in a (European?) court settlement, not because they want to.

Re:One possible reason for releasing the specs now (2, Insightful)

friedman101 (618627) | more than 6 years ago | (#22487416)

Come on. You really think Microsoft wants to increase the vulnerability of old versions of Office (which are still the vast majority in corporate America). This not only makes their software looks bad, it increases the amount of work they have to do to support the older versions (yes, they still support Office 2003). You don't sell new cars by convincing people the last model was rubbish. I think your tin-foil hat fits a little to tight.

Re:One possible reason for releasing the specs now (5, Insightful)

Stan Vassilev (939229) | more than 6 years ago | (#22487604)

Come on. You really think Microsoft wants to increase the vulnerability of old versions of Office (which are still the vast majority in corporate America). This not only makes their software looks bad, it increases the amount of work they have to do to support the older versions (yes, they still support Office 2003). You don't sell new cars by convincing people the last model was rubbish. I think your tin-foil hat fits a little to tight.

Let me break your statement in pieces:

- that would increase the vulnerability of old Office
- the majority of corporate America is stuck on old Office
- you don't sell old cars by convincing old ones are rubbish

You know, have you seen those white-papers by Microsoft comparing XP and Vista and trying to put XP-s reliability and security in bad light?

Or have you seen those ads where Microsoft rendered people using old versions of office as... dinosaur-mask wearing suits?

If the majority of corporate America uses the old Office, then the only way for Microsoft to turn in profit would be to somehow convince them this is not good for them anymore, and upgrade. You're just going against yourself there.

Re:One possible reason for releasing the specs now (1)

hahiss (696716) | more than 6 years ago | (#22487616)

If you go to the following page, you'll see that Microsoft is indeed selling its new product, Vista, by comparing its speed and safety to . . . Windows XP:

http://www.microsoft.com/windows/products/windowsvista/facts.mspx?wt_svl=10355VH_OS_Other1&mg_id=10355VHb1 [microsoft.com]

While clearly they are saying that Vista is "better" (faster to boot, 20% fewer "hangs," and a metric crapload reduction in likelihood of pwnage), the subtext here is that if you run XP, you're vulnerable to pwnage, increased hanging, and slow boot times. Microsoft isn't saying "We rocked before, and now we turned it up to 11"; this page clearly says "Yeah, we realize there's a bunch of stupid stuff that our old OS does, and the new one doesn't".

Microsoft marketing (3, Insightful)

Comboman (895500) | more than 6 years ago | (#22487936)

You don't sell new cars by convincing people the last model was rubbish.

You're kidding right? That's been exactly Microsoft's marketing strategy for the last ten years. Remember the Win9X BSOD ads for Windows XP? Microsoft is in the difficult position where their only real competition is their own previous products.

So, then MS will release the XP source next? (0, Flamebait)

patrixx (30389) | more than 6 years ago | (#22487860)

So that XP get exploited and thus puts Vista in better light...

Re:One possible reason for releasing the specs now (0)

Anonymous Coward | more than 6 years ago | (#22487938)

the main benefit is publisher etc, they still use the binary formats and are not popular enough to be reverse engineered

Re:One possible reason for releasing the specs now (3, Informative)

Jugalator (259273) | more than 6 years ago | (#22488102)

So while I'm not a conspiracy nut, I do believe one of Microsoft's goals here are to assist the process of those binary formats becoming obsolete, to drive Office 2007/2008 adoption.
Not a chance. Microsoft is bound to release Office 2003 security updates until January 14, 2014 [microsoft.com] .

Office Doc Generation on the Server (5, Informative)

VosotrosForm (1242886) | more than 6 years ago | (#22486956)

I would like to point out another good option Joel doesn't have on his list. It's a software called OfficeWriter, from a company named SoftArtisans in Boston. When I last checked/worked there, it was capable of generating Excel and Word docs on the server, and I believe Powerpoint was probably coming relatively soon. Creating a product that can write office documents isn't quite as impossible in terms of labor as Joel is saying.... but it's still way beyond any hobby project. Plus, he is suggesting that you use Excel automation or the like through scripts to create documents on the server, which is a decent suggestion, if you want Excel or Word to constantly crash and lock up your server, and you enjoy rebooting them every day. If you want to do large scale document generation on a server you are going to need something like Officewriter. -Vosotros/Matt

Promise not a license (5, Insightful)

G0rAk (809217) | more than 6 years ago | (#22486966)

As PJ pointed out over on Groklaw [groklaw.net] , MS are giving a "Promise" not to sue but this is very very far from a license. Careful analysis suggests that any GPL'd software using these binaries could easily fall foul of the fury of MS lawyers.

Re:Promise not a license (5, Interesting)

morgan_greywolf (835522) | more than 6 years ago | (#22487082)

As PJ pointed out over on Groklaw, MS are giving a "Promise" not to sue but this is very very far from a license. Careful analysis suggests that any GPL'd software using these binaries could easily fall foul of the fury of MS lawyers.
Correct.

Here's my suggestion: someone should use these specs to create a BSD-licensed implementation as a library. Then, of course, (L)GPL programs would be free to use the implementation. Nobody gets sued, everybody is happy.

Hey! That was MY suggestion! (0)

Anonymous Coward | more than 6 years ago | (#22487418)

Unfortunately, I think I BSD released it... :-(

I know! I'll get Theo to rant at you!!! :-)

Re:Promise not a license (1)

Vexorian (959249) | more than 6 years ago | (#22487272)

And it is just a promise, so even if you are not GPLed you'll live under the "will Microsoft break the promise tomorrow when I wake up"?

Re:Promise not a license (2, Informative)

Pofy (471469) | more than 6 years ago | (#22487846)

>As PJ pointed out over on Groklaw, MS are giving a "Promise"
>not to sue but this is very very far from a license.

Some (hypothetical?) questions:

What would happen if those patents in some way was transfered to someone else?

Despite the promise, are you still actually infringing the patent? Just with an assurance of the current patent holder that he won't do anything?

If so, what would happen if it becomes criminal to break a patent (it was quite close to be part of an EU directive not so long ago)? Together with such suggestions one have also seen sugestions that police should be allowed (and required?) to act on those crimes even without a filing from someone suffering infringment. How would that apply to a situation with such a promise?

Why not ODF or OOo? (2, Interesting)

jfbilodeau (931293) | more than 6 years ago | (#22486968)

Why does the author avoid any mention of ODF or OpenOffice as alternatives to work with MS Office docs? He seems stuck on 'old' formats like WKS or RTF.

I know OOo is not a perfect Word/Excel converter, but it has served me marvelously since the StarOffice days. I wish that there was a simple command-line driven tool that could convert .docs or .xls to ODS or PDF using the OOo code. Any one knows about such a tool?

Re:Why not ODF or OOo? (0)

Anonymous Coward | more than 6 years ago | (#22487788)

It would appear the point of the article was the fact that Microsoft released the specifications. Whether or not alternatives exist or should be used are totally off topic. If you made some discussion about the merits of MS's design versus the ODF design, that would be one thing, but that's not what matters. What matters is the *fact* that most people use MS Office binary formats and this specification provides the resources (if they so desire) to see exactly what is in those files.

Their way out of long-term support (1, Insightful)

Anonymous Coward | more than 6 years ago | (#22486986)

How to look nice and offload some work in one shot.

With this M$ can shut off critics that say proprietary formats are evil, especially those using the long-term viability argument.

Now that the formats are documented, hordes of open source hobbyist can develop (for free) code and tools to read / convert the old Office formats. Then M$ will tell "See, we do not lockout anybody, there are myriads of ways to read our old crap".

Smart indeed. And anyway these format do not hold any competitive advantage anymore since most users are coping with the new ones now.

Re:Their way out of long-term support (1)

MrNaz (730548) | more than 6 years ago | (#22488158)

"There are myriad ways to read our old crap."

Is there some secret conspiracy I am not aware of to butcher the use of this word? Why does every attempt to use it end up in miserable failure.

mo3d up (-1, Troll)

Anonymous Coward | more than 6 years ago | (#22487032)

anything c4n would take about 2 marketing surveys

Retaliation? (2, Interesting)

ilovegeorgebush (923173) | more than 6 years ago | (#22487040)

Is this retaliation to the impending doom of the OOXML format requesting ISO standard status? Is MS's thinking: "Right, ISO has failed us, so we'll release the binaries so everyone keeps using the office formats anyway"?

Hmm. (1)

Uzuri (906298) | more than 6 years ago | (#22487044)

"[A] normal programmer would conclude that Office's binary file formats: are deliberately obfuscated; are the product of a demented Borg mind; were created by insanely bad programmers; and are impossible to read or create correctly. You'd be wrong on all four counts..." ...It's something far more sinister.

(Sorry, sometimes ya just gotta get it out)

I thought it was pretty well known (1, Insightful)

erroneus (253617) | more than 6 years ago | (#22487088)

Just as OOXML files and WMF make references to Windows or Office programming APIs, I think it would come as no surprise to anyone that Office binary formats would also make similar references. The strategy behind it would be obvious -- to tie the data to the OS and to the software as closely as possible.

Re:I thought it was pretty well known (2, Informative)

leuk_he (194174) | more than 6 years ago | (#22487200)

Did you read the article? Nah, why would you do so for some MS bashing.

If you read the article you would notice that the binary solution of winword 97 (and in fact it is compatible with it predecessors) was a good solution in 1992 when word for windows 2.0 was created. Machines did have have less memory and processing power that your phone, and still had to be able to open a document fast.

my conclusion is that the open office devs are crazy that they ever supported the word .doct format, and did a surprisenly good job.

Re:I thought it was pretty well known (1)

erroneus (253617) | more than 6 years ago | (#22487874)

Yes, I read the article and I don't buy into it.

The fact is, Word in its early versions was NOT significantly faster than its competitors and neither was Excel. Word Perfect and Lotus 1-2-3 did everything people needed and they did it within the resource constraints of the day.

The article is leading in attempting to address the "limited resources" of the day because for most of us, we find it amazingly difficult to imagine operating in a 1MB operating environment. The article also fails to identify the actual time-line of development and what platforms were like with each release of Word, Excel or Windows. They tried to make it sound like Word 2.0 was linking with Excel from day-one. It was not. And it certainly didn't do the things we expect to see (but rarely use) today back in the earlier days.

The article was nothing more than a list of whiny excuses for what Microsoft did when others were able to accomplish the same functionality without all the nonsense.

The reality is that when you tie your documents to the OS and the Office software, it's simply a lot harder to write competing apps that can work with the same data. If the document formats can stand alone, then writing apps that can use the data becomes a lot more simple. And since others were able to accomplish the same ends without the nonsense described in the article, I'd say there must have been some OTHER motivation behind their departure from standard coding practices of the day... and even standard coding practices of TODAY!

I didn't think I'd have to remind anyone that the main reason why OOXML will never be an ISO standard is because the format does not stand on its own. It requires reference to Windows and Office programs to work.

I especially loved the part about how an Excel file is a file system within a file. Sounds like an archive to me. Not like it hasn't been done before.

Re:I thought it was pretty well known (1)

BluenoseJake (944685) | more than 6 years ago | (#22487406)

Maybe they just want to reuse code? Using operating facilities to do useful work instead of reinventing the wheel makes sense, and it's just good programming practice. Maybe that tinfoils on a bit tight, not everything is a conspiracy.

Re:I thought it was pretty well known (4, Interesting)

erroneus (253617) | more than 6 years ago | (#22487644)

It's a DOCUMENT format. You know, you put words and pictures in there? Things you type in with your own keyboard with your fingers? There should be no need to have API calls in a document format. The same is true for WMF. WMF was very exploitable as a result, so not only is it bad style, it's dangerous.

Re:I thought it was pretty well known (1)

Koohoolinn (721622) | more than 6 years ago | (#22487962)

Maybe that tinfoils on a bit tight, not everything is a conspiracy.
M$ has a particular bad record concerning ulterior motives. People have been bitten so many times that erring on the safe side makes perfect sense.

"compound documents." oh no, run away! (4, Interesting)

radarsat1 (786772) | more than 6 years ago | (#22487122)

You see, Excel 97-2003 files are OLE compound documents, which are, essentially, file systems inside a single file.

I don't see why just because something is organized filesystem-like (not such an awful idea) means it has to be hard to understand. Filesystems, while they can certain get complicated, are fairly simple in concept. "My file is here. It is *this* long. Another part of it is over here..."

They were not designed with interoperability in mind.

Wait, I thought you were trying to convince us that this doesn't reflect bad programming...

That checkbox in Word's paragraph menu called "Keep With Next" that causes a paragraph to be moved to the next page if necessary so that it's on the same page as the paragraph after it? That has to be in the file format.

Ah, I see, you're trying to imply that it's the very design of the Word-style of word processor that is inherently flawed. Finally we're in agreement.

Anyways, it's no surprise that it's all the OLE, spreadsheet-object-inside-a-document, stuff that would make it difficult to design a Word killer. (How often to people actually use that anyway?) It would basically mean reimplementing OLE, and a good chunk of Windows itself (libraries for all the references to parts of the operating system, metafiles, etc), for your application. However, it certainly can be done. I'm not sure it's worth it, and it can't be done overnight, but it's possible. However you'll have a hard time convincing me that Microsoft's mid-90's idea of tying everything in an application to inextricable parts of the OS doesn't reflect bad programming. Like, what if we need to *change* the operating system? At the very least, it reflects bad foresight, seeing as they tied themselves to continually porting forward all sorts of crud from previous versions of their OS just to support these application monstrosities. This is a direct consequence of not designing the file format properly in the first place, and just using a binary structure dump.

It reminds me of a recovery effort I tried last year, trying to recover some interesting data from some files generated on a NeXT cube from years ago. I realized the documents were just dumps of the Objective C objects themselves. In some ways this made the file parseable, which is good, but it other ways it meant that, even though I had the source code of the application, many of the objects that were dumped into the file were related to the operating system itself instead of the application code, which I did _not_ have the source code to, making the effort far more difficult. (I didn't quite succeed in the end, or at least I ran out of time and had to take another approach on that project.)

In their (MS's) defense, I used to do that kind of thing back then too, (dumping memory structures straight to files instead of using extensible, documented formats), but then again I was 15 years old (in 1995) and still learning C.

Re: "compound documents." oh no, run away! (4, Insightful)

ContractualObligatio (850987) | more than 6 years ago | (#22487452)

It's interesting you give a nicely egotistical critique of a well-regarded expert's article, but don't suggest a single alternative to how M$ could have met their design goals, nor explain why the no-interoperability assumption was unreasonable at the time. If you can't appreciate the design goals, nor suggest a way to meet them, what's the point of the rest of your post?

Re: "compound documents." oh no, run away! (1)

radarsat1 (786772) | more than 6 years ago | (#22487626)

It's interesting you give a nicely egotistical critique of a well-regarded expert's article, but don't suggest a single alternative to how M$ could have met their design goals, nor explain why the no-interoperability assumption was unreasonable at the time. If you can't appreciate the design goals, nor suggest a way to meet them, what's the point of the rest of your post?


I think the design goals were flawed. That's my point. Their design goals should have included, how can we ensure that our customer's data will be (usefully) readable in the future? Sure, back then maybe it was worth it to skimp on validation in order to squeeze out a few extra microseconds of processing time, because the competition would avoid doing this and beat you with claims of efficiency. I guess we've all learned a lot about how to deal with data since the 90's. A big part of that was learning the importance of metadata. (ie., tagged, extensible formats)

Anyways, just because it was done years ago, under different conditions, doesn't mean it wasn't bad programming. Maybe everyone else would have done it the same way, maybe I would have too. Still doesn't mean it wasn't bad programming. (I shouldn't say "bad programming" of course, the code could be fine for all I know.. I should say "bad design", in hindsight. Like a lot of things.)

By the way, "the no-interoperability assumption" is _always_ unreasonable. (IMHO of course.)

Hmm (1)

woolio (927141) | more than 6 years ago | (#22487476)

In their (MS's) defense, I used to do that kind of thing back then too, (dumping memory structures straight to files instead of using extensible, documented formats), but then again I was 15 years old (in 1995) and still learning C.

Except for the "1995" part, wasn't that pretty much how Microsoft got started?

They haven't advanced from that point by much....

Re: "compound documents." oh no, run away! (2, Insightful)

Anonymous Coward | more than 6 years ago | (#22487622)

I don't see why just because something is organized filesystem-like (not such an awful idea) means it has to be hard to understand. Filesystems, while they can certain get complicated, are fairly simple in concept. "My file is here. It is *this* long. Another part of it is over here..."
He didn't say File systems were complex, he said Ole compound documents were complex. Look it up on MSDN. It's a tad painful to work with.

"They were not designed with interoperability in mind."

Wait, I thought you were trying to convince us that this doesn't reflect bad programming...
Wholly out of context, Batman! They made a design decision to ignore interoperability and optimized towards small memory space. What part of that is hard to understand? You think everything should be designed up front for interoperability, regardless of context? In the mid to late 80s, there just wasn't a huge desire for this feature, as Joel states.

but then again I was 15 years old (in 1995) and still learning C.
Ah, now your post makes sense. You completely lack perspective. The Word/Excel doc formats were around 10 years before you. You lack the knowledge about why dumping C data structures directly to disk was necessary--even though Joel spells it out. You don't understand what OLE truly solved (not just embedding spreadsheets inside of word, by the way). And most importantly, you seem to lack the ability to understand design trade-offs.

Re: "compound documents." oh no, run away! (1)

radarsat1 (786772) | more than 6 years ago | (#22488136)

He didn't say File systems were complex, he said Ole compound documents were complex. Look it up on MSDN. It's a tad painful to work with.


I didn't say this. I said I don't see why the fact that OLE documents being like file systems (according to TFA), means that they must necessarily be complex. i.e., I'm saying file systems aren't necessarily complex concepts, and therefore it's not an excuse for a convoluted file format. Anyways, maybe it's straining his analogy further than he intended, so I'll give you that.

Wholly out of context, Batman! They made a design decision to ignore interoperability and optimized towards small memory space. What part of that is hard to understand?


What makes you think I don't understand it? It's still bad programming. Not that I have statistics, but there were plenty of examples of software that used the same or less memory than Word but managed to have better document formats.

Ah, now your post makes sense. You completely lack perspective. The Word/Excel doc formats were around 10 years before you. You lack the knowledge about why dumping C data structures directly to disk was necessary--even though Joel spells it out. You don't understand what OLE truly solved (not just embedding spreadsheets inside of word, by the way). And most importantly, you seem to lack the ability to understand design trade-offs.


No, I understand them. I just don't think they made the right trade-offs. It's not like they had no competition at the time, other companies that a lot of people other than me still claim had better software. Anyways it's sort of a moot argument, since what's done is done. We don't really need to write these formats any more, just read them.

Re: "compound documents." oh no, run away! (0)

Anonymous Coward | more than 6 years ago | (#22487812)



In their (MS's) defense, I used to do that kind of thing back then too, (dumping memory structures straight to files instead of using
extensible, documented formats), but then again I was 15 years old (in 1995) and still learning C.



Well, 1995 microsoft wasn't much older than you so it's kind of understandable.

Re: "compound documents." oh no, run away! (4, Informative)

Thundersnatch (671481) | more than 6 years ago | (#22487956)

Anyways, it's no surprise that it's all the OLE, spreadsheet-object-inside-a-document, stuff that would make it difficult to design a Word killer. (How often to people actually use that anyway?)

At my company, our users do that every day. Excel spreadsheets embedded in Word or PowerPoint, Microsoft office Chart objects embedded in everything. It's what made the Word/Excel/PowerPoint "Office Suite" a killer app for businesses. MS Office integration beat the pants of the once best-of-breed and dominant Lotus 1-2-3 and WordPerfect. When you embed documents in Office, instead of a static image, the embedded doc is editable in the same UI, and can be linked to another document maintained by somebody else and updated automatically. It saves tremendous amounts of staff time.

Re: "compound documents." oh no, run away! (1)

petermgreen (876956) | more than 6 years ago | (#22488148)

It reminds me of a recovery effort I tried last year, trying to recover some interesting data from some files generated on a NeXT cube from years ago. I realized the documents were just dumps of the Objective C objects themselves.
IMO the powerfull serialisation formats of modern langauges are even worse than just dumping out C structs. If an app just dumps out C structs then you can probablly figure out the binary format pretty quickly with just the source for the app and a pagefull or so of information on the C compiler used. The application designer still has to pay some attention to file format design because structures containing pointers can't be saved directly.

For a modern serialisation format things are typically far worse, the app developer is less likely to pay attention to KISS when he can serialise any arbitary graph of objects and you need both the apps code and a load of information on how the language serialises stuff.

access (1)

oliverthered (187439) | more than 6 years ago | (#22487136)

Still missing the binary format for access, still never mind it's not that hard to work out [sourceforge.net]

Re:access (1)

Hulver (5850) | more than 6 years ago | (#22488348)

Ah, a typical sourceforge project. "We're almost ready for a beta release!" (Dated 2002), and software release (version 0.0.4 also dated 2002).

Oh right, it was so easy they got it right first time and never had to update it since?

Worst. Workaround. Ever. (4, Interesting)

organgtool (966989) | more than 6 years ago | (#22487206)

FTA:

There are two major alternatives you should seriously consider: letting Office do the work, or using file formats that are easier to write.
His first workaround is to use Microsoft Office to open the document and then save that document in a non-binary format. Well that assumes that I already have Microsoft Windows, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, etc. Do you see the problem here?

The second "workaround" is the same as the first, only a little more proactive. Instead of saving my documents as binary files and then converting them to another format, I should save them as a non-binary format from the start! Mission accomplished! Oh wait - how do I get the rest of the world to do the same? That could be a problem.

I fail to see the problem with using the specification Microsoft released to write a program that can read and write this binary format. If Microsoft didn't want it to be used, they would not have released it. Even if Microsoft tried to take action against open source software for using the specs that they opened, how could Microsoft prove that the open source software used those specs as opposed to reverse engineering the binary format on their own? I think this is a non-issue.

Re:Worst. Workaround. Ever. (1, Troll)

malevolentjelly (1057140) | more than 6 years ago | (#22487462)

I think this workaround is for companies and professionals with resources, not just zealots. Chances are if you are doing web applications that parse MS Office formats, you're intelligent enough to be running office on one of your servers, instead of pouring thousands of man hours into implementing something that you can give away to competitors through the GPL.

If you are an open source zealot, I recommend the following work-arounds:

* Complain that the code is somehow inferior

* Make a conspiracy theory about how Microsoft foresaw open source and were trying to stifle it

* Solve 40% of the problem and claim superiority

* Hack something unreadable together in perl and pretend that it's more interoperable- once more, claim superiority

Re:Worst. Workaround. Ever. (1)

dedalus2000 (704571) | more than 6 years ago | (#22487972)

first open source "zealots" are professionals with resources. second who better to beta test a non core piece of software nicety then your competitors.

Re:Worst. Workaround. Ever. (1)

Toone_Town (612696) | more than 6 years ago | (#22487730)

And I love his suggestion to access this using ASP.net under IIS...as if I really want to be running *OFFICE* on my *WEB SERVER*...one more thing to exploit.

Re:Worst. Workaround. Ever. (3, Insightful)

ContractualObligatio (850987) | more than 6 years ago | (#22487740)

I fail to see the problem with using the specification Microsoft released to write a program that can read and write this binary format

That is almost the the stupidest thing I've read today (RTFA with respect to development costs to figure out why), except for this:

If Microsoft didn't want it to be used, they would not have released it.

We can ignore the shockingly poor logic inherent to this statement and just take it at face value: doing something just because M$ wants you to would easily make the Top 10 Stupid Things To Do In IT list. It's particularly bizarre to hear it on Slashdot.

Joel being apologetic (1)

porkThreeWays (895269) | more than 6 years ago | (#22487222)

Joel is being awfully apologetic. I understand why they are bad formats, but it doesn't change the fact they are bad.

Re:Joel being apologetic (2, Informative)

slapout (93640) | more than 6 years ago | (#22488350)

Joel worked on the Excel team.

Don't Adopt. Convert. (5, Insightful)

Doc Ruby (173196) | more than 6 years ago | (#22487278)

Spolsky's advice explains that the format code is extremely bad code from the POV of a programmer picking it up to use starting now. Because it grew like a coral reef, starting so long ago that interoperability with anything else but the app's codebase at the time was not in the designs. And every new feature was thrown in as a special case, rather than any general purpose facility for kinds of features or future expansion. The Microsoft legacy that leverages every year's market position into expansion the next year.

But we're not Microsoft, and we don't have the requirements MS had when making these formats. So we should by no means perpetuate them. We should do now what MS never had reason to do: upgrade the code and drop the legacy stuff that makes most of the code such a burden, but doesn't do anything for the vast majority of users today (and tomorrow).

That's OK, because Microsoft has done that, too, already. The MS idea of "legacy to preserve" is based on MS marketing goals, which are not the same as actual user requirements. So that legacy preservation doesn't mean that, say, Office 2008 can read and write Word for Windows for Workgroups for Pen Computing files 100%. MS has dropped plenty of backwards compatibility for its own reasons. New people opening the format for modern (and future) use can do the same, but based on user requirements, not emphasis on product lines if that's not a real requirement.

So what's needed is just converters that use this code to convert to real open formats that can be maintained into the future. Not moving this code itself into apps for the rest of all time. Today we have a transition point before us which lets us finally turn our back on the old, closed formats with all their code complexity. We can write converters that can be used to get rid of those formats that benefited Microsoft more than anyone else. Convert them into XML. Then, after a while, instead of opening any Word or Excel formats, we'll be exchanging just XML, and occasionally reaching for the converter when an old file has to be used currently. MS will go with that flow, because that's what customers will pay for. Soon enough these old formats will be rare, and the converters will be rare, too.

Just don't perpetuate them, and Microsoft's selfish interests, by just embedding them into apps as "native" formats. Make them import by calling a module that can also just batch convert old files. We don't need this creepy old man following us around anymore.

doing the right thing (5, Insightful)

carou (88501) | more than 6 years ago | (#22487298)

From Joel's FA:

There are two kinds of Excel worksheets: those where the epoch for dates is 1/1/1900 (with a leap-year bug deliberately created for 1-2-3 compatibility that is too boring to describe here), and those where the epoch for dates is 1/1/1904. Excel supports both because the first version of Excel, for the Mac, just used that operating system's epoch because that was easy, but Excel for Windows had to be able to import 1-2-3 files, which used 1/1/1900 for the epoch. It's enough to bring you to tears. At no point in history did a programmer ever not do the right thing, but there you have it.
Nonsense.

When Excel started importing 1-2-3 documents, the right way to do that would be to create an importer to your own native format. Not to munge a new slightly different format into your existing structures. Yes, you'd have had to convert some dates between 1900 and 1904 formats (and maybe, detect cases where the old 1-2-3 bug could have affected the result) but at least you wouldn't be trying to maintain two formats for the rest of time.

If this is an example of programmers throughout history always doing exactly the right thing, I'd hate to see an example of code where the original author regretted some mistakes that had been made.

Re:doing the right thing (1, Insightful)

Anonymous Coward | more than 6 years ago | (#22488284)

Your assumption is that the people making the 123 'scripts' (and there were many of those) didnt depend on that bug. Remember the MS mantra 'embrace and extend'. Excel and 123 have full out programming languanges built in. It is not easy to build an inteperter that would say fit on 3 floppy discs and have memory left over. That memory can be used for OTHER things such as features people actualy use. Never mind the regression testing making sure it works. Convert in place is a perfectly logical assumption to do.

Also remember 'small' features were most likely written by an intern, or 'the new guy'. Not some grizzeld vetren. Plus there probably have been hundreds of coders in there. I would be willing to bet some of that code has not been touched in years. I bet some of it they are afraid to change!

Its easy to sit outside and take potshots at them. But I dont think we fully apreciate the nightmare they have to deal with every day...

Joel's Advice (1, Insightful)

Anonymous Coward | more than 6 years ago | (#22487386)

Joel is usually spot on, but the advice he gave in the article is actually pretty terrible if you are going to have to generate any volume of Excel reports. Automating Excel is slow and unwieldy, and should not be hooked up to a server. You will be limited to a few workbook generation requests per second, and if you need to handle more, buying another Windows/Office license and load balancing is pretty awful. The only way that this might be workable is to set up a process that sits in the background with a "pool" of automated excel instances launched and waiting for work, so that when there is a high volume of requests, they get forwarded to different instances. Still not very scalable.

There are companies out there that have reverse engineered the file format (the one I have experience with is SoftArtisan ExcelWriter, which is buggy), but overall there will be no clean, scalable solution for this until Excel 2007/the Excel 2003 compatibility pack are more prevalent you can just generate the XML to represent the workbook.

I will gladly pay anyone (5, Funny)

flanders123 (871781) | more than 6 years ago | (#22487468)

...to take this spec and create an identical .doc format, circumventing Word's bullet AI.
  • it
    • never
      • ever
  • ever
      • works

Re:I will gladly pay anyone (0)

Anonymous Coward | more than 6 years ago | (#22488156)


        1 I

        2 Completely

3 Agree
        1 It doesn't

        1 Work well

        2 at all.

Seems that these aren't the full specs (3, Interesting)

amazeofdeath (1102843) | more than 6 years ago | (#22487508)

Stephane Rodrigues comments:

"I first gave a cursory look at BIFF. 1) Missing records: examples are 0x00EF and 0x01BA, just off the top of my head. 2) No specification: example is the OBJ record for a Forms Combobox," Rodriguez wrote. "Then I gave a cursory look at the Office Drawing specs. And, again, just a cursory look at it showed unspecified records."
http://www.zdnet.com.au/news/software/soa/Microsoft-publishes-incomplete-OOXML-specs/0,130061733,339286057,00.htm [zdnet.com.au]

No insanely bad programmers ? (1, Insightful)

bytesex (112972) | more than 6 years ago | (#22487512)

Then what's with that 2Gb limit ? Or what's with the decision to use such formats for mail-storage and databases ?

The file format is not really important (5, Interesting)

wrook (134116) | more than 6 years ago | (#22487552)

I've worked on some of these file formats quite a bit (I was the text conversion guy when WP went to Corel -- don't blame me, it was legacy code! ;-) ) Anyway, while the formats are quite strange in places, they aren't really that difficult to parse. I would be willing to speculate that this was never really much of a problem in writing filters for apps (or at least shouldn't have been).

No, the difficulty with writing a filter for these file formats is that you have no freaking clue what the *formatter* does with the data once it gets it. I'm pretty sure even Microsoft doesn't have an exact picture of that. Hell, I barely ever understood what the WP formatter was doing half the time (and I had source code). File formats are only a small part of the battle. You have all this text that's tagged up, but no idea what the application is *actually* doing with it. There are so many caveats and strange conditions that you just can't possibly write something to read the file and get it right every time.

In all honesty I have at least a little bit of sympathy for MS WRT OOXML. Their formatter (well, every formatter for every word processor I've ever seen) is so weird and flakey that they probably *can't* simply convert over to ODF and have the files work in a backwards compatible way. And lets face it, they've done the non-compatible thing before and they got flamed to hell for it. I honestly believe that (at some point) OOXML was intended to be an honest accounting of what they wanted to have happen when you read in the file. That's why it's so crazy. You'd have to basically rewrite the Word formatter to read the file in properly. If I had to guess, I'd say that snowballs in hell have a better chance...

I *never* had specs for the word file format (actually, I did, but I didn't look at them because they contained a clause saying that if I looked at them I had to agree not to write a file conversion tool). I had some notes that my predecessor wrote down and a bit of a guided tour of how it worked overall. The rest was just trial and error. Believe it or not, occasionally MS would send up bug reports if we broke our export filter (it was important to them for WP to export word because most of the legal world uses WP). But it really wasn't difficult to figure out the format. Trying to understand how to get the WP formatter (also flakey and weird) to do the same things that the Word formatter was doing.... Mostly impossible.

And that's the thing. You really need a language that describes how to take semantic tags and translate them to visual representation. And you need to be able to interact with that visual representation and refer it back to the semantic tags. A file format isn't enough. I need the glue in between -- and in most (all?) word processors that's the formatter. And formatters are generally written in a completely adhoc way. Write a standard for the *formatter* (or better yet a formatting language) and I can translate your document for you.

The trick is to do it in both directions too. Things like Postscript and PDF are great. They are *easy* to write formatters for. But it's impossible (in the general case) to take the document and put it back into the word processor (i.e. the semantic tags that generated the page layout need to be preserved in the layout description). That also has to be described.

Ah... I'm rambling. But maybe someone will see this and finally write something that will work properly. At Corel, my friend was put on the project to do just that 5 times... got cancelled each time ;-) But that was a long time ago...

Some "solutions" from TFA (2, Insightful)

mariuszbi (1113049) | more than 6 years ago | (#22487826)

In many situations, you are better off reusing the code inside Office rather than trying to reimplement it. Here are a few examples.
1. You have a web-based application that's needs to output existing Word files in PDF format. Here's how I would implement that: a few lines of Word VBA code loads a file and saves it as a PDF using the built in PDF exporter in Word 2007. You can call this code directly, even from ASP or ASP.NET code running under IIS. It'll work. The first time you launch Word it'll take a few seconds. The second time, Word will be kept in memory by the COM subsystem for a few minutes in case you need it again. It's fast enough for a reasonable web-based application.
2. Same as above, but your web hosting environment is Linux. Buy one Windows 2003 server, install a fully licensed copy of Word on it, and build a little web service that does the work. Half a day of work with C# and ASP.NET.
So if you are on a Linux system, you are screwed . I think this article is written by some M$ fanboy. Nothing wrong here. But saying that Linux user should just dump their software, and go for Microsoft stuff , just because

It's very helpful of Microsoft to release the file formats for Microsoft and Office, but it's not really going to make it any easier to import or save to the Office file formats.
I think it's wrong wrong wrong.

Compatibility is important across systems and time (1)

Grampaw Willie (631616) | more than 6 years ago | (#22487856)

this is a good discussion. compatibility is important to us, not only from one system to another but also across time.

it has often seemed to me that proprietary solutions should be avoided for this reason.

i recently converted my Win 3.11 computer to XP. quite a move, but look how much i saved not doing all the interim updates!

i did have some documents in the old WordPerfect 5.1 format but I managed to acquire a program that will read these and write them as .rtf

I like .rtf and would like to see it become an ISO/ANSI standard

but think how many libraries are loaded with .xls and .doc files that will need to be converted to OOXML or risk becoming un-usable

hmmm

Chunky File Format (5, Interesting)

mlwmohawk (801821) | more than 6 years ago | (#22487892)

While I was a contractor for a now defunct contracting company, we did a contract for Microsoft. This was pre windows 3.1. We did some innovations which I think became the bases for some of the OLE stuff, but I digress, Microsoft had a spec for its "Chunky File Format."

The office format based on the chunky file format does not have a format, per se' It is more similar to the old TIFF format. You can put almost anything in it, and the "things" that you put in it pretty much define how they are stored. So, for each object type that is saved in the file, there is a call out that says what it is, and a DLL is used to actually read it.

It is possible for multiple groups within Microsoft to store data elements in the format without knowledge of how it is stored ever crossing groups or being "documented" outside the comments and structures in the source code that reads it.

This is not an "interchange" format like ODF, it is a binary application working format that happens to get saved and enough people use it that it has become a standard. (With all blame resting squarely on M$ shoulders.)

It is a great file format for a lot of things and does the job intended. Unfortunately it isn't intended to be fully documented. It is like a file system format like EXT2 or JFS. Sure, you can define precisely how data is stored in the file system, but it is virtually impossible to document all the data types that can be stored in it.

lol ... XML is the hindsight, not the foresight (0)

Anonymous Coward | more than 6 years ago | (#22488332)

people ... of course it's impossible for anyone, including MS, to produce perfect code, structures or output, in anticipation of future developments. Clearly, coding is evolving in response to a weakness. Wasn't a new standard, XML, engineered for just this exact reason? If everyone would look beyond complaing and just implement engineering standards, we would all be ok. After all, it is what it is, just deal with it.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>