Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Docvert 3.0 Lessens Reliance On Microsoft Office

kdawson posted more than 7 years ago | from the weaning dept.

Upgrades 108

An anonymous reader writes "After 10 months of development Docvert 3.0 was released today. This open source web service converts DOC files to Oasis OpenDocument 1.0, and then to HTML, RSS, or any XML format. Try the ODF demo or download the source and install it on your own box. Version 3.0 comes with an MS Word Plugin, FTP/WebDAV upload, and an in-browser document editor."

cancel ×

108 comments

It promises to be an interesting battle (3, Insightful)

0racle (667029) | more than 7 years ago | (#17663496)

Ya, I'm on the edge of my seat. It will get adopted as a standard or it won't. Office will use it either way and anyone wanting to interoperate with Office will have to try to implement it as well.

Re:It promises to be an interesting battle (1)

SuperStretchy (1018064) | more than 7 years ago | (#17663670)

Phew I hear you man... I'm staying up every night until I hear more! Its like that one time when the PS3 came out...

Re:It promises to be an interesting battle (5, Insightful)

Zaiff Urgulbunger (591514) | more than 7 years ago | (#17663820)

All true, but if it does get adopted as a standard, then MS can use this to ensure the continued use of MS Office by government agencies around the globe. If it doesn't get adopted, MS will be under pressure to provide a supported, native, OOD format.

Re:It promises to be an interesting battle (1, Interesting)

CastrTroy (595695) | more than 7 years ago | (#17663976)

What are the implications if it does get adopted as a standard? Can anybody implement for free? Can MS get fined for saying they support the standard when in fact their software actually does not (ala, Java, CSS, HTML, Kerberos, and others). If we could just get MS to follow some standard and actually implement it as the standard as written, then I think we could get long way to interoperability with MS word. If it's an open standard, and MS can't just go ahead and change it whenever, and they have to actually follow it, then what does it matter who made up the standard?

Re:It promises to be an interesting battle (4, Insightful)

truthsearch (249536) | more than 7 years ago | (#17664408)

In my opinion there are two reasons Microsoft is trying to create their own standard: PR and government contracts. The PR aspect is obvious. The US government is Microsoft's largest customer (by far) and also the most likely to demand open document standards. Other governments will likely do the same long before corporations demand it. So Microsoft needs to have their own standard which they implement first in order to get the contracts.

They don't have to implement it correctly. They can claim support for a standard [msversus.org] for years without actually following it (e.g. CSS, Kerberos, etc.) and still get the contracts. They were actually involved in creating some CSS standards and still didn't follow them.

It's all about the money. Get the big contracts and nothing else matters.

Re:It promises to be an interesting battle (2, Insightful)

Zaiff Urgulbunger (591514) | more than 7 years ago | (#17664688)

I don't *know* the answers but I believe:

Can anybody implement for free?
I think so! But, you'll need to get a copy of the standard first, and I believe ISO normally charge.. rather more than I'd like for that.

Can MS get fined for saying they support the standard when in fact their software actually does not...
I doubt it, but if a test case can be produced to prove the fault, they'll maybe/probably/hopefully/perhaps fix it. Depending on whos asking for a fix!

You're right that *a standard* is far better than no standard at all. But the only reason MS have done this now is because they've been forced into it due to various governments demanding open standards for documents, and thus, by getting this adopted as a standard, they get to keep business! They are not interested in other people using the standard to write competing software, and as such, I expect they'll move the goal posts as soon as anyone gets close.

Re:It promises to be an interesting battle (3, Informative)

mrchaotica (681592) | more than 7 years ago | (#17666226)

Can anybody implement for free?

No, because bits of it are patented (especially the "legacy compatibility" parts that basically just say "emulate old versions of Office").

Can MS get fined for saying they support the standard when in fact their software actually does not (ala, Java, CSS, HTML, Kerberos, and others).

In this case it won't matter, because the OOXML "standard" is effectively defined as "whatever MS Office does." In other words, MS basically documented Office's behavior down to the smallest detail, and submitted it to ECMA and now ISO.

Re:It promises to be an interesting battle (3, Informative)

FireFury03 (653718) | more than 7 years ago | (#17669502)

In other words, MS basically documented Office's behavior down to the smallest detail

They didn't even do that. A lot of the document states that when you encounter certain tags you will emulate a Office bug, but never specifies the details of that bug because that is "beyond the scope of the document". So even if you have the standards document, you can't fully implement the standard without getting all the old versions of Office and reverse engineering their behavior.

Re:It promises to be an interesting battle (1)

Divebus (860563) | more than 7 years ago | (#17664040)

It only figures... Microsoft expecting a compendium of proprietary digital glop to be officially embraced by all. Why even submit this for standardization? De-facto "standards" have worked so well for them [ -and so badly for everyone else].

Application Rejected. Thanks for Playing. Please Try Again.

I originally read OOXML ... (5, Funny)

alispguru (72689) | more than 7 years ago | (#17663520)

as:

Object
Oriented
X
M
L

and whimpered at the thought...

Re:I originally read OOXML ... (0)

Anonymous Coward | more than 7 years ago | (#17663778)

If the thought of object-oriented XML makes you whimper, check out JSON - the fat-free alternative [json.org] . It's object-oriented, with most (though not all) of the benefits of XML, lightweight, and simple.

Re:I originally read OOXML ... (0)

Anonymous Coward | more than 7 years ago | (#17663790)

Object Oriented XML... that'd be SOAP.

Re:I originally read OOXML ... (1)

parvenu74 (310712) | more than 7 years ago | (#17664074)

That is exactly what I thought at first as well!

But let's assume that this OOXML thing get through the approval process... with an open standard anyone can make import/export functionality for MS Office documents in non-MS applications. From iWork to KOffice to OpenOffice and whatever else is out there, will there be any need to have MS Office in order to read, edit, and forward on "MS Office documents?" To me, it seems like MS is creating a way for everyone else to erode their market share.

Re:I originally read OOXML ... (3, Insightful)

Knuckles (8964) | more than 7 years ago | (#17664546)

That is exactly what I thought at first as well!

And i wonder how you could. Even just reading the the /. blurb makes it clear that the "standard" as proposed is non-implementable.

Re:I originally read OOXML ... (2, Interesting)

MoxFulder (159829) | more than 7 years ago | (#17664934)

But let's assume that this OOXML thing get through the approval process... with an open standard anyone can make import/export functionality for MS Office documents in non-MS applications. From iWork to KOffice to OpenOffice and whatever else is out there, will there be any need to have MS Office in order to read, edit, and forward on "MS Office documents?" To me, it seems like MS is creating a way for everyone else to erode their market share.


Indeed... that would be nice. Try reading the article to find out all the obstacles Microsoft has thrown down to actually prevent this interoperability from ever happening :-)

Re:I originally read OOXML ... (0)

Anonymous Coward | more than 7 years ago | (#17665334)

"with an open standard anyone can make import/export functionality for MS Office documents in non-MS applications..."

Yes, but too bad that is not the case for OOXML. Their "standard" calls for backward compatibility with old MS Word documents but does not tell anybody how to implement this proprietary feature.

Re:I originally read OOXML ... (1)

rucs_hack (784150) | more than 7 years ago | (#17664100)

Me too, not a nice moment.

Re:I originally read OOXML ... (1)

backwardMechanic (959818) | more than 7 years ago | (#17667232)

I read it and thought how childish MS can be. I've no idea what the OO stands for, but I can think of a related open software package it might be designed to frustrate...

Re:I originally read OOXML ... (1)

j1mc (912703) | more than 7 years ago | (#17667476)

I originally read it as "open office xml," which is probably what microsoft wants me to think . . .

Re:I originally read OOXML ... (1)

ozbird (127571) | more than 7 years ago | (#17673446)

Can't OpenOffice.org do them for trademark infringement? "Office Open XML" is both confusing and misleading.

Re:I originally read OOXML ... (1)

Millenniumman (924859) | more than 7 years ago | (#17674024)

Um, that would that be a hard sell coming from them. The reason OpenOffice.org has that stupid .org at the end is because OpenOffice is held as a trademark by someone else. If "OpenOffice.org" does not infringe "OpenOffice", then "Office, Open XML" does not infringe "OpenOffice.org".

Please recommend compliance validation tools (3, Interesting)

statusbar (314703) | more than 7 years ago | (#17663596)

One of the things that bugs me are these 'enormous specifications' that are inconsistent. What we need is not just a document, but the tools necessary to verify a generated file. Not just for valid XML, but for all the little microsofty-bits hidden inside.

--jeffk++

Re:Please recommend compliance validation tools (4, Insightful)

Zaiff Urgulbunger (591514) | more than 7 years ago | (#17663846)

Microsoft isn't doing this for you silly! The whole intent is likely that it is *hard* for anyone to implement.

Re:Please recommend compliance validation tools (2, Insightful)

xenocide2 (231786) | more than 7 years ago | (#17664602)

On the contrary, when taking the article at face value, I think that the whole intent is to make it *easy* for Microsoft to implement the first time, because it's already done. This is of course backwards from how a community standard should work, it should be an effort that is repeatable. Instead we have whatever crap their contractors turned in, with apparent flaws turned into requirements. I doubt even Microsoft could write a second compatible handler for this document format. Well, perhaps instead of "on the contrary" I should say "to clarify"; the standard appears to be designed so that any implementation but the original is near impossible.

It sounds like pretty much like business as usual for MSFT, although describing in 6,000 pages how hard it would be to create an interoperating product is new. Their format is the standard, even the flaws that they didn't fix before release.

Re:Please recommend compliance validation tools (2, Insightful)

TheRaven64 (641858) | more than 7 years ago | (#17664728)

I am beginning to think that a requirement for anything being a standard should be two independent implementations (i.e. no shared code). I would even like to require that at least one be under a license no more restrictive than the 3-clause BSD license (GrokLaw FUD aside), and ideally MIT-licensed or Public Domain.

ODF has already been supported by several implementations, and some of these threw up some OpenOffice-isms; if the support had been finished before the standard had been finalised then this would have allowed them to be fixed.

Re:Please recommend compliance validation tools (1)

bigpat (158134) | more than 7 years ago | (#17667202)

So you have been talking to the folks at the IETF have you?

Describing exceptions doesn't make a standard. (4, Insightful)

splutty (43475) | more than 7 years ago | (#17663662)

Despite what Microsoft thinks and how they're been acting in the past with all their 'standards'; Describing all the exceptions doesn't make something a standard. Describing them in the context of a non-standardized environment, makes it even less so.

Although I'm quite sure that Microsoft really doesn't give a and will push this through as 'their' standard that everyone else will have to adhere to to be able to do anything with Mickyshaft generated content anyway.

Whether ISO approves of this or not is inconsequential, the only thing that matters is that M$ can now say: Look, we proposed a standard, it's not our fault 'they' think it's not good enough.

Re:Describing exceptions doesn't make a standard. (2)

Gazzonyx (982402) | more than 7 years ago | (#17663784)

Whether ISO approves of this or not is inconsequential, the only thing that matters is that M$ can now say: Look, we proposed a standard, it's not our fault 'they' think it's not good enough.

My response: I proposed a rational solution for the tech department that I control - it's not my fault that we decided to go to another vendor when you no longer support Office2K. Google gives its love and regards. As does OpenOffice, MySQL, and Linux. Sincerely,

The guy who makes the tech decisions

Re:Describing exceptions doesn't make a standard. (2, Insightful)

suggsjc (726146) | more than 7 years ago | (#17667136)

It was mentioned before, but since the US gov't is their biggest customer. Until you are the "guy who makes the tech decisions" for them, M$ won't care about your response.

Idealism will only get you so far, especially when it squares off against practicality.

Re:Describing exceptions doesn't make a standard. (4, Insightful)

Anonymous Coward | more than 7 years ago | (#17664102)


Whether ISO approves of this or not is inconsequential, the only thing that matters is that M$ can now say: Look, we proposed a standard, it's not our fault 'they' think it's not good enough.
It matters to governments, who are coming under increasing pressure to rationalize their MS Office upgrade cycle (and why they're not getting out, via standards)

But yeah it doesn't matter much to the private sector / industry.

Re:Describing exceptions doesn't make a standard. (1)

truthsearch (249536) | more than 7 years ago | (#17664534)

Not true. As the AC above states, it does matter to governments who are finally starting to demand open standards for documents. The US government is Microsoft's largest customer. Microsoft would lose them if they finally decided to only use applications that have documented and standard file formats.

Deja Vu Docvert (5, Interesting)

ei4anb (625481) | more than 7 years ago | (#17663724)

Way back before the web I worked in a Unix shop that was a development lab for a big multinational. Head office kept sending us e-mail with large MS Word attachments. We got tierd of having to go down to the library, where we kept the only PeeCee in the department, just to see what was in the attachment.

I solved the issue by writing a program that ran on a Windows PC (an old one that had been discarded and was gathering dust in the closet) that received SMTP mail, detached the Word attachment, started up Microsoft's Word Viewer to read the attachment, then "printed" it to a file in PDF format and finaly SMTP mailed it back to the sender.

From then on all we had to do was forward the email to the robot and wait for a readable version to bounce back. As I used Microsoft's own Word Viewer there were no problems whenever a new version of Word came out, I just downloaded the latest viewer :-)

Re:Deja Vu Docvert (1)

CastrTroy (595695) | more than 7 years ago | (#17664194)

That's a pretty smart Idea. It's a pretty good way to protect against Word Macro viruses too. I'm pretty sure the viewer doesn't support Macros, and even if it does, only the computer processing the documents would get infected, and that machine could be reimaged every night if it was a real problem.

Re:Deja Vu Docvert (1)

cnettel (836611) | more than 7 years ago | (#17664290)

It should be noted that the recent Office exploits have been buffer overruns and similar effects, not directly related to macro handling. As the rendering mechanism in the Word Viewer is quite identical, it's not invulnerable. (So for an automated machine stuffed away to do only this, reimaging can be quite needed if no other checks are performed.)

Re:Deja Vu Docvert (1)

TheRaven64 (641858) | more than 7 years ago | (#17664814)

There have been vulnerabilities that affect word viewer (I believe one is still in the wild). This could, however, be prevented from becoming a vulnerability by running the viewer in a VM that was restored to a snapshot state immediately after running. Ideally the document would be loaded from a disk image that would be generated before starting the VM and then printed to a generic PostScript printer over IPP on the host machine, which would then convert the PostScript to PDF. No connections other than to the host machine on the IPP port would be allowed.

If the document did contain a virus then all it would be able to do is trash the hard drive on the VM, which would then be restored as soon as it had run.

Re:Deja Vu Docvert (0)

Anonymous Coward | more than 7 years ago | (#17664424)

It would be nice to have the viewer running inside WINE ... then you wouldn't need a separate server. Of course, the main problem would be to instruct the viewer to generate pdf. Hmm ... maybe a nice project. Any takers?

Re:Deja Vu Docvert (1)

kripkenstein (913150) | more than 7 years ago | (#17665046)

That's a nice solution. Any chance you might release it under an open-source license?

Re:Deja Vu Docvert (1)

endrue (927487) | more than 7 years ago | (#17666838)

If you had an old PC laying around, why didn't you just use that one to read the Word docs?

- Andrew

Re:Deja Vu Docvert (1)

sootman (158191) | more than 7 years ago | (#17668584)

Maybe because having dozens of users get up and leave their desks to read documents isn't really effective in a business setting?

Re:Deja Vu Docvert (2, Interesting)

multipartmixed (163409) | more than 7 years ago | (#17668164)

UNIX shop... before the web... only one PC... rendering to -- PDF?!

I think I smell a tiny fib.

I would have believed you if you had told me that you used Windows' (native since 3.1) Apple Laserwriter printer driver set up to print to a file, then mailed the resulting (PostScript) file to yourself to print or view with GhostScript/gv.

Well, except that I didn't think the Word viewer was released until either '95 (or as late as '97?), and it was released because MS broke the Word 6.0 (Office 4.3) document format, which was in fairly widespread use at the time (putting Wordperfect 5.1 out of business).

Re:Deja Vu Docvert (0)

Anonymous Coward | more than 7 years ago | (#17671448)

PDF 1.0 came out in 1993. Same year as the WWW came into being. Microsoft's Word Viewer didn't come out until Office 97.

Re:Deja Vu Docvert (1)

wonkobeeblebrox (983151) | more than 7 years ago | (#17674910)

From then on all we had to do was forward the email to the robot and wait for a readable version to bounce back. As I used Microsoft's own Word Viewer there were no problems whenever a new version of Word came out, I just downloaded the latest viewer :-)

That seems like the hard way to solve the problem...

...you can do largely the same thing with the unix "strings" command. You'll lose formatting, but you're interested mostly in content, right?

Watch my example:
computer:~/Documents wonkobeeblebrox$ ls Op*
Operation Restore Hope Election Night 2006
Operation Restore Hope.pdf
computer:~/Documents wonkobeeblebrox$ file "Operation Restore Hope Election Night 2006"
Operation Restore Hope Election Night 2006: Microsoft Office Document
computer:~/Documents wonkobeeblebrox$ strings "Operation Restore Hope Election Night 2006" | more
jbjbd
Operation Restore Hope*
Join me for a night of celebration as we
watch our favorite shows make us laugh and then
watch our favorite candidates make us cheer!
Tuesday, November 7, 7PM
(soft drinks and snacks will be provided)
* Alternate names considered:
Operation Dumbo-Drop
Operation Its-About-Time!
Operation Investigation
Operation We-Actually-Won-This-Time!

...etc

you lose some of the formatting, but it is dead simple and gets the contextual points across, I think...

Open XML is a transliteration (1)

suv4x4 (956391) | more than 7 years ago | (#17663886)

It's to be expected as Open XML is a straight transliteration of the DOC "dumps" to XML format.

I wonder how it ended this way: not enough time to properly develop and implement a more proper standard, or by design.

I feel it's both.

Re:Open XML is a transliteration (5, Insightful)

TheRaven64 (641858) | more than 7 years ago | (#17664078)

The design requirement of Microsoft's XML format was (obviously) that it be possible to convert existing Word documents to it without any loss. In order to do this, there must be a one-to-one mapping between the .DOC semantics and the OpenXML semantics.

The second design requirement was that the spec be developed and released quickly, before ODF had time to gain much traction. Between these two objectives, it's hardly surprising that it ended up the way it did...

Re:Open XML is a transliteration (1)

Maxo-Texas (864189) | more than 7 years ago | (#17664748)

You know that's an interesting take.

Microsoft is so huge now (and i work for a huge corporation as well) that it may be easier/more natural to just let the business mangle things in a natural way than to get a purposeful plan going to do this kind of thing.

The project planning and meeting time alone would be bad plus you wouldn't want a documentation trail showing you intended to lock in to word or it might come back to bite you later.

Re:Open XML is a transliteration (1)

kalpaha (667921) | more than 7 years ago | (#17665184)

Yes. And for the reasons you list, no one should adopt this "standard", which hopefully it will never be declared.

Re:Open XML is a transliteration (1)

spitzak (4019) | more than 7 years ago | (#17666222)

must be possible to convert existing Word documents to it without any loss. In order to do this, there must be a one-to-one mapping between the .DOC semantics and the OpenXML semantics.

Uh, no, that is wrong. A many-to-one mapping would work for that requirement. If doc has many ways of representing the same information, turning them all into the same way would work just fine and would be lossless.

A requirement that it lossly convert *both* ways, however, would require this. Otherwise converting to XML and back would probably map the many ways of representing the same data to a single one and would be lossy. However I am willing to bet that OOXML does not satisfy that criteria. So this excuse is totally bogus.

Re:Open XML is a transliteration (1)

complete loony (663508) | more than 7 years ago | (#17673476)

In Word documents and other OLE formats, the object responsible for rendering the content is also responsible for loading and saving its own state into the document format. In the DOC format this is mostly a binary dump of the objects state. For OOXML it seems they have done some kind of serialisation based on their internal variable names.

Re:Open XML is a transliteration (2, Interesting)

cnettel (836611) | more than 7 years ago | (#17664082)

Well, the XML notation for Office 2003 was even more so. They broke that one now, and some changes are to the better. The requirement to be able to represent just about anything that was possible in the previous versions, faithfully, is still a great contaminant, as you say.

What, like... (1)

Mythrix (779875) | more than 7 years ago | (#17664256)



(Content of .doc file)

Re:What, like... (Oops, forgot, no xml tags.) (5, Funny)

Mythrix (779875) | more than 7 years ago | (#17664318)

<?xml version="1.0" encoding="UTF-8"?>
<microsoft_word_document>
(Content of .doc file)
</microsoft_word_document>

Re:What, like... (Oops, forgot, no xml tags.) (1)

dascandy (869781) | more than 7 years ago | (#17671454)

Doc files don't parse as UTF-8. Try application/octet-stream?

Re:Open XML is a transliteration (1)

Hal_Porter (817932) | more than 7 years ago | (#17664434)

Any format that allows you roundtrip from .doc to that format and back without altering formatting has to be like this, right? It has to support all the features the .doc format does.

That's the reason for all the "render like WordPerfect 5.x" options that people have complained about [slashdot.org] , because they have to allow people to convert to the XML format and then convert back without reducing the document to an unreadable mess.

I remember reading some interview with the Office program manager where he said rountripping to HTML was a big feature, long before they thought of making their XML format an open standard.

http://xml.coverpages.org/microsoftHTML971215.html [coverpages.org]

But obviously the HTML or XML that you emit that can be roundtripped back to .doc will be fairly unreadable, since you need to encode all the stuff that .doc supports somehow. Like suppressTopSpacingWP in fact.

The alternative is to not support roundtripping and then wait for slashdot headlines like "Users find that the new Office XML format mangles their documents". More to the point, Office is dead if it does that, the advantage it has is that it's a defacto standard, used for the vast majority of editable documents in big companies. Anything which dilutes that is dangerous. And I doubt somehow that Microsoft really care if anyone else recognizes their standard, they just need to be able to claim that they have documented it to make it seem open.

Re:Open XML is a transliteration (4, Insightful)

99BottlesOfBeerInMyF (813746) | more than 7 years ago | (#17664978)

That's the reason for all the "render like WordPerfect 5.x" options that people have complained about, because they have to allow people to convert to the XML format and then convert back without reducing the document to an unreadable mess.

There is no reason I know of why the XML format cannot support all the features of Word and round trip, without relying on nasty hacks like this, it just takes more work. The problem with "Open"XML that I've seen is the concentrate entirely on supporting only the features of .doc files and their interactions with other programs to the exclusion of anything else. Rather than "render like WP 5.x" you need to define how WP 5.x renders that feature, then incorporate it into your conversion script in a way that makes sense in general for documents.

The whole format is built upon the assumption that only MS and Word will be using it and it is not designed to abstract word processing documents in general, but to kowtow to the eccentricities of Word.

The alternative is to not support roundtripping and then wait for slashdot headlines like "Users find that the new Office XML format mangles their documents".

No, the alternative is to do it right and build hacks like the ones you mention into the import and export routines, rather than embedding them, without any definition, into the format.

And the sad thing is... (5, Insightful)

Durkheim (960021) | more than 7 years ago | (#17663908)

...Some people think its fine that way. A friend of mine, quite pro-ms, told me that all those little strange things in the specification where normal to have backwards compatibility, and that reading the specification was a waste of time. Instead, he directed me towards a preview of Ms office 2007. Because for him, as for many more, what's important is the final product, the cuteness of the buttons, the way it works and displays its own format. Why bother using a free program that displays word documents badly, when Office is already perfect huh? I feel so misunderstood sometimes. What makes me sad is that they don't see the use of a clear straight-to-the-point format. Maybe only geeks can be horrified by this one.

Re:And the sad thing is... (3, Insightful)

westlake (615356) | more than 7 years ago | (#17664356)

I feel so misunderstood sometimes. What makes me sad is that they don't see the use of a clear straight-to-the-point format. Maybe only geeks can be horrified by this one.

The user cares only for the document he sees in print or on screen. The internal structure of the file interests him not at all.

Re:And the sad thing is... (2, Insightful)

TheRaven64 (641858) | more than 7 years ago | (#17664870)

Give him a few years. Like Free Software, the need for Open Standards is only really apparent to people who haven't yet been bitten by their lack. I still have a load of ClarisWorks 1.0 documents from years ago that I can't open; even if I could find my copy of ClarisWorks and the disk hadn't been corrupted, I don't even own a disk drive that could read it anymore.

Re:And the sad thing is... (1)

mrchaotica (681592) | more than 7 years ago | (#17667940)

Do ClarisWorks files store text as text (i.e., without mangling it)? If so, just use strings to convert the files to plaintext, delete all the garbage (which used to be formatting) and be done with it.

Re:And the sad thing is... (1)

TheRaven64 (641858) | more than 7 years ago | (#17671108)

For a lot of the documents, images (diagrams in particular) are quite significant components. Thus, extracting the text would not benefit me much (it might only be a few hundred words in a lot of cases).

Re:And the sad thing is... (1)

DragonWriter (970822) | more than 7 years ago | (#17665946)

The user cares only for the document he sees in print or on screen. The internal structure of the file interests him not at all.


Actually, many users care about their ability to use the document, not just the appearance. Now, in the short-run, that means the applications that support it now; in the longer-run, though, the underlying format and its adaptability and limitations are a real-but-hidden issue that effects utility, though most users are not able to evaluate it directly.

Re:And the sad thing is... (1)

CastrTroy (595695) | more than 7 years ago | (#17665066)

Doesn't he feel bad that he has to pay hundreds of dollars for a new version of MS Word every couple of years. I mean, that's a lot of money to be spending on a tool that hasn't changed much in the last 10 years (at least in terms of the functionality that most people use). Could you imagine if you had to pay $100 for a hammer, and that even if you didn't use the hammer that much, you still had to buy a new one every 3 years?

Re:And the sad thing is... (1)

hawaiian717 (559933) | more than 7 years ago | (#17665272)

Well, you wouldn't have to buy a new hammer, just like you can keep using Word 97 if you want. It would be more like, if after a few years, they came out with new nails and you had to buy the new hammer to use the new nails. As long as you only need to use the old nails you're fine with the old hammer.

Re:And the sad thing is... (1)

CastrTroy (595695) | more than 7 years ago | (#17666210)

But what happens when you can't buy any old nails because they don't sell them anymore? And they start selling wood that isn't compatible with the old nails, so even if you have old nails you can't use them, because it doesn't work with the new wood.

Re:And the sad thing is... (1)

hawaiian717 (559933) | more than 7 years ago | (#17668994)

Definitely valid problems, and why the analogy mostly works. In the case of Word, there actually is one advantage: You can make your own documents (nails) with Word 97 (the old hammer) as long as you want; the problem comes in when you try to use documents from other people (new nails). Most people can't make their own nails.

Divy it up? (3, Insightful)

plopez (54068) | more than 7 years ago | (#17663928)

I wonder if you could get 60 people to review 100 pages each (or divide up chapters or sections in some logical manner). That may be feasible in 1 month. At least the glaring problems would be flagged. I have no idea how to organize this however.....

Re:Divy it up? (3, Funny)

Dragged Down by the (1004490) | more than 7 years ago | (#17664876)

Why, you could use Office 2007's Auto-Collaborate-Review feature, of course!

Re:Divy it up? (1)

plopez (54068) | more than 7 years ago | (#17665318)

Ooooo.. like using the Death Star to destroy the Imperial fleet!

Cool!

Objection! (1)

Kamineko (851857) | more than 7 years ago | (#17663930)

Submitting contradictions?


Wait a minute, I know this! This is just Phoenix Wright!

For all the students out there (2, Funny)

Catbeller (118204) | more than 7 years ago | (#17663952)

This is why we oldsters hate Microsoft. 25 YEARS of this.

Re:For all the students out there (0)

Anonymous Coward | more than 7 years ago | (#17666468)

And, if I humbly add to why I hate them, is that they have a bizarre way of obfuscating the "worldview" by introducing, puuuushing, then mothballing, which some call innovative, technologies, but which I call as questionable and not thought through; throw it at the wall and let's see if anything sticks methodology. Then we are left to understand and support the obfuscation, forever.

I'm shocked! (0, Troll)

Anonymous Coward | more than 7 years ago | (#17664004)

That Andy Updegrove, who runs a law firm that works for IBM (which has a massive vested interest in making sure that there is one and only one XML word processing format (ODF)) would submit a story to /. pointing out his own article that is critical of Microsoft's XML word processing format.

Amazing. Who would have thought of something like that.

Re:I'm shocked! (2, Insightful)

moranar (632206) | more than 7 years ago | (#17664438)

I'm shocked too, that someone using ad-hominem attacks would resort to anonymous posting. Amazing. This must be Slashdot.

The fact that Updegrove might have a vested interest in ODF succeeding doesn't detract from the OOXML proposed standard being a crock of shit.

Who needs OOXML... (2, Funny)

Chris Mattern (191822) | more than 7 years ago | (#17664034)

...when you can have oo-mox? [memory-alpha.org]

Chris Mattern

Open Source community debugs MS code (3, Funny)

RichMan (8097) | more than 7 years ago | (#17664086)

So it looks like the Open Source community is now debugging Microsofts Document format. I am sure Microsoft does not itself know what is going on in here half the time and much of this document was generated by code scrappers looking for structures and interfaces.

Congrats to the world community but they should really submit a bill to Microsoft.

Yeah, that's a Microsoft product alright (3, Funny)

Master of Transhuman (597628) | more than 7 years ago | (#17664122)

"additional Microsoft technology that must be emulated (but is not covered by the Microsoft patent pledge); elements that can't be implemented without Microsoft technical assistance; dependencies on Windows itself; mandatory bugs; and more. And then there's also the fact that OOXML heavily overlaps ODF -- a platform-independent, already-adopted ISO/IEC."

Pretty much like everything they do.

Wait - where are the virus APIs? Did they leave those out?

Naah...

Gotta be there somewhere. Keep looking.

Re:Yeah, that's a Microsoft product alright (0)

Anonymous Coward | more than 7 years ago | (#17668552)

"additional Microsoft technology that must be emulated (but is not covered by the Microsoft patent pledge); elements that can't be implemented without Microsoft technical assistance; dependencies on Windows itself; mandatory bugs; and more. And then there's also the fact that OOXML heavily overlaps ODF -- a platform-independent, already-adopted ISO/IEC."

Pretty much like everything they do.

Wait - where are the virus APIs? Did they leave those out?

Naah...

Gotta be there somewhere. Keep looking.


You answered your own question...

Here's a contradiction (0)

Anonymous Coward | more than 7 years ago | (#17664228)

Does the OO mean "Object Orientated" or "Open Office"? Microsoft always arrive at the poolside late and attempt to muddy the water and ruin it for everyone else. I propose Microsoft remove the ambiguity by renaming it OMXML, "Obnoxious Monopolist XML" which is still awful because the fact it uses XML is irrelevant. How would Microsoft feel if the open source community began prefixing all their work with "MS"? That might help us get more F/OSS past management, since they may be under the impression we are talking about Microsoft software. Hell it works for Microsoft!

Anyway, the contradiction is that Microsoft want to be seen to be involved, they participate in community standards for all the wrong reasons. They submit technically unsound proposals and generally waste everyones time. How is the naming of their proposed "standard" not intended to cause confusion and slow the adoption of ODF and OpenOffice.org?

The OOXML acronym (1)

WebCowboy (196209) | more than 7 years ago | (#17666712)

Does the OO mean "Object Orientated" or "Open Office"?

It means neither. OOXML is shorthand for Opaque and Obfuscated eXception-based Markup Language. However, Marketing rejected the longhand name for the format because it didn't test well in developer focus groups. However, marketing found the shorthand OOXML appealing because psychologists have said the roundness of the O's induces a sense of calmness. BillG liked it because legislators could make an (incorrect) association between OOXML and OpenOffice.org (often abbreviated OO.o), and he hopes the confusion could lead to the inadvertent acceptance of MS' pseudo-open file format in government.

So OOXML stays but it officially stands for about as much as DVD does, which is nothing (or whatever you want it to stand for--it is all about "personal freedom" after all, so OOXML stands for what means the most to you ;-).

Now We'll Now... (2, Interesting)

segedunum (883035) | more than 7 years ago | (#17664350)

...whether ISO has simply become a dumping ground for people simply wanting to market their stuff as standards (ECMA), or a real standards body.

As it is, there is not a snowball in hell's chance that OpenXML can become an ISO standard. It is simply a dump of the existing awful doc format into a nice incomprehensible 6000 page document, and it doesn't even use existing ISO standards. There's even a set group of banners and bullet points defined in there which can by no stretch of the imagination be called international.

I know Microsoft has managed to butter the ECMA up as their usual standards dumping ground, but I simply cannot see how they can get past the shortcomings in that article. To do so would be a huge amount of work (and Office 2007 is already using this format) and it would threaten their Office monopoly - which is what this obfuscation was about in the first place.

Re:Now We'll Now... (0, Troll)

I'm Don Giovanni (598558) | more than 7 years ago | (#17667300)

...whether ISO has simply become a dumping ground for people simply wanting to market their stuff as standards (ECMA), or a real standards body.

You mean like how ISO rubberstamped the half-spec for ODF that OASIS submitted? You don't even have spreadsheet formulas spec'ed for crying out loud! OO.o is the "reference" implementation. Whenever anyone implements ODF and runs into a wall because the spec isn't fully spec'ed, they say, "Just do whatever OO.o does". Some spec. It's a spec based on and written for OO.o, indeed it's derived from OO.o's previous XML format. And OO.o writes lots of stuff in its documents that are NOT in the spec (spreadsheet formulas being the most well known example).

ISO rubberstamped ODF with no revisions, no critiques, or any opposition (despite MS being one of the committee members for the ISO rubberstamping; MS raised no objections, quite apart from IBM crying like a baby and getting outvoted 20 to 1 at the ECMA ratification of OOXML).

And the ECMA process for OOXML was far more rigorous than ISO's rubberstamping of ODF. The ECMA process took over a year, with various revisions. And the parties involved included governments, tech companies (e.g. Apple and Novell), and businesses. The issues that IBM is whining about were raised during that process (Novell themselves (which, I know you guys despise now) raised many of the same issues), and they were dealt with. There were some issues for which doing the so-called "right" thing would have been counter productive as those cases are exremely rare (much more rare than creating spreadsheets with formulas, which OASIS didn't bother to spec for ODF). Others were dealt with the "right" way.

Pragmatism has its place. THat's something you guys can't seem to get thru your thick skulls. And your double-standards and hypocrisy are beyond belief. "Don't complain about the spec in your brother's eye while ignoring the log in your own."

And you guys are now running around trying to convince governments to mandate exclusive use of ODF, even when the ODF spec doesn't cover everything that OO.o does, let alone MS Office. And one of the reasons for the latter, is that by mandating exclusive use of a format that doesn't cover MS Office's features, you automatically make those features irrelevant since they couldn't be used, therefore evening the playing field wrt features, not by OO.o catching up, but by government fiat! You guys are too much!

Re:Now We'll Now... (1)

codemachine (245871) | more than 7 years ago | (#17669868)

You mean like how ISO rubberstamped the half-spec for ODF that OASIS submitted? You don't even have spreadsheet formulas spec'ed for crying out loud! OO.o is the "reference" implementation. Whenever anyone implements ODF and runs into a wall because the spec isn't fully spec'ed, they say, "Just do whatever OO.o does". Some spec. It's a spec based on and written for OO.o, indeed it's derived from OO.o's previous XML format. And OO.o writes lots of stuff in its documents that are NOT in the spec (spreadsheet formulas being the most well known example).
No doubt the ODF standard needs some work, but at least the reference implementation is available with source code. The OOXML spec references implementation details in old versions of MS Word and even Corel Wordperfect, but does not document how they work. At least with ODF you can go and look at the code implementing these edge cases, whereas with OOXML, the products being referenced are closed source and likely covered by various patents that wouldn't allow reimplementation anyhow.

Plus OOXML just reinvents the wheel everywhere it has a chance (well actually not quite, it just used the old .doc way of doing things in an XML-ized way), whereas ODF actually uses existing a XML standards in many places. ODF, while maybe not feature complete, is an actual attempt at an XML based document standard. OOXML is an ugly reimplementation of .doc whose sole purpose is to obtain "standard" status for government bids, as well as to confuse the general public (even the name is meant to confuse it with ODF - Office Open XML??? Like nobody around the ODF format uses the initials OO and the words Open Office...)

In the end it may help competing office suites import MS Office documents better, but only versions saved in the new format, which currently excludes about 99% of all existing .doc files. Whereas MS gets to still be the only one with full compliance and backwards compatibility (though they aren't 100% at it either).

It has already been proven how easy it would've been for MS to just support ODF (others have written the plugins). If ODF was missing features is MS's opinion, why didn't they contribute to the standard instead of reinventing the wheel?

Re:Now We'll Now... (1)

yo_tuco (795102) | more than 7 years ago | (#17672104)

"Whereas MS gets to still be the only one with full compliance and backwards compatibility (though they aren't 100% at it either)."

And to pour salt on the wound, I hear Microsoft's promise not to sue only applies to those who can fully implement their "Open Standard". And since it appears that only Microsoft can do that, what good it that promise? More smoke and mirrors from Microsoft?

Re:Now We'll Now... (1)

I'm Don Giovanni (598558) | more than 7 years ago | (#17677868)

"And to pour salt on the wound, I hear Microsoft's promise not to sue only applies to those who can fully implement their "Open Standard". "

You heard wrong. That was more of IBM's FUD. When are you going to understand that 99.99% of IBM's statements regarding OOXML are to be disregarded as pure FUD (FUD is an invention of IBM). You do NOT have to fully implement the standard.

Re:Now We'll Now... (1)

segedunum (883035) | more than 7 years ago | (#17672912)

You mean like how ISO rubberstamped the half-spec for ODF that OASIS submitted? You don't even have spreadsheet formulas spec'ed for crying out loud!
Simply doesn't matter. That's not the point. The point is that OOXML is trying to be an ISO standard when it doesn't even respect other ISO, or even accepted, standards - a lot outside of the IT industry. I'm not talking about how complete you might happen to think ODF is.

And the ECMA process for OOXML was far more rigorous than ISO's rubberstamping of ODF.
I've seen no evidence for that whatsoever. I wouldn't call a panic 6000 page dump of an existing closed binary format into XML rigorous in any way.

The simple fact is, if it depends on Windows, depends on features in Windows and Office and depends on the quirks of Windows rather than even other ISO standards then it is a poor pathetic attempt at getting something accepted as a show standard in order to quell any interest in ODF.

That's what OOXML is for, and that is all it's for. I know it. You know it. Microsoft knows it.

Re:Now We'll Now... (1)

complete loony (663508) | more than 7 years ago | (#17673390)

When Microsoft says "Reverse engineer whatever it was that Word 97 did" or worse one of the obsolete DOS word processors that OOXML has special flags for, you think this is better than "Do whatever OO.o does, oh and here's the code"?

Sure the ODF spec could do with some work. But if you want to implement it you *do* have access to everything you need to understand it.

Is OOXML Truly Open Source (1, Informative)

Anonymous Coward | more than 7 years ago | (#17664480)

The biggest question to ask is not whether or not Microsoft provides access to the XML, nor whether Microsoft provides access to a schema, the question to ask is, "Is OOXML Truly Open Source?"

The biggest issue I have with the OOXML "standard" (and I use the word quite losely) is there are BLOB's (binary large objects) in the OOXML file created by Microsoft. In this BLOB is all the byte code used in the Macros, etc for the file in question (i.e. an Excel file). Since Microsoft has not provided proper instructions (whether it be a schema, or source code) to read the the BLOB containing this information, and how to intrupret this information, I doubt this will ever pass as a true ISO standard, nor be truly accepted as open source (not to mention marcos are still programmed using the Microsoft defined, and patented, VBA rather than using an open source standard such as JavaScript).

ISO maybe, but never an IETF standard (2, Insightful)

Pascal Sartoretti (454385) | more than 7 years ago | (#17664852)

Maybe they'll get an ISO standard, but I have the feeling that an IETF standard would be out of question. Look at the requirement for being just a "Draft standard" (see here [ietf.org] ):

A specification from which at least two independent and interoperable implementations from different code bases have been developed, and for which sufficient successful operational experience has been obtained, may be elevated to the "Draft Standard" level.

Outside Office 2007, who would ever implement this "standard"?

Re:ISO maybe, but never an IETF standard (1)

SwashbucklingCowboy (727629) | more than 7 years ago | (#17667126)

Outside Office 2007, who would ever implement this "standard"?
Corel, in their WordPerfect Office product for one, at least that's what their announcement [corel.com] claims.

Surprised? (1)

UnknowingFool (672806) | more than 7 years ago | (#17665348)

If it's true, is anybody really surprised? This is MS after all.

There is a another analysis [groklaw.net] on groklaw.

hope not (1)

suezz (804747) | more than 7 years ago | (#17665364)

here's too it not getting adopted and I hope they kick it back out on the proprietary horse it rode in on.

Hey guys, I've an idea (2, Funny)

matt me (850665) | more than 7 years ago | (#17666538)

> there's also the fact that OOXML heavily overlaps ODF -- a platform-independent, already-adopted ISO/IEC.
Couldn't the Microsoft people use the existing standard instead? That way everyone would be able to communicate. Someone should call to let them know about it.

You must be new here (1)

deimios666 (1040904) | more than 7 years ago | (#17669652)

Oh go ahead and call Microsoft if you wish. But I bet they implemented their own phone standard which is incompatible with ours. Embrace, Extend, Exterminate. 'nuff said.

ODF converter ready to go since May 2006 (0)

Anonymous Coward | more than 7 years ago | (#17667826)

So, we've had an ODF "production ready" converter ready since May 2006. http://www.groklaw.net/article.php?story=200605040 15438308 [groklaw.net]

Where is it? Why won't Gary Edwards of the OpenDocument Foundation say anything?

Hey people, how about some promoting of ODF?

MS is not worried about the format's success! (1)

PostPhil (739179) | more than 7 years ago | (#17669700)

They already know that everyone is locked-in to their proprietary Office formats. This XML "standard" is not created as a real product that Microsoft hopes to promote. That would be a conflict of interest. Instead, they want to make sure that it is a gigantic convoluted spec that no one can implement. It's designed as a distraction. They want the spec to leave you with a feeling of disgust for open XML formats because:
1. You'll go back to your works-good-enough-for-me Office formats you've already been using (e.g. Word Doc).
2. If you're the typical uneducated business person, you'll get confused between OOXML and ODF and falsely believe that ODF is that bloated mess of a spec you believe you heard about from your fresh-out-of-high-school IT guy. Well he knows about computers, so that ODF (OOXML? Open Office XML? XML? Open Document?) thing must be a bad idea.

Not many people know much about Open Office, even many supposed "techs" in many businesses (at least in the U.S.). Microsoft wants to take advantage of their greater mind-share to control public opinion through their usual tactics of FUD and confusion. They want to make sure that the reputation among developers of XML as being a bloated exchange medium will work in their favor by amplifying that perception thereby killing off ODF and any chance of the industry adopting a common format.

OOXML vs. ODF Deathmatch! (1)

dtabraha (557054) | more than 7 years ago | (#17670556)

Wikipedia has a good doc outlining the difference between OOXML and ODF:

http://en.wikipedia.org/wiki/Comparison_of_OpenDoc ument_and_Microsoft_XML_formats [wikipedia.org]

It may not be an ISO standard, but it's a heck of a lot better than the completely proprietary older formats.

How about a good "atta boy" for Microsoft at least? :)

Re:OOXML vs. ODF Deathmatch! (0)

Anonymous Coward | more than 7 years ago | (#17672746)

"It may not be an ISO standard"

But that is the subject here. MS is submitting their "Open Document" to be an ISO standard. An ISO standard that relies on proprietary features from a single software vendor! It's a contradiction. "Atta-boy", Microsoft, try again when you understand what "Open" means.

How is OOXML better than older formats? (1)

walterbyrd (182728) | more than 7 years ago | (#17673178)

Or do you mean better for msft?

The old formats are known. OpenOffice and AbiWord can read the old formats - as can older versions of ms-office. With this OOXML cr@p it's right back to square one.

Oasis OpenDocument (1)

VGPowerlord (621254) | more than 7 years ago | (#17673566)

Why is this still being called Oasis OpenDocument? Have you already forgotten that it's an ISO standard [iso.org] ?
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...