Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

MS Office XML Format Now In TextEdit

timothy posted more than 9 years ago | from the beating-vs-joining dept.

Data Storage 86

computerdude33 writes "Apparently, Apple heard of Microsoft Office changing to XML formats. If you have OS X 10.4.2, you can save documents in TextEdit in Word XML Format. They are saved with a *.xml extension, and are riddled with references to Word. Here is an example of one of these documents."

cancel ×

86 comments

Sorry! There are no comments related to the filter you selected.

Beating MS... at their own game. (3, Funny)

jpsowin (325530) | more than 9 years ago | (#13232794)

Now you just have to find a Microsoft product to read the future Microsoft Word XML file!

Re:Beating MS... at their own game. (1)

toph42 (160730) | more than 9 years ago | (#13232850)

Yeah, it doesn't open for me on Word 2002 as a document. It opens up with all the xml tags shown.

Re:Beating MS... at their own game. (2, Informative)

sycotic (26352) | more than 9 years ago | (#13233729)

I get the same thing in Microsoft Office Word 2003 :\

Re:Beating MS... at their own game. (1)

michokest (893732) | more than 9 years ago | (#13232913)


And you'll better look for a DRM-cracking thing for Office's new fancy tech [slashdot.org] .

Re:Beating MS... at their own game. (1)

generic-man (33649) | more than 9 years ago | (#13238821)

Nobody uses Office DRM unless they're willing to shell out thousands of dollars for an "Information Rights" server.

OO.Org (1)

Esine (809139) | more than 9 years ago | (#13232842)

I wonder if OpenOffice.org or KOffice will start supporting this format any time soon..

Re:OO.Org (3, Informative)

EddWo (180780) | more than 9 years ago | (#13235769)

OpenOffice 2.0 Beta already supports WordML.
http://www.openoffice.org/issues/show_bug.cgi?id=3 3450

Re:OO.Org (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#13236303)

Yes, they are pretty quick at hand with stealing patented formats for their crappy suite without own formats.

Suckers.

in case you're curious... (3, Interesting)

ubiquitin (28396) | more than 9 years ago | (#13232849)

So a simple two word text file has the following 33 XML tags pasted here with the greater and less than signs removed...


?xml version="1.0" encoding="UTF-8" standalone="yes"?
?mso-application progid="Word.Document"?
w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve"o:DocumentProperties/o:Documen tPropertiesw:fontsw:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"//w:fontsw:docPr/w:docPrw:bodywx:sectw:pw:pP r/w:pPrw:rw:rPrw:rFonts w:ascii="Helvetica" w:h-ansi="Helvetica" w:cs="Helvetica"/wx:font wx:val="Helvetica"/w:sz w:val="24"/w:sz-cs w:val="24"//w:rPrw:tHot time!/w:t/w:r/w:pw:sectPrw:pgSz w:w="12240" w:h="15840"/w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"//w:sectPr/wx:sect/w:body/w:wordDocum ent

Re:in case you're curious... (0, Troll)

Smack (977) | more than 9 years ago | (#13232866)

Oh no my 200 GB drive is going to fill up with XML files.

terrible moderation (0)

Anonymous Coward | more than 9 years ago | (#13233067)

I don't see how this is a troll.

Yes, Word XML files are bigger than RTF files, but who the hell cares? XML wasn't exactly designed for leanness, people.

Re:terrible moderation (2, Interesting)

That's Unpossible! (722232) | more than 9 years ago | (#13233810)

I don't see where XML files are bigger than RTF. I just performed a test, and the RTF file was 3 times as large as the XML file.

Re:in case you're curious... (0)

Anonymous Coward | more than 9 years ago | (#13233274)

Yes, but which two words?

Re:in case you're curious... (0, Flamebait)

jo42 (227475) | more than 9 years ago | (#13233621)

Further proof that the computer industry in general is getting more and more brain damaged.

The disease is called "XML-on-the-brain". There is also "Java-on-the-brain" and "Flash-on-the-brain".

Re:in case you're curious... (1)

Golias (176380) | more than 9 years ago | (#13233927)

Do you have any actual objection to XML as a document format, or do you just fear change?

Personally, I'll take .xml over .doc any day of the week, and twice on Sunday.

Re:in case you're curious... (3, Interesting)

Tim Browse (9263) | more than 9 years ago | (#13236418)

XML files can be a little ungainly if you want to partially update them, or just append data. Binary files can be better for this (note: 'can').

As is evidenced by the lovely pause that happens whenever I close an MSN Messenger window of someone I chat to often, and it appends the chat history to the 1.5Mb XML file, by reading/writing the whole XML file again....wugga wugga wugga.

(Either that, or their append code sucks!)

But other than that, yes. The size argument doesn't stand up - a counter-intuitive result, but seems to be true. Especially when you start zipping XML files.

Re:in case you're curious... (2, Informative)

rohanl (152781) | more than 9 years ago | (#13238172)

The PDF format is a particularly good example of this. The file contains a set of atoms, and finally at the end of the file is an index that selects which atoms to include and in what order.

Multiple indexes can be included, and the last one found is used.

This means that you can actually save, and update a PDF file, by just appending to the end. You can even save the file on a WORM device that allows multiple sessions.

Doing this also maintains a full file history too. You can retrieve any version of the file by selecting one of the many indexes.

Of course, whether any programs do this is another matter...

Re:in case you're curious... (4, Insightful)

That's Unpossible! (722232) | more than 9 years ago | (#13233766)

So a simple two word text file has the following 33 XML tags pasted here with the greater and less than signs removed...

What is your point? Oh lord, this file is 1200 bytes long, for "just two words of text."

I created the same two-word document and saved it in several text-based formats that preserve the formatting. HTML (2700 bytes), RTF (3600 bytes), PDF (16,600 bytes), and of course, Word .doc format (20,000 bytes).

The XML version is smaller than all three, and I dare-say, easier to parse and manipulate with a 3rd party program.

Yeah, if you don't want any formatting information stored with your text, use plain text. But otherwise, XML seems to be as good a format as any of the other markup doc formats commonly used in Office.

Re:in case you're curious... (2, Interesting)

ubiquitin (28396) | more than 9 years ago | (#13234376)


Well, sir, you made the point nicely. Although the HTML file that I came up with in vi came in at around 48 bytes. The 33 tags that TextEditor produces for doc-like-XML is actually a pretty compact way of describing a document along with formatting.

Here's my $.02 on the bigger picture here: instead of fighting about document formats with Microsoft, we will now be fighting over XML data structures. Same old bully, just a different playground.

Re:in case you're curious... (1)

percent20 (895464) | more than 9 years ago | (#13235836)

Is there a standard XML format for word documents?

Re:in case you're curious... (1)

marmoset (3738) | more than 9 years ago | (#13243813)

Yes. Yes, there is [oasis-open.org] .

Re:in case you're curious... (1)

mwvdlee (775178) | more than 9 years ago | (#13293271)

RTFAT (T for Title)

Re:in case you're curious... (1)

quanticle (843097) | more than 9 years ago | (#13336135)

Open Office (I think) supports the Oasis format for word processor documents. However, if you're talking about MSWord, the only standard is the one Microsoft defines.

Re:in case you're curious... (1)

Photar (5491) | more than 9 years ago | (#13235797)

But what about all those pesky tabs?!

Re:in case you're curious... (1, Interesting)

Anonymous Coward | more than 9 years ago | (#13236016)

what kind of html did you write to get 2700 bytes? Using XHTML and a version of CSS 3 (which, yes, is not yet supported by anything, but is an example of how to model this sort of stuff) I got


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd ">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8" />
        <title>Untitled</title>
        <meta name="generator" content="BBEdit 8.2" />
<style type="text/css" title="text/css"> /* <!{CDATA{ */

      @media print and (width: 8.5in) and (height: 11in) {
            @page {
                    margin: 1in; orphans: 2; widows: 2;
            }
      }
body {font-size: 12pt; font-family: "Times New Roman";}
p {font-size: 12pt; font-family: "Times New Roman"; text-indent: 5em;}
span.001 {font-size: 24pt; font-family: Arial;} /* ]]> */
</style>
</head>
<body>
<p><span class="001">Hot time!</span></p>
</body>
</html>


882 bytes. Note that to be fair I left in a reference to the text editor I used (and which supplied the base template) and defined the phrase as a non-standard font. Still just 882 bytes. What are you using, FrontPage??

Re:in case you're curious... (2, Insightful)

Trillan (597339) | more than 9 years ago | (#13236604)

I thought he was demonstrating different exports from Word. Word 2004 (Mac) makes it 2,167 bytes. Granted, that's horrible HTML...

Re:in case you're curious... (3, Insightful)

NutscrapeSucks (446616) | more than 9 years ago | (#13236748)

Granted, that's horrible HTML...

It's also a fair example, because Word-HTML can "round-trip" back to Word with no loss in fidelity. A barebones HTML file can not.

Re:in case you're curious... (2, Informative)

Trillan (597339) | more than 9 years ago | (#13237610)

Well, that's often the case, but I'm betting you could encapsulate two words in a way that could be transported back to Word (with formatting intact) a lot more efficiently.

A lot of the bulk seems to be Word saving unused style sheets, which arguably doesn't need to be done to keep the document true.

Re:in case you're curious... (1)

hunterx11 (778171) | more than 9 years ago | (#13237780)

You can see just how horrible [lavincolindo.net] it is versus handwritten HTML.

Re:in case you're curious... (5, Funny)

commodoresloat (172735) | more than 9 years ago | (#13237376)

<x-html><!x-stuff-for-pete base="" src="" id="0" charset=""><DIV></DIV> <DIV></DIV> <o:DocumentProperties/> <w:fonts> <w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/> </w:fonts> <w:docPr/> <w:body> <wx:sect> <w:p> <w:pPr/> <w:r> <w:rPr> <w:rFonts w:ascii="Helvetica" w:h-ansi="Helvetica" w:cs="Helvetica"/> <wx:font wx:val="Helvetica"/> <w:sz w:val="24"/> <w:sz-cs w:val="24"/> </w:rPr> <w:t>I agree.</w:t> </w:r> </w:p> <w:sectPr> <w:pgSz w:w="12240" w:h="15840"/> <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/> </w:sectPr> </wx:sect> </w:body>

Re:in case you're curious... (1)

Pope (17780) | more than 9 years ago | (#13270029)

I've always wondered why HTML emails always have "!x-stuff-for-pete>" tags at the top.

Who the fuck is Pete?

Re:in case you're curious... (1)

rbannon (512814) | more than 9 years ago | (#13239055)

LaTeX file is 71 bytes long. Here's the ASCII code:

\documentclass[11pt]{article}
\begin{document}
Hot time!
\end{document}

Try it, you'll like it!

Why remove the greater and less than signs? (1)

edalytical (671270) | more than 9 years ago | (#13234772)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve"><o:DocumentProperties></o:Doc umentProperties><w:fonts><w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/></w:fonts><w:docPr></w:docPr><w:body><wx:s ect><w:p><w:pPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Helvetica" w:h-ansi="Helvetica" w:cs="Helvetica"/><wx:font wx:val="Helvetica"/><w:sz w:val="24"/><w:sz-cs w:val="24"/></w:rPr><w:t>Hot time!</w:t></w:r></w:p><w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/></w:sectPr></wx:sect></w:body></w:w ordDocument>

Who is maintaining the "standard"? (1)

amichalo (132545) | more than 9 years ago | (#13232857)

It really concerns me that MS is able to create a "standard" due to their market share. What ensures they continue to maintain or even _use_ their own standard?

I think of the browser wars. MS loves it that everyone but them are W3C compliant because that ensures they can break all other browsers simply by being incompatible with one standard. Because of their market share, developers will just 'give up' and code CSS, Javascript, and the like as IE compatible. Out of frustration with incompatible websites, users won't use FireFox et al and MS maintains control.

So I welcome the compatibility but I'd like to see an independant standards body regulat the XML DTD.

Re:Who is maintaining the "standard"? (3, Informative)

mroch (715318) | more than 9 years ago | (#13232917)

OpenDocument [oasis-open.org] from OASIS

Re:Who is maintaining the "standard"? (2, Informative)

tsa (15680) | more than 9 years ago | (#13232979)

Don't forget that in the days before IE, Netscape was the market leader and they defined the standard. Nobody cared about that then.

Re:Who is maintaining the "standard"? (0)

wcb4 (75520) | more than 9 years ago | (#13233228)

tags anyone.... really standards compliant...

Re:Who is maintaining the "standard"? (1)

Gorbag (176668) | more than 9 years ago | (#13234390)

Don't forget that in the days before IE, Netscape was the market leader and they defined the standard. Nobody cared about that then.
Perhaps because the browser was free (as in beer), and they didn't also control the (non-free) OS.

lol noob (0)

Anonymous Coward | more than 9 years ago | (#13236642)

Netscape used to cost money.

Re:Who is maintaining the "standard"? (1)

TheRaven64 (641858) | more than 9 years ago | (#13234497)

Really? I recall learning HTML, and there being an official standard, a set of IE extensions and a set of Netscape (2.0 at the time) extensions. If you used any of the extensions you made sure that your page degraded cleanly in other browsers.

Re:Who is maintaining the "standard"? (2, Insightful)

fm6 (162816) | more than 9 years ago | (#13235347)

Netscape never "defined the standard". There have always been W3C specs for HTML. The problem was that in the middle 90s, W3C was taking forever to define specs for more than the most trivial web pages, and Netscape wasn't willing to wait on them.

Nor was it true that "nobody cared". Lots of people bitched about it.

Re:Who is maintaining the "standard"? (1)

blibbler (15793) | more than 9 years ago | (#13235925)

There were certain tags and technologies that (arguably) needed to be made or developed that netscape had to do, but there were also W3C standards that Netscape blatantly ignored. For example the CSS standard was made prior to Netscape 4, but Netscape had notoriously poor support for it, while IE had CSS support (albeit very limited) back in version 3.

While IE6's poor standards support is a limitation now, it is nothing compared to the pains that Netscape 4 put people through.

Re:Who is maintaining the "standard"? (2, Insightful)

martinX (672498) | more than 9 years ago | (#13236809)

Which is why even the dedicated MS-haters blanched at having to use NN4. It was bloated, buggy, crappy.

MS didn't achieve browser dominance just through (mis)use of their monopoly. Netscape helped them by releasing NN4.

Re:Who is maintaining the "standard"? (1)

Linus Torvaalds (876626) | more than 9 years ago | (#13237536)

There were certain tags and technologies that (arguably) needed to be made or developed that netscape had to do

Element types. Not "tags".

but there were also W3C standards that Netscape blatantly ignored. For example the CSS standard was made prior to Netscape 4, but Netscape had notoriously poor support for it, while IE had CSS support (albeit very limited) back in version 3.

Get your facts straight. At the time Microsoft were implementing CSS, it wasn't a published W3C recommendation. And Netscape submitted their own stylesheet language, JSSS to the W3C too. Unfortunately for Netscape, the W3C rejected JSSS and published CSS as a recommendation. Netscape was left scrambling to support CSS at the last minute, and did it by transforming CSS to JSSS on the fly, which was understandably limited.

The kicker was that one of the reasons the W3C rejected JSSS was because it violated the Principle of Least Power [w3.org] , while CSS didn't. Once Netscape were out of the way, Microsoft went ahead and added their own proprietary extensions to CSS that also violated the principle of least power.

So, while Netscape do have a history of ignoring the W3C, they certainly tried to work with them in this instance, only to get steamrollered by Microsoft who saw it as an opportunity to get a lead over Netscape in the browser wars.

Re:Who is maintaining the "standard"? (1)

fm6 (162816) | more than 9 years ago | (#13246048)

CSS wasn't nailed down as an official "recommendation" until late 1996. By then, almost all the work that was going to go into the original Netscape brower engine had already been done.

I agree that Netscape should have paid more attention to CSS and other W3C standards once they actually appeared. But that's all kind of beside my point, which was that Netscape never "defined the standard".

"Nobody cared about that then"? (2, Funny)

commodoresloat (172735) | more than 9 years ago | (#13237402)

I guess you're too young to remember bitching about the "BLINK" tag?

Re:Who is maintaining the "standard"? (1)

Nevyn (5505) | more than 9 years ago | (#13233559)

So I welcome the compatibility but I'd like to see an independant standards body regulat the XML DTD.

Given the rest of your message, what would this achieve? The only way anything will get better is if a significant number of people push back at stds. non-compliance ... and then it doesn't matter who created/maintains the std.

The obvious place for this to happen is government bodies, and non-US ones are starting to imply they will do this. How much they push back remains to be seen.

Ugly format.. (0)

ciroknight (601098) | more than 9 years ago | (#13232862)

Yeah, it might be good and all that Office is switching word over to XML, but it's such an ugly format. Readable, but very, very bloated.

Would probably be more effecient to use straight XHTML to make documents...

Oasis (1)

michokest (893732) | more than 9 years ago | (#13232969)


XHTML has its place: web.
If you were looking for something witty (and Slashdot-approven) to say, you meant Oasis [oasis-open.org] .

Re:Ugly format.. (4, Insightful)

Heisenbug (122836) | more than 9 years ago | (#13233380)

I don't really see the problem with "bloated" xml, when the files are zipped by default. Instead of smushing your efficiency requirements in with your readability and standardization requirements (and screwing all three), you first handle readability and standardization and then rap it in a standard efficiency layer. The upshot is, not only are the files often *smaller* than the old Word equivalent, but I can also hack through them using a couple of standard perl packages that have come with linux, OS X and cygwin for years.

Where's the downside?

Re:Ugly format.. (1)

raffe (28595) | more than 9 years ago | (#13238042)

The problem is that they have a patent on it. It you create software and sell it without a license they can sue you. With or without makeing the software gpl

Re:Ugly format.. (1, Insightful)

Golias (176380) | more than 9 years ago | (#13233847)

It's only "ugly" if you are not used to XML. It's certainly not "bloated" at all.

"Verbose" perhaps... but verbosity is kind of the whole point of XML in the first place.

I hate MS as much as the next guy, but I'm thrilled with the fact that they are finally creeping towards some open document standards.

When you consider that their main profit strategy for the last 5-10 years has been "force pointless upgrade sales by screwing with the document format and breaking compatability with everybody, including our old customers," I think the fact that they are suddenly playing nice like this (even though it may open opportunities for other people to chop them off at the knees) shows that Balmer & Co. seriously believe that their future does not lie in merely maintaining the MS-Office monopoly. Maybe Cringely's right, and the boys from Redmond are betting the company on the X-Box evolving into a ubiquitous media console.

Then again, maybe they are so cock-sure that they have the best & brightest programmers in the world, that they think they will be able to open the format and still maintain their lead on quality alone. I find it hard to believe that they are that delusional, but you never know.

Re:Ugly format.. (1)

robbieduncan (87240) | more than 9 years ago | (#13234273)

Text edit can load and save XHTML (1.0 Strict or 1.0 Transitional) with Embedded CSS, Inline CSS or no CSS.

This is just an additional format.

Re:Ugly format.. (1)

snuffdiddy23 (620624) | more than 9 years ago | (#13235964)

I believe that it would be more difficult to have a valid XHTML document that is as flexible as a valid XML document. The nature of self-describing data is that at any point you can add tags that bring new functionality while still maintaining a valid document, whereas you have to get a new XHTML tag ratified by the W3C.

What's more, it is a logical step to use XML, as it is the little brother of the SGML system that dominated documentation for larger companies that could afford development of a SGML system. SGML and XML even have roots implanted in products such as Adobe Designer, Adobe Illustrator, and a myriad of other vector drawing programs via SVG or PDF in the case of Adobe Designer.

What's more, Apple use XML (though has defected a bit in the latest release for some of its property lists) as the cornerstone of its customization, so supported it in their little APSL gem that is TextEdit is logical. Apple's Pages and Keynote also XML as the holder of their data.

I don't know if it would be more efficient, but it would probably be invalid or not have as good a look, unless you have a little Eric Meyer in your machine making CSS that can accomplish what you can with a commericial XML product.

Re:Ugly format.. (1)

chasingporsches (659844) | more than 9 years ago | (#13237383)

i guess you've never seen what a regular word file generates. you should be thankful!

Firefox?? (-1, Offtopic)

tsa (15680) | more than 9 years ago | (#13233085)

I can't get Firefox to open the file in TextEdit. Does anyone know how to do this? I'm on OSX 10.4.2, FF 1.0.6

Re:Firefox?? (1)

Lavoe (902189) | more than 9 years ago | (#13234776)

Mine opens just fine.. FF 1.0.4, OSX 10.2.8, it shows a warning: This XML file does not appear to have any style information associated with it. The document tree is shown below.
(followed by the source code)

Re:Firefox?? (1)

argent (18001) | more than 9 years ago | (#13235906)

I can't get Firefox to open the file in TextEdit.

Save to disk.

Open in Finder.

Open in TextEdit from Firefox? Please tell me that isn't possible.

Re:Firefox?? (1)

Carthag (643047) | more than 9 years ago | (#13242838)

What's wrong with that? It's a document, it shouldn't be able to pose any kind of security risk at all.

Re:Firefox?? (1)

argent (18001) | more than 9 years ago | (#13243954)

It violates compartmentalization. It should not be possible for anything in a web page to cause a document to be passed off to any application that has not specifically registered itself as a handler for web content.

In the case of Firefox on the Mac, that means it shouldn't trust LaunchServices, because LaunchServices includes any application that wants to handle local content. It should only trust "Library/Internet Plugins", and then when necessary (such as for itms:) add specific cases of LaunchServices entries that are known to be intended for web content. So if YOU want to allow TextEdit to be usable as a handler, YOU should be able to _explicitly_ add it to the list firefox maintains.

If this isn't done, then they haven't learned the lesson of the help: hole and the x-man-page: hole. There will be more holes like that in the future, and a web browser that's supposed to be "secure" should simply close of that whole avenue of attack.

Re:Firefox?? (1)

Carthag (643047) | more than 9 years ago | (#13244702)

That's what I thought he was talking about; setting up FireFox to open .xml documents in TextEdit.

.xml? (2, Interesting)

Stuart Gibson (544632) | more than 9 years ago | (#13233158)

I understood that the new office XML formats had an extension the same as the original with an x at the end, as in .docx.

Possibly this was a wrapper for the format to encapsulate images etc? Can anyone who has actually looked at this clarify?

Thanks,
Stuart

Re:.xml? (1)

sethadam1 (530629) | more than 9 years ago | (#13234028)

Not exactly. Office 2003 includes the complete failure "XML" file format. It's useless, it isn't readable anywhere but in Office 2003 (Office 11) (which was a useless upgrade save the UI changes in Outlook), and it has been, and always will be, a complete flop. .docx is the new XML zip bundle for Office 12, which is unreleased to date, and it's strikingly similar to OpenOffice.org's new OASIS format. It will not be XML, but rather an extractable zip file with content and format properly separated. Of course, Microsoft reinvented the wheel, but it's still better than the mess that is Office 2003 XML.

Holy Riddler, Batman! (1)

fm6 (162816) | more than 9 years ago | (#13233929)

"Riddled with references to Word"? Whatever you mean, I don't see it. There's a reference to the Word XML namespace. But all XML applications have to define a namespace.

Re:Holy Riddler, Batman! (1)

Suppafly (179830) | more than 9 years ago | (#13234863)

I was wondering about that too. Maybe all the W's are considered references to word?

Re:Holy Riddler, Batman! (2, Informative)

hawaiian717 (559933) | more than 9 years ago | (#13236635)

Yes, w: at the start of the XML tag indicates that the tag is part of a namespace, which would be defined somewhere in the file by adding an xmlns attribute to a tag.  In this case, it's in the w:wordDocument tag, and in fact several namespaces are defined:

<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve">

that's gotta be the worst XML ever (1)

namekuseijin (604504) | more than 9 years ago | (#13234523)

where's the standard xml header? hell, yeah, it was extended by a proprietary mso-application header. just M$ embrace-and-extend style all over...

any sign of a xmlns attribute anywhere? nope, and yet, they use the ns:tagName notation...

stupid.

has M$ at least released the XML Schemas for the formats? If not, forget it: it's just as illegible as binary...

and let's not forget it'll only display correctly inside MSWord itself...

Re:that's gotta be the worst XML ever (1)

slashusrslashbin (641072) | more than 9 years ago | (#13235040)

Er, maybe you're not looking at it the right way? It's a completely standard XML document with standard XML header and xmlns declarations...

Re:that's gotta be the worst XML ever (1)

larry bagina (561269) | more than 9 years ago | (#13235532)

That XML file was created by Apple's TextEdit.

Why don't you think that over for a couple minutes. It sort of invalidates your M$ bitch fest.

Re:that's gotta be the worst XML ever (1)

EddWo (180780) | more than 9 years ago | (#13235851)

The mso-application header is used by Windows Explorer to identify which application should be used to view the file, and which icon to use in the shell. It wouldn't be very user friendly if all xml files, be they word, excel or otherwise opened in Internet Explorer or notepad.

People expect to see a Word icon, and to be able to launch the appropriate application.

Re:that's gotta be the worst XML ever (2, Informative)

hawaiian717 (559933) | more than 9 years ago | (#13236664)

For some reason Firefox is hiding the standard xml header and the xmlns declaration. Just save the file to disk and open it in your favorite text editor, and you'll see it's there.

Re:that's gotta be the worst XML ever (1)

namekuseijin (604504) | more than 9 years ago | (#13237399)

oh, that's weird...

Influenced by the Open Source Movement (1)

objeck (890008) | more than 9 years ago | (#13235956)

I think Microsoft is noticing that the open source movement is not a fade. They were big proponents of XML based web services and now with them supporting XML document formats for Office they are finally giving users a choice. So, in the future instead of saying .NET/J2EE or Office/OpenOffice people can say both. They might not gain market share with this strategy but they might loose less in the long run.

XML test M$ App w /ad watch ellen for details (0)

Anonymous Coward | more than 9 years ago | (#13236944)

nothing to see here move along ...go get a real computer

Word XML not necessarily a voluntary move... (4, Informative)

soullessbastard (596494) | more than 9 years ago | (#13237031)

Disclaimer: I am a Mac OS X OpenOffice.org developer and a NeoOffice [neooffice.org] project founder

One thing to note is that the Microsoft XML formats and schemas, either those exported by TextEdit or by the .docx format, are not necessarily done by Microsoft by choice. They're not even in response to OpenOffice.org. In my opinion, they are the result of "government forced technology", similar to how the California clean air regulations back in the 70s started to force Detroit to pour more money into catalytic converters and environmentally friendly cars.

There have been numerous government proposals and mandates that require open document formats. Some of the Massachusetts proposals come to mind. I believe the EU also has proposals on the table that require the use of open document formats. The trick with the EU proposal is that it actually mentioned XML (I believe it's the ISIS proposal, but may have the wrong acronym). Governments are large Microsoft customers and Microsoft doesn't want to lose their business. Including the ability to save in publicly documented XML formats gives them a loophole to continue selling to governments, even if all of the open document format requirements are adopted.

The ability of OpenOffice.org (and NeoOffice/J) to support these formats really is dependent on two things. First, the schemas are licensed from Microsoft on non-OSS compatible terms. Each individual person or application has to enter into a licensing agreement with Microsoft individually. This is directly against the terms of either BSD style or GPL style licensing. Secondly, Microsoft may have software patents involved with their schemas according to their licensing terms. While the patentability of a schema itself is questionable, they seem to have several patents revolving around the interpretation of XML schemas that may apply to their Office schemas. This goes against the CDDL style licensing Sun is now fond of.

Because of these terms, the only ways that OOo/NeoOffice could legally support them would be if either the schemas are clean room reverse engineered from example documents or if Microsoft turns a blind eye to open source folk using their schemas. Since I wouldn't want to rely on Microsoft's generosity, the clean room solution is the only way I can see. Sun won't be the one to clean room them either; they don't have to. StarOffice (and Sun built OpenOffice.org for Linux/Solaris/Win) would be covered under Sun's cross-licensing arrangements with Microsoft as a result of their settlement. Those licenses don't extend to non-Sun OOo developers like me, however, so we're all up shit creek.

Just because you can read it and the format is "open" doesn't mean it's "free". You can be sure that Microsoft's lobbyists will make sure that all of those government directives still refer to "open" and no "free" gets snuck in there by mistake.

ed

Re:Word XML not necessarily a voluntary move... (0)

Anonymous Coward | more than 9 years ago | (#13248289)

This goes against the CDDL style licensing Sun is now fond of.

Does anybody care?

Re:Word XML not necessarily a voluntary move... (1)

constantnormal (512494) | more than 9 years ago | (#13267185)

Sun won't be the one to clean room them either; they don't have to.

However, IBM has the capability to clean-room reverse engineer a free and open spec. So long as they are pushing a J2EE-centric application strategy opposing .NET, they have every reason to make a freely open implementation available to the rest of the world.

Hope springs eternal ...

Re:Word XML not necessarily a voluntary move... (1)

Aram Fingal (576822) | more than 9 years ago | (#13273207)

I think another aspect of this is that Microsoft has been having a little bit of trouble recently supporting their own formats. Here at my workplace we have sometimes had to post different versions of template files (expense reports, travel reports, etc.) for different versions of Word -- like one for Office 2000 and another for Office XP (one or the other will usually work on a given Mac version of Word, but not always). With all the versions of Office out there, Microsoft has got to be having trouble figuring out all the bugs. XML should make this kind of troubleshooting easier in the future.

They're not so stupid as to miss the fact that the reason most businesses use their products is the perception that they will be readable anywhere and any time in the future, as long as you stick with Microsoft products. The reality can only get so far from that perception before people start to look for alternatives. In some cases they have. Microsoft must realize that they have been loosing some market share to Adobe Acrobat. PDF is an open (but convoluted) file format and you can download a free viewer. They must also realize that it's only a matter of time before something more open and better designed than Acrobat comes along (Im talking in terms of business perception).

No offense to OO but I actually prefer Abiword for this kind of purpose (if I could only convince the PHB). It has the same advantages as OO in being GPL'd and having a clear XML based file format but it also (at least in recent versions) has good slick, high performing versions on all major platforms (including OS X Cocoa) and is even smaller to download than Acrobat Reader. I feel comfortable telling people to download it to read files I send them -- it's small, it's free. OpenOffice is a bit too big to tell dialup users to download, and a bit too slow to run on older hardware.

BTW, I do use and like NeoOffice for spreadsheet stuff.

Interesting... (1)

mooniejohnson (319145) | more than 9 years ago | (#13237169)

An interesting thing is that trying to open one of those files in Pages results in a dialog that says "This XML files was created with an unsupported beta version of Word" and it doesn't open it. I'm not drawing any conclusions, I just think it's interesting.

Re:Interesting... (2, Insightful)

King Babar (19862) | more than 9 years ago | (#13237546)

An interesting thing is that trying to open one of those files in Pages results in a dialog that says "This XML files was created with an unsupported beta version of Word" and it doesn't open it. I'm not drawing any conclusions, I just think it's interesting.

Ah, Pages. The program has some neat features, but has all of the hallmarks of being rushed out of the door for the 1.0 release. It's a nifty program for making flyers, and maybe short newsletters, but it's pretty much a loss to do any serious word processing in the thing, as it currently stands. In a way, it doesn't surprise me to hear that TextEdit is leading the way on the XML front, despite the fact that Pages has an XML native format...

I almost got excited... (0, Offtopic)

chemacguevara (896855) | more than 9 years ago | (#13237829)

at using the property list editor from the developer package to do something other than change the CPU requirements of Apps like Soundtrack etc. to run on my archaic G4 400. But sadly it doesn't recognize xml. WTF?

Re:I almost got excited... (1)

CableModemSniper (556285) | more than 9 years ago | (#13249388)

Of course it doesn't. It edits property lists, which are occasionally stored as XML. Its not an arbitrary XML editor.

Re:I almost got excited... (1)

chemacguevara (896855) | more than 9 years ago | (#13262776)

this was meant as a joke. The icon has a big XML at the bottom of it. Whatever, man /. users are pretty thick sometimes, especially the moderators

Pages (0, Redundant)

rohanl (152781) | more than 9 years ago | (#13238161)

I created a Word XML doc, and tried to open it in Pages. I got an error saying:
The XML document was created with an unsupported beta version of Word.

The fact that it knew it was a word doc is promising. Looks like Pages will support it too...

Safari does it too (1)

znesic (611899) | more than 9 years ago | (#13257235)

I just tried to load that page in Safari 1.2.4 under Mac OS X 10.3.8 and it displayed just the content of the file (no XML code), so I suppose that:

1. This is a part of NSTextEdit class (or whatever its name is) and is not specific to TextEdit.app

2. It's been around a bit longer, at least since 10.3.8, it just wasn't exposed in TextEdit.app

The good thing is that all the Cocoa apps that use this class will also get the ability to handle Word XML docs - for free.

Re:Safari does it too (0)

Anonymous Coward | more than 9 years ago | (#13262331)

or maybe... just maybe safari parsed the file...
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>