Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Is Dedicated Hosting for Critical DTDs Necessary?

Cliff posted more than 7 years ago | from the might-the-W3C-be-interested dept.

The Internet 140

pcause asks: "Recently there was a glitch, when someone at Netscape took down a page that had an important DTD (for RSS), used by many applications and services. This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. Is this a sane way to build an XML based Internet infrastructure? Companies come and go all of the time; this means that the storage and availability of those DTDs is in constant jeopardy. It strikes me that we need an infrastructure akin to the root server structure to hold the key DTDs that are used throughout the industry. What organization would be the likely custodian of such data, and what would be the best way to insure such an infrastructure stays funded?"

Sorry! There are no comments related to the filter you selected.

I know! (4, Funny)

Colin Smith (2679) | more than 7 years ago | (#19170719)

ICANN!

Mhahahahaha. Yeah. I know, I crack myself up.

 

Re:I know! (1)

Score Whore (32328) | more than 7 years ago | (#19170793)

Clearly the particular organization is not yet formed, however there is absolutely no question that it should be hosted in Iran.

Re:I know! (1)

rs79 (71822) | more than 7 years ago | (#19171253)

Why there? Did you want to run an MLM?

Google or archive.org come to mind as a more senseible choice.

Re:I know! (2, Insightful)

mollymoo (202721) | more than 7 years ago | (#19172215)

The point was that repling on a single entity isn't a good idea. Google is a single company, The Internet Archive is a single organisation.

I'd suggest something more along the lines of DNS, where although there would be a single ultimate authority, the day-to-day business of serving DTDs would be distributed and handled by multiple levels of servers.

Catalog files? (1)

aamcf (651492) | more than 7 years ago | (#19172159)

Am I missing something here, or is this problem solved by catalog files? Surely any decent XML parser that can download an external DTD subset from a URI can get the DTD subset via a catalog file?

Re:Catalog files? (1)

gmack (197796) | more than 7 years ago | (#19173929)

Or better yet why can't you just copy the blasted thing to your own site if your going to use it?

Is there some technical reason I'm not aware of that means it has to stay somewhere central?

Re:I know! (4, Funny)

commodoresloat (172735) | more than 7 years ago | (#19172231)

ICANN!

Mhahahahaha. Yeah. I know, I crack myself up.

No you cann't!

Centralization (4, Insightful)

ushering05401 (1086795) | more than 7 years ago | (#19170725)

Nothing too insightful to write, but worth saying in today's volatile political climate. Centralization makes me nervous.

Regards.

Re:Centralization (2, Interesting)

radarsat1 (786772) | more than 7 years ago | (#19170945)

Exactly. How about hosting these important files via a decentralized bittorrent tracker?
Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.

Re:Centralization (4, Informative)

Bogtha (906264) | more than 7 years ago | (#19171069)

There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.

This is known as a URN [wikipedia.org] . URLs and URNs are together known as URIs.

Re:Centralization (1)

TheRaven64 (641858) | more than 7 years ago | (#19171079)

Doctypes do not contain a URL indicating the location of the DTD, they include a URI. This URI is typically a URL, but could easily be something else.

Re:Centralization (1)

Doctor Memory (6336) | more than 7 years ago | (#19173867)

Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
Nope, it just means that your torrent tracker would have to have a way to resolve the reference. Whether something like DNS where you have specific "go-to" hosts, or whether you just ask every host you're connected with, or something else (maybe a kind of dynamic mesh with ad-hoc gateways), the choice is up to you.

Maybe something like NTP, where you have the strata-1 time servers, and then the designated strata-2 servers, and everyone is encouraged to set up a strata-3 server for their own subnet. This way nobody's really dependent on anyone else if they don't want to be. Once this gets set up, maybe you could even have a dtd: protocol that specifies how to find a server, and how to cache DTDs once you get them (and how to expire or occlude them when a new version comes out).

Don't use them (5, Insightful)

Anonymous Coward | more than 7 years ago | (#19171191)

If the absence of these files will break your app or service, then you need to make your app or service more robust.

Sure, DTD files are necessary for development. If your app requires that they be used to validate something in real time each time it is comes in from a client or whatever, then use an internal copy of the version of the DTD file that you support. If the host makes a change to it (or drops it, or lets it get hacked), your app won't break, and you can decide when you will implement and support that change.

I really don't see what is gained by making the real time operation of your application dependent on the availability and pristinity of remotely and independently hosted files. It just makes you fragile, and you can get all the benefits you need from just checking the files during your maintenance and development cycles.

Re:Don't use them (4, Informative)

Skreems (598317) | more than 7 years ago | (#19173061)

Exactly. The only point of having a URL associated with a DTD is to assure a unique identifier for each one. It wasn't worth starting a group specifically to regulate DTD identifiers, so they hooked it to a system that's already regulated. Yeah, it's nice to have the DTD live at that location, so if you get a file with a reference to an unfamiliar DTD you can pull it down on the spot, but it shouldn't be required.

Localized hosting (1)

Alien54 (180860) | more than 7 years ago | (#19171623)

what to stop someone from hosting this files locally, for their own use, on a local server? In some cases this would not be practical, with redirects for downloading, etc. but could this be done for some instances?

Re:Localized hosting (1)

FLEB (312391) | more than 7 years ago | (#19172011)

I think (might be wrong) that most of the problems come from some apps which:

1.) Use the DTD URI to determine a document's type, from a list of known URI/type associations in the application. (For instance, a web browser that checks the DTD to determine whether to render in HTML or XHTML mode.)

and

2.) Validate the document against the DTD from the copy stored at the URI (given that the URI is a URL... it does not necessarily have to be.)

And, if the DTD isn't at the URL (fails on 2), it barfs from not being able to validate the document. However, if the URI is not one from its known list (hosted elsewhere, for instance), it would not know which of its rendering schemes to use to display/process/etc. the document.

Re:Localized hosting (1)

rfreedman (987798) | more than 7 years ago | (#19172077)

Indeed, I've always considered this a must for production applications - particulary intranet applications The overhead of retreiving the DTD from the web is simply unacceptable in many situations.

Centralization of more than DTDs is good. (1)

MikeFM (12491) | more than 7 years ago | (#19171939)

The trick is to make centralized copies of important, or oft used, files available. I'd not just do DTD's. I think as AJAX, Web 2.0, or whatever you wanna call it, grows more popular and demands users download more and more Javascript, images, etc that are often the same files between different websites that it could be very useful to them if we stored a copy of those shared files on one server, with caching properly configured, so that users need to only download and store one copy instead of dozens of copies.

You don't have to centralize the originals - just copies. You get the benefits of a centralized resource without the risks of a centralized organization.

Re:Centralization (0)

Anonymous Coward | more than 7 years ago | (#19172177)

Why? It's not as if the XML specification would have to change. People would still be able to host their DTDs where ever they want.

w3c (5, Insightful)

partenon (749418) | more than 7 years ago | (#19170729)

w3c.org [w3c.org] . There's no better place to keep the standards related to the web.

Re:w3c (4, Funny)

JordanL (886154) | more than 7 years ago | (#19170853)

There's no better place to keep the standards related to the web.
Some say that wistfully, others begrudgingly.

Re:w3c (2, Interesting)

inKubus (199753) | more than 7 years ago | (#19171737)

What about a distributed file system that works like DNS? Hierarchial servers that each are responsible for a different level of the DTD. The "Root" is a trusted group of servers, which maintain a list of other servers where you can get a copy of the rest of the DTD. Then plugin builders and other sub-entities can have their own server for extensions to the base DTD.

Unfortunately, the DNS method has proven to not necessarily be the best way, with poisoning and stuff that can occur. Of course, it was designed during the days when they didn't just let anyone on the internet. But you can always diff your copy all the way to the publisher if you are paranoid, and with a signing server or something MD5ish that signs the DTD.

Re:w3c (1)

bofkentucky (555107) | more than 7 years ago | (#19171835)

The TXT record is more than capable of doing this, just like your spf statement for your approved mail exchangers.

Re:w3c (1)

flooey (695860) | more than 7 years ago | (#19171923)

w3c.org . There's no better place to keep the standards related to the web.

I'd expand on that and say: whatever organization is responsible for developing the format that the DTD is for. The W3C is responsible for things like XHTML, so they should be hosting the DTD for it. The IETF should have the DTD for Atom. RSS is currently maintained by Harvard and the DTD should be maintained by them.

Re:w3c (1)

J'raxis (248192) | more than 7 years ago | (#19172245)

RSS 0.9x was developed by Netscape; having the originator host it, forever, is how we got in this problem in the first place.

sure there is! (2, Funny)

commodoresloat (172735) | more than 7 years ago | (#19172255)

What's wrong with this website [microsoft.com] ?

Re:w3c (0)

Anonymous Coward | more than 7 years ago | (#19172705)

That place is for standards. Just because it's an important dtd doesn't mean it's a standard. A better place would be at the GNAA.

hmm... (0, Redundant)

Anonymous Coward | more than 7 years ago | (#19170733)

ICANN do it!

Re:hmm... (0, Redundant)

pionzypher (886253) | more than 7 years ago | (#19170859)

Yeah? Well whatever you can do, ICANN do better.

Re:hmm... (1)

Short Circuit (52384) | more than 7 years ago | (#19172541)

ICANN do anything better than you.

ICANN song. (1)

Short Circuit (52384) | more than 7 years ago | (#19172695)

I'm sorry...it just has to be done. Source should be obvious. But I butchered it horribly because I kept getting pwned by the line length filter.

Anything you can do, ICANN do better./ICANN do anything Better than you.

No, you can't./Yes, ICANN. No, you can't./Yes, ICANN. No, you can't./Yes, ICANN, Yes, ICANN!

Anything you can be ICANN be greater./Sooner or later, I'm greater than you.

No, you're not. Yes, I am./No, you're not. Yes, I am./No, you're NOT!. Yes, I am./Yes, I am!

ICANN shoot a partridge With a single cartridge./ICANN get a sparrow With a bow and arrow.
ICANN live on bread and cheese.
And only on that?/Yes./So can a rat!
Any note you can reach ICANN go higher.
ICANN sing anything Higher than you.
No, you can't. (High)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN! (Highest)

Anything you can buy ICANN buy cheaper./ICANN buy anything Cheaper than you.

Fifty cents?/Forty cents! Thirty cents?/Twenty cents! No, you can't!
Yes, ICANN, Yes, ICANN!

Anything you can say ICANN say softer./ICANN say anything Softer than you.
No, you can't. (Softly)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer)
YES, ICANN! (Full volume)
ICANN drink my liquor Faster than a flicker./ICANN drink it quicker And get even sicker!

ICANN open any safe.
Without bein' caught?/Sure./That's what I thought--you crook!

Any note you can hold ICANN hold longer.ICANN hold any note Longer than you.

No, you can't.
Yes, ICANN No, you can't/Yes, ICANN No, you can't.
Yes, ICANN
Yes, I-I-I-I-I-I-I-I-I No, you C-A-A-A-A-A-A-A-A-A-A-A-A-N'T--
CA-A-A-A-N! (Cough, cough!)
Yes, you ca-a-a-an!

Anything you can wear ICANN wear better./In what you wear I'd look better than you.
In my coat?/In your vest! In my shoes?/In your hat! No, you can't!/Yes, ICANN Yes, ICANN!

Anything you say ICANN say faster./ICANN say anything Faster than you.
No, you can't. (Fast)
Yes, ICANN. (Faster) No, you can't. (Faster)
Yes, ICANN. (Faster) Noyoucan't. (Faster)
YesIcan! (Fastest)

ICANN jump a hurdle./ICANN wear a girdle.
ICANN knit a sweater./ICANN fill it better!
ICANN do most anything!/Can you bake a pie?
No./Neither can I.
Anything you can sing ICANN sing sweeter./ICANN sing anything Sweeter than you.

No, you can't. (Sweetly)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't, can't, can't (sweeter)
Yes, ICANN, CAN, CAN (Sugary)

Yes, ICANN! No, you can't!

DTD? (4, Insightful)

mastershake_phd (1050150) | more than 7 years ago | (#19170755)

and DTD stands for? Distributed Technical Dependency?

Re:DTD? (4, Informative)

x_MeRLiN_x (935994) | more than 7 years ago | (#19170871)

Document Type Definition

Re:DTD? (0)

Anonymous Coward | more than 7 years ago | (#19170935)

Death by Text Data

Re:DTD? (4, Funny)

Sporkinum (655143) | more than 7 years ago | (#19171727)

It's the sound Carlos Mencia makes...

Re:DTD? (1)

bckrispi (725257) | more than 7 years ago | (#19172017)

^If only Mencia were that funny!

Re:DTD? (1)

Joebert (946227) | more than 7 years ago | (#19172197)

Thankyou, my eyes are still watering, great laugh !

Re:DTD? (1)

Chris Mattern (191822) | more than 7 years ago | (#19173383)

I thought it was the sound Tweeky makes.

"DTDTDT, that's right, Buck."

Chris Mattern

In case of death... (4, Insightful)

Kjella (173770) | more than 7 years ago | (#19170785)

...keep a copy, host it on your own site and reference that instead. There was no problem except that some were using that file to download the definitions. Or just expand the definition to include a checksum and a list of mirrors. Is this even a problem worth solving? I mean except for the slashdot post it seemed to me like this went by without anyone noticing.

Re:In case of death... (0)

Anonymous Coward | more than 7 years ago | (#19170911)

Many applications have already addressed this with custom DTD readers. Having your mission critical web applicaiton depending on some server you don't control is just asking for trouble.

Re:In case of death... (2, Interesting)

centinall (868713) | more than 7 years ago | (#19171225)

what if you're using a 3rd party library that has references to the dtd, schema or whatever? you don't really want to go through and change all of them.

what if XML files, for instance, are being exchanged between your application and others and they are including a dtd that doesn't reside within your domain?

I'm sure there are other scenarios as well.

D2D (0)

Anonymous Coward | more than 7 years ago | (#19170827)

"This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. "

Install them into every commercial and consumer router.

Not only DTDs, but also ontology definitions (1, Insightful)

Anonymous Coward | more than 7 years ago | (#19170879)

Such a system should also allow stable storage and management of ontology definitions, used within the semantic web.

I would suggest someone like OSTG or the Mozilla foundation...

Re:Not only DTDs, but also ontology definitions (1)

Achromatic1978 (916097) | more than 7 years ago | (#19171749)

I would suggest someone like OSTG or the Mozilla foundation...
Hahaha. You crack me up.

Why?

"Is this a sane way to build an XML Internet" (-1, Troll)

TodMinuit (1026042) | more than 7 years ago | (#19170913)

Sane XML -- Good one!!!

XML? What? (0, Flamebait)

moderatorrater (1095745) | more than 7 years ago | (#19170951)

You sound like a PHB who thinks to himself, "XML is a buzzword, I'll bet it'll get the job done."

Sane? (5, Insightful)

DogDude (805747) | more than 7 years ago | (#19171009)

Well, I wouldn't call it sane if anybody who is actively using XML and needs a DTD isn't hosting it right along with whatever web site they're using the XML for. Relying on somebody else to maintain a critical DTD that you use isn't sane. It's pretty dumb.

Re:Sane? (1)

sconeu (64226) | more than 7 years ago | (#19171433)

Who says you're using XML for a website?

Re:Sane? (2, Insightful)

DogDude (805747) | more than 7 years ago | (#19172475)

Well, even if you're not, then you should absolutely, positively, and without any doubt, at least in my mind, have a copy of all of your DTD's.

Re:Sane? (2, Insightful)

curunir (98273) | more than 7 years ago | (#19172687)

Exactly. If you write an application that requires a DTD (or XSD for that matter) to parse an XML document, include that file as part of the software. The XML processing code should intercept entity references and load them from the local copy. Not only does this make your application more reliable, it also makes it faster.

Public hosting of schema documents should not be for application use where the application knows ahead of time what kind of document it will be parsing (like the RSS situation). In all likelihood, a change to that schema document will cause an error in the XML parsing anyway, since the parser isn't expecting new or changed elements.

Public hosting of documents should be reserved for editors that create XML documents that must comply with a given format. This allows XML authors to validate their documents against the schema, but nothing breaks when the publicly-hosted document becomes unavailable.

No (5, Insightful)

Bogtha (906264) | more than 7 years ago | (#19171023)

You shouldn't be using DTDs any more. Validation is better achieved with RelaxNG, and you shouldn't use them for entity references because then non-validating parsers won't be able to handle your code.

For those document types that already use DTDs, either you ship the DTDs with your application, or you cache them the first time you parse a document of that type.

The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust. You shouldn't be looking at the hosting to solve the problem.

Mod parent up (1)

Mr 44 (180750) | more than 7 years ago | (#19171129)

This is just not an issue worth solving...

Re:No (3, Insightful)

Anonymous Coward | more than 7 years ago | (#19171377)

The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust.

Not sufficiently robust is an understatement. ****ing stupid is what I would call it. If every browser had to hit the W3C site for the HTML DTDs every time they loaded a web page, the web would collapse.

Re:No (1)

rholtzjr (928771) | more than 7 years ago | (#19172233)

Okay, so what you are saying is that we ship the SAME DTD that is already defined with the application that we provide. ???? WHAT ????

This is does not follow OO design methodologies! REUSE!!!! The whole point behind OO design is that we reuse existing components. If we can not do this then what is the point of OO. If we have defined a DTD that can be used BY the community, then it should be made available FOR the community. The re-distribution of the DTD does not make sense, as it could be altered from one iteration to another. If we suggest this approach, we would have MANY iterations of the SAME DTD.

I am not sure I agree with your assessment of "If you use a DTD, then you must provide it". This does not solve the issue

RelaxNG may provide a mechanism to allow the custom modification of existing schemas, by providing a modification still does not solve the issue of: REGISTERING your schema with any applications that uses it..

The real problem here is that fact that an XML document REQUIRES a DTD!!!, whether its a default or or custom defined.

If a default/Custom DTD is required for the parsing of an XML documentation, then it SHOULD be provided as a service, as this follows the re-use advantages of an OO design pattern!

IN other words, It your application require parsing of a structure, then this stucture MUST be publically available

The bigger question is HOW we REMOVE this dependency!!!

Re:No (1)

msuarezalvarez (667058) | more than 7 years ago | (#19174185)

From what you write, it is clear that this is among the least of your problems... Anyways: please do not shout as much!

DTDs, XML entities and the non-breaking space (3, Funny)

Darkforge (28199) | more than 7 years ago | (#19172339)

Unfortunately, DTDs aren't just for validation... they're also the only good way to define "entities" (e.g. "&foo;") in XML. This comes up a lot when trying to put HTML in XML feeds, because HTML has a lot of entities that aren't in the XML spec. Specifically, you may notice that you can't type " " in ordinary XML.

It's trivial to define "&nbsp;" yourself in a DTD, (<!ENTITY nbsp "&#a0;">) and many of the standard DTDs out there do define it, but by the XML 1.0 standard it's got to be defined somewhere or else the XML won't parse.

Re:DTDs, XML entities and the non-breaking space (1)

Lachlan Hunt (1021263) | more than 7 years ago | (#19172471)

You're better off using numeric or hexadecimal character references instead, or just encoding the file in UTF-8 and using whatever character you need directly. Although, it would have really helped if XML 1.0 had predefined the entire set of entity references defined HTML4, instead of amp, lt, gt, quot and apos. Then they all could have been used without a DTD.

A few problems with RelaxNG validation (1)

wowbagger (69688) | more than 7 years ago | (#19172365)

The last time I checked, there is no mechanism by which an XML file can provide a link to the corresponding RelaxNG schema in the same way that it can provide a DTD.

Thus, while an application which expects files conforming to a specific schema can validate against that schema, it is not possible for a program to validate an arbitrary XML file. For example, there is no way xmllint can automatically find the related RelaxNG schema, in the same way that it can find the DTD.

If I am wrong, and there is a way to provide the schema, please enlighten me.

Re:A few problems with RelaxNG validation (1)

SimHacker (180785) | more than 7 years ago | (#19173585)

There's a reason for that!

Here's a discussion about it on the Relaxng-user mailing list:

http://relaxng.org/pipermail/relaxng-user/2003-Oct ober/thread.html [relaxng.org]

>> I'm a relatively new "convert" (from XML Schema) to RELAX NG. I understand that there is no standard way to associate a RELAX NG schema with a document. I'm just wondering if there is any plan to make this possible.

> Not really. The theory is that you might want to validate a document against different schemas for different purposes, and no one schema is really preferred.

James Clark weighs in with his usual clarity:

>> In simpler words, the people who designed the technology don't see a consistent way to formally express an association that already exists, or didn't implement it yet.

> It's part of the general problem of specifying appropriate XML processing; an RNG-specific solution is neither particularly general nor, IMHO, particularly useful.

I would divide the problem of specifying appropriate XML processing for a document into:

(a) how to specify the process to be performed
(b) how to locate the appropriate processing specification

I see (b) as a special case of the problem of how to specify rules that, given an XML document, find a related resource. This is problem that the XML vocabulary that I've designed for nXML mode is intended to solve. It's not specific to RELAX NG or for that matter to schemas. You could use the same vocabulary to describe how to find the XSLT stylesheet to use to display an XML document.

[...]

Although it's important to be able to individually specify the schema to use for a particular document, it's also convenient to be able to specify rules that apply to classes of document. For example, on my system I have a rule that says when the namespace URI of the document element is http://relaxng.org/ns/structure/1.0 [relaxng.org] , then the schema is /home/jjc/schema/relaxng.rnc.

James

> Hum, that's a place where I would expect the XML Catalogs to take a role in abstracting the file paths.

I think that's an independent issue. If you are in an environment that has a policy of using XML catalogs for URI remapping in XML-related contexts, then it would make sense to use them for remapping both URIs occuring in include/externalRef in schemas and URIs occurring in locating files. However I don't see any need to explicitly couple locating files to catalogs. My personal opinion is that, although XML catalogs are an appropriate solution to the problem of publicId-to-URI mapping, using XML catalogs to perform URI-to-URI mapping is an XML-specific solution to a non-XML-specific problem.

James

> Thus, for me the only reasonable choice is still to use the DOCTYPE declaration for all associations

If you want to use DOCTYPEs, the nXML method can accomodate you (by doctypePublicId rules). However, I find the problems of using DOCTYPEs worse by far than the problem of associations disappearing on a rename. And even with DOCTYPEs, you can still get problem of the association changing; you still have to associate your DOCTYPEs with schemas. If you force me to put something in the instance, I would much prefer a processing instruction.

There's no single right way to do the association. Different users will legitimately prefer different approaches. A solution needs to be flexible enough to accomodate them.

James

> My opinion is that this association should be obligatory once present and could not be overriden.

It's a basic tenet of RELAX NG that the schema is not inherent in the document and that validation is a process that has two independently-specifiable inputs. Section 8 of the spec says: "A conforming RELAX NG validator must be able to determine for any XML document and for any correct RELAX NG schema whether the document is valid with respect to the schema." If I understand you correctly, you're saying that if the document contains a particular processing instruction, then there should not be a way to validate it against a schema other than that specified in the processing instruction. That's clearly non-conformant. A conforming RELAX NG validator must allow you to use any schema to validate a document, no matter what processing instruction the document contains.

James

A DOCTYPE declaration does nothing more than identify the external subset. Eliot Kimber is eloquent on this one. See for example the thread at http://lists.w3.org/Archives/Public/www-html/2000J an/thread.html#66 [w3.org]

> Documents have very long and complex lifecycles, and change is inevitable and must be planned for.

Which is exactly why your documents should not contain anything specific to a particular schema language. Who knows what schema language we'll all be using in 20 years?

> But couldn't we at least have one (1) standard way of asserting that a XML document belongs to a (version of) a specific class of documents

The assertion shouldn't be specific to a particular schema language. The assertion should be an assertion that the document belongs to a particular abstract type; an abstract document type involves more than just the (usually infinite) set of documents belonging to the type; there's also semantics, whether formal or informal.

There is no standardized way to make such an assertion. It's not the job of RELAX NG (or indeed of any particular schema language) to standardize such a mechanism. If you want there to be a standard way, I suggest you take it up with the W3C or some other standards body.

I agree that it's often desirable to have a document include information about the abstract type to which it belongs. But it's up to you to decide how your documents should represent this information, just like it's up to you to decide how they should represent any other information. If namespaces aren't enough, then use a PI or use an attribute on the document element. The choice is yours. A schema association mechanism should be able to make use of whatever reasonable way you've chosen rather than mandate a particular way.

James

On Fri, 2003-10-24 at 15:33, George Cristian Bina wrote:

> We solved the association problem in oXygen using a PI

What does your PI look like exactly?

James

Hi James,

For Relax NG we have something like below:

<?oxygen RNGSchema="test/RelaxNG/compact/test1.rnc" type="compact"?> <?oxygen RNGSchema="test/RelaxNG/XML/test1.rng" type="xml"?>

For NRL:

<?oxygen NRLSchema="test/RelaxNG/NRL/nrlSchema.nrl"?>

I guess these can be put better into something like:

<?schema location="schemaLocation" type="rnc|rng|nrl"?>

For type we can eventually have as values: text/rnc, text/rng or text/nrl. For compact schemas we may also need an encoding pseudo attribute.

Best Regards, George

On Thu, 2003-10-23 at 20:43, Daniel Veillard wrote:

the idea growing up seems to have the association done at the toolkit level instead of having it into the instance.

Right. I believe it's quite wrong for a schema language to dictate that a document type designer must use an element name or an attribute for a particular purpose (as W3C XML Schema does with xsi:type or xsi:schemaLocation). The document type designer should be the one who decides what element names and attribute names to use, and the schema language should allow the document type designer to write a schema to reflect their decision. I think this extends to any association mechanism and it also applies to processing instruction target names.

This does not prevent your having something in the instance that influences the association but it does imply that the ultimate authority must be outside the instance. If, for example, you want a processing instruction in the instance to point to the schema, then there must be something outside the instance that says that this is what you want and that specifies the processing instruction target to use for this purpose.

Once you decide that you want something outside the instance to control the association, I think it's an obvious decision to express that something in XML and it's highly desirable to make it application-independent.

> It's unclear to me whether the approach taken by nXML can really be expanded to other frameworks

If it can't, then it's a bug. It was a fundamental design goal to express the association rules in an application-independent way. Eventually, I hope to implement this for Jing as well.

James

Fri5t ipsot (-1, Troll)

Anonymous Coward | more than 7 years ago | (#19171205)

Butts are exposed collect any spilled channel #GNAAW on purposes *BSD is was at the same that has grown up The curtains flew corporate Fucking surprise,

Doctypes are completely broken design. (1)

Ant P. (974313) | more than 7 years ago | (#19171213)

The only other language I know of that even allows file sourcing over HTTP is PHP, and there it's a gaping security hole that defaults to off. In everything else, the dependencies *get installed to the local file system*.

Perhaps something like "pool.ntp.org"? (4, Insightful)

Zocalo (252965) | more than 7 years ago | (#19171279)

NTP.org" [ntp.org] maintains a pool of public NTP servers that are accessible via the hostname "pool.ntp.org", so perhaps something similar would work for a global DTD repository. An industry organization with a vested interest, the W3C seems like the most logical, could maintain the DNS zone and organizations could volunteer some server space and bandwidth to host a mirror of the collected pool of DTDs. Volunteering organizations might come and go, but when that happens it's just a matter of updating the DNS zone to reflect the change and everyone using DTDs just needs to know a single generic hostname will always provide a copy of the required DTD.

Just a thought...

using non-local cached copy considered harmful (4, Interesting)

tota (139982) | more than 7 years ago | (#19171305)

Most tools provide a way to refer to a DTD on a public URL, yet use the local copy instead. (ie: taglib-location directive in java)

Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).

Call me crazy... (4, Interesting)

Nimey (114278) | more than 7 years ago | (#19171311)

but just have your DTD as a W3C standard, distribute copies with your software, and don't bother a remote server until a new version of the DTD is released. Then distribute it with a new version of your software.

Seriously, what the fuck were they thinking relying on a server to be always available?

Re:Call me crazy... (1)

libkarl2 (1010619) | more than 7 years ago | (#19172217)

Seriously, what the fuck were they thinking relying on a server to be always available?
I've noticed the trend lately. Folks *want* some server to always be available. They want this so badly, they just go about their business as if the server in question would always be available. Even trained pros, who know better, sometimes think and/or act this way. Especially with regards to systems they can't see, and do not have to maintain. Thus, the Hard & Painful Lessons of Life(tm) still have their place in the world. ;(

Re:Call me crazy... (2, Interesting)

Megane (129182) | more than 7 years ago | (#19173219)

Even more stupid is that the URI had a freaking version number in the filename! It's not like someone would update it, and then give it the old version number. It's going to give you the same file even when there's a newer version!

Re:Call me crazy... (4, Funny)

Nimey (114278) | more than 7 years ago | (#19173399)

It's not like someone would update it, and then give it the old version number.


Your trust in the world is cute. :-)

URI vs URL (5, Insightful)

Sparr0 (451780) | more than 7 years ago | (#19171315)

A key mistake in your assumptions was brought up when the Netscape fiasco was news, and I will bring it up again...

"http://my.netscape.com/publish/formats/rss-0.91.d td" is a URI. It uniquely identifies a file. It *HAPPENS* to also be the URL for that same file, for now, but that is just a fortunate intentional coincidence. Your software should not rely on or require the file to be located at that URL. /var/dtd/rss-0.91.dtd is a perfectly valid location for the file identified by the URI "[whatever]/rss-0.91.dtd". What we need is for XML-using-software authors to support and embrace local DTD caches, AND package DTDs along with their applications (with the possibility of updating them from the web if neccessary).

It is silly that millions of RSS readers fetch a non-changing file from the same web site every day. It is only very slightly less silly that they fetch it from the web at all.

MOD PARENT UP (1)

timster (32400) | more than 7 years ago | (#19171481)

Don't usually do this, but the above comment is the first one in this conversation that explains why this problem doesn't really exist.

EXACTLY (4, Insightful)

wowbagger (69688) | more than 7 years ago | (#19172435)

Exactly right, but it is even worse than that:

A DTD spec SHOULD have both a PUBLIC identifier and a SYSTEM identifier. The system identifier is strongly recommended to be a URL so that a validating parser can fetch the DTD if the DTD is not found in the system catalog.

The system catalog is supposed to map from the PUBLIC identifier to a local file, so that the parser needn't go to the network.

If you are running a recent vintage Linux, look in /etc/xml/ - there are all the catalog maps for all the various DTDs in use.

So:
  1. The application writers SHOULD have added the DTDs to the local system's catalog.
  2. Failing that, the application SHOULD have cached the DTD locally the first time it was fetched, and never fetched it again.


Re:URI vs URL (1)

Fnkmaster (89084) | more than 7 years ago | (#19172523)

Actually, I'd go a step further. It might be useful to actually *not* host the DTD itself at that URI. As I recall, there was never a requirement that DTDs actually be located at the URI if it was treated as a URL.

If instead the URL just returned a page that said: "You can find a copy of the appropriate DTD at the following locations..." and listed them, it would remove the temptation to introduce a programmatic dependency on that URL being live but still give people a way to find that resource, and force developers to map the URI to a file internally in their applications.

Re:URI vs URL (1)

uctechdude (921990) | more than 7 years ago | (#19173631)

agreed...though i don't think its silly with RSS...it just helps the lazies :D

XML Catalogs (1)

Chris Chiasson (908287) | more than 7 years ago | (#19171565)

I think there is an OASIS standard called XML Catalogs for redirecting offsite schema requests to a local copy...

Re:XML Catalogs (1)

holloway (46404) | more than 7 years ago | (#19171925)

Yes, you're right, that's the standard way of caching them locally. I'm not sure that all RSS clients are XML processors though.

HTML clients (browsers) don't go requesting the HTML dtd, and so it could be said that the RSS client shouldn't either. For RSS clients though they're more pure in that they take the DTDs definition of entities literally so we do need to access the DTD.

But you'd expect clients to cache them, using XML catalogs as you say. They should be packaged with the standard DTDs, a default DTD with all the HTML entities, and only check for updates occasionally without requiring it.

yes: XML Catalogs; no: DTD document hosting (1)

jmaline (26165) | more than 7 years ago | (#19173543)

Kind of, but not really.

Yes, XML catalogs are the answer.

Nothing in the XML specs says that any actual document is hosted at the URI. It's a mechanism to specify a globally unique identifier. It's an identifer, not a promise to host a document. Some folks host the DTD document at the URI, but there's no requirement to do that.

While I'm sure not every RSS client uses a high-quality XML implementation, it seems clearly true that every RSS client is an XML processor. RSS is an XML format. So an RSS client is, um, processing XML...

As for checking for updates, that's a non-issue. Remember, the URI is a unique identifier. If you ever update the DTD, you'd generate a new URI.

Re: Is Dedicated Hosting for Critical DTDs Necess (0)

Anonymous Coward | more than 7 years ago | (#19171691)

Are Critical DTDs Necessary?

As far as I know this is an quite old story.

And we (the Slashdotters) came already to the conclusion that programmers who write code that relies on such kind of external resource need to be fired because they're obviously incompetent and a danger to the business of their employer.

So, it doesn't really matter if such external resources are hosted one way or the other. You stay away from them. You stay away from them. You don't use these external resources in your code.

Uhhm.... I thought we were using XML Schema now??? (1, Offtopic)

SadGeekHermit (1077125) | more than 7 years ago | (#19171767)

People are still using DTD's? I thought everybody switched to XML Schema a while back. God, I can't keep up with this constant flux!

I need some chinese food. Hmm...

Schezuan!

Not again (3, Informative)

dedazo (737510) | more than 7 years ago | (#19171795)

This has been covered before here and elsewhere... anyone who is using a DTD as a URL rather than a URI needs to be taken out and shot. I say bring them all down and let all the apps that rely on them die or be fixed.

DTD? (-1, Redundant)

Anonymous Coward | more than 7 years ago | (#19171855)

Am I the only one who noticed that, when pronounced like a word, "DTD" sounds very much like "dee da dee", ala carlos mencia? Heh.

Supply local DTDs with your app (4, Interesting)

Dragonshed (206590) | more than 7 years ago | (#19171933)

I recently (within the last year) deployed an application that end users use for downloading and viewing custom content, and are intended to install the app onto laptops, tablets, and other portable devices allowing them view said content both on and off-line.

When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.

In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.

The DNS root servers are run by... (1)

_iris (92554) | more than 7 years ago | (#19172107)

Wikipedia's Root nameserver [wikipedia.org] entry says that 4 of the 13 root nameservers are run by private companies.

I have a server in my basement we could use. (4, Funny)

fyoder (857358) | more than 7 years ago | (#19172307)

Linux box with an uptime of 153 days. It does have to go down now and again so I can clean the dust and cat fur out of it, but that doesn't take too long.

Re:I have a server in my basement we could use. (2, Funny)

Skapare (16644) | more than 7 years ago | (#19173253)

I have an old Sun Sparc 5/70 that still works. Rock solid machine and has OpenBSD loaded on it. I even have a static IP address on my dialup service I could put it on.

HTML 5 (1)

somethinghollow (530478) | more than 7 years ago | (#19172359)

I don't know why important DTDs aren't just turned into serializations. HTML 5 (and, in practice, HTML in general) has a text/html serialization because the major browsers don't care about DTDs. It seems like well-published specifications like RSS should just be serialized and DTDs ignored, even though they are presented, instead of breaking when the DTD can't be found. I guess that wouldn't work if a generic XML parser was used for RSS, but for RSS readers, the DTD shouldn't matter.

DTDs are Useless (1)

Lachlan Hunt (1021263) | more than 7 years ago | (#19172367)

Quick, someone register http://all.your.dtds.are.belong.to.us/ [belong.to.us] :-)

Seriously though, we don't need dedicated hosting for DTDs. We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless. Web browser vendors realised this a long time ago. No browser ever read HTML's SGML DTDs, and they do not use validating parsers for XHTML either (although, they use a hack to parse a subset of the DTD to handle XHTML and MathML entity references).

DTDs are bad for several reasons [hsivonen.iki.fi] :

  1. DTDs pollute the document with schema-specific syntax. Since the document itself declares the rules, the question on answered by DTD validation is not the question that should be asked. DTD validation aswers the question "Does this document conform to the rules it declares itself?" The interesting question is "Does this document conform to these rules?" when the person who asks the question chooses the rules the question is about.

  2. DTDs mix a validation mechanism, an inclusion mechanism and an infoset augmentation mechanism. The inclusion mechanism is mainly used for cheracter entities, which solve (but only it if the DTD is processed and processing it is not required!) an input problem by burdening the recipient instead of keeping input matters between the editing software and the document author.

  3. DTDs aren't particularly expressive.

  4. DTDs don't support Namespaces in XML.

Plus, if a UA needs to request the DTD every time it parses the file, that adds significant overhead by the time it fetches the DTD, parses it and checks the document for validity. It's just not worth it. The Netscape RSS DTD issue was a mistake, and it's time to learn from that. There are much better alternatives available for validating XML than DTDs, such as RelaxNG or Schematron.

Isn't this addressed already? (1)

Talchas (954795) | more than 7 years ago | (#19172397)

Isn't this what doctypes like this are for:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transition al.dtd">

That whole PUBLIC thing means that the browser can have its own copy so that it doesn't have to fetch it off the website. Is there a reason that this is not the standard way of doing this?

short answer: no (3, Insightful)

coaxial (28297) | more than 7 years ago | (#19172423)

Validation is overrated. Especially, when it comes to RSS. There's so many competing "compatable" standards, that really aren't. feedparser.org [feedparser.org] has a great write up about the state of RSS. It's pathetic.

If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway. When you're constructing XML, you should write it according to the DTD, but if you're relying on a remote site, then you're asking for trouble. Just cache the version locally, but seriously, you're tool shouldn't really need it. You're engineers do, but not the tool.

Finally, it's trivial to reconstruct a dtd from sample documents.

Re:short answer: no (1)

JustNiz (692889) | more than 7 years ago | (#19172537)

>> Finally, it's trivial to reconstruct a dtd from sample documents.

But it won't be the same DTD as the one used to create the documents, which is probably the 'standard' one.

Re:short answer: no (1)

coaxial (28297) | more than 7 years ago | (#19172893)

But it doesn't matter. It's the one that's actually used. Real world data always trumps ideal data.

Builtin DTDs everywhere! (1)

darthflo (1095225) | more than 7 years ago | (#19172547)

Now I may have not quite grasped the importance of DTDs, but I can think of only one scenario where retrieving a DTD from a to-be-determined location would be useful: Validating XML against any DTD. (Solution: Whomever wants to validate will also provide the DTD.)
To my knowledge any other application could just depend on builtin DTDs for validating the formats it knows and don't care about whatever it doesn't know as it wouldn't be able to intelligently use them, anyways.

Did I forget to take in account one of those nice tiny little huge details somewhere?

URLs were never sane (1)

trimbo (127919) | more than 7 years ago | (#19172721)

Think about it.

A URL has:

  • A hostname
  • A PORT number
  • A path on that machine

The only one of those that the machine itself has any control over hiding from the user is the path, which can be virtualized. However, many aren't. DTDs certainly don't seem to be.

A distributed system for this kind of mission-critical information is what we need. Think DNS for documents, rather than just hosts.

what we need to do be doin (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#19172953)

is to hit dose walls and work dat middle. Use yo penis power to fuck dat girl into submission. Daz whut I's talkin bout!

XML catalog files let your app use local copies... (3, Informative)

KarmaRundi (880281) | more than 7 years ago | (#19173057)

You can map public and system identifiers to local resources. Use them for dtds, schemas, stylesheets, etc. Here's the spec [oasis-open.org] . Google for more information.

DNS? (1)

saltmiser (946687) | more than 7 years ago | (#19173191)

Do it like the DNS system, have a bunch of companies (*cough* google yahoo verizon mozilla microsoft (or not) *cough*) host this stuff, I doubt they'll all go bankrupt at the same time ;P

DTD Critical Hosting (2, Insightful)

liothen (866548) | more than 7 years ago | (#19173195)

Why doesnt the content provider just provide the dtd. Why have to worry about caching it or random errors poping up in it, when the DTD can be stored on the very same server as the website, or stored with the application. Then it doesnt matter if another company screws up or if some miliscious hacker decideds to attack the DTD it doesnt effect your product...
  Some might think well what if it changes?
well its obvious download the new one update your xhtml/xml or application to the specific changes.

Think first, implement later (0)

Anonymous Coward | more than 7 years ago | (#19173499)

Using URLs is just a non-bureaucratic way to avoid name clashes, which is rather clever. However, using http:/// [http] as prefix is rather brain-dead because all the other brain-dead people will assume that you have to anything from this URL. It would have been smarter to add a dedicated URN prefix for this like "namespace:", "spec:" or "whatever:".

the best host is localhost (1)

rickla (641376) | more than 7 years ago | (#19173643)

Having anything in a live project linking externally is insane! I never understood how developers can risk this.

We use maven, use dtd's schemas wsdl etc. Much of the wsdl and other files refer to online areas. We download these and alter the references to be local. Otherwise we would have a build fail because of an internet issue, which is just nuts.

Same with maven, we have our own local repository where we keep a subset of what we use. Again same situation. In these cases this is just for building, I can't imagine doing this on a live site. This can especially go for externally referenced javascript... local copies are your friend.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?