Slashdot: News for Nerds


Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

HTML Tags For Academic Printing?

timothy posted about 5 years ago | from the gedankenexperiment-draws-cries-of-use-ps-or-pdf dept.

Education 338

meketrefi writes "It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents — including the more official ones like academic papers. The problem is using HTML to create pages with a stable size that would deal with bibliographical references, page breaks, different printers, etc. Does anyone think it is possible to develop a decent tag like 'div,' but called 'page,' specially for this? Something that would make no use of CSS? Maybe something with attributes as follows: {page size="A4" borders="2.5cm,2.5cm,2cm,2cm" page_numbering="bottomleft,startfrom0"} — You get the idea... { /page} I guess you would not be able to tell when the page would be full, so the browser would have to be in charge of breaking the content into multiple pages when needed. Bibliographical references would probably need a special tag as well, positioned inside the tag ..." Is this such a crazy idea? What would you advise?

cancel ×


LaTeX (5, Informative)

Anonymous Coward | about 5 years ago | (#28567623)

You seem to be talking about LaTex. It already exists. Don't reinvent it.

Re:LaTeX (5, Insightful)

Liquidrage (640463) | about 5 years ago | (#28567637)

Good answer. Now all we need is for someone to mod your post up to +5 and then lock the thread.

*sigh* it's a slow news day

Re:LaTeX (-1, Troll)

Anonymous Coward | about 5 years ago | (#28567659)

I like your sig. Perfect for any occasion. What? That's not a sig? Coulda fooled me.

Re:LaTeX (5, Interesting)

Petrushka (815171) | about 5 years ago | (#28568039)

I have a sneaking suspicion that when the OP is saying things like "no CSS" and doesn't mention LaTeX, s/he is actually giving specifications in a very obfuscated way -- specifications that need to be deduced. What I take from the post is that the OP wants

  • Portability. Anyone can open an HTML file without having to install new software; the same doesn't go for ODF, LaTeX, or MSWord. I suspect this is the main thing the OP wants. But this shouldn't rule out CSS.
  • Everything in one file: I'm guessing this may be why the OP doesn't want CSS. But that's not a good reason to avoid CSS either, since CSS can perfectly easily go in the same file. (I think it does rule out editing the XML in ODF documents, though, since as far as I'm aware they're always a composite of several files.)
  • Read/edit in the same document. This could be another reason why the OP doesn't mention LaTeX. LaTeX is perfect for editing, not so great for reading: for that you have PDF. Maybe the OP doesn't want to have two separate files like that.

I'm guessing the OP has been inspired by the use of HTML for slide presentations, in the form of S5 [] . I can see that. But the specifications, if I've deduced them correctly, are not hugely well-thought-out ones. I can kind of see someone not wanting to use LaTeX for the reasons given above, but insisting on no CSS is crazy.

In any case, the OP should certainly give slightly clearer specifications if s/he doesn't want to have people yelling "LaTeX!!!" all day.

Re:LaTeX (3, Informative)

thefringthing (1502177) | about 5 years ago | (#28568321)

Most distributions of LaTeX come with some way to compile it and export to PDF, which is both portable and readable.

Re:LaTeX (1)

Apathy451 (234733) | about 5 years ago | (#28568351)

Also of note is MathJax ( which is a full rewrite of jsMath (
(See jsMath in action: [] )

It handles a lot of the Math rendering needed for the web without the need for the end user to install/do anything. Granted, it doesn't do things like macros or any other number of LaTeX stuff, but it does quite a lot as-is for taking straight TeX and rendering it properly.

Re:LaTeX (1)

nine-times (778537) | about 5 years ago | (#28568389)

I have a sneaking suspicion that when the OP is saying things like "no CSS" ... s/he is actually giving specifications in a very obfuscated way

Definitely. The problem with this question is that the OP is suggesting an idea for solving a problem, but then not giving a complete description of the problem being solved. A complete description would also give the restrictions on how you can solve the problem so that you can really analyze what the problem is.

Just to add to your list, another thing that popped into my head (which may be on the 'unlikely' side): the aversion to CSS may be that the poster wants people to be able to edit in an office suite, but might have some inkling that WYSIWYG editors don't handle style sheets very well at all. But really, there's no proper way to handle this sort of formatting in HTML itself. It would have to be done with styles, even if they were inline.

Re:LaTeX (0)

Anonymous Coward | about 5 years ago | (#28568391)

Anyone can open an HTML file without having to install new software; the same doesn't go for ODF, LaTeX, or MSWord. I suspect this is the main thing the OP wants. But this shouldn't rule out CSS.

I do believe that web browsers are more common then pdf readers (Windows does come with a web browser but not a pdf reader), but not by much at all. This isn't really much of a point against LaTeX.

Everything in one file: I'm guessing this may be why the OP doesn't want CSS. But that's not a good reason to avoid CSS either, since CSS can perfectly easily go in the same file. (I think it does rule out editing the XML in ODF documents, though, since as far as I'm aware they're always a composite of several files.)

Again, this is easily something LaTeX can take care of.

Read/edit in the same document. This could be another reason why the OP doesn't mention LaTeX. LaTeX is perfect for editing, not so great for reading: for that you have PDF. Maybe the OP doesn't want to have two separate files like that.

This [i]could[/i] be a really trivial advantage for some HTML jiggering over an established standard for such things.

I'd be perfectly happy to learn of a better solution then LaTeX for this situation - but your post doesn't really give any notable reason LaTeX would not really work here. Your points are extremely trivial.

Eh-hem: LaTeX!!!

Re:LaTeX (3, Insightful)

AceofSpades19 (1107875) | about 5 years ago | (#28568415)

I'm pretty sure every one that has a browser, has a pdf reader to read pdfs written in latex

Re:LaTeX (5, Informative)

The Snowman (116231) | about 5 years ago | (#28567679)

You seem to be talking about LaTex. It already exists. Don't reinvent it.

Another alternative is RTF, which is a sister SGML language of HTML. While it may have drawbacks, it would accomplish most if not all of what is required.

Re:LaTeX (3, Informative)

Anonymous Coward | about 5 years ago | (#28567795)

No one really writes RTF by hand though. DocBook [] is more like what this person is suggesting.

Re:LaTeX (2, Informative)

Liquidrage (640463) | about 5 years ago | (#28567849)

Sadly I have. I've spit out RTF docs from websites many moons ago back when we had lots of printing issues on the web (still sucks, but at least now it's manageable).

In hindsight there was probably a better way...but I was young, and string manipulation is easy.

Re:LaTeX (1)

AceofSpades19 (1107875) | about 5 years ago | (#28568431)

Rich Text Format is nothing like SGML. It has a completely different syntax and it behaves a lot differently.

Re:LaTeX (1)

nevhan (1422601) | about 5 years ago | (#28567755)

Indeed, LaTex is standard in academia, I keep all my written work in LaTex format. I then usually convert to PDF for submission. Its pretty :

Re:LaTeX (4, Informative)

fermion (181285) | about 5 years ago | (#28567809)

Latex is really the solution. There is no reason to reinvent the wheel. In fact, reinventing the wheel might cause problems when submitting papers. From what I have seen, many academic journals prefer .tex and .eps files. I can't imagine what they would do with HTML.

The nice thing about LaTex is that, like HTML, it is a pure markup language, but it is a markup language that understands typesetting so one tend to get a good page layout no matter what. OTOH, HTML merely identifies blocks of text as various generic types, and really does not have a context for the types. The render engine is free visualize, make a sound, or do whatever it wishes to represent the blocks. CSS is what imposes a consistent visual framework, so what one needs to duplicate LaTex is in fact CSS.

Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

Re:LaTeX (5, Informative)

Anonymous Coward | about 5 years ago | (#28567861)

Actually, the font problem is solved by using XeLaTeX (which uses XeTeX [] ).

Full OpenType support. Looks amazing.

Re:LaTeX (2, Informative)

dangitman (862676) | about 5 years ago | (#28567923)

LaTex is a solution for certain (usually niche) purposes, but has its drawbacks, like everything else. The problem is that we are a dealing with an online world, where things are published in several different formats, online and offline. LaTex doesn't translate easily or cleanly into HTML, or vice-versa. And good luck getting people outside of math and science academia to use LaTex.

This is a real problem, and shouldn't simply be brushed aside with "use this" comments. There currently is no workable format, no Lingua Franca for multipurpose documents. A solution should at least be attempted to make the web, word processing, page layout and typography interoperable.

Don't get me wrong, I actually like LaTex, but it's an exercise in frustration trying to integrate it with the rest of the world.

Re:LaTeX (0)

Anonymous Coward | about 5 years ago | (#28568007)

There are a few LaTeX compilers that output HTML, one of them apparently even outputs to ODF
link: (found with google)

Re:LaTeX (1)

dangitman (862676) | about 5 years ago | (#28568117)

But they don't do a particularly great job of it. To deploy in a publishing system, a more serious solution is warranted.

Re:LaTeX (1)

MightyMartian (840721) | about 5 years ago | (#28568399)

HTML is simply not that good at more complex forms of document creation, nor does it have to be. Now if we're talking about giving browsers the capacity to directly display LaTex, well, I think that's an interesting idea, but in reality, what the parent is talking about is making an essentially retarded version of LaTeX, with the single ability that it can define page size. How would it render properly given various resolutions?

I agree with others, LaTeX is what the parent is looking for. HTML and CSS have sufficient compromises to try to make browsers do what they never were intended to do. This would take the whole thing absurd extreme.

Re:LaTeX (0)

Anonymous Coward | about 5 years ago | (#28568107)

Another option is to use ReST (Restructured Text) which can be converted both to nice TeX and nice HTML.

Re:LaTeX (5, Informative)

plasticsquirrel (637166) | about 5 years ago | (#28568159)

Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

LaTeX has limited fonts, but if you use XeTeX [] (which uses LaTeX), you can not only use all the LaTeX stuff, but also any TrueType or OpenType font, and native unicode support as well. This is a godsend for typesetting anything that includes words or characters not in English, or just for people who are picky about typography. My personal favorite font is the open source SIL Gentium [] family, which is not only much more beautiful and readable than Times, but contains a fuller character set that makes it compatible with many more languages. Once you start writing documents with XeTeX and nicer fonts, you see how lacking word processors are for good typography and well-structured documents, and how self-limiting the concept is.

For newcomers to LaTeX and XeTeX, including packages and specifying options can be a bit time-consuming when you just want to get started with a basic A4-sized document. Here is the basic XeTeX file template I use for simple stuff. I'm picky about margins, line spacing, fonts, etc. so you know that it's a safe place to start out.


% XeTeX packages

% Formatting packages


\pdfpagewidth 8.5in
\pdfpageheight 11in

% 10pt font: 1.15
% 11pt font: 1.1

\setmainfont[Mapping=tex-text]{Gentium Basic}


Some text...


Re:LaTeX (1)

scubamage (727538) | about 5 years ago | (#28567883)

LaTeX works. It's about as enjoyable as a proctological exam being performed by a porcupine, but it does work.

Re:LaTeX (5, Informative)

femto (459605) | about 5 years ago | (#28567981)

LyX [] . I wrote a thesis in it and didn't have to resort to any manual interventions in the generated LaTeX. Couple it with [] SVG diagrams, generated by inkscape [] , and you have a seamless authoring system that handles both text and graphics. SVG means there is no messy task of keeping source and postscript output synchronised (just right click a diagram within LyX to edit the SVG source with inkscape). Use gnuplot [] to generate your (postscript) graphs and you have pretty well a complete authoring system. A few years ago, LyX and inkscape were too immature to use seriously, but they have matured. I recommend the combination.

Re:LaTeX (1)

igny (716218) | about 5 years ago | (#28568035)

You had it easy young man. I had to use ChiWriter [] .

Re:LaTeX (1)

scubamage (727538) | about 5 years ago | (#28568043)

You have no idea how much I could have used that when I was still in the academic arena. The head of our math/comp sci department dictated that all work be produced in LaTeX. Hours of my time wasted. Oh well, at least I know should I find myself in grad school. Thanks for the links! :)

Re:LaTeX (2, Informative)

Anonymous Coward | about 5 years ago | (#28568075)

I think you are overstating the enjoyment possible when using LaTeX :)

Re:LaTeX (0)

Anonymous Coward | about 5 years ago | (#28568235)

I see a lot of people knocking this guy for reinventing the wheel. But I don't get why. Let him take his crack at the problem and see what he comes up with. Let the market decide.

I didn't realise Slashdot had turned into a Microsoft convention.


Re:LaTeX (5, Insightful)

moosesocks (264553) | about 5 years ago | (#28568279)

Although I agree with you in that LaTeX is widely used in the scientific community, and unambiguously offers the best typesetting facilities you'll find outside of a publishing house, is it still appropriate today?

The internet as we know it was created at CERN to facilitate the sharing of scientific information. Why are we still publishing in a format designed to be presented on dead trees?

Like it or not, a properly-formatted print article looks horrible on a screen. An article formatted for printing on A4 or Letter-sized paper will use the whole width of the page, be set in 10-point type, and use columns. Unfortunately, modern computer screens don't have nearly enough resolution to display the full width of the page alongside much else. Obviously, PDF files also don't have the ability to flow to fit the width of the screen.

LaTeX also doesn't give you the benefit of hypertext. Yes, there are various hacks you can use to add anchors and links to PDFs, although these are mere hacks on top of a broken format. Things such as high-resolution figures and hyperlinked references would be particularly beneficial for academic uses. It'd also be great to be able to see all articles linking back to what you happen to be reading. (This brings up all sorts of questions about the very nature of scientific publishing, although this is another debate entirely)

Wikipedia (more specifically, MediaWiki) actually offers a promising solution to these (and the original poster's) requirements. It provides a convenient and simplistic markup for multi-sectioned articles, flows to fit the width of the page, and also provides LaTeX's fantastic mathematical typesetting facilities. Hyperlinking to other parts of the Wiki (and to external sites) is excessively easy. I'm sure the DOI [] system could be integrated to allow linking back to other articles within the constraints of the existing academic publishing regime.

Google could very easily provide the "glue" to hold such a system together, although it would ultimately be better to put a public, non-profit entity in charge. It's absurd and hypocritical that so much of academic research (particularly the publishing part of it) is profit-driven.

Re:LaTeX (3, Insightful)

AigariusDebian (721386) | about 5 years ago | (#28568361)

LaTeX describes the document, just like HTML describes it, but with more structure about it. What you see in a web browser is not HTML, it is a *rendering* of HTML. Different browsers render the same HTML differently, for example a mobile browser will skip stuff and reformat other stuff. In the same way there are different LaTeX renderers - some output PDF, some output HTML, some output ODF. It would be much easier to use LaTeX for the source document and then compile a pretty HTML file and a PDF file from it than to craft it in HTML or some other XML variant directly. You can have links in LaTeX that render as normal links in the HTML output. Why stress?

Re:LaTeX (2, Interesting)

nine-times (778537) | about 5 years ago | (#28568297)

On the other hand, .tex files don't render so well if you drop them into web browsers. I mean, which format you choose really should depend on what your needs are. If you want to be able to store a single copy that can be opened in a web browser or an office suite, HTML isn't necessarily a bad choice, but at this point you wont get great layout control. I really think it's reasonable to hope that as new versions of HTML and CSS come out, they should be aiming towards enabling people to have a good "print" media type CSS that gives professional layout results, but we aren't there yet. We aren't even really close.

If you want people who know what they're doing to be making/editing these documents, then LaTex may be a good choice. If you want people to have normal everyday people to be able to open the file in an office suite they're comfortable with, then ODF is worth considering. If you want a widely supported format only for display/printing purposes (no editing) and you want tight layout control, then you won't do better than PDF.

At this point, there is no format that does it all without any downsides. You have to pick the best tool for the job.

Re:LaTeX (1)

AigariusDebian (721386) | about 5 years ago | (#28568367)

HTML does not render so well if you drop it into an image viewer.

That is why there are LaTeX renderers. Or at least LaTeX->HTML renderers.

Congratulations! (5, Insightful)

Anonymous Coward | about 5 years ago | (#28567639)

Congratulations, you're the 5,134,978th person to suggest a change to HTML which will prevent it from being reflowable!

Please step up to the spiked door in front of the acid pit to claim your prize.

Re:Congratulations! (5, Funny)

Memroid (898199) | about 5 years ago | (#28568393)

Speaking of HTML/CSS... can I be the first person to suggest that we rename "Anonymous Cowardon" back to "Anonymous Coward"?

PDF? (5, Informative)

sys.stdout.write (1551563) | about 5 years ago | (#28567645)

As much as I hate Adobe, there's a reason why PDF files dominate acadamia..

Re:PDF? (1)

jskora (1319299) | about 5 years ago | (#28568271)

The thing with books is that most folks who can read and manipulate them. PDF is similar at this point, relatively ubiquitous and easier to use for most people.

LaTeX (1)

SanguineV (1197225) | about 5 years ago | (#28567649)

LaTeX: it has everything that you are looking for and can be easily compiled to ps, dvi, pdf, and (I am told but haven't used) html. It even plays nicely with version control, bibliography management (BiBTeX), etc.

As a bonus you can run it on linux via command line.

wondering if we should let go of standard tags (1)

tjstork (137384) | about 5 years ago | (#28567657)

I am wondering if the whole concept of CSS modifying a set of stock tags is unwieldly, and if a simpler Html might be one that allows you to first specify a page schema with custom tags, then, renders those using CSS to define custom tags. So, instead of having pages with div class = "menu", we might have , etc.

Re:wondering if we should let go of standard tags (0)

Anonymous Coward | about 5 years ago | (#28567901)

I suggest you do a little research on XML and XSL.

Re:wondering if we should let go of standard tags (2, Insightful)

Hecatonchires (231908) | about 5 years ago | (#28568209)

"You know, if we abstract this back one level" Now we find the true terror of computer science.

CSS3 is the solution (5, Informative)

Tiles (993306) | about 5 years ago | (#28567673)

This is exactly what CSS is designed for, presentation. The CSS3 Paged Media [] module already defines a number of the properties and settings you're going for. It even includes positions such as @bottom-center to allow you to position footnotes and references. The only thing missing is a way to mark this up in HTML, which could easily be done with anchors and the longdesc attribute, coupled with the CSS content: property. What you're looking for is a CSS3 enabled browser, not a new specification.

Re:CSS3 is the solution (1)

rs79 (71822) | about 5 years ago | (#28568067)

Nice. So there's 3 ways to skin a cat.

Normally I'd say use PDF cause the entire planet seems to use PDFs to get the page numbers right but acrobad became such an obnoxious pig in the last couple of releases I regard PDF now more as a warning than anything else.

And yes stupid new spell checker in Opera, I means to say "acrobad". Stop underlining it damn you.

Re:CSS3 is the solution (1, Informative)

Anonymous Coward | about 5 years ago | (#28568205)

Why not use a proper PDF viewer, like one derived from the OSS poppler stack? I find xpdf to be a valuable tool on Linux, and prefer it with PDF docs to my old (very old!) standard of ghostview with PS docs. (PostScript being what we academics used to exchange print-media formatted papers before PDF appeared.)

I rather like the newer latex modules that can be enabled when producing PDF outputs, so that tables of content and references are hyperlinked in the PDF document, while still providing a print-quality typset document for the audience.

Even for less academic technical documents, I find most reading/reviewing collaborations regarding documents are much better served by "let's now discuss page 74" rather than trying to navigate everyone via section labels, paragraph counts, etc. while they look on in their own re-flowed views.

Re:CSS3 is the solution (1)

Hecatonchires (231908) | about 5 years ago | (#28568221)

I've switched from adobe to foxit reader for my pdf viewing. Much much quicker to load. You do need to uninstall a toolbar tho, which I didn't like.

Re:CSS3 is the solution (1)

Agent ME (1411269) | about 5 years ago | (#28568385)

You need to uninstall a toolbar? I'd call that a plus on most systems I see with too many Yahoo/Google/MSN toolbars installed on the browser.

ODF (2, Insightful)

minsk (805035) | about 5 years ago | (#28567675)

LaTeX already got mentioned, and probably makes more sense.

If you really want an unreadable super-general XML-based format, use ODF.

Why not use CSS? (1)

mckinnsb (984522) | about 5 years ago | (#28567685)

I don't seem to understand why you couldn't simply change the properties of standard HTML tags to fit your needs with a simple CSS sheet. HTML, after all, was designed with the explicit purpose of representing a document.
Otherwise, if you want special tags, use LaTEX.
Otherwise, I'm sorry, its really a crazy idea.

Have you looked at PrinceXML? (3, Informative)

sandford (1577271) | about 5 years ago | (#28567697)

Is there a reason you don't want to use CSS? Because, there are already CSS extensions that do exactly what you want. The book Cascading Style Sheets - Designing for the web, was written using only HTML and CSS and prepped for printing using PrinceXML. The PrinceXML web site has a bunch of HTML+CSS similar samples, including academic papers.

Re:Have you looked at PrinceXML? (2, Interesting)

ccvqc (307904) | about 5 years ago | (#28567909)

I've had great experience with PrinceXML -- same document to generate both interactive web page and printable PDF using CSS3 tailored to the media. If you already know HTML/CSS, extending yourself to CSS3 is a lot easier than learning LaTeX.

Mod parent up (1)

bluej100 (1039080) | about 5 years ago | (#28568313)

If you want to print HTML, Prince is the way to go. It even makes our end-user-generated TinyMCE documents look good.

Why not use CSS? (1)

Homburg (213427) | about 5 years ago | (#28567705)

Something that would make no use of CSS?

Given that CSS does this already [] , what's the advantage of adding another way of doing it without CSS?

Wrong, in many ways (5, Insightful)

Zaffle (13798) | about 5 years ago | (#28567721)

What you want (being able to define pages) is wrong in many many ways.

You should, as an authoring tool, never define a page, or its dimensions, especially academic works, which will be printed in different formats, on different paper (A4/Letter/Tradeback/etc/etc)

At most, whatever markup you have, many define things like page breaks, but even then, they are more a typesetting issue.

What you want is either LaTeX or DocBook.

Re:Wrong, in many ways (4, Funny)

Ambiguous Coward (205751) | about 5 years ago | (#28568065)

You should, as an authoring tool, never...

Who're you callin' a tool?

Static Page Feeds are available (4, Informative)

caffiend666 (598633) | about 5 years ago | (#28567729)

Static configurations are available already, not the intelligent ones being requested. Has sufficed for what I needed:

To have print page break add: <p style="page-break-before: always">

Also, to hide odd font and underline for links:

<STYLE TYPE="text/css" MEDIA=print> <!-- A { text-decoration: none; color: black } --> </STYLE>

Yes, they have to be massaged a little.

Re:Static Page Feeds are available (1, Insightful)

Anonymous Coward | about 5 years ago | (#28567761)

1995 called, they want their old HTML back.

Seriously, no browser has needed the HTML comment stuff inside of style tags in many years. And don't even get me started on the uppercase tag names...

hey, why don't we... (2, Insightful)

Anonymous Coward | about 5 years ago | (#28567759)

create yet another little-used and poorly supported document format...

unnecessary (0)

Anonymous Coward | about 5 years ago | (#28567763)

the document size will increase.
normal text like ("this is my file and my image and my link and my e-mail")
will need more tag and element to let the browser speak with it.
e.g ("this is mymy file") and so on.

this is just an unnecessary waste for bandwidth and time :).

especially when there is an alternative solutions, e.g PDF, DOC, OpenOffice.


Nope (1)

nlawalker (804108) | about 5 years ago | (#28567771)

HTML describes a document. "Document" used to imply printed pages, but it doesn't anymore. HTML doesn't have anything to represent the notion of a page because documents don't have pages.

Re:Nope (3, Informative)

nlawalker (804108) | about 5 years ago | (#28567785)

I should have been more clear: HTML describes the *structure* of a document, of which pages are not a part.

As many have said above, you could use CSS if you really wanted to, since page specifications are presentational aspects of the document. Or, you could use LaTeX, which is designed for this kind of use.

Not a bad idea but it points to a larger issue (1)

jeffgtr (929361) | about 5 years ago | (#28567787)

Actually this makes a great deal of sense to me. I'm not sure on this but I think HTML5 contains tags for many of the things needed. I don't think css is the answer though as it is for presentation only. HTML is for hierarchy and structure of information as is XML. The part that makes sense with this is that it would be standardized (if you can keep Microsoft out of it) and could easily be transitioned back and forth between the web, ebooks and whatever device came next. PDF is widely used but truly it is a pain to convert into a structured document. Word is a nightmare with all of the jumbled up MS proprietary tags. I've yet to see an online editor that will clean up that mess with a simple copy and paste. The real issue is standardization in the way we store textual information. It's a huge issue and frankly Microsoft needs to be called on the carpet for manipulating and at the very least getting in the way of standards. It's refusal to recognize standards has caused needless expense to anyone that publishes information on the web. Few people realize the damage MS has caused on the web. Everyone bitches and moans about their operating system but only those directly involved in creating content for the web seem to complain about IE and their corruption of a standardized open document format. The damage they have done in this arena will haunt us longer than windows will, in my humble but sincere opinion.

You don't actually want HTML (2, Insightful)

Anonymous Coward | about 5 years ago | (#28567819)

Seriously. It's pretty bad. You can, however, use Docbook (or your own schema or Docbook extended with your own stuff) and XSLT it into XTHML (or something entirely different) at the end.

Most likely you just want to use Latex though.

what do you want to do? (4, Informative)

jipn4 (1367823) | about 5 years ago | (#28567823)

If you want to save the source form or markup, use a language designed for it: LaTeX. LaTeX lets you represent all the things you would want to represent in an academic paper, it's fairly readable, very widespread, and has tons of tools. And LaTeX converts to both HTML and PDF.

If you want to display on the web, use HTML. It's meant for the web. It's not a good representation for paged media. If you must represent paged media, you need to use CSS or XSL, but you probably don't want to.

If you want archival quality paged representations, PDF is the only game in town really. HTML with CSS doesn't come close. But it doesn't make sense to save your own papers only in PDF because PDF is not really editable and doesn't have the semantic information.

page breaks with css (0)

Anonymous Coward | about 5 years ago | (#28567829)

From JavaScript Site Page breaks with css []

Don't use HTML (3, Informative)

emandres (857332) | about 5 years ago | (#28567837)

You wouldn't want to use HTML for something like this, especially with newer versions of HTML. There has been a steady transition in HTML away from specification of the aesthetic appearance of a page. For this reason tags like <font> and <center> are considered nonstandard anymore, mostly because CSS does a way better (and cleaner) job of it.

Re:Don't use HTML (3, Insightful)

Wonko the Sane (25252) | about 5 years ago | (#28567907)

HTML was never supposed to do those things in the first place. The tags you are referring were hacks invented because CSS did not exist yet.

Unfortunately there is a whole generation of "web developers" who don't understand the concepts of semantic markup and output device-independent layouts.


sgrover (1167171) | about 5 years ago | (#28567841)

I use XML/XSL to render my content as needed - including images and SVG graphics where needed. Then I use the FOP [] project to convert the generated XML-FO into PDF. Works great and can be scripted easily. But the learning curve is kinda steep. Luckily there are a few tutorials [] out there.


Lil'wombat (233322) | about 5 years ago | (#28567935)

I 2nd the XSL-FO recomendation.

XML is like violence. If it doesn't solve your problems, you're not using enough.


caerwyn (38056) | about 5 years ago | (#28568307)

Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems.

Anonymous Coward (0)

Anonymous Coward | about 5 years ago | (#28567873)

Yeah, seriously? This is not a valid slashdot article. PDF and numerous other formats exist for a reason. Why reinvent the wheel, there was no reasons stated in this article why any of the other, very popular open standards for documents couldn't be used.

Ugh... who submits these articles?

In my day (2, Insightful)

Barny (103770) | about 5 years ago | (#28567885)

I used netscape communicator to write all my papers for uni, mainly because it was available under windows and unix (IRIX in our case) and could be read by anyone on any platform.

It was a reasonably easy to use editor, without all the useless crap most others have.

A few lecturers were quite impressed with the idea, the portability and cost were big factors.

Re:In my day (0, Flamebait)

Antique Geekmeister (740220) | about 5 years ago | (#28567921)

And the rest of us used Emacs, just to get the indenting consistent and make sure we closed our parentheses correctly. But the amount of time people waste on page breaking where they want, font selection, "just so" footnote standards, etc. is a sign of people who don't have anything to actually say.

The exception is people who make visual illusion picture books. Other than that, let's get over our "web designer", IDE driven fascination with layout, and use a straightforward plain text format. Then get on with writing something worth reading, not something to be treasured for its footnote layout.

Re:In my day (1)

Barny (103770) | about 5 years ago | (#28568143)

get on with writing something worth reading, not something to be treasured for its footnote layout.

Thats the main reason I used it, I could just open it up on any of the machines they had there (at the time, netscape was the main browser used on campus) and type without having to think "am I in the right font?" or stuff around with paragraph setups and crap that office used.

Of course at the time I was reasonably new to unix so it was something I was comfortable with and knew about already.

Re:In my day (1)

brusk (135896) | about 5 years ago | (#28567951)

How did you handle footnotes? Page numbers?

Re:In my day (1)

Lemmy Caution (8378) | about 5 years ago | (#28568169)

Academic citation guidelines allow you to cite paragraph numbers instead of page numbers.

Re:In my day (1)

Barny (103770) | about 5 years ago | (#28568189)

Select everything you want on one page, wrap it in a table tag, set table to 100% of screen height and leave a separate row at the bottom for page numbers (and a second for footnotes if there were any).

Re:In my day (1)

AigariusDebian (721386) | about 5 years ago | (#28568387)

You should be shot for that. Shot to death.

XSL:FO (4, Informative)

Roxton (73137) | about 5 years ago | (#28567959)

There's a little-used standard that came out of the W3C along with XSLTs called XSL:FO. You write your document in XSL:FO markup, and then one of any number of processors like XEP [] to convert it into PDF or what have you. []

One of the original purposes of it was so that you could use XSLTs to transform the same XML data into both XHTML or XSL:FO for publishing. The standard never took off though. XSL:FO just doesn't have enough options to be typographically interesting, compared to SVG.

Of course, the right answer is LaTeX, but you might want to give XSL:FO a try for familiarity's sake.

Re:XSL:FO (0)

Anonymous Coward | about 5 years ago | (#28568335)

We can go back and forth in time on the specifications and find all sorts of interesting specifications, but the question is will they be portable. That is, will common software decode them as users expect. HTML is derived from IBM SGLM which is derived from GML, which, like runoff is a document preparation system, and will do everything the op wants. The disconnect we are seeing here is likely the desire for a WYSIWYG fixed presentation across platform, a la Adobe PDF, and the reality that most WYSIWG editors doesn't give a user that level of control. For instance, very few web browsers pass Acid 3, and i don't know of any that pass Acid 4, test that guarantee that things will always look the same.

XHTML is a good answer to the original question, however, as it will allow the bells and whistles needed for an academic documet. However, as the question was ill posed(it suggested a solution rather than stating a general problem to be solved with constraints) there is no way an ideal solution will be achieved.

Prince XML (1)

cwt137 (861631) | about 5 years ago | (#28568063)

I think writing papers using XHTML and CSS 2.1 or 3 is a good idea. Then you can use Prince XML to convert it to PDF. Their site has a nice sample [] or two of journal articles / conference papers. The quality of the renderer is great. It was even used to create a professional book, Cascading Style Sheets: Designing for the Web [] .

Docbook, definitely (1)

ishmalius (153450) | about 5 years ago | (#28568073)

It has exactly what you need, an html-like format, but tagged by meaning, not presentation. The project has tools to convert it to printable formats.

The spec: []

The tools: []

Use DITA (4, Informative)

wooden pickle (1006975) | about 5 years ago | (#28568087)

Someone mentioned XML/XSL/FO. Don't try to write your content in XSL-FO. You'll hate every minute of it.

I'd look in to using DITA (Darwin Information Typing Architecture). It's a set of canned XML structures, plus a specification for how to process and customize those structures. It includes tags for stuff like footnotes...I bet it covers a lot of your use cases. There are some good intros to how these XML structures work here: []

As DITA is XML, you can convert it to HTML and whatever else you feel like, pretty easily. There's an open-source implementation of the DITA spec called the DITA Open Toolkit ( The DITA Open Toolkit includes stylesheets/scripts to publish HTML and PDF, among other things. PDFs are published via XSL-FO. Just like HTML needs a web browser to render something useful, XSL-FO requires a FO processor to create a PDF. So, in the end you write DITA, XSLT and other scripts transform that DITA to XSL-FO, the a FO processor consumes the XSL-FO and spits out a PDF. The DITA Open Toolkit comes with an open-source FO processor (Apache FOP). FOP doesn't fulfill everyone's needs, but it might work very well for you.

Unfortunately, working with the Open Toolkit and customizing its output can be a bit unwieldy. [] is a pretty good place to look for help.

RDFa to model bibliographical data (1)

TwistedPants (847858) | about 5 years ago | (#28568115)

Don't reinvent, as so many have already said. CSS works for print media, LaTeX works wonderfully, pdfs work wonderfully. RDFa lets you really define the semantics of anything - People, Businesses, Biliographic data in a workable way.

LyX + LaTeX ... DUH! (1)

WolphFang (1077109) | about 5 years ago | (#28568161)

LyX + LaTeX ... DUH! It even makes it easy to take public domain OCR'ed books and reset them into something extremely nice.... *quickly*

texexplorer (4, Interesting)

e**(i pi)-1 (462311) | about 5 years ago | (#28568163)

yes, latex is nice, but it would be even better, if basic TeX would
be understood by browsers.  About 10 years ago, IBM had a cool plugin called texexplorer.
The plugin would compile latex on the fly. No need to publish a PDF. It worked
pretty well for basic documents which would not rely on macros.

Still, to address the question of the submitter, it would be nice to have something like

$\int_0^1  \frac{\sqrt{\sin(x)}}{1+x^2} \; dx$.

It would not have to be the full latex stack but the ability to place mini latex pages into
HTML documents. Its a pity techexplorer technology seems have disappeared. If IBM would
opensource it, it could become an add-on for firefox.

Re:texexplorer (1)

Hecatonchires (231908) | about 5 years ago | (#28568249)

+1 neat idea

Re:texexplorer (0)

Anonymous Coward | about 5 years ago | (#28568435)

Take a look at the XKCD forums [] . It's not exactly what you're looking for, but it's close.

A solution requires a problem (1)

carlzum (832868) | about 5 years ago | (#28568185)

What is it you're trying to accomplish? Non-standard HTML is certainly not a solution for whatever printing problem you're having, and it eliminates the benefits of HTML. Listen to everyone else that's responded. LaTeX solves most gripes people have with word processors, stick with CSS if you have a compelling reason to use HTML, and look into Docbook XML if you're not happy with the first two options.

If you want to use HTML just to prove it can be done, go for it if you think it sounds fun. But if you're serious about using it for publishing, forget it. No one's going to accept a homegrown HTML file for printing.

Which paper size? (1)

TalkingToes (244859) | about 5 years ago | (#28568195)

What size paper would we all agree upon? You listed "A4", I like "Letter". Close in size, but different. Get the world to agree, and maybe you have you wish one step closer. I'd not vote for "Business Card" sized.

yes, it's a crazy idea. (1)

porky_pig_jr (129948) | about 5 years ago | (#28568227)

This is exactly what HTML was *not* intended to be. We're talking about viewing of a document, with different browsers. No standard display is guaranteed, no matter what you try. For academic documents use software like LaTeX, and create a PDF file, or, use MIcrosoft and create doc file, or whatever. I remember reading somewhere discussion why LaTeX cannot be mapped exactly to HTML (may be it was TeX faq, not sure), and that was pretty much it. Different goals in either case.

Editors should be ashamed. (0)

Anonymous Coward | about 5 years ago | (#28568229)

As mentioned by everyone else in this thread, LaTeX is exactly what you're looking for. HTML is absolutely not, and should never be made into, a page description language.

The editor of this Slashdot summary should be ashamed for not being familiar with LaTeX, one of the greatest open source projects.

Learn the tools first, then worry about changing (2, Informative)

crmartin (98227) | about 5 years ago | (#28568233)

See, as someone has already pointed out, there's at least one such tool that's in wide use already: TeX and LaTeX. If you don't like that one, it turns out that HTML, with CSS and a little bit of Javascript, is perfectly capable of doing all the things you want, too. You just have to learn how. Have a look at Lie's Cascading Style Sheets: Designing for the Web [] (written and typeset in HTML/CSS) and at Prince XML [] for detailed examples.

Themes (0)

Anonymous Coward | about 5 years ago | (#28568325)

I had the same idea as the OP, while looking I found LaTeX and I find it quite perfect for writing pretty much anything, however there is one point which makes it mostly unusable for normal people: themes.

While writing in LaTeX is easy and powerful, in order to theme (typeset?) a document you have to suffer quite a bit: read docs, learn lots of stuff etc. I believe what the OP wants is to be able to easily write documents (HTML) but also, easy to create a presentation (CSS), think about it: CSS is easy, simple and clean and it could be an awesome companion to something like LaTeX or any other markup language. There are a lot of styles for LaTeX that allow to create a bunch of document kinds, however when you want to customize some part of the presentation (like: add a section with a little image to the right and a yellow border) you are in a world of hurt.

I have yet to find an easy way to create print documents and have a good control over the presentation. So far the closest thing are word processors, but I hate the broken visual editing (I prefer to stick with good old code syntax).

You're on the right track, for the wrong reason. (2, Informative)

mellon (7048) | about 5 years ago | (#28568327)

The ability to cite an HTML document is something that would indeed be useful. The ability to hard code page numbers into an HTML document isn't. The reason why academia and the press have been so resistant to HTML, historically, is that you don't get any control over page layout. Which means that you can't refer to things by page number.

The solution isn't to fix HTML so that you can number pages. It is to fix the bibliographic references to not use page numbers. Generally speaking, it's not hard to number documents by section, and you can make the numbering fine-grained enough for bibliographic references. Then refer to the chapter and section, rather than the page number in your bibliography, and you're done. No need to "fix" HTML.

It might make sense to ID paragraphs in HTML, so that you could simply refer to the paragraph ID in your bibliography. If this were simply document metadata, and didn't have anything to do with layout, it would work pretty well. As a bonus, you wouldn't need to renumber, because the ID would just be an arbitrary cookie, and wouldn't need to make sense to a human.

Of course, with hypertext, there's really no need for a bibliography anyway. Just link to the text you're referencing... But I realize that that's impractical in academia at the moment. I'm just saying...

Learn the truth about Slashdot. (0)

Anonymous Coward | about 5 years ago | (#28568345)

Behold Anti-Slash [] , the jihad HQ for the holy war against the Slashdot hive-mind. See our extensive documented failures of Slashdot, and make today the last day of being a robot.

You want more tags? You want XML. (1)

jrharshath (1386973) | about 5 years ago | (#28568355)

Since HTML wont add new tags for you, you could write your paper as XML, and use a stylesheet to display it in whatever fashion you want. That way you could have "one column stylesheet", "two column stylesheet" etc formatting the same XML document in your favourite way of presenting it :)

Re:You want more tags? You want XML. (1)

theillien (984847) | about 5 years ago | (#28568395)

Seconded. This is the sort of custom application that XML was created for.

Universality of HTML (0)

Anonymous Coward | about 5 years ago | (#28568381)

You can embed CSS in HTML pages; that should do what you want, if you have another way of dividing up the right amount of information per page.

Although this is slightly more complicated, I'd look to an XML/XSLT/CSS solution instead. It would enable you to take a source document, split it into pages by paragraph or size, and then format those pages, all while keeping the raw data in XML in the case the user wanted to use another reader.

Re:Universality of HTML (1)

theillien (984847) | about 5 years ago | (#28568417)

You can embed CSS in HTML pages; that should do what you want, if you have another way of dividing up the right amount of information per page.

Please don't encourage people to do things the wrong way. Stylesheets are for style and markup is for content. Style should only be applied as attributes using the stylesheet definitions. Encouraging people to embed their style in their markup is a step backward.

He wants us to reverse-engineer SGML from HTML? (1)

kenh (9056) | about 5 years ago | (#28568405)

SGML pre-dated HTML, in fact, HTML is (in many ways) a subset of SGML.

I suspect the poster never heard of SGML, or it's predecessor GML

Here's a link to a good book on the subject in Google Books: The SGML Book []

There is also DOCBOOK and LaTex..

seriously? (0)

Anonymous Coward | about 5 years ago | (#28568419)

Why are obvious trolls being posted as if they were serious questions?

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account