Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

What's in Your HTML Toolbox?

Cliff posted more than 7 years ago | from the utilities-of-the-trade dept.

192

Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"

cancel ×

192 comments

What's in your... (3, Funny)

AKAImBatman (238306) | more than 7 years ago | (#16035457)

So what's in YOUR toolbox?


CAPITAL ONE!

[...]

Wait, what was the question again?

The tools I use. (-1, Redundant)

Anonymous Coward | more than 7 years ago | (#16035591)

vim to edit the XHTML, the W3C HTML validator to check its correctness, and Konqueror to test how it looks.

Re:The tools I use. (0)

Anonymous Coward | more than 7 years ago | (#16035615)

But, but do you have a Captial One card?!?

Re:The tools I use. (0)

Anonymous Coward | more than 7 years ago | (#16035668)

vim to edit the XHTML, the W3C HTML validator to check its correctness, and Konqueror to test how it looks.

Re:The tools I use. (0)

Anonymous Coward | more than 7 years ago | (#16036035)

But, but do you have a Captial One card?!?

FTW (1)

mr_stinky_britches (926212) | more than 7 years ago | (#16035464)

Dreamweaver FTW! It would be a huge timesaver in this situation.

Good luck!

Re:FTW (2, Informative)

mr_stinky_britches (926212) | more than 7 years ago | (#16035486)

You can also do batch file processing with vim by using the following commands: vim *.match.files.* then once in vim: :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).

Obligatory (2, Informative)

hahafaha (844574) | more than 7 years ago | (#16035508)

> :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).

Or, in emacs

M-% (AKA Meta(usually Alt)-Shift-5)
Query Replace: ^M with [nothing] :-)

P.S. Note that ^M is not Caret-M. It is a single character. I usually just copy it out of the file, and then do it in emacs.

Re:Obligatory (2, Informative)

mr_stinky_britches (926212) | more than 7 years ago | (#16035550)

Or, in emacs
M-% (AKA Meta(usually Alt)-Shift-5)
Query Replace: ^M with [nothing] :-)

Question for you: how would you do that across multiple files in emacs?
The global search and replace command using vim on a single file would be simply:
%s/^m//g

Re:FTW (1)

Southpaw018 (793465) | more than 7 years ago | (#16035598)

Eek. I'd say avoid Dreamweaver at all costs. It causes exactly these kinds of problems. And, if not configured correctly, it can even put out malformed code itself.

Debugging HTML/PHP/etc files: UltraEdit-32 [ultraedit.com] . $40 for the single best Swiss Army Editor I've ever found. In cominbation with Tidy (which it has 100% integrated in the interface), it can handle every file-related problem mentioned except for the names themselves. Out of the box, it can do everything from line ending conversion to your standard syntax highlighting (though it doesn't have as many languages out of the box as Scintilla) to assisting you with actually correcting the HTML errors.

Re:FTW (1)

mr_stinky_britches (926212) | more than 7 years ago | (#16035659)

Only UltraNoobs use UltraEdit. You need to spend some time learning VIM (http://www.vim.org [vim.org] ) and you'll never look back.
Cheers.

Re:FTW (2, Interesting)

Baricom (763970) | more than 7 years ago | (#16035764)

Agreed. I dismissed people who kept suggesting vim as "crazy UNIX people." I still felt that way about a week into playing with it, but soon after, I realized how powerful it is once you've figured out how the keystrokes work. Since then, I've used vim on every computer I've worked with and gvim (the GUI-enhanced version of vim) is my primary editor on my Windows box.

vim has excellent syntax highlighting, predictive typing, line numbers, search and replace (with regular expressions), code folding, spell-check, built-in help, and more.

Give yourself two weeks with an open-mind, and you might be surprised about it. The easiest way to get started is to type vimtutor from almost any shell account.

Macros (0, Flamebait)

soloport (312487) | more than 7 years ago | (#16036264)

Can't even consider vim because of the macro capability in emacs. I have remapped Crtl-z to be equivalent to 'Ctrl-x e' (repeat last macro -- since I don't use 'suspend', the normal Ctrl-z function). Then I can record a macro ('Ctrl-x (' and type *anything* then close with 'Ctrl-x )') and use Ctrl-z to rapid-repeat the last macro. Makes repedetive editing very efficient. Can also do 'Ctrl-u 50 Ctrl-z' to repeat a macro 50 times, etc.

I'd move to vim if it had similar ease with macro creation / execution. Does it? Huh? Well, does it? Come on, preach it, brother! Make me a vim believer!

Re:Macros (3, Informative)

Mad Merlin (837387) | more than 7 years ago | (#16036338)

Can't even consider vim because of the macro capability in emacs. I have remapped Crtl-z to be equivalent to 'Ctrl-x e' (repeat last macro -- since I don't use 'suspend', the normal Ctrl-z function). Then I can record a macro ('Ctrl-x (' and type *anything* then close with 'Ctrl-x )') and use Ctrl-z to rapid-repeat the last macro. Makes repedetive editing very efficient. Can also do 'Ctrl-u 50 Ctrl-z' to repeat a macro 50 times, etc.

I'd move to vim if it had similar ease with macro creation / execution. Does it? Huh? Well, does it? Come on, preach it, brother! Make me a vim believer!

q<register> to record a macro, q to finish recording. Execute the macro with @<register>, then you can execute it again with @@. Obviously the @ commands can be prefixed with a number to repeat them that many times, 5@@ would repeat the last macro 5 times, for example.

Re:Macros (1)

Baricom (763970) | more than 7 years ago | (#16036347)

I've heard that these discussions can be dangerous [wikipedia.org] ... ;)

My main complaint about emacs (I tried it for about a month) was the key structure. I didn't like holding down Ctrl whenever I want to do something - I prefer vim's modal command system. I could see how it could annoy some people, however.

I honestly haven't found the need for particularly sophisticated macros while I'm editing. The . (repeat last command) and ! (pipe) keys have always been enough for what I need.

I'm still learning vim, but I like what I've seen so far.

windows... (1)

Lord Prox (521892) | more than 7 years ago | (#16036291)

try Notepad++. syntax highliting for html php js and conversion for windows/unix line ending, macros, hex editor, html tidy-ier-upper, and more. Lots o nifty stuff and i's OSS.



Place a curse on Microsoft [i-curse.com]

Re:windows... (1)

Lord Prox (521892) | more than 7 years ago | (#16036298)

Damn, I forgot the Link [sourceforge.net]

Re:windows... (1)

antic (29198) | more than 7 years ago | (#16036449)

Do any of these options have an inbuilt FTP application so I can edit files on a live server? (No "editing live is suicide" responses please, I don't have time.)

Re:FTW (1)

G-funk (22712) | more than 7 years ago | (#16036172)

jEdit is your friend!

Re:FTW (1)

afd8856 (700296) | more than 7 years ago | (#16036400)

Amen brother. Although for python coding I've been using Eclipse with pydev a lot lately.

Re:FTW (1)

Jaruzel (804522) | more than 7 years ago | (#16036473)

ebugging HTML/PHP/etc files: UltraEdit-32. $40 for the single best Swiss Army Editor I've ever found.


Seeing as you got flamed for this opinion, I thought I'd help you out.

UltraEdit-32 is damn good, I'm sure it's not as slick as some of the other Editors out there, but it has good syntax highlighting, tabs, and the ability to run macros or spawn sub processes and capture the output. Yes you have to put a bit of work in to get it how you like it, but overall it fits the bill, and if you can actually write code without needing noobie popup helps all the time, then it's a good editor.

Yes, an Editor. It's NOT an IDE. Good for many things, although master of none. Your mileage may vary however.

-Jar.

Re:FTW (1)

ozbon (99708) | more than 7 years ago | (#16036609)

Rather than UltraEdit32, I always swear by TextPad [textpad.com] . It's just about the first thing I install on any computer I'm working on - hell, I even paid for a license!

OK, most of the time it's just minor-annoyance nagware, but I figured, I use it so much, might as well pay them for some kind of use.

The only other thing I absolutely swear by for HTML/CSS is BradSoft's "TopStyle Editor" for CSS. Yeah sure, I can use a text editor for the same thing, but TopStyle makes my life easier.

Perl (2, Informative)

hahafaha (844574) | more than 7 years ago | (#16035465)

I know many of the geeks out there have forsaken Perl, but it is still, in my opinion, an indisposable tool. I am currently fixing up a website similar to the one you described, especially in terms of the HTML problems. Write a Perl script to fix capitalization, closing of tags, etc. But understand that if code is not written well to begin with, than in many cases, it is impossible to automate the process of fixing it. You are going to have to do some things by hand.

Depending on how bad it is, consider rewriting the HTML and CSS part of the website from scratch. It may be easier than fixing old code.

Re:Perl (1)

Somatic (888514) | more than 7 years ago | (#16035618)

> Depending on how bad it is, consider rewriting the HTML and CSS part of the website from scratch. It may be easier than fixing old code.

I'll second that. If you have your own system of page templates, CSS, etc, just junk all of the old code entirely. Think of how easy it is to paste text into your own (working) template vs. how hard it would be to go through and manually correct every mistake in a large website.

The tagless text could be gotten easily enough via a Perl script. A script smart enough to grab exactly what you want and put it where you want would be impossible, so you'd want to oversee the process yourself, obviously, but it would become a whole lot simpler.

I can't even find and correct the mistakes in one of my own pages. A whole site, written by someone else (who didn't know what they were doing)? Forget it. If you have a job where you are being trusted with large tasks like that, then you must have your own system for designing sites by now. Start from scratch, toss out everything but what the user needs to see, and stick it in your own framework.

Tidy or Meyer (4, Informative)

hedronist (233240) | more than 7 years ago | (#16035477)

There are two approaches: live with it and make as few changes as possible, or bite the bullet and do a complete rebuild. To do a cleanup, checkout tidy - it does a good analysis of the existing pages and can generate CSS that is OK, but not beautiful. If you want the final pages to look the same, but be standards compliant, see meyerweb.com and read his books on rebuilding pages ("Eric Meyer On CSS" and "More Eric Meyer on CSS"). Pragmatic is his keyword: lots of examples and he makes sense.

  Good luck. You're going to need it.
 

HTML Tidy (3, Informative)

d3ik (798966) | more than 7 years ago | (#16035478)

Been there, try this [sourceforge.net]

Re:HTML Tidy (1)

itwerx (165526) | more than 7 years ago | (#16035623)

Parent is absolutely spot on, Tidy rocks!
      And if you're of the OSX persuasion there's a port here [balthisar.com] .

JavaScript and batch files (1)

Asmor (775910) | more than 7 years ago | (#16035479)

I know it's a huge mickey mouse and there's probably (scratch that-- definitely) better ways, but when I need to do repetitive, but relatively simple, that can be done via command line, I use JavaScript to automatically create all the commands, copy them into a batch file, and done.

Why use static HTML? (1)

The MAZZTer (911996) | more than 7 years ago | (#16035483)

I use PHP. Server side includes are perfect for standard headers/footers. I check server variables to change behavior based on whether it's on the dev server or the final webserver.

I'd paste an example, but slashdot seems to think PHP code is "junk characters".

Re:Why use static HTML? (1)

Fraew (10491) | more than 7 years ago | (#16035507)

i use php includes too - saves a hell of a lot of effort, but even so dealing with close to 2000 static page elements still leaves a lot of room for dead links, imcompletness etc... maybe some day i'll teach my self to implement a CMS setup. So does anyone have a dead link checker? i've never thought about finding one before...

Re:Why use static HTML? (1)

jZnat (793348) | more than 7 years ago | (#16035616)

W3C has a link validator [w3.org] you might be interested in.

Re:Why use static HTML? (1)

larry bagina (561269) | more than 7 years ago | (#16035541)

in that situation, server side includes are just as useful, but faster and more secure.

Re:Why use static HTML? (2, Interesting)

Phroggy (441) | more than 7 years ago | (#16035592)

in that situation, server side includes are just as useful, but faster and more secure.

If what you need is very simple (including footers would count as simple), here's more information about server side includes [apache.org] (SSI). Either rename your pages .shtml, or keep the .html name but set the files as executable (chmod a+x *.html) using the XBitHack.

If you want something more complex, you can use SSI to include a mini-CGI script into the middle of your HTML. CGI scripts can be written in any language, even a shell script:

#!/bin/sh
echo Content-type: text/html
echo
echo (insert HTML here)

Re:Why use static HTML? (1)

SanityInAnarchy (655584) | more than 7 years ago | (#16035660)

I'd paste an example, but slashdot seems to think PHP code is "junk characters".

It is.

PHP was designed for about what you're describing, when there were better technologies out there to do the same thing. It really looks like it was just supposed to be a bunch of PHP tags you'd mix in with your HTML tags, so you didn't have to think too much like a programmer, and could think more like a web coder.

This is a bad idea in the first place -- if you want to do dynamic stuff, learn to program. Worse, for some inexplicable reason, PHP has been adapted to much more general things.

At my current job, I'm developing a Wordpress plugin. Wordpress is written in PHP. I try to offload as much of the logic to a ruby script and into the MySQL database, because PHP is so ugly and difficult to work with. Much more so than Perl, with none of the upside.

And Wordpress seems to be a really decent, well-designed app. Imagine having to deal with some idiot's homebrew PHP... *shudder*

Anyway, why even bring up PHP here? Unless OP ends up redoing the entire site, what they're really looking for is tidy.

HTMLKit for Windows (4, Interesting)

SocialEngineer (673690) | more than 7 years ago | (#16035506)

HTMLKit [htmlkit.com] has a lot of great options for developers, and a good plugin system.

Hey, Windows/Linux Refugees! (-1, Troll)

Anonymous Coward | more than 7 years ago | (#16035517)

The only thing more pathetic than a PC user is a PC user trying to be a Mac user. We have a name for you people: switcheurs.

There's a good reason for your vexation at the Mac's user interface: You don't speak its language. Remember that the Mac was designed by artists [atspace.com] , for artists [atspace.com] , be they poets [atspace.com] , musicians [atspace.com] , or avant-garde mathematicians [atspace.com] . A shiny new Mac can introduce your frathouse hovel to a modicum of good taste, but it can't make Mac users out of dweebs [atspace.com] and squares [atspace.com] like you.

So don't force what doesn't come naturally. You'll be much happier if you stick to an OS that matches your personality. And you'll be doing the rest of us a favor, too; you leave Macs to Mac users, and we'll leave beige to you.

Re:Hey, Steve 'Head' Jobbs! (0)

Anonymous Coward | more than 7 years ago | (#16035816)

The only thing more pathetic than a Mac zealot, is a Mac zealot who believes that the type of computer he owns will get him laid.

Even more pathetic than that, is one who brags about this belief on slashdot, while at the same time exposing his sexual isolation by linking to fugmo's who he thinks are attractive women (who consequently, would quit their jobs as call-girls to avoid ever having to touch him).

MY toolbox... (3, Funny)

grammar fascist (239789) | more than 7 years ago | (#16035518)

My toolbox has a little white pill that I take every time I get a hankering to work with HTML. It fixes me up right quick.

Creating white space (3, Interesting)

M0b1u5 (569472) | more than 7 years ago | (#16035527)

The disaster that was "s.gif" (or "trans.gif" in some circles) used as a layout tool was horribly over-used - and the 'net is a worse place because of it. In most projects now, I seek to replace all instances with a "compatible" approach.

I create a class: .spacer{
        line-height:0;
        font-size:0;
}

Then I replace all those hundreds (and sometimes THOUSANDS) of references to s.gif with the following:



I use a span sometimes, as required - if the DIVs alone cause layout issues.

Say hello to faster web pages instantly!

Re:Creating white space - apologies (4, Informative)

M0b1u5 (569472) | more than 7 years ago | (#16035538)

Oops Sorry!

<div class="spacer" style="width:Xpx; height:Ypx;"></div>

Re:Creating white space - apologies (4, Informative)

masklinn (823351) | more than 7 years ago | (#16036128)

This is worse than image spacer, please go die in a fire

Re:Creating white space - apologies (4, Informative)

Anonymous Coward | more than 7 years ago | (#16036303)

Or you could just use the padding / margin features provided by CSS.

margin-top: 1px;
margin-right: 2px;
margin-bottom: 3px;
margin-left: 4px;
or margin: 1px 2px 3px 4px;

padding-top: 1px;
padding-right: 2px;
padding-bottom: 3px;
padding-left: 4px;
or padding: 1px 2px 3px 4px;

Re:Creating white space - apologies (2, Informative)

julesh (229690) | more than 7 years ago | (#16036408)

Err.. this approach just doesn't work. Images are inline elements, you can't replace them with an equivalently sized block element and expect the page layout to be the same. And setting the CSS 'width' attribute of an inline element doesn't work in Explorer, so the entire approach is flawed. Sorry.

Re:Creating white space (1)

suv4x4 (956391) | more than 7 years ago | (#16036228)

It was called "spacer.gif". It was not abused at the time since empty div/span didn't work. In fact Netscape barely supported any div/span.

Also same can be said for table cells with in them which in some browsers would collapse or misbehave.

You create empty space with "spacer div" today which is not better that what people did back then. In fact it's worse since they had no much alternatives while you do: padding/margin/border where applicable.

White space is rarely just a block of empty space floating around.

Oh, the usual (1)

davidwr (791652) | more than 7 years ago | (#16035535)

left- and right-angle-brackets, the more useful tags like <b> <i> <p> <br> <a> <ol> <ul> <li> <dl> <dt> <dd> <em> <strong> <tt> <blockquote> <div> <ecode>, squiggley-braces, parentheses, periods, quotation marks, and more.

Those and a bare-bones test editor, what more do you need?

Re:Oh, the usual (1)

stubear (130454) | more than 7 years ago | (#16035576)

What, no <blink>?!?

Re:Oh, the usual (1)

Southpaw018 (793465) | more than 7 years ago | (#16035617)

The problem is a bunch of the tags you just listed no longer exist - at least, not in XHTML or HTML 4.1 Strict. <b> and <i> are both gone totally, as are <em> and <strong>. <br> is now <br />. Everything has shifted over to CSS, which is far more powerful anywho.

Re:Oh, the usual (0)

Anonymous Coward | more than 7 years ago | (#16035711)

em and strong are still part of XHTML (all versions) and HTML 4.1 strict. Remember, em and strong are not presentational elements...... i and b are.

Re:Oh, the usual (2, Informative)

reanjr (588767) | more than 7 years ago | (#16035773)

Nope.

br is not now br /, one must simply write well-formed documents. Well-formed HTML (with all tags closed) also uses br /.
em and strong are still alive and well as of XHTML 2.0.
b and i are still available in XHTML 1.0.
There is no HTML 4.1. Presumably you meant 4.01 strict, which is pretty much XHTML 1.0 Strict.

Re:Oh, the usual (1)

jZnat (793348) | more than 7 years ago | (#16036013)

What the fuck are you smoking? em and strong are semantic tags (emphasis and strong text), so they're here to stay. b and i are gone because they were just presentational.

Re:Oh, the usual (1)

masklinn (823351) | more than 7 years ago | (#16036174)

WTF? Not a single one of these is gone in either XHTML or HTML 4.01... TT is still in, I is still in, B is still in, BIG is still in and SMALL is still in, the only elements that have been deprecated from HTML3 are STRIKE, S and U... And "<br>" is now not "<br/>", BR is an element, <br> is a self-closing empty HTML tag and <br/> is an empty XML tag. The former is semantics, the latters are grammar.

Re:Oh, the usual (1)

Yvan256 (722131) | more than 7 years ago | (#16035825)

Don't forget to replace those <b>'s with <strong>'s and those <i>'s with <em>'s. We want XHTML, not HTML 3.2

Re:Oh, the usual (1)

maxwell demon (590494) | more than 7 years ago | (#16036511)

Don't just blindly make this change. Instead determine if the goal was really to emphasize/strongly emphasize, or if it actually had another purpose (e.g. italics is commonly used for foreign words like et al., which certainly should not be marked with <em>. Better replace it with <span class="foreignword"> and add .foreignword { font-style: italic } to your CSS, unless there exists a HTML tag specifically designed for the use, as e.g. <q>for quotations, which also are often done in italics).
Remember, there's a reason that <i> und <b> are removed, and that reason is surely not to force you to type more or make web pages larger.

Ohh! (1)

chris_eineke (634570) | more than 7 years ago | (#16035544)

Vim, grep, and sed. I heard they make movies, too! :-)

Easy. (0)

Anonymous Coward | more than 7 years ago | (#16035545)

Bash, Sed, Awk, Perl and vi.

Dreamweaver (1)

statikuz (523906) | more than 7 years ago | (#16035558)

I've used Dreamweaver pretty successfully to clean up a lot of poor HTML since it has pretty good functionality. I don't really have any suggestions as far as other tools go but for general single page cleanup I like DW. I've cleaned up quite a few huge documents that someone just saved as a webpage out of Word and ended up with 2 MB of HTML. Not really sure if that would work for your batch processing needs but if you have excessive issues with single pages I would recommend it.

Tidy, script, then manually clean (1)

Bitsy Boffin (110334) | more than 7 years ago | (#16035593)

Really, the only way to do a cleanup of your typical dog's breakfast collection of html is

1. Tidy the pages (using htmltidy)
2. Use a custom written script in whatever language (perl is good) to do as much of the task as possible automatically (things like replacing static headers with includes) - you'll need to be good with regex
3. Open the pages manually, and finish the job - I like Dreamweaver for this particularly if it's a complicated table based layout

whatever the case, it's going to take you a lot of time and energy, there is no quick fix.

Firefox with plugins (3, Interesting)

bhav2007 (895955) | more than 7 years ago | (#16035605)

Firefox with the IE Tab (or IE View), Web Developer, View Formatted Source, and HTML Validator extensions.

Re:Firefox with plugins (1)

Southpaw018 (793465) | more than 7 years ago | (#16035626)

I'll second all that, though I have no experience with View Formatted Source. I'll make a note to check it out - in debugging page display and layout issues, those other three extensions are absolutely indispensable.

Re:Firefox with plugins (1)

deek (22697) | more than 7 years ago | (#16035766)

Web Developer gets the two thumbs up from me. An absolutely essential plugin for html creators.

I'd also recommend Live HTTP Headers. OK it's not a html tool, strictly speaking, but it is extremely useful for debugging any web server issues. There's no other way to track HTTP issues down, I believe, unless you telnet to the webserver. Error/access logs on the webserver don't often contain enough info, unfortunately.

Vim and Emacs (1)

jZnat (793348) | more than 7 years ago | (#16035606)

Vim for the editting, Emacs for the web server, interpreted language, games, database, web browser to check it with, source code management, image editor, vector graphics editor, e-mail client, e-mail server, ...

Here's what I got... (-1, Troll)

Anonymous Coward | more than 7 years ago | (#16035621)

Go fuck yourself. Do you want us to do your job for you, you fucking n00b.

Go back to sucking dicks.

jEdit (0)

Anonymous Coward | more than 7 years ago | (#16035643)

I've been using jEdit for a few years now. I've used almost every text editor out there, from Crimson to UltraEdit, and I still think jEdit is the best. When combined with the WebDeveloper extension and DOM inspector for Firefox, it can't be beat.

http://www.jedit.org/ [jedit.org]

Actually... Frontpage (2, Funny)

Planesdragon (210349) | more than 7 years ago | (#16035651)

Actually... Frontpage.

No, really, stop laughing.

Frontpage, once you convince it to stop the WYSIWTG crap, has three tools that will make fixing a non-technical user's webpage easy. (Never, ever, let a non-technical user use Frontpage without supervision. It's worse than Word.)
  1. "Site Management", where you can let Frontpage check for dead files, orphan files, broken links, and do mass re-names of all HTML-based links. (No script correction here, but non-techies don't do that.)
  2. Regular Expresions (or a workable subset thereof)
  3. VBA, to invoke things like "optimize HTML" and "standardize name"

I'd be shocked if there aren't better tools out there -- but by and large either they don't do as much, or they cost a significant chunk of change.

(Hey, you, with the laughing -- point me to a app that can do #1 with compatible replacements for #2 and #3, and, er, you'll get good karma for being so mean and laughing.)

Re:Actually... Frontpage (1)

masklinn (823351) | more than 7 years ago | (#16036188)

I know that DreamWeaver has very strong "Site Management" features but I don't know if it can check for dead files/broken link and do mass renames (I tend not to use dreamweaver when I can avoid it). It also has a very good (PC)RE support, and you can build "extensions" to the software by using CSS & the DOM to manipulate your documents.

on macosx (1, Informative)

Anonymous Coward | more than 7 years ago | (#16035667)

TextWrangler or BBEdit Lite, vi, telnet, ftp, Photoshop CS (not CS2), GraphicConverter, Firefox, Safari.

Dude... (1)

shoolz (752000) | more than 7 years ago | (#16035670)

Leave it as a giant tangled mess and secure your job for the next 3 years. When they threaten to lay you off, tell them you need at least 1 more years of work before you can straighten up the code and 'hand off' the job to the new webmaster.

Steve Irwin Just died (0, Offtopic)

Frogbert (589961) | more than 7 years ago | (#16035674)

I know this is completely off topic but Steve Irwin died about 3 hours ago. He was killed by a stingray.

Re:Steve Irwin Just died (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#16035704)

Truly an American, err, truly an Australian icon. I can only hope he's drinking a beer and talking to Stephen King about crocs up in heaven right now.

Re:Steve Irwin Just died (0, Offtopic)

elronxenu (117773) | more than 7 years ago | (#16035818)

Steve Irwin promoted tourism by demonstrating the "wild" part of "wildlife".

You'd have to agree that's a fitting way for a "crocodile dundee" to die - much more respectable than, say, being run over in the street by a drunk driver, or choking to death on a peanut.

Probably a lot of people wish they'd lived a lifestyle like him, but he actually did it day in, day out. Just an accident, but I doubt he would have wanted to die in an aged nursing home.

Two tidbits (2, Interesting)

ptaff (165113) | more than 7 years ago | (#16035708)

Tidy is great as others mentioned. Will even allow if you feel confident to cherrypick the data you want to scavenge with XSLT.

Separating grain from chaff

A static HTML project has numerous index2.old.html, index2.html, index_2.html, project2.html.old and so on - files that you just aren't sure are useful?

Copy the project directory (touch all the files) and do a wget -r on the tree; by looking at the access time, you'll know all internal referenced files. Alternatively, scan the webserver logfiles to know which files are useful.

Be sure your filesystem is configured to register access times if you pick the first method...

(As a bonus, a close peek on the 404s might give you some answers on mis-used capitalization of filenames.)

Lynx / Links / ELinks

Can be used to dump the text data of old and unmaintainable HTML documents; most useful when trying to scavenge only the text contents to put in a database or so.

First, you better learn HTML before complaining .. (1)

tomhudson (43916) | more than 7 years ago | (#16035729)

First, before bitching about something, you should take a moment to learn about it.

"It has all the usual problems: paragraph tags with no ending tag"

There's no end tag required for paragraphs, as per the official spec: http://www.w3.org/TR/REC-html40/index/elements.htm l [w3.org]

HTML is not XML. Closing tags are optional for some elements, and forbidden for several others. and putting a slash at the end of a tag that doesn't have a closing tag, so it looks "xml-y" is an affectation and a waste of bytes.

Re:First, you better learn HTML before complaining (1)

BigFootApe (264256) | more than 7 years ago | (#16036085)

Hey, 1998 is calling, it want's it's post back.

HTML is not XML.


It is now. [w3.org]

Re:First, you better learn HTML before complaining (1)

masklinn (823351) | more than 7 years ago | (#16036202)

no it's not [w3.org] HTML is still SGML, and still alive and well.

Re:First, you better learn HTML before complaining (2, Insightful)

Bazman (4849) | more than 7 years ago | (#16036222)

The great thing about web standards is... there's so many of them!

Re:First, you better learn HTML before complaining (1)

discord5 (798235) | more than 7 years ago | (#16036351)

Although you're right, flamewar in 5..4..3..2..1.. *ducks*

Search and Relplace (1)

Seraphim_72 (622457) | more than 7 years ago | (#16035731)

Actual Search and replace [divlocsoft.com] $30, windows only. But Lord have mercy, if you need to do massive replavement in text files it is worth every cent. It does perl regex searches and plain old english pattern matching. Good customer service.....yada yada yada, and no - It is not mine, just a satified customer.

Sera

My toolbox... (0)

SanityInAnarchy (655584) | more than 7 years ago | (#16035754)

At my last job, I had to do a LOT of this. Basically, I had to duplicate someone's web site look'n'feel, given nothing more than a URL, and put our (dynamic) content in the middle of it. Then, they could link to our page, and we'd essentially have one page of their site under our control.

First thing: Crack open the source. I would try not to clean it up if I didn't have to. If I didn't like it, that means I had to -- MS FrontPage and all of its DAMN CAPS ON EVERY TAG meant I'd run it through HTML Tidy [sourceforge.net] .

Second thing: Fix the URLs. Since it was on our server, I had to make everything into absolute URLs. Rather than write a general-purpose script for this, I just wrote semi-generic regex search-and-replace in Vim. Replace href="/ with href="http://example.com/. Replace href="../ with href="http://example.com/foo/. And so on, and also with src.

Now the real challenge: Fix the structure of the document. Some don't need much. Some need major surgery -- fixed table widths, images set to those exactly, fixed heights, all kinds of other stuff in a layout... The worst were the ones where their main textual content was split up arbitrarily, to create things like columns.

Or worse, Adobe GoLive. I simply refused to work with it -- absolutely everything on the page, no matter how small or meaningless the distinction -- list items, everything -- was wrapped in its own div and positioned absolutely with separate CSS. The structure of the code did not match the structure of the visual document at all. And the menu (something I'd always have to customize) was generated entirely from some difficult-to-read JavaScript -- I wish I'd known about the web developer's "view generated source"...

Two main things to remember here: Dom Inspector and the Web Developer Toolbar. Dom Inspector to find where what you're looking for lives in the code, and the Web Developer extension (for Firefox) to edit the CSS and see changes reflected in realtime, as well as way, way more stuff than I could possibly mention here, including "view generated source".

Sometimes I couldn't fix their layout, and I'd have to make a brand new document and paste their content into a brand new layout. Sometimes it worked, often it didn't. So keep that in mind -- I know others have said it, but sometimes it makes sense to just throw the whole thing out. But yours looks like it could work with some simple search/replace in Vim -- look for href=, src=, and in CSS, url('...

Re:My toolbox... (1)

Osty (16825) | more than 7 years ago | (#16036137)

Two main things to remember here: Dom Inspector and the Web Developer Toolbar. Dom Inspector to find where what you're looking for lives in the code, and the Web Developer extension (for Firefox) to edit the CSS and see changes reflected in realtime, as well as way, way more stuff than I could possibly mention here, including "view generated source".

For IE, you should look at the IE Developer Toolbar [microsoft.com] . It does for IE much of what Firefox's DOM Inspector and Web Developer Toolbar can do. Works with IE6 and IE7.

Yes, I know, IE is "teh evil", but sometimes you have to work with IE-only pages, or pages that do things in different ways for IE and Firefox. It's nice to have a tool in your toolbox that will let you inspect things in IE just as you would in Firefox.

crocodile hunter steve irwin dead at 44 (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#16035784)

Even if u didn't see him wrestle a croc u prolly saw him do some other crazy ass shit. Truly an australian icon.

WebdevTML Survival Kit (1)

Kent Brewster (324037) | more than 7 years ago | (#16035809)

Previous posts have mentioned Perl and PHP; seconding those for high-intensity search-and-destroy missions. As for software, you can't go wrong with TextPad [textpad.com] , WinSCP [winscp.net] , and PuTTY [greenend.org.uk] .

For best practices (separation of content from structure from behavior, mostly) keep an eye on are listed in and around A List Apart [alistapart.com] and the Web Standards Project [webstandards.org] . And if you're looking for several sets of outstanding presentation and behavior tools, check out the YUIBlog [yuiblog.com] and the Yahoo! Developer Network [yahoo.com] . (Hint: their page grid layout [yahoo.com] , font normalization [yahoo.com] , and CSS reset [yahoo.com] libraries are an excellent place to start.)

Unix tools (1)

dascandy (869781) | more than 7 years ago | (#16035876)

Fromdos, grep, sed and awk. Possibly some normal pretty printer too.

Cheat with PHP (2, Informative)

GloomE (695185) | more than 7 years ago | (#16035890)

$doc = new DOMDocument();
$doc->loadHTML($junky_html);
echo $doc->saveHTML();
Reads in your crappy HTML, turns it into compliant XML, then dumps it out as nice clean HTML.

tidy, web developer FF extension, search & rep (3, Informative)

Tumbleweed (3706) | more than 7 years ago | (#16035910)

Tidy, as others have already mentioned, will be your very best new friend.

Install the 'Web Developer' extension for Firefox, and use some of the HTML/CSS validators in the Tools submenu.

Get a good handle on regex searching & replacing (if you're doing this from Windows, I suggest Funduc's "Search & Replace").

If you're migrating your GIFs to PNG (which I would recommend), then you need to get yourself pngout, to compress them to their smallest possible size (Photoshop SUCKS at this).

And as someone else said, make an empty new standards compliant template, and get to cutting and pasting; it can be a *brutal* initial process, but you'll probably save yourself time in the long run, depending on how clean you want to eventually get the code. If you just want it to be standards compliant, then you can just do a clean up job. If you want to do it 'right,' you'll want to develop a new template and coding style to properly integrate the HTML and CSS. Things like not putting everything in a DIV (a sure sign you're a newbie to CSS), just to style something. Figure out why you should be using H1, H2 tags (& TBODY & TH tags if you're using tables for outer layout), etc, without having to use a lot of unnecessary DIVs all over the place. Inline styles = bad.

Figure out why XHTML may not be the best choice over HTML. Know which DTDs to specify. Know the difference in IE6 between standards mode and quirks mode, and which DTD to use to make IE6 behave. Know that IE7's quirks mode is supposedly identical to IE6's; you supposedly won't get the new 'more-standards compliancy' in IE7 without a DTD.

Oh yeah - the guy who posted about replacing spacer gifs with 'spacer DIVs'? Don't do that to yourself, okay? Yikes.

Learn about usability and readability. Learn about typography, and how light-on-black text should be sized differently from black-on-light. Thinking about grey text on black or grey text on white? Don't be stupid. Make the stuff readable! Learn that sans serif fonts are more easily read at screen density (opposite of print). Learn why Verdana is usually not your friend (go for Trebuchet MS or even Arial).

Oh, and learn to intent your freaking HTML!

Some nice resources:

Activating the Right Layout Mode Using the Doctype Declaration [hsivonen.iki.fi]

Quirksmode [quirksmode.org] - a GREAT resource. Awesome info here. Memorize it.

vi anyone? (1)

4D6963 (933028) | more than 7 years ago | (#16035950)

When I clicked the page I was so sure I would see at least one "What's in My HTML Toolbox? vi!" comment, modded Funny of course, but no...

Maybe I should check again later...

Re:vi anyone? (1)

Ashtead (654610) | more than 7 years ago | (#16035999)

Of course vi is used. I use it myself, for everything, and I've seen a couple others above mentioning it.

I've also made me a little utility that would take a string from the document I'd be writing in, and generate a link <a href="..."> ... </a> style, and put it back in, courtesy of vi. Useful when writing up documentation about programming and configuration.

Then there is the matter of getting rid of carriage-returns and get the text file into the true format, with lines separated by newline characters.

CSSEdit (1)

rampant mac (561036) | more than 7 years ago | (#16035971)

CSSEdit by Macrabbit.

Awesome program and worth checking out if you use a Mac.

wget (1)

nicholaides (459516) | more than 7 years ago | (#16035994)

If you're using all static HTML, you can get rid of dead pages with wget. Do "wget www.website.com/whatever -r" to download it, and then just use what you've downloaded as your base.

To find broken links, I like to use Xenu. Google it.

Re:wget (0, Redundant)

byolinux (535260) | more than 7 years ago | (#16036155)

In Soviet Russia, Xenu likes to use you!

Mein Toolboxen (1)

cheese-cube (910830) | more than 7 years ago | (#16036010)

My HTML toolbox, which is my little 64Mb thumbdrive, has really only the bare essentials for website development: Notepad++ and WS_FTP.

What's in my toolbox? (1)

lewp (95638) | more than 7 years ago | (#16036147)

A hammer for hitting myself over the head, and a bottle of whiskey to numb the pain... of dealing with HTML.

OpenSP (1)

aamcf (651492) | more than 7 years ago | (#16036192)

I've used OpenSP [sourceforge.net] a lot. It's a suite of tools that includes onsgmls, the parser that lies at the heart of the W3 validator. Combined with find you can easily validate local copies of all the files. Its faster than using the validator for multiple pages. It also included onsgmlnorm, which is used to normalize SGML. If you have a load of "XHTML without closing p tags" type HTML, change the doctype to an HTML doctype, run it through onsgmlnorm, switch the doctype back, and all the closing ps are there. (It's not quite that simple though - you have to clean up lots of suprious > s which get introduced for sensible but obscure SGML reasons, usually after img elements. It's trivial to do the cleanup automatically.)

Paradigm shift (1)

suv4x4 (956391) | more than 7 years ago | (#16036269)

Changing old code to new code could rarely be automated, it's not a simple syntax change, it's aq paradigm shift, and computers are not as smart yet as to figure out the semantics of old code and rewrite it into HTML/CSS combo.

HTML Tidy is something free and available which will do the very basic work of cleaning up and fixing the HTML where possible.

Version Control (0)

Anonymous Coward | more than 7 years ago | (#16036375)

Set up a subversion repository, or whatever your version control of choice is.

Add everything to it, even the .olds etc. Then remove all the old stuff, to what you 'think' is current, commit.
Checkout the repo to a webserver, see if anything is broken. (someone previously suggested wget, this too would work). Basically, get yourself a nice starting point.

Then go to town on the code. Everything is in version control, so if you accidentally delete something, you can always look back and figure out what it was and re-add it.

It's not FOSS, but... (1)

bigHairyDog (686475) | more than 7 years ago | (#16036406)

I use Adobe Golive for this, and it's served me well. It detects errors like broken links, and offers batch fixing.

Failing that, perl is probably your best bet.

Dreamweaver (2, Interesting)

Leroy_Brown242 (683141) | more than 7 years ago | (#16036425)

As much as WYSIWYG editors some times suck, Dreamweaver is alright. I like that it helps with the organization but also lets me get as geeky as I'd like.

My HTML Toolbox (IAAWD - I am a web developer) (1)

Qbertino (265505) | more than 7 years ago | (#16036481)

jEdit (www.jedit.org) - best editor in existance, unmatched functionality
Dreamweaver 8 (on OS X) DW is an outdated way to do things, but it still is very powerfull
Quanta (Quanta Gold for Win or OS X - > http://www.thekompany.com/products/quanta/ [thekompany.com] ; Quanta Plus for Linux -> http://quanta.kdewebdev.org/ [kdewebdev.org] )
PHPEclipse (has anoyances but very good PHP tools)

For a redo of that old site of yours I recommend simply installing a CMS and migrating the content by hand if neccesary. That's probably faster and more effective than anything else. Static HTML just isn't the way to go these days, which eliminates most of the need for a large-type HTML editor. Check out joomla! (www.joomla.org)

Start tag: required, End tag: optional (0)

Anonymous Coward | more than 7 years ago | (#16036561)

> It has all the usual problems: paragraph tags with no ending tag

You said it was HTML, right? Ever read the specification? The closing tag for paragraphs are optional.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...