Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Fixing Broken Links With the Internet Archive

Soulskill posted about 9 months ago | from the maintain-URIs-or-T.B-L.-will-beat-you-up dept.

The Internet 79

eggboard writes "The Internet Archive has copies of Web pages corresponding to 378 billion URLs. It's working on several efforts, some of them quite recent, to help deter or assist with link rot, when links go bad. Through an API for developers, WordPress integration, a Chrome plug-in, and a JavaScript lookup, the Archive hopes to help people find at least the most recent copy of a missing or deleted page. More ambitiously, they instantly cache any link added to Wikipedia, and want to become integrated into browsers as a fallback rather than showing a 404 page."

Sorry! There are no comments related to the filter you selected.

machine generated content? (-1, Offtopic)

Anonymous Coward | about 9 months ago | (#46060873)

avoiding machine generated content will be the challenge

Re:machine generated content? (0)

Anonymous Coward | about 9 months ago | (#46063857)

I don't know why that was modded down because it's true. I have found several sites that were entirely taken over by spammers and IA mirrored the content exactly. If they'd have had a better way of reporting such content I'd have helped.

Over Reach (-1)

Anonymous Coward | about 9 months ago | (#46060891)

When you try to do too much you over reach and tend to fail.

Stick with what you know, not the internet of internet links gone by.

Re:Over Reach (0)

Anonymous Coward | about 9 months ago | (#46061085)

But what they know is the "internet of internet links gone by".

Please no? (3, Insightful)

DMiax (915735) | about 9 months ago | (#46060923)

...want to become integrated into browsers as a fallback rather than showing a 404 page

Fuck no. If a page does not exist it does not exist.

Re:Please no? (0)

Anonymous Coward | about 9 months ago | (#46061015)

It's gonna blow the fuck up trying to archive 4chan's 404'd threads anyway.

-- Ethanol-fueled

Re:Please no? (1)

bobbied (2522392) | about 9 months ago | (#46061551)

Not until you run out of stack space....

Recursion for the masses!

Re:Please no? (1)

Xtifr (1323) | about 9 months ago | (#46062519)

You do realize we're talking about the Internet Archive/Wayback Machine here? They already have backups of the old 4chan threads! This is just a way to make it easier to access that data.

The Archive, like Google, already crawls the web and caches pretty much everything. The difference is that Google replaces the contents of their caches when they crawl something a second time, while the Archive archive keeps both copies, with timestamps.

Re:Please no? (1)

thunderclap (972782) | about 9 months ago | (#46063425)

The archive doesn't archive any Chan. It doesn't archive the deep web or any hidden onion site. So no it won't.

Re:Please no? (1, Offtopic)

Mashdar (876825) | about 9 months ago | (#46061109)

You're absolutely right. Fuck me for wanting to read that article on first wort hopping from three years ago. What was I thinking?

Re:Please no? (0)

Anonymous Coward | about 9 months ago | (#46061181)

You're absolutely right. Fuck me for wanting to read that article on first wort hopping from three years ago. What was I thinking?

You selfish bastard. Why don't you think about the rest of us for once, you insensitive clod!

And what about the children? Why isn't anyone thinking about the children??

Re:Please no? (0)

Anonymous Coward | about 9 months ago | (#46062211)

I'm thinking about the children.

--
Support NAMBLA [nambla.com] .

Re:Please no? (0)

Anonymous Coward | about 9 months ago | (#46062437)

What was I thinking?

1) you were thinking that adding hops at the beginning of the boil would be just as good, and that first wort hopping is bullshit.

2) You were thinking you could always just google it and read some other article about the same thing.

3) You were thinking that old article's URL can 301 to the new location.

4) If you get this far without a solution, you're not really trying.

Re:Please no? (2)

mcgrew (92797) | about 9 months ago | (#46065891)

Methinks a moderator needs more coffee, that wasn't offtopic. Let me explain the parent's point, since at least one person was too dense to understand.

The GP said "when a page is gone it should be gone", WHY? That's insane. Say you want to get out that old Quake game and want to look up console commands. You're not going to find that great site because it lapsed a decade ago (the parent used beermaking as his example).

Archive.org [archive.org] to the rescue.

The suggestion is that when you click that bookmark you saved a decade ago, rather than a 404 you get archive.org's copy. However, this might not work in some situations, like when a site is abandoned and someone else registers the name.

If you don't want your site archived, they'll take their copy down.

Re:Please no? (1)

SigmundFloyd (994648) | about 9 months ago | (#46067201)

Methinks a moderator needs more coffee, that wasn't offtopic.

Yes it was, at least in part. The details on the old article's topic were completely unnecessary in order to make the point.

Re:Please no? (1)

bob_super (3391281) | about 9 months ago | (#46061117)

Whoa! What else do you want?
Tell the kids when they suck? Tell bad drivers the rollover was their fault? Admissions of guilt as part of multi-million dollar settlements?

Too much reality is bad for you, mate...

Re:Please no? (1)

dak664 (1992350) | about 9 months ago | (#46062649)

Presumably the wayback redirect would tell you the page does not exist, but the he last time it could be loaded, this was the content. What's wrong with that?

Resurect page (1)

DrYak (748999) | about 9 months ago | (#46063771)

There's a FireFox extension called "Resurrect Pages" which already does this tastefully:

In case of error, it does display the error page, but the extension gives you the choice to look for the missing link in a few place (archive, google cache, etc.)

As long as they don't simply replace 404 errors, but give a choice to the end user, I'm for it.

Re:Resurect page (0)

Anonymous Coward | about 9 months ago | (#46066431)

I wish it could first ping the archives/caches and only give me a list of those that have a copy. It's really annoying to click on all those buttons only to come up with more 404s.

As long as they don't simply replace 404 errors

I would actually like that, on an opt-in basis.

Re:Please no? (1)

Wycliffe (116160) | about 9 months ago | (#46064027)

I would agree that transparently going to the lost page is a bad idea but I would not be opposed to
a 404 error page that has a link to the last known copy of the link. What would be so bad about that?
It would save me the step of trying to find it in the google cache and/or the internet archive which
is what I tend to try to do if it is a link that I want.

No. 404 is important! (1, Insightful)

mrchaotica (681592) | about 9 months ago | (#46060997)

To everyone who might think of subverting the HTTP standard to "helpfully" show me an alternative to a page that does not exist: fuck you.

I don't give a shit whether you're doing it because you want to advertise to me or because you want to altruistically show me what I'm looking for even if it doesn't exist anymore. Either way, you are still lying to me and breaking everything that relies on accurate error reporting. So quit it!

Re:No. 404 is important! (4, Insightful)

Sarten-X (1102295) | about 9 months ago | (#46061065)

Supply HTTP code 404, and provide the content of the old page, preferably with a large banner saying "we couldn't find it, but here's what we had before".

I believe that meets all applicable standards. Automated systems should recognize the 404 code, and human systems (which won't likely see the underlying code) will see the banner.

Re:No. 404 is important! (2)

game kid (805301) | about 9 months ago | (#46061133)

Absolutely agree. Give a nice little popup balloon, visibly separate from the web page (i.e. not like one of those in-client-area IE information bars; make it pop out as window size allows), that says "This page could not be accessed (error err_code). Below is an earlier version from archival_group. [ [ ] Do not show archived versions ever again, you dummy ]". (Maybe with more user-friendly language.) Problem solved.

Re:No. 404 is important! (-1)

Anonymous Coward | about 9 months ago | (#46061251)

Sorry but that violates the standard as well. It must return a 404 or you break testing.
The only way this can be implemented without causing problems for others is to have it be an option in the browser for those who want it to do the additonal lookup.
Then all others who depends on the standard to work does not have to figure out how to work around all the helpful sabotage some insists on making.

Someone who breaks the standard should be forced to solve all the problems they create, especially all the ones who do it for the own gain and to hell with everyone else.

Re:No. 404 is important! (4, Insightful)

Minwee (522556) | about 9 months ago | (#46061483)

Sorry but that violates the standard as well. It must return a 404 or you break testing.

RFC 2616 mandates a 4xx error code followed by an optional human readable reason phrase. While the reason phrase is usually "Not Found" for a 404 error, there's nothing keeping it from being augmented by "...but a copy of a previous version is over there."

If your testing relies on anything beyond the numeric error code, then it's probably already broken.

Re:No. 404 is important! (1)

butchcassidy1717 (1129219) | about 9 months ago | (#46061627)

What about a new 4xx code. Like 444 - Original not Found, but Archived Version Available [here]?

This is being done client-side (1)

pavon (30274) | about 9 months ago | (#46063087)

A new error code won't help, because for that to work the original website would have to send it. But if a link is broken, then they already negelcted to send a usefull response code. This feature is about how the client responds to a 404 error, in which case the most honest thing to do is show the user the 404 message that the site provided, but also let them know that they can access an older version of the page if they wish. Which is pretty much how the existing Wayback plugins work.

Re: This is being done client-side (1)

butchcassidy1717 (1129219) | about 9 months ago | (#46063179)

While I'm not familiar with exactly how the wayback I would assume it caches a copy and serves it up so the error code could be handled on the server side. It may not be how the author proposed, but it seems sensible for archival purposes.

Re:No. 404 is important! (3, Informative)

amorsen (7485) | about 9 months ago | (#46061539)

The only way this can be implemented without causing problems for others is to have it be an option in the browser for those who want it to do the additonal lookup.

That is the proposal. The browser does it. The web server still returns 404, so your code does not have to work around anything. This is not the NXDOMAIN redirection fiasco.

Re:No. 404 is important! (1)

thunderclap (972782) | about 9 months ago | (#46063483)

Sorry but that violates the standard as well. It must return a 404 or you break testing.

And if someone has a copy of that page, guess what, its not a 404 because its found. So they DON"T BREAK TESTING BECAUSE ITS FOUND. No one said, hey lets get rid of all 404s and on the pages that are actually dead send to goatse or the internet archive.

mod up (1)

schneidafunk (795759) | about 9 months ago | (#46061227)

Absolutely. That's a great solution.

Re:No. 404 is important! (4, Interesting)

SunTzuWarmaster (930093) | about 9 months ago | (#46062021)

So let's say that my company has three lines of products on three different webpages. We decide to discontinue two of the lines of products for being unprofitable, and remove the pages. Google search results still show the pages, and archive.org still shows them to users. These products are still shown to my potential customers, who experience frustration when they attempt to get them.

Alternately, I create a temporary webpage for displaying some demo content to a potential client. It is a demo page, and ridden with bugs, holes, and other areas that need improvement. Archive.org still shows this page as part of search results? What will potential clients think of my company, given that it put up a buggy/terrible page?

Alternately, let's just say that I rename a longstanding webpage (technology.slashdot.org to tech.slashdot.org) and delete the old URL. Should archive.org redirect to false content?

Or, let's say that my restaurant decides to take down its 2013menu.html page, and doesn't wish customers to be able to compare its new and old menu side by side to see where prices inflated.

Error messages have purpose. While the most common case is that the page/server went offline, there are many times where a page URL changes as a result of regular website updates, where you don't want users to obtain old content.

Sometimes things are deleted for a reason.

Re:No. 404 is important! (1)

rubycodez (864176) | about 9 months ago | (#46062309)

so you use the robots.txt to keep internet archive and any other respectable crawlers out

Re:No. 404 is important! (1)

Oligonicella (659917) | about 9 months ago | (#46062421)

This does not give the site owner control, only a voice to plead with. It's a suggestion, not a door, stopping no one who wishes to ignore it.

The type of browser that would show old pages in the first place should be looked on as more likely to break robots.txt a well. As pointed out in other posts there are perfectly valid reasons to want a page gone, not archived. This precludes that choice.

Re:No. 404 is important! (1)

mcgrew (92797) | about 9 months ago | (#46065953)

First, archive.org won't keep anything you don't want them to -- DMCA. Second, you must have never been to archive.org; there's a banner at the top telling you it's archive.org and from the banner you can access earlier versions of the file.

Re:No. 404 is important! (1)

rubycodez (864176) | about 9 months ago | (#46069673)

the major search sites respect it, it's good enough

Redirect, don't 404. (3, Insightful)

pavon (30274) | about 9 months ago | (#46063037)

None of those examples should result in a broken link if you are maintaining your website correctly. And this feature is only "fixing" broken links; that is links that once existed and are now 404'ed.

If you want to discontinue a product, then replace those pages with one that explains that the product is discontinued, and provides links to simular current products, as well as the support page for the discontinued product. If a users is clicking on links in reviews or forum posts about your old product and receive 404's, or redirection to a completely unrelated and unhelpfull page on your site, they will be frustrated with or without this feature.

In the second case, just redirect the entire demo website URL tree to a current list of examples.

In the third case, you shouldn't do that without redirecting the old url to the new one. Seriously, are you trying to make your content hard to find?

Again, redirect to the new menu.

In no case is sending a user a 404 useful or benificial, nor is it the most appropriate thing to do according to the HTTP standard. If you really want to be pendantic then send a 301 or 303 to perform the redirect, otherwise use URL rewriting, or just change the contents of the existing URL, whichever is easiest. The user should only see a 404 if they clicked an invalid link that was never a real URL for your website. Otherwise, you have failed your users, and it's no-one's fault but your own if they choose to use a service that tries to make up for your short-commings.

Re:Redirect, don't 404. (0)

Anonymous Coward | about 9 months ago | (#46064549)

> Again, redirect to the new

+100

Wish I had mod points right now.
This is how the web is supposed to work.

Web 2.0 and people still do not understand Web 1.0 and intentionally break links.

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46066469)

But more often than not, content disappears from the web because some company goes broke or goes premium (same effect) or because of IP or some silly law. In Germany public broadcasters are legally forced to "depublish" all content after a short time because otherwise they would be deemed unfair competitors to private companies. So in effect they're writing and filming for the memory hole. To circumvent this stupidity is a worthy cause.

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46066473)

With your robots.txt (or its absence), you've told Google etc. that the content is indexable. So don't complain.

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46062059)

The banner will end up having 0 pixel dimensions for many reasons that one will find a way to rationalize. So no humans can ever observe anything going wrong.

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46063461)

Nah, I say 404 the 404. If you are looking for content, the worst thing is a 404. So why not have the actual content there. Just date stamp it. And beside it a tombstone logo. 'This site is dead'

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46061127)

404 - the entity you were trying to deliver your message to was not found. /sarcasm

I agree with you. Leave the 404 handling up to the website that is responsible for the link.

Re:No. 404 is important! (5, Informative)

bill_mcgonigle (4333) | about 9 months ago | (#46061313)

Chillax, dude, it's simply a matter of implementation and preferences.

While archive.org might think this is a new idea, I've been using Errorzilla mod [jaybaldwin.com] for the good part of a decade. When a 404 is encountered, you get the regular error page, and then it adds some buttons that let you look at the Google cache, Coral cache, Wayback archive, etc.

Quite useful and non-harmful.

Re:No. 404 is important! (0)

mrchaotica (681592) | about 9 months ago | (#46061435)

Well if you want to use a browser extension to do something interesting when you get the error, that's perfectly fine. The problem is that if this idea were implemented, it would break things like Errorzilla and nobody would have a choice about it anymore!

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46062093)

...........

Quite useful and non-harmful.

It is a user's personal judgment and so its a user's preference explicitly pick a cache. In other words, it is a browser or a plugin's job to show the user other options upon getting a 404, and not the server's.

Re:No. 404 is important! (1)

Janek Kozicki (722688) | about 9 months ago | (#46062187)

wow, thanks. I need to check it out (replying only to "mark" your post in my comment history ;)

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46064091)

Me too!

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46064901)

Except the Archive has actually been doing this for 15 years.

Re:No. 404 is important! (0)

Anonymous Coward | about 9 months ago | (#46063449)

To everyone who might think of subverting the HTTP standard to "helpfully" show me an alternative to a page that does not exist: fuck you.

I don't give a shit whether you're doing it because you want to advertise to me or because you want to altruistically show me what I'm looking for even if it doesn't exist anymore. Either way, you are still lying to me and breaking everything that relies on accurate error reporting. So quit it!

I don't know what internet you are surfing but accurate error reported ended the day Eternal September began.

Useful if (1)

diorcc (644903) | about 9 months ago | (#46061025)

It's made apparent that this is an out-of-date page pulled out of the archives. In the case of information recency not being pertinent it's useful - otherwise, misleading.

Solving one problem and creating 3 more. (2)

jellomizer (103300) | about 9 months ago | (#46061039)

If there is a dead link, there is usually a reason why it went dead.
Sure we get the odd server down. But we also have cases where we have a deliberate take down of information, due to legal, or personal reasons.
Heck they just may no longer be in business anymore, and doesn't want people to think they are.

Also the Last Page, may not be a good page to point to, as it may have been a victim of an attack and have harmful information on it.
404 means the page is dead, we should deal with that. Also, there are some web services that use the http error messages to send information across, having the browser say otherwise can prevent debugging.
Also it can create lazy companies, why bother hosting your stuff, when you got someone else to do it for you, and you just have it up for some time and take it off. No more hosting for you.

If it's gone use 410, not 404 (1)

John Bokma (834313) | about 9 months ago | (#46062063)

Sure we get the odd server down. But we also have cases where we have a deliberate take down of information, due to legal, or personal reasons.

If it's gone use 410, not 404, see: http://www.w3.org/Protocols/rf... [w3.org]

Re:If it's gone use 410, not 404 (1)

Oligonicella (659917) | about 9 months ago | (#46062521)

People setting up and deleting web pages are typically not the ones controlling the server. Your suggesting that every time someone deletes a hosted page, they request the provider serve a 410 for that page.

Not gonna happen. Either the request or - presuming the slim chance the request *is* made - the serving of the 410.

I have my own domains and I don't even do it because I make a lot of trash files for test purposes and that's an incredible pain in the ass for pages no one is supposed to see but me anyway and *despite* having a robots.txt, i have found them in search engines and archives. So archives are basically worthless as historic collections of functioning pages.

Re:If it's gone use 410, not 404 (0)

Anonymous Coward | about 9 months ago | (#46063541)

So basically you are griping about the archive breaking a protocol you abuse because you are too lazy to use the correct protocol? As for the trash files, he you ever considered not hosting on a live website...hmmm? If you set up a server that feeds only to you, then they wouldn't be collected. Apparently you are too lazy to do that too.

Re:If it's gone use 410, not 404 (1)

John Bokma (834313) | about 9 months ago | (#46064029)

.htaccess

As for trash files for test, you could keep them in a single directory and make a rule that if the file doesn't exist to return a 410.

Re:Solving one problem and creating 3 more. (1)

mcgrew (92797) | about 9 months ago | (#46065973)

Heck they just may no longer be in business anymore, and doesn't want people to think they are.

Have you never used the wayback machine? They leave no doubt where you are.

But we also have cases where we have a deliberate take down of information, due to legal, or personal reasons.

They won't archive anything you don't want archived.

Also the Last Page, may not be a good page to point to, as it may have been a victim of an attack and have harmful information on it.

Archive.org doesn't host malware.

Broken link fixery could be good for education (2)

GoodNewsJimDotCom (2244874) | about 9 months ago | (#46061101)

Right now the Internet is an excellent place to get an education... if you're an active learner.

Someday spoonfed education will be there with these new "universities" online compiling information and lessons

Right now if you wanted to, you could write a webpage that links to a zillion different small lessons that would build into one real lesson to get you day to day on subjects from K-12-College. The reason I never wrote this "index of lessons:virtual textbook of interactive material" was because of link rot.

I could spend several months compiling up a "virtual textbook of interactive material", but link rot would destroy it over time.

I just assumed it wouldn't be worth my time because I wasn't certain if I could out index the link rot. Now if link rot is fixed with the Internet Archive, someone could sit down and link all these links, adding in a time too. This way you'd have a URL with time/date data. So if the link ever changes into something that is not the lesson you wanted, like a new blog entry, or even a shock pic, the old time/data data would indicate to the Internet Archive to do that.

So I think what they're doing is a good idea if I know one application I'd personally use it on. I'm sure there'd be others.

Cool! I can stop paying my hosting provider! (3, Insightful)

barlevg (2111272) | about 9 months ago | (#46061175)

While I honestly think this is an awesome idea, I wonder, if this takes off, whether anyone who currently pays for web hosting of a static site will decide, "fuck it--it's backed up on Internet Archive. Might as well save the $N a month I pay to maintain the website and lease the domain name."

It's being done without paying to begin with (0)

Anonymous Coward | about 9 months ago | (#46062143)

Wonder no more, temporary code testing pages [archive.org] can be archived permanently, using the wayback machine kind of like version control.

Re:Cool! I can stop paying my hosting provider! (1)

tlhIngan (30335) | about 9 months ago | (#46062315)

While I honestly think this is an awesome idea, I wonder, if this takes off, whether anyone who currently pays for web hosting of a static site will decide, "fuck it--it's backed up on Internet Archive. Might as well save the $N a month I pay to maintain the website and lease the domain name."

Until some domain name spammer goes and hijacks the name and puts up a generic redirect for all URLs back to the home page full of ads.

This thing gets rid of 404s. It doesn't help if the 404 is replaced with a valid, but different, page.

Better yet, some evil genius might decide to re-post the missing content altered in some way...

Re:Cool! I can stop paying my hosting provider! (1)

mcgrew (92797) | about 9 months ago | (#46065993)

You could do that as long as you intend to never change it. My old sites are there, although they're not complete. But who hangs on to a completely static page?

Ugly as sin... (2)

gr4nf (1348501) | about 9 months ago | (#46061301)

The idea's in the right place but I'd hesitate to let anybody with so ugly and poorly maintained a web presence as archive.org into the inner workings of my browser. Seriously, guys... get it together.

Historical Archives? (1)

krelvin (771644) | about 9 months ago | (#46061675)

I have a number of historical archives to provide reference to what was being said and done back in the past rather than having that info disappear forever. they are not really web pages but rather web access to mailing lists etc...

I get requests to update the sites in them all the time, but have them set as read-only so keep the context under which they were written. So some of the web links included don't go anywhere any more but the main body of the text is valid for that period of time.

Temporary 404 (0)

Anonymous Coward | about 9 months ago | (#46061699)

What about all those links that are 404 today, but may be correct tomorrow?!

I suggest keeping the original links. And *add* new suggested mirror link(s).

I like! (Better than OpenDNS) (2)

rueger (210566) | about 9 months ago | (#46061733)

We use a service to fake Netflix into believing that our TV is the US and not Canada. Many Canadians do this.

However, the service that we use replaces our ISP's DNS with OpenDNS.

Instead of presenting nicely a formed 404 message, with the offending URL in the location bar, OpenDNS offers up a useless message:

"Oops! www.bvyhuigyi.com is unavailable. Please check domain for spelling errors and try again."

And replaces the URL that you had entered with www.website-unavailable.com

In practical terms, it means that if you mistype a URL you can't just go "oops" and fix the one charter that was in error - you need to retype the whole damned address.

I'm sure that someone at OpenDNS could argue for this being a "feature," but I'd call it a bug.

I really wish it was possible (or at least easy) to turn off this thing and just get a regular 404 message. And yeah, having the option of clicking through to an archived version of page would be good.

ah ... URLs as SMTP status messages (2)

oneiros27 (46144) | about 9 months ago | (#46061821)

Just today, I sent some mail and got :

----- The following addresses had permanent fatal errors -----
<[CENSORED]@aol.com>
      (reason: 521 5.2.1 : (CON:B1) http://postmaster.info.aol.com... [aol.com] )
<[CENSORED]@aol.com>
      (reason: 521 5.2.1 : (CON:B1) http://postmaster.info.aol.com... [aol.com] )

Too bad AOL seems to have taken those URLs down. A quick hop to archive.org told me that my ISP's been blocked for sending spam ... oh, joy.

Re:ah ... URLs as SMTP status messages (1)

rubycodez (864176) | about 9 months ago | (#46062319)

see, your problem is you have friends / coworkers / clients that are still on AOL in the 21st century. they should be dead to you.

what internet archive needs (1)

rubycodez (864176) | about 9 months ago | (#46062375)

they need a search engine (witth optional date ranges). then they'd have something

Re:what internet archive needs (1)

mcgrew (92797) | about 9 months ago | (#46066017)

Re:what internet archive needs (1)

rubycodez (864176) | about 9 months ago | (#46069661)

no! try it. that is NOT a search engine on their page! that just searches for site names

and using google with site:archive does not search their snapshots of web sites over the years, nor is there a way to search with date ranges.

the thing of which I speak does not exist, and they should build it.

FP SHIt (-1)

Anonymous Coward | about 9 months ago | (#46062445)

some in7ell1gent [goat.cx]

I Have Experience in Internet Archaeology (3, Interesting)

IonOtter (629215) | about 9 months ago | (#46062857)

There was a fascinating website dedicated to high-energy weapons and experiments, called svbxlabs.com

It was run by a young man who'd been born in the US to Ukranian immigrants, which is actually important to keep in mind. He was brilliant, at least in my eyes, putting together the most incredible devices. HERF cannons, railguns, Tesla coils; you name it. He was the first to explain what the OptiCom traffic Light Changer [fleetsafety.com] was, and how it worked.

In short, he was doing a lot of work on things a LOT of people would much rather he didn't. Things were zipping along nicely, and his college professor was very excited to see what he came up with next.

Then 9/11 happened. Within four months, the site was gone. And Slava Person vanished from the Internet not long after that. Other people took up the mantle of his work, such as powerlabs.org, but it's not as good as Mr. Slava's work had been.

But if you put svbxlabs.com into WBM/A.O, you can find most of what he did. Also, one of the problems of WBM/A.O is that you can't just click on the links. Sometimes you have to copy them, then enter them into the WBM window, otherwise your browser tries to go to the direct link. Which no longer exists.

I've also used it to find all kinds of fan fiction, role-playing games, artwork and more.

I approve of this.

Re:I Have Experience in Internet Archaeology (0)

Anonymous Coward | about 9 months ago | (#46065527)

I would suggest always saving an article or webpage of interest. Why rely on Archive.org? It could be gone as soon as someone gets the domain and puts a robots.txt there. I disapprove of that sort of thing Archive.org does when a new owner puts a robots.txt file in the domain.

Isn't there a trick where you can save a webpage in base64 as a URL or something? Couldn't that be done on wikipedia citation links?

/. effect (1)

confused one (671304) | about 9 months ago | (#46063577)

Can they handle the traffic from all the redirected links?

firefox add-on: resurrect-pages (0)

Anonymous Coward | about 9 months ago | (#46065613)

Firefox have a add-on called resurrect-pages [github.com] , that when you got a missing page/site, you are also show a table with with possible ways to see the page, using google cache, bing, internet archive, etc

Not fast enough to be useful (1)

drinkypoo (153816) | about 9 months ago | (#46065733)

There is no way I'm sending visitors to my site to the internet archive without careful forethought. It's just too slow. They won't thank me for it. I'll send them there for a particular media download, but not on a lark.

Favorites Shortcut can do the same thing. (0)

Anonymous Coward | about 9 months ago | (#46066389)

An old coworker of mine gave me a URL once and told me to save it under my favorites. If I hit a 404 page all I had to do was hit his URL next and the archive.org page for my 404'd page would appear.

It was just a very short URL. I managed to lose it. (I am not a smart man.) I have missed it ever since. Robert, if you're reading this, may I have it again please? :P

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?