Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Why Power Failures Can Always Lead To Data Loss

timothy posted more than 6 years ago | from the when-velcro-snags-shoelaces dept.

Data Storage 456

bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."

cancel ×

456 comments

Sorry! There are no comments related to the filter you selected.

Well no shit, Sherlock (5, Insightful)

Skyshadow (508) | more than 6 years ago | (#24306965)

Power losses can cause data loss? Gee, you mean that my system that relies on electricity for everything it does can be adversely effected by power outages even if I take precautions? That's some good admin work there, Lou -- if only there was some sort of law that covered the tendency of things that can go wrong to go wrong...

Next week: Fires can make things warm, floods can make things wet.

Re:Well no shit, Sherlock (5, Funny)

Anonymous Coward | more than 6 years ago | (#24307031)

I don't know about you, but my servers run on the power of cotton candy and happy thoughts.

Re:Well no shit, Sherlock (5, Funny)

Skyshadow (508) | more than 6 years ago | (#24307069)

I don't know about you, but my servers run on the power of cotton candy and happy thoughts.

As a former sysadmin, I would think that any machine reliant on 'happy thoughts' would be the most crash-prone system in the history of computing.

Re:Well no shit, Sherlock (5, Funny)

Anonymous Coward | more than 6 years ago | (#24307523)

I can offer you a Happy Thought UPS. It's a box of puppies. Be careful though, it only has 500 puppy Amps of capacity.

It can be done! (3, Funny)

GameboyRMH (1153867) | more than 6 years ago | (#24307617)

...If you're a Mac fanboy running a network of Apple computers. If anything goes wrong, it's an artistic expression and anyone who criticizes the problem is a closed-minded square who "doesn't get it." Then you sit back in self satisfaction listening to alternative pop, thinking about how hip and different and enlightened you are.

Happy thoughts power supply: Dead stable.

Linux networks can run on happy thoughts as well as long as you run on electricity during the setup and installation stages and then switch to happy thoughts once everything's running properly...you just have to make sure you never, ever run emacs, vi, or Gpaint.

Re:Well no shit, Sherlock (4, Funny)

ArsonSmith (13997) | more than 6 years ago | (#24307725)

Except the server that runs http://youporn.com/ [youporn.com]

Re:Well no shit, Sherlock (4, Funny)

NFN_NLN (633283) | more than 6 years ago | (#24307291)

My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.

I like this setup, but please. Tell me more about this cotton candy technology? Is it superior.

Re:Well no shit, Sherlock (3, Insightful)

MightyMartian (840721) | more than 6 years ago | (#24307443)

My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.

That is until the 10,000 volt spike when the power company improperly brings the grid back up bakes the RAM, the battery, RAID controller and the hard drives.

Re:Well no shit, Sherlock (1)

linuxpyro (680927) | more than 6 years ago | (#24307497)

My servers run on Electricity

My servers run on Love.

Re:Well no shit, Sherlock (5, Funny)

Anonymous Coward | more than 6 years ago | (#24307561)

Your mom loves you and pays for the electricity. That doesn't mean that your servers run on love.

Mine run on evil thoughts and hatred (1)

Joce640k (829181) | more than 6 years ago | (#24307493)

And I bet they has a longer uptime than yours....

Re:Well no shit, Sherlock (1)

dreamchaser (49529) | more than 6 years ago | (#24307041)

Yes, where is the 'Duh' tag when you really need it? Or maybe slownewsday...

Re:Well no shit, Sherlock (5, Informative)

Anonymous Coward | more than 6 years ago | (#24307091)

Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.

Re:Well no shit, Sherlock (2, Funny)

Midnight Thunder (17205) | more than 6 years ago | (#24307157)

if only there was some sort of law that covered the tendency of things that can go wrong to go wrong.

I hear Murphy might have one :)

Re:Well no shit, Sherlock (5, Funny)

Timothy Brownawell (627747) | more than 6 years ago | (#24307163)

No, it really does have some interesting observations, with some very scary implications:

One of the first things that will happen, is that the memory DIMMs will no longer be refreshed properly (DRAM needs to be refreshed constantly otherwise it will loose it's data) and very rapidly, the memory will contain only garbage. The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.

However, we've recently seen that RAM holds state well enough to preserve crypto keys thru a power cycle [hackaday.com] . This has very scary implications: the RAM knows what's happening, and behaves differently (loses data immediately on power-off or remembers it for several seconds) in order to cause the most difficulty for the owner of the machine.

Not only are computer components intelligent and self-aware, they're also out to get us!

no, that's not the scary thing (2, Funny)

ScentCone (795499) | more than 6 years ago | (#24307451)

The scary thing is that yet one more person can't feakin' tell the difference between "loose" and "lose." It's becoming an epidemic.

Re:no, that's not the scary thing (1)

clone53421 (1310749) | more than 6 years ago | (#24307587)

I've searched through the parent, the grandparent, the summary, and even TFA (gag)... the only misuse I can find is this statement in the article:

DRAM needs to be refreshed constantly otherwise it will loose it's data

Is this what you were referring to? All the other uses of lose, loses, and loss have been correct...

Re:no, that's not the scary thing (1)

ScentCone (795499) | more than 6 years ago | (#24307659)

Nope, that's it. I'm just pitching a fit over that thing, right there. But it's probably the 10th time I've seen it today, in print, in e-mails, and in numerous discussion threads. Just gets under my skin, since it suggests that, as usual, people don't actually think about what they're typing. It's not just a typo - it's right up there with people who say "I could care less about..." when they actually mean the exact opposite. It's just laziness and thoughtless communication, that's all.

Re:no, that's not the scary thing (0)

Anonymous Coward | more than 6 years ago | (#24307651)

Oh, loosen up.

Re:Well no shit, Sherlock (1)

yukk (638002) | more than 6 years ago | (#24307685)

Well known fact. The Reg has been carefully tracking [theregister.co.uk] this phenomenon for quite a while now.

Re:Well no shit, Sherlock (2, Funny)

ArsonSmith (13997) | more than 6 years ago | (#24307463)

We just need to get that guy that declared Pluto is no longer a planet to declare that electricity no longer causes data loss.

Side note: He also declared that north is no longer a direction, blue is no longer a color, and your sister is no longer a virgin.

Re:Well no shit, Sherlock (1)

sootman (158191) | more than 6 years ago | (#24307611)

If ever there was an article that deserved to be tagged 'duh,' this is it. And even so, it even managed to skip over two key points--even if you could perfectly restore a system and not lose a byte of data, unexpectedly cold-rebooting a server 1) is downtime and 2) restoring is a pain in the ass. Sometimes less painful, sometimes more painful, but a pain nonetheless. UPSs are very cheap insurance.

Re:Well no shit, Sherlock (1)

dkeisling (1328667) | more than 6 years ago | (#24307741)

I've shipped my data with UPS many times and they've never lost anything.

Re:Well no shit, Sherlock (1)

alta (1263) | more than 6 years ago | (#24307773)

I, once again, ask for the ability for us to be able to Mod stories in addition to comments.

Any story that reaches -1 (as this would have nearly instantly) will come off the front page.

Not me! (0)

Anonymous Coward | more than 6 years ago | (#24306997)

What if your data's on the cloud?

First!

Re:Not me! (4, Funny)

sm62704 (957197) | more than 6 years ago | (#24307311)

If there's clouds in your server room, your server's probably been slashdotted and is on fire!

Illiteracy (5, Funny)

carou (88501) | more than 6 years ago | (#24307005)

From TFA:

(DRAM needs to be refreshed constantly otherwise it will loose it's data)

Fly, little data! Be free!

Re:Illiteracy (0)

Anonymous Coward | more than 6 years ago | (#24307169)

Thank you!

To the author of TFA, it's "lose," dammit, not "loose"! Oh, and the previous sentence shows how to properly use "it's".

Re:Illiteracy (2, Funny)

Ngarrang (1023425) | more than 6 years ago | (#24307437)

Get off my lawn, you little bits!

Re:Illiteracy (0)

Anonymous Coward | more than 6 years ago | (#24307713)

Loose data... do they mean porn?

can always lead to data loss? (5, Funny)

internerdj (1319281) | more than 6 years ago | (#24307009)

Definitely maybe?

Re:can always lead to data loss? (1)

jpellino (202698) | more than 6 years ago | (#24307631)

No, it's "maybe definitely."
Sloppy title either way.
Perhaps a better headline might be "Pray to God but row towards shore."

UPS - more than just a backup. (4, Informative)

Zebadias (861722) | more than 6 years ago | (#24307011)

UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

UPS is more than just saving your data.

Re:UPS - more than just a backup. (4, Informative)

linuxpyro (680927) | more than 6 years ago | (#24307173)

It's also important to get a decent UPS too, if you're using it for something like a server. I think the cheapy ones basically just use a transfer relay, where as the higher end ones actually run the hardware off of the battery via the inverter all the time. While I would think that with the former (called "standby" UPSs maybe?) the transfer time wouldn't be enough to cause too many problems, you still don't have the buffer that you'd get with a true uninterruptible power supply.

I think a lot of the cheaper ones don't put out a true sine wave either, though for their intended purpose of letting you shutdown your desktop cleanly again they're probably fine.

Re:UPS - more than just a backup. (4, Informative)

SuperQ (431) | more than 6 years ago | (#24307709)

Yup the 3 major types of battery UPSs I know of:

Offline - Relay or simple failover. (APC Backups)

Line Interactive - Can correct line over/under voltage to a point (APC Smartups)

Online - Full AC -> DC -> AC conversion. (APC Symetra, Liebert, anything that doesn't suck)

Basically outside of home use you want an online type UPS.

There are other systems like motor/generator flywheel types, but they need a very fast backup generator to sustain anything more than 30 seconds of outage. But they're great for smoothing out some types of line issues.

Re:UPS - more than just a backup. (2, Informative)

Anonymous Coward | more than 6 years ago | (#24307335)

>UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

A true UPS smooths out the spikes. Most of today's UPSes (at least consumer models) are off-line [wikipedia.org] supplies. The batteries don't kick in unless the power is out. Worse than that, the cheap ones don't output sine waves, they output square waves. These UPSes also take some time to switch to batteries, leaving your computer without power for that time.

Now, some of those UPSes have filtering technology like you find in expensive powerbars, sure. But it isn't the same as an always-on UPS at all.

Re:UPS - more than just a backup. (1)

Sandbags (964742) | more than 6 years ago | (#24307529)

Exactly. Even more important than simple power backup is AVR. The microfractures that can be created in system chips by voltage as little as +/- 10 volts, over time, will cause systems to malfunction.

I can't find the article, but at one point as a reseller for APC (maybe it was Leibert), their marketing material used to state that 95% of all system component failure was due to voltage irregularity, and that properly filtered line voltage could extend the life of electronics 3 fold.

You should not only have your systems and servers on UPS, but at this point, all your Home theater equipment too, TVs and all. Battery backup is not so important for these other devices, but AVR and line conditioning is.

He's right about drives continuing to write bad data. Sure, you can restore from backup, but you still loose what was created since that point. Critical databases can be replicated and mirrored in real time, and come close to preventing 100% of data loss, but not so for most other system uses, and realtime record level syncing is out of the budget for most companies.

Beyond UPS, not a bad idea to have an on-site generator too...

Duh! (4, Insightful)

mlwmohawk (801821) | more than 6 years ago | (#24307029)

I remember a discussion on the PostgreSQL hacker's list about recoverability and transaction logs.

You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.

There are too many variables and too much randomness on a cold hard power failure. You absolutely need a UPS that gives you time to shut down cleanly.

Re:Duh! (3, Insightful)

sm62704 (957197) | more than 6 years ago | (#24307433)

You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.

Gotta love these slashvertisements, I wonder whose UPSes they're pimping? Its not like we don't all know you need a UPS. What's next, a FA about how you need fire insurance?

Re:Duh! (0)

Anonymous Coward | more than 6 years ago | (#24307689)

You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.

Redundant power supplies in 90% of modern servers. RAID in 99% and offsite backups performed by any good admin. I fail to see any point in your comment.

Re:Duh! (1)

smbarbour (893880) | more than 6 years ago | (#24307779)

Power supply failure - Redundant power supplies
Hard Drive failure - RAID array (Though losing multiple drives at the same time... Very bad, unless you are using an "exotic" RAID level such as RAID 6 or 5+1 (Striped set with double distributed parity and mirrored striped set with distributed parity, respectively))

Building burns down - You have bigger problems than just losing some data.

Re:Duh! (1)

GXTi (635121) | more than 6 years ago | (#24307595)

The goal of transactions is to make the window of data loss on the database's side infinitesimally small. PostgreSQL's default configuration will not tell you it has committed a transaction until it can guarantee that nothing short of lying hardware will cause that transaction to be lost. Hence, it's up to the software to handle errors (like the database sever disappearing) by informing the user that their action failed, or to put the work aside until the database comes back. That said, the probability of losing data is very very small but still not zero (primarily due to lying hardware like RAID controllers), so you don't want to take any chances. Unless you're running MySQL, a few minutes is all you need to cleanly stop the database and shut down.

Of course (1)

Naurgrim (516378) | more than 6 years ago | (#24307063)

As my nieces would say, Durrrr! Yes, of course - you need a UPS. Next question please.

So, big HD writes (1)

Trigun (685027) | more than 6 years ago | (#24307077)

into a huge cache on the drive don't get written permanently if the power quits? Why didn't somebody tell me about this before?

Silly Me! (0)

Anonymous Coward | more than 6 years ago | (#24307079)

I always thought Gremlins caused data loss.

Since when did power have anything to do with it?

Well of course you need UPSs, but (5, Informative)

pembo13 (770295) | more than 6 years ago | (#24307103)

APC is the only UPS maker on the market that has at least spent some small effort so that their UPSs can be properly integrated with a Linux machine. I made the mistake of purchasing an Ultra UPS as it was cheaper than the APC.

Re:Well of course you need UPSs, but (0)

Anonymous Coward | more than 6 years ago | (#24307261)

You only need a UPS if the chance of a power failure is bigger than the chance of your UPS failing. The decision isn't always as clear cut as in the US.

Re:Well of course you need UPSs, but (1)

pembo13 (770295) | more than 6 years ago | (#24307307)

Well I am studying in the US, and there have been quite a few power surges that have been subdued by the UPS.

Re:Well of course you need UPSs, but (0)

Anonymous Coward | more than 6 years ago | (#24307327)

I dunno about that. Just about 4:30 every morning, the power flickers for about half a second where I live. If it weren't for the ups, every morning my server would be rebooting.

Re:Well of course you need UPSs, but (1, Interesting)

raddan (519638) | more than 6 years ago | (#24307401)

Actually, UPS devices are useful for other kinds of things as well. Need to distribute load more evenly across your circuits? If you have the machine plugged into a UPS, you simply unplug the UPS and plug it into the other circuit. Heck, you could even do something really dumb like physically move the machine while it's running if you had it connected to a UPS.

Re:Well of course you need UPSs, but (2, Interesting)

bruceg (14365) | more than 6 years ago | (#24307537)

been there, and done that! We recently moved a few servers this way. Just be careful, and go slow.

Re:Well of course you need UPSs, but (0)

Anonymous Coward | more than 6 years ago | (#24307609)

The best (indeed only) real use my cheapo UPC ever got was in the big blackout of a couple of years ago that took down power in Ontario and some northeastern states for a couple of days - we plugged a clock radio into it and were able to keep up to date on the news (we had a battery powered radio, but loaned it to our neighbours who didn't).

Re:Well of course you need UPSs, but (0)

Anonymous Coward | more than 6 years ago | (#24307603)

MGE Systems supports the NUT project which supports a number of different UPSs. They now own APC, but didn't when they started supporting NUT.
Since buying APC they don't seem to be selling MGE branded units in the US, but I bought a couple of Nova 1100s a while back and they seem to work OK.

What this really points out... (2, Insightful)

JesseL (107722) | more than 6 years ago | (#24307121)

is a weak spot in the design of most computers.

Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data, and operating systems and critical apps should be able to handle an emergency shutdown and save critical data in very short order.

This is old hat in embedded systems.

Re:What this really points out... (4, Informative)

mlwmohawk (801821) | more than 6 years ago | (#24307219)

Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data

Here's a question for you: Calculate the size of the capacitor needed that can hold enough power to run a 200W load for 5 minutes and maintain a voltage level within a specific usable range.

Hint: its BIG. batteries are more space efficient, but the chemicals and outgassing make them inappropriate for location INSIDE the computer box.

Re:What this really points out... (4, Insightful)

JesseL (107722) | more than 6 years ago | (#24307319)

Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.

Re:What this really points out... (1)

mlwmohawk (801821) | more than 6 years ago | (#24307485)

Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.

I'm not sure what your system is, but for this to be a general purpose device, it needs to work within the realm of real life systems. Have you ever typed "sync" on a busy system and had it go away for a minute or more?

5 minutes is a "safe" number. It takes time to detect a power failure more than a mere "spike." You don't want to start a shut down and suddenly have the power come back on. How would you know to restart the system?

If this were a real product, it would need a hell of a lot more than just a big capacitor.

Re:What this really points out... (3, Insightful)

Locklin (1074657) | more than 6 years ago | (#24307355)

Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

####@johncash:~$ time sync

real 0m0.004s
user 0m0.004s
sys 0m0.000s

Re:What this really points out... (1)

mlwmohawk (801821) | more than 6 years ago | (#24307503)

A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

As I said in another post, it is very much more complicated than just a few seconds.

Re:What this really points out... (3, Informative)

Firehed (942385) | more than 6 years ago | (#24307705)

Other than the lack of communication at present between the PSU and the rest of the system (on a hardware and software level), what you're describing really seems to be the computer equivalent of throwing your hands in front of your nuts as you spot the incoming baseball. It helps the immediate problem of data (or testicle) loss, but it's really just a small amount of damage control.

This is why a proper UPS that can trigger a full system shutdown once you hit a certain power remaining threshold is far preferable. Granted I'd rather have a controlled crash than the risky nonsense that would come from the power cord being yanked, but (right now) computers can only go so far to help themselves in a couple-second window.

Re:What this really points out... (5, Interesting)

Macman408 (1308925) | more than 6 years ago | (#24307321)

This is old hat in embedded systems.

Yes, but embedded systems usually have lower power requirements, or at the very least, a smaller range of power requirements. You can't add 3 PCIe cards, a few extra drives, and a few more GB of RAM to most embedded systems.

I worked on the design of an embedded system a few years ago that had a holdup spec - I think it was supposed to survive for 50 ms with no power. So a 50 ms power interruption would result in continued operation, while an outage longer than that was allowed to reset the board. However, the power draw on the board was around 200 Watts; being able to supply that much power for that long in a fairly compact form factor was a huge hurdle. It also caused airflow problems, because the giant capacitors would prevent air from getting to other components on the board, like the CPU. In the next version of the spec, I believe the holdup requirement was eliminated - apparently we weren't the only ones having trouble meeting that requirement.

Re:What this really points out... (0)

Anonymous Coward | more than 6 years ago | (#24307399)

If only there were some kind of...battery-powered...device that might perform this function, allowing machines to run for a time after electric mains failed. Hmm...

Our Tandem (5, Interesting)

PIPBoy3000 (619296) | more than 6 years ago | (#24307541)

This reminds me of my favorite power loss story. The facility was doing a generator test, where we were supposed to switch over from city power to the generator. Unfortunately it didn't happen smoothly and the UPS kicked in. Sadly it turned out that so many servers had been added since the original design, the UPS was really only good for fifteen minutes or so. The final problem was that our operator didn't notice the issue quickly enough and so the next thing everyone in IT knew is that our main data center just lost power.

We spent most of the day getting our servers back up from various states of disrepair (confirming the article, power loss is superbad). It turns out that our main medical software ran on a Tandem. Though the drives and such lost power, the CPU had a backup of D-batteries and survived the power loss just fine. Needless to say, we stopped making fun of their seemingly primitive emergency backup power.

Re:What this really points out... (2, Interesting)

natoochtoniket (763630) | more than 6 years ago | (#24307599)

The problem is that different applications systems have different amounts of stats that must be saved. An RT app usually only has a memory buffer that can be written in a small number of IO's. Many business apps have relatively lots of data, in non-contiguous buffers, that require hundreds of IOs to store. Many business systems have hundreds of such apps running in the machine at the same time. Some systems can have gigs of data, in thousands of buffers, in their write-behind cache. And, some businesses have systems that must not shut down, except for actual emergencies like fire or flood.

How does the hardware designer of a general-purpose computer guess what kinds of apps will run in that machine? He/she cannot.

The external power supply (aka, the UPS) can be configured to accommodate the needs of the application. An application that needs lots of power for a long time can be configured with a big UPS. And, an app that doesn't need it, doesn't have to pay for it.

Re:What this really points out... (1)

JesseL (107722) | more than 6 years ago | (#24307657)

I'm not saying that UPS are completely unnecessary, I'm saying that most computers are excessively vulnerable.

It happened to someone (4, Insightful)

Joebert (946227) | more than 6 years ago | (#24307123)

The funny part is someone had to have thought they were safe without a UPS for this to become news.

Re:It happened to someone (4, Funny)

Verteiron (224042) | more than 6 years ago | (#24307187)

Yes. My first reaction upon reading the summary was.. "Duh?" What, did they have it plugged into the wall before that? A UPS becomes MORE critical, not less, as the cost of hardware (RAID arrays are expensive) goes up.

Re:It happened to someone (1)

thermian (1267986) | more than 6 years ago | (#24307513)

Would you believe that a certain major UK university runs its entire computer science dept without either UPS or power spike protection?

I was surprised, especially since I saw how the regular power spikes blew computer after computer and nothing was done.

As for trying to run experiments that took more then a day or two to complete, well, can you picture a post grad who's just found that a weeks work has once again been wasted because some bean counter refused to pay for department wide UPS?

Alas I don't have to imagine, and I became quite well versed in the experience.

Don't for get to test people, TEST! (5, Insightful)

sco_robinso (749990) | more than 6 years ago | (#24307133)

In my company, everything is behind UPSs. Our SAN is even behind 2 separate UPSs. We thought everything was configured properly, but you'd be surprised what comes to roost when you test everything.

We recently had a test night where all we did was test the UPS system and shutdown procedures, and there was a couple gotchas. Interestingly, by default the APC powerchute app we were using defaulted to shutting down the UPS completely after the [first] server went down - not good. This was buried fairly deeply in the configuration.

Equally important to any protection measure, be it RAID, Power Protection, whatever - is testing!

Re:Don't for get to test people, TEST! (4, Interesting)

Darkk (1296127) | more than 6 years ago | (#24307419)

I 100% agree with the idea of testing under controlled conditions. The oops you guys discovered is a good thing to be caught early on. I can imagine the look on your support team's faces when the UPS suddenly turned itself off while the remaining servers still trying to perform a safe shutdown. I'm sure the secondary UPS was left running as a precaution until the test is successful.

I have seen a screw up where somebody cut into a live power cord thinking it was a tie wrap caused a major short in the PDU. The guy thought he was safe until he discovered whoever installed the servers didn't double check the power connections and loads so it created a cascade failure in several racks and lost several tons of data. Recovery took awhile.

Least to say it was not a good day.

Re:Don't for get to test people, TEST! (1)

raddan (519638) | more than 6 years ago | (#24307459)

It is also important to note that on {many | most} SAN-connected RAID enclosures, there's actually a battery backup unit that writes pending transactions to disk before the unit switches itself off due to power loss. Now, this doesn't help you when one of the SAN clients starts blatting out garbage, but assuming your clients are connected to UPS devices, that shouldn't happen.

Re:Don't for get to test people, TEST! (0)

Anonymous Coward | more than 6 years ago | (#24307491)

...2 separate UPSs...

Hah! I've got a Beowulf cluster of UPSs!

On the other hand.. (2, Interesting)

m0i (192134) | more than 6 years ago | (#24307161)

you can recover your RAM minutes after loosing power.. no kidding! http://citp.princeton.edu/memory/ [princeton.edu]

I know PHB's try to cut costs.... (1)

knarfling (735361) | more than 6 years ago | (#24307177)

I know that PHB's will try to cut costs, and that unnecessary hardware is the first to be cut, but is there ANYONE who believes that a UPS is not needed? Are there really people out there that think, "We don't need the UPS right now. We can wait until we have more money."

It boggles my mind that there is even a need for such an article

Re:I know PHB's try to cut costs.... (0)

Anonymous Coward | more than 6 years ago | (#24307693)

Uh yes, there are. Why should I bother with the expense of a UPS? I don't even back up. I migrate to new disks as needed. My subversion repository/media server runs on an old laptop, and so has a built in UPS.

That's what I always say sometimes (1)

RiffRafff (234408) | more than 6 years ago | (#24307209)

Well, duh. Thank you Captain Obvious.

Here's question for you all. I have a cheap Conext (made by APC) IPS. Yes, it's an interruptible power supply. It used to work fine, but once I added a Samsung b/w laser printer, whenever the printer's heating element first comes on, the UPS drops out immediately and the computer restarts. Even put a new battery in it; no help. The printer, btw, is NOT plugged into the UPS. The line voltage appears to get yanked down just momentarily and the computer ignores it, when off the UPS. The UPS, with nothing plugged in to it, always clicks off then back on once during the printer's warm-up cycle. Is the UPS just too small (900 AVR)?

Re:That's what I always say sometimes (1)

CastrTroy (595695) | more than 6 years ago | (#24307337)

Most UPS devices should have a test button. Try pushing the test button when your computer isn't doing anything critical to see if it really can stand up to the load. If you don't have a test button. Just yank the cord from the wall (or back of the UPS unit), If it fails, it means that you don't have enough power for the devices hooked up to it.

Re:That's what I always say sometimes (1)

RiffRafff (234408) | more than 6 years ago | (#24307771)

If I yank the UPS power cord, it goes to battery with nary a hiccup. There is something about the high current draw of the printer on the line voltage that messes with it. When the printer warms up, the UPS never goes to battery, but just loses power (or voltage level drops too low) to the computer. Very odd.

Re:That's what I always say sometimes (1)

Reziac (43301) | more than 6 years ago | (#24307343)

The docs for every UPS I've ever seen say do NOT attach a laser printer to them, for exactly the reasons you've seen -- when the printer comes on, the startup drawdown is just too much (it's generally about four computers worth).

Your printer should be on a good surge protector, but there is no reason it needs to be attached to the UPS. Some UPSs now have spare plugs for exactly this use -- they provide surge protection but not continuous power.

Re:That's what I always say sometimes (1)

timster (32400) | more than 6 years ago | (#24307543)

You should read his whole comment -- the printer is not plugged in to the UPS.

Re:That's what I always say sometimes (1)

RiffRafff (234408) | more than 6 years ago | (#24307673)

Hi. Thanks. But, um, in my post I wrote "The printer, btw, is NOT plugged into the UPS."

I don't understand how the printer could yank the line voltage down so that the UPS faults, and yet a computer plugged directly into the wall can handle it. Unless my computer's power supply buffers better than the UPS.

Maybe I'm not explaining it well enough. If I plug my computer into the UPS, and plug the printer into a different wall outlet in the room, when I print, I hear the UPS click, and then the computer resets. If I unplug the computer from the UPS, and then plug the computer into yet another wall outlet, when I print the UPS still clicks (with nothing plugged into it), but the computer is fine.

Printer is plugged into a reasonably high-end Tripp-Lite Isobar suppressor (but acts the same without it).

Re:That's what I always say sometimes (5, Interesting)

alta (1263) | more than 6 years ago | (#24307397)

Rule #1.

NEVER plug a laser printer into a UPS. The power that the fuser draws is WAY too much.

Look at some of the cheap office units, they show little pictures on them, notice the printer icon is on the surge side, NOT battery/surge side.

If the power goes out, you should NOT be trying to print.

http://articles.techrepublic.com.com/5100-10878_11-6085460.html [com.com] See #6

http://arstechnica.com/guides/other/ups.ars/3 [arstechnica.com]

http://www.jetcafe.org/npc/doc/ups-faq.html#0405 [jetcafe.org] see 04.05

Would you put a space heater on a UPS? Shredder? Vacuum? Table Saw? If you put a laser printer on it, you may as well.

Re:That's what I always say sometimes (1)

alta (1263) | more than 6 years ago | (#24307557)

Maybe I'm the dumbass here, WTF is an interruptible power supply? And why is it called a UPS, when a UPS is an UNinterruptible power supply?

Re:That's what I always say sometimes (1)

RiffRafff (234408) | more than 6 years ago | (#24307701)

Thanks for responding; please see my reply to Reziac.

Re:That's what I always say sometimes (2, Informative)

bgat (123664) | more than 6 years ago | (#24307453)

Yes, quite. It can't handle the substantial inrush current needed by the laser printer.

The "click" you hear in the UPS when the laser printer warms up is the UPS noting the drops on the power mains, which gives you some idea just how much current that printer needs.

I have a Samsung ML2150, and have noticed the same thing. Lights flicker, etc. whenever I submit a print job and the printer transitions from standby to active. The various UPSes in my office sense that, and respond with clicks and beeps.

Take the laser printer off the UPS. If you really need printer capability during a power failure, switch to an ink jet.

Re:That's what I always say sometimes (1)

natoochtoniket (763630) | more than 6 years ago | (#24307749)

Is the UPS just too small (900 AVR)?

Duh? You think it might?

Read the power labels on all of the devices that you intend to plug into that power supply. Add up the volt amps (volts times amps), or the watts (almost the same thing). The total needs to be smaller than the power-supply.

Even if the capacity numbers look good, batteries lose capacity as they age.

Get a UPS (3, Insightful)

Chemisor (97276) | more than 6 years ago | (#24307229)

I really can't understand people who don't have a UPS. Don't you care about your data? At all? The UPS is not very expensive (My BackUPS 900 is very nice and only $100), and will last a long time (you just replace the batteries now and then). Once you are on UPS, you can stop worrying about any power issues, journalling file systems, crash recovery, and all that. The computer will never fail due to power. If you run Linux, it will also never fail due to the OS. If you are a normal user, that means your computer will never fail, period. Seriously, there is no excuse for not having a UPS. Go and get one right now!

Re:Get a UPS (1)

Reziac (43301) | more than 6 years ago | (#24307415)

I always tell people the same thing. For about $100 for a decent home-type UPS, you will never have your hardware trashed by power spikes and sags, and you'll never have your work rudely interrupted or destroyed by a power outage.

Re:Get a UPS (2, Insightful)

LBArrettAnderson (655246) | more than 6 years ago | (#24307517)

Unless your PSU breaks...

Is this bring your kid to work day? (4, Funny)

alta (1263) | more than 6 years ago | (#24307279)

Ok, now everyone has something to give to your kid for the sysadmin-in-traning class.

For the rest of us... back to work, nothing here you didn't learn your first year.

For the poster... Shame shame... Turn in your card.

Uhm, no..? (1)

TheDarkener (198348) | more than 6 years ago | (#24307345)

If you back up religiously, assuming you have the backups on some sort of removable media, why would recovering from them be impossible when data loss via electrical outage occur?

Dur-durdur!

Carefully proofredded article (2, Funny)

Intron (870560) | more than 6 years ago | (#24307391)

"3.2. (Ecrypted) file systems"

Please tell me more about these ecrypted file systems. Do they also do gurnalling?

Don't forget the simple case... (1)

s31523 (926314) | more than 6 years ago | (#24307411)

So a UPS is needed, really. Working on a long block of code haven't hit save in a while and no autosave is on... Bam, power is out and you just lost 100 lines of code you spent hours on. Go get a UPS.

And this is what ZFS looks out for (3, Interesting)

E-Lad (1262) | more than 6 years ago | (#24307425)

...by design. TFA doesn't delve into too much detail, but a sudden power loss on such software RAID systems is a condition that ZFS accounts for. Its Copy-on-write (COW) and write-length stiping strategy prevents things such as the RAID5 write hole [sun.com] condition, a condition that has the biggest chance of occurring when a power loss event happens.

Re:And this is what ZFS looks out for (0, Troll)

X0563511 (793323) | more than 6 years ago | (#24307579)

What's with all the ZFS spam lately? Did I miss something?

A UPS is good to have. Even at home. (3, Interesting)

Forge (2456) | more than 6 years ago | (#24307473)

Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.

At work. Power is a whole enterprise within the company I work for.

Dual gas powered Generators at each location, Rooms full of Batteries for the Telecoms gear (most is straight DC) and Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)

We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.

Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.

Which means the phone network is a lot more reliable than the Power grid where I live.

As for Data loss. I have over the years done a lot of recovery work. "Morfy" of "Murfy's Law" fame isn't a guy or a girl. He is a deamon from the darkest pits of hell sent to torment the souls of IT workers everywhere.

Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server.

despite previous testing and confirmation that the backups work the most recent tapes failed to read.

Eventually we sent the failed drives off to a Data recovery company in Florida because

#1. The customer can afford it.
#2. Simply "skipping" a few days of Email is not an option for a bank (hence the ability to afford data recovery).

So yeah. A UPS is essential. Just like RAID, Clustering and Backups but in the end it can all fail.

Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).

Other reasons to run a UPS (3, Interesting)

rwa2 (4391) | more than 6 years ago | (#24307481)

UPS units are relatively cheap, it's well worthwhile to invest in one, not just to protect from data loss:

* Hardware loss: I've seen a lot of hardware blown up from power interruptions. Do you trust your power company that much to provide clean power to you? Sure surge protectors help a bit, but a decent UPS costs maybe twice as much as a good surge protector.

* Time lost restoring your session after blackouts / brownouts: OK, maybe you're used to restarting your computer every morning anyway. But I like to leave things open and return to my desktop just the way I left it arranged.

* Stats: Using NUT and Munin, you get to monitor and log your power, so you can see things like exactly when your electricity went out and for how long, what load your PC is drawing after that last upgrade, etc. e.g.: http://hairball.bumba.net/cgi-bin/nut/upsstats.cgi?host=apc@localhost [bumba.net]

* Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

Frankly, I'm a little surprised a backup battery isn't built into PC power supplies already, so they'd work a bit more like laptops. Same with networking gear.

Duh! (1)

slashname3 (739398) | more than 6 years ago | (#24307629)

Post this under most obvious thing ever!

I guess the author wasn't worried about any events or transactions that were in the process of being committed. Nor has he managed any production databases.

Next thing you know there will be an article about not being able to surf the web when the Internet connection is down.

(plus one In73ormative) (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#24307717)

He didn't mention RAID batteries (1)

Percy_Blakeney (542178) | more than 6 years ago | (#24307747)

He fails to mention battery packs for RAID cards. They maintain power to the disk cache memory on the card in the event of a power failure, which allows the card to finish writing the cache to disk once main power is up again. That's one of the arguments for a hardware RAID solution.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>