Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

How Power Failures Corrupt Flash SSD Data

Soulskill posted about a year and a half ago | from the not-so-solid-state dept.

Data Storage 204

An anonymous reader writes "Flash SSDs are non-volatile, right? So how could power failures screw with your data? Several ways, according to a ZDNet post that summarizes a paper (PDF) presented at last month's FAST 13 conference. Researchers from Ohio State and HP Labs researchers tested 15 SSDs using an automated power fault injection testbed and found that 13 lost data. 'Bit corruption hit 3 devices; 3 had shorn writes; 8 had serializability errors; one device lost 1/3 of its data; and 1 SSD bricked. The low-end hard drive had some unserializable writes, while the high-end drive had no power fault failures. The 2 SSDs that had no failures? Both were MLC 2012 model years with a mid-range ($1.17/GB) price.'"

Sorry! There are no comments related to the filter you selected.

build in some power storage (5, Insightful)

X0563511 (793323) | about a year and a half ago | (#43049765)

Seriously... slap in some basic power circuitry and some caps - enough that the drive can finish the cycle it is on and do whatever it needs to do to power off safely.

Re:build in some power storage (2, Insightful)

Anonymous Coward | about a year and a half ago | (#43049801)

I'll quote the great CliffyB: Vote with your dollars!

What? It's valid thinking, not at all 9:th grade.

We encountered something like this (5, Interesting)

AliasMarlowe (1042386) | about a year and a half ago | (#43050073)

We encountered extensive and progresssive file corruption on SSDs in an industrial device. It used the FAT file system, and after every loss of power, it ran its equivalent of chkdsk/f at the next boot. If power was lost again while this command was running, then it was guaranteed that the file system would become corrupt (despite the fact that we were writing nothing to the SSD; it held only files which were opened for reading). The window of opportunity was described as "very short", and the possibility of corruption was "very small" according to the vendor. In our experience in the field, and in our internal testing, the window of opportunity exceeded 20 seconds, and the possibility of corruption was "utter certainty".

The vendor fixed the problem in a very easy way. They changed the file system from FAT to a commercial journaling FS. In our subsequent tests, we never found any file corruption, even on iterated power loss at random intervals after power on.

Re:We encountered something like this (4, Insightful)

TheRealMindChild (743925) | about a year and a half ago | (#43050199)

First, running an SSD on an "industrial device"

Second, using FAT

Third, "commercial journaling FS". What does that even mean?

If you are industrial, where is your UPS?

Re:We encountered something like this (5, Insightful)

yurtinus (1590157) | about a year and a half ago | (#43050391)

Likely as part of an embedded system - monitoring or control software. Systems where you just flip the power switch on when you need them and off when you're done, so an UPS wouldn't apply.

I'm not saying their implementation was right, just saying that you can't imply from his post that it was wrong :P

Re:We encountered something like this (3, Informative)

thejynxed (831517) | about a year and a half ago | (#43050513)

If it was a drive being used to read schematics for CNC for instance, there isn't a manufacturer out there that currently offers a machine-tied UPS for the CNC machine. If the CNC machine loses power, then so does the drive, and vice versa, since it's all on the same circuit (usually you'll find the power stuff hidden in a cabinet along a nearby wall, and that stuff takes power directly from the mains).

Re:We encountered something like this (-1)

Anonymous Coward | about a year and a half ago | (#43050931)

Third, "commercial journaling FS". What does that even mean?

It means NTFS, a real file system. Shitty journaling file systems are non-commercial and are what you silly faggots use on your linux "boxen".

Re:We encountered something like this (5, Informative)

certsoft (442059) | about a year and a half ago | (#43050289)

We use USB flash drives for a data logger. Most of the time the data is being buffered in the ARM based Linux board's RAM to save power. Once we get a complete file's worth (4MB at the present) we power up, validate, write the file, and power down. Supercaps have been a lifesaver. There's even enough capacity to do the write cycle if the flash was powered down when a power fail is detected. That allows to not lose whatever what was already in the RAM buffer.

Re:We encountered something like this (3, Interesting)

Darinbob (1142669) | about a year and a half ago | (#43050715)

I hate a lot of USB drives and CompactFlash. They're all designed as dumb commodity devices for the undiscriminating user, and trying to get any solid spec sheets out of the manufacturers is impossible if you're not also a giant corporation. Instead their data sheets are just marketing literature (you rarely get anything more technical than "8x speed"). Almost all are designed to work with Windows with no concern to work with embedded systems or production automation, etc. So you end up buying a wide variety to test with and see which ones are barely adequate to work with your system.

Re:We encountered something like this (2)

certsoft (442059) | about a year and a half ago | (#43050823)

Fortunately the client has facilities to test various drives over a wide temperature range (down to -40, not sure how hot they test) while running. And yes, a lot of them are crap.

Re:We encountered something like this (1)

Anonymous Coward | about a year and a half ago | (#43050727)

How do you tell if a drive has supercaps? I have a unique application and we need a super small SSD drive although ran into a problem with SSD drives being too unreliable. I concluded it was probably due to the occasional loss of power. We were actually hoping to use MicroSDHC cards until this issue was realized after significant testing. This was with the highest rated drives/the most expensive. We are now looking into msata ssd drives. Any thoughts on this?

Re:build in some power storage (-1)

Anonymous Coward | about a year and a half ago | (#43049831)

Oh! Basic power circuitry! ...AND caps!

I bet no one ever thought of that!!

Re:build in some power storage (1)

sjames (1099) | about a year and a half ago | (#43049917)

I bet no one ever thought of that!!

Based on the paper, I guess they didn't

Re:build in some power storage (2)

hawguy (1600213) | about a year and a half ago | (#43050171)

I bet no one ever thought of that!!

Based on the paper, I guess they didn't

Some SSDs already have capacitors that do just this, so yes, they did think of it. Did you really think that SSD manufacturers aren't aware of this issue?

But when a few dollars can sway a purchase decision, and it's hard to convince consumers through a few sentences on the side of an SSD box that power protection circuitry is important to have, it's hard to justify putting it in. And since most SSD's are probably sold as OEM equipment where a few pennies can make the difference between getting the sale or not, then it's even harder to justify.

It's not something I'd be willing to pay extra for - my computer hasn't lost power in years (thanks to a UPS that automatically shuts down my computer), but my computer writes to disk so rarely that there's probably a 100 to 1 chance that it will be in the middle of a write if I just walk up and pull the plug. If I do lose data, there's always backups to fall back on.

Re:build in some power storage (1)

TheRealMindChild (743925) | about a year and a half ago | (#43050247)

But when a few dollars can sway a purchase decision, and it's hard to convince consumers through a few sentences on the side of an SSD box that power protection circuitry is important to have, it's hard to justify putting it in

This isn't buying a car. $3 or even $20 isn't going to be detrimental to the purchase oppritunity when the consumer can TELL it is of quality above the competitors. Blaming the consumer in this case sounds like you are on the other side

Re:build in some power storage (2)

TechyImmigrant (175943) | about a year and a half ago | (#43050369)

It wold be great if they told you about the feature so you could make an informed purchasing decision.

Re:build in some power storage (2)

yurtinus (1590157) | about a year and a half ago | (#43050441)

Exactly, this is buying consumer computer equipment. Put a label on the side with a bullet point touting your unexpected power fault protection and I can pretty much guarantee it will have no impact on your product sales. You know what will? The extra $2 price that puts you below the other guy on the "lowest price first" product sorting.

Re:build in some power storage (0)

TechyImmigrant (175943) | about a year and a half ago | (#43050611)

My employee discount beats any $2 price difference.

Re:build in some power storage (1)

hawguy (1600213) | about a year and a half ago | (#43050745)

My employee discount beats any $2 price difference.

What kind of employee discount do you have that can take a $120 drive and a $122 drive and make the prices equivalent?

Re:build in some power storage (2)

hawguy (1600213) | about a year and a half ago | (#43050479)

But when a few dollars can sway a purchase decision, and it's hard to convince consumers through a few sentences on the side of an SSD box that power protection circuitry is important to have, it's hard to justify putting it in

This isn't buying a car. $3 or even $20 isn't going to be detrimental to the purchase oppritunity when the consumer can TELL it is of quality above the competitors. Blaming the consumer in this case sounds like you are on the other side

How can the consumer TELL if its quality is above the competitors? The presence of capacitors doesn't mean that it's a better drive than a drive without capacitors. It just means that you have more protection from one rare set of circumstances -- potentially with less reliability overall, since big electrolytic capacitors are known to fail, especially cheap ones.

I suspect that most SSD's are bought as OEM drives buried inside laptops and desktops where the end user may not ever know what brand and/or model the drive is, so how will a higher cost for a feature that may offer no real benefit for mother users help sell more drives?

Don't believe me? Here's proof: Manufacturers aren't promoting it as a feature in big letters on the side of the box. If they thought they could add $5 of circuitry and sell the drive for $10 more, they would.

If you're reading Slashdot, then you're not a typical consumer, and maybe you really are enough of an SSD expert to compare features to know what makes one SSD better than another, but for the other 99% of consumers, they will either buy an SSD with their next computer, or they'll buy the one at Best Buy that has the lowest price and the highest transfer rate since that's a number he can understand. How would you even quantify "Power protection capacitors" to know if it's worth $5, $50 or $100 to you? If it's really important to you, you can always buy an enterprise class SLC drive that includes the capacitors

Blaming the consumer in this case sounds like you are on the other side

Is this one of those George Bush "If you're not with us, you're against us" false dichotomys? Believe it or not, it's possible for people to have different opinions without being enemies.

Re:build in some power storage (1)

Bill_the_Engineer (772575) | about a year and a half ago | (#43050755)

Component manufacturers target OEMs for the bulk of their sales. They will build to the price point that may win them a sale. The $3 to $20 amount may not make a difference to a consumer purchasing one from NewEgg, but to someone who purchases in blocks of 1000 it may.

Re:build in some power storage (1)

Anonymous Coward | about a year and a half ago | (#43050879)

Most consumers don't know what an SSD drive is and those who have one in there systems have one because they were upsold on a system by a clueless salesperson. Its pure coincidence they have one.

Re:build in some power storage (1)

sjames (1099) | about a year and a half ago | (#43050261)

It's not like they need a great deal of hold up time. Done well, they need only hold power long enough to successfully write a commit bit or decide not to.

Re:build in some power storage (1)

NatasRevol (731260) | about a year and a half ago | (#43050161)

They thought of it. They just didn't want to pay for it.

Re:build in some power storage (5, Insightful)

v1 (525388) | about a year and a half ago | (#43049853)

space is at an extreme premium in those drives. There's a reason they feel so heavy/dense. Given the quilting layout of the chips, adding a single cap would prevent several memory chips from fitting. So you may as well then fill that remaining space with more caps. But you will reduce capacity, and that's what sells SSDs.

There's already a substantial amount of circuitry in them, far from "basic". It's essentially a CPU. I'd be interested to see some numbers as to average power drain during idle, read, and write.

The ones that did the best during the power blips probably did have caps and a bit more in their power system to handle it though. It certainly does surprise me that the mid-range, not the high-end, were the best performers in this test.

Re:build in some power storage (2)

Mad Merlin (837387) | about a year and a half ago | (#43049923)

space is at an extreme premium in those drives. There's a reason they feel so heavy/dense.

I don't know what SSDs you've been using, but I've never picked up an SSD (OCZ Vertex 2/3, Intel X25-M/320/330/335/510/520) that didn't feel light and sound nearly hollow.

Re:build in some power storage (3, Informative)

Mashiki (184564) | about a year and a half ago | (#43050341)

I don't know what SSDs you've been using, but I've never picked up an SSD (OCZ Vertex 2/3, Intel X25-M/320/330/335/510/520) that didn't feel light and sound nearly hollow.

Consumer drives are usually lightweight, they don't need the extra cooling. Enterprise drives depending on who they're made by and what they're for can have heatspreaders or heatsinks within, or attached to each chip adding to the weight.

Re:build in some power storage (0)

Anonymous Coward | about a year and a half ago | (#43050107)

They don't feel heavy at all, and the drives are much smaller than than the 3.5" bays they go into, leaving lots of space for a capacitor.

You ever look inside one? (1)

Sycraft-fu (314770) | about a year and a half ago | (#43050133)

There is all kinds of extra space in a 2.5" SSD. They have a lil' CPU, some flash chips, and that's it more or less. They are quite small. In smaller form factors, then ya space can become an issue but there's plenty in a 2.5" unit.

Re:build in some power storage (1)

AmiMoJo (196126) | about a year and a half ago | (#43050279)

Most SSDs are 2.5" so there would be plenty of room for a large capacitor or small battery. You really don't need a lot of energy to finish flushing a small RAM buffer.

Re:build in some power storage (2)

edmudama (155475) | about a year and a half ago | (#43051019)

Most of the enterprise grade SSDs on the market that are outfitted with power-loss protection circuitry fit these capacitors within the 2.5" form factor.

Re:build in some power storage (1)

TechyImmigrant (175943) | about a year and a half ago | (#43050385)

>space is at an extreme premium in those drives.
So put them in a desktop drive form. The first thing I do with SSDs is put them in one of those adaptors to make them fit in a normal drive tray.

Re:build in some power storage (2)

Guspaz (556486) | about a year and a half ago | (#43049867)

Most enterprise SSDs do have small supercapacitors or capacitor arrays onboard for exactly this reason. Some of the higher-end consumer drives do too. But most consumer drives don't.

The answer? Get a UPS.

Re:build in some power storage (1)

sjames (1099) | about a year and a half ago | (#43049933)

The answer? Get a UPS.

Because those never fail.

Re:build in some power storage (0)

Anonymous Coward | about a year and a half ago | (#43050117)

Yes, lets just not use anything that fails once and a while, even if it is even less than the thing it is protecting.

Re:build in some power storage (0)

Anonymous Coward | about a year and a half ago | (#43050151)

"once and a while"?

It is "once IN a while".

Re:build in some power storage (1)

sjames (1099) | about a year and a half ago | (#43050281)

I didn't say don't use a UPS, I said they DO fail sometimes so don't pretend it can't happen.

Re:build in some power storage (1)

dgatwood (11270) | about a year and a half ago | (#43050205)

The answer? Get a UPS.

You're assuming a desktop-sized drive in a desktop computer, yet nearly all computers sold today are portables, and laptop users are more likely to buy bus-powered external drives than mains-powered drives.

So the five most likely causes of power failure in a consumer hard drives (and presumably, in the future, SSDs), ordered from most likely to least likely, are probably:

  • Somebody yanking a USB cable before the device is fully unmounted.
  • The laptop's battery dying earlier than expected.
  • Somebody yanking a FireWire cable before the device is fully unmounted.
  • Somebody yanking an eSATAp cable before the device is fully unmounted.
  • An electrical power disruption caused by the hinge pinching the inverter cable.

An unexpected mains power failure with a non-battery-backed device falls somewhere around #87. A UPS won't help with any of the above.

Re:build in some power storage (3, Funny)

TechyImmigrant (175943) | about a year and a half ago | (#43050637)

>yet nearly all computers sold today are portables

What I really want is a potable computer, so I can drink it if I get thirsty.

Re:build in some power storage (1)

AmiMoJo (196126) | about a year and a half ago | (#43050297)

Or maybe attach a capacitor or battery to the power connector (with diodes so you don't try to power the entire PC).

UPS does nothing for the common fault case. (3, Informative)

stoploss (2842505) | about a year and a half ago | (#43050359)

Most enterprise SSDs do have small supercapacitors or capacitor arrays onboard for exactly this reason. Some of the higher-end consumer drives do too. But most consumer drives don't.

The answer? Get a UPS.

A UPS is no panacea: I experience grid failure very rarely.

However, relatively speaking I experience many more kernel lockups that require an ACPI-initiated poweroff by holding down the power button until the machine abruptly powers off. What do you do when a reboot/poweroff command causes your Linux/BSD machine to hang? I/O handle leaks in the Samba SMB client (ie. *not* the smbd daemon) and the Samba Winbind code are notorious for this. The only times I have ever had to "yank power" from a production Linux database machine were due to SMB share mount zombies or Winbind that the kernel couldn't kill even during an issued reboot command.

I have several OCZ Vertex 4 SSDs, and this concerns me—especially due to the fact that the paper/presentation does not disclose the test results. I guess I will just have to hope that my device models aren't affected and/or that waiting a minute or two during a hung poweroff/reboot means the kernel has stopped attempting to write to the devices and everything has flushed.

PS. If you compare the vague results in the summary with the paper you will find that only two of the fifteen drives passed the tests, yet four of the devices were cited to have power protection capacitors.

Re:UPS does nothing for the common fault case. (0)

Anonymous Coward | about a year and a half ago | (#43050933)

I don't understand how if they claim that it takes up to 20 sec for the final write to finalize that a computer that simply shutsdown in 10 sec won't have the same problem.

Re:UPS does nothing for the common fault case. (0)

Anonymous Coward | about a year and a half ago | (#43050969)

Suggest you use MS Windows so it won't lock up all the time.

Re:build in some power storage (1)

WillgasM (1646719) | about a year and a half ago | (#43049869)

You would think. The only SSD I'm running is on my computer at home and my house is sufficiently UPS'd. It's always cool when the power goes out at my apartments but all my electronics keep going. I just wish there was a battery on that Time Warner box outside my door.

Re:build in some power storage (1)

PRMan (959735) | about a year and a half ago | (#43050263)

I just wish there was a battery on that Time Warner box outside my door.

Strange. My DirecTV DVRs just keep on working...

Re:build in some power storage (1)

WillgasM (1646719) | about a year and a half ago | (#43050473)

I'm mostly talking about the Internet. Netflix only buffers a minute or two.

Re:build in some power storage (-1)

Anonymous Coward | about a year and a half ago | (#43049913)

A lot of newer ones do this, actually. Certainly the "enterprise" targeted drives.

Re:build in some power storage (1)

Beardo the Bearded (321478) | about a year and a half ago | (#43050097)

That was my first thought as well, throw in one supercap and you'll solve this problem.

Already done (1)

rgbrenner (317308) | about a year and a half ago | (#43050119)

enterprise-class SSDs have capacitors designed to last long enough for the SSD to finish any writes if the power fails.

Capacitors cost money though.. so this is one of the things that gets stripped out of consumer-level drives to reduce the price.

Re:build in some power storage (0)

Anonymous Coward | about a year and a half ago | (#43050255)

High-end SSD drives do have ultracaps to deal with power interruption.
High-end as in enterprise class drives that each cost more than you spent on your whole gaming rig.

The reason why power interruption causes data loss is simple. It comes down to how flash storage works. While I'm sure most of you assume that writing to flash is a simple bit-by-bit operation that happens instantly, nothing could be further from the truth.

Flash memory is accessed in blocks and only blocks. Even if you need to write to a single bit, the entire block that that bit resides in needs to be re-written. This means before you can write, the entire block has to be read and stored temporary ram. If power is interrupted during a write operation then there is a very good chance the entire block will be lost because the contents of the flash controller's ram will be lost.

And it gets worse, because flash writes don't always work. Yep. You heard that correctly. Here's how a typical flash write operation works:
1. Read block and calculate changes
2. Erase block (Oh yeah, you can't write to blocks with data. You can only write to flash cells that have been zeroed)
3. Attempt write
4. Check for write success by reading block and making sure it matches what you attempted to write.
5. If step 4 fails, go back to step 2 (Number of iterations required depends on the type and quality of the flash memory)

The above explains why flash writes are so much slower than flash reads, but it also means that there is a greater chance of data loss during power failure because the writes take a long time.

Re:build in some power storage (2)

TechyImmigrant (175943) | about a year and a half ago | (#43050661)

>Flash memory is accessed in blocks and only blocks. Even if you need to write to a single bit, the entire block that that bit resides in needs to be re-written. This means before you can write, the entire block has to be read and stored temporary ram. If power is interrupted during a write operation then there is a very good chance the entire block will be lost because the contents of the flash controller's ram will be lost.

You are wrong.

Flash it written word by word. The size of the word depends on the chip.
Flash is *erased* a block at a time.

That is what makes flash more efficient than EEPROM, the block erase plane.

Re:build in some power storage (1)

TechyImmigrant (175943) | about a year and a half ago | (#43050697)

You can write individual words in a flash chip.
It takes longer to write than read because you have to force a bunch of electrons through an insulator.

If you want to write over existing data, you have to erase the block it is in, because you can only erase whole blocks, but there is nothing to stop you incrementally writing to unused parts of a block.

Re:build in some power storage (1)

hairyfeet (841228) | about a year and a half ago | (#43050387)

Uhhh...we solved this problem ages ago with UPS. If you care about your data put the machine on a UPS. I've had my business customers on UPS systems for years, showed them how to test the batteries and swap 'em when they get worn out, no problems. I just had to swap the PSU and HDD out of my netbox at the shop because a transformer blew on my block and managed to give the old gal enough of a shock even through a surge protector that it cooked the PSU and the HDD, but since its just a netbox I don't care enough about it to waste money on a UPS. I just slapped in some parts from my parts box, slapped the disc image and was back up in less than an hour, no biggie.

At the end of the day if the unit has data you care about? UPS, just like the one I have at home is running on. If its something you don't give a shit about, like my 8 year old netbox at the shop? Just use a surge protector and be ready with a disc image if anything craps out. Really folks this ain't rocket science, a little common sense goes a long way.

UPS irony (1)

stoploss (2842505) | about a year and a half ago | (#43050569)

Uhhh...we solved this problem ages ago with UPS. If you care about your data put the machine on a UPS. I've had my business customers on UPS systems for years, showed them how to test the batteries and swap 'em when they get worn out, no problems.

That may help, but it isn't sufficient. I had one client on an APC SmartUPS that caused more power failures than it prevented. Why? Ambient thermal shutdown of the SmartUPS resulted in it abruptly powering off repeatedly even while the grid was up. So, if they did not have a UPS installed they would not have had any of those power outages, and, for bonus irony, grid failures were quite rare and never occurred while I was there.

This may seem like it goes without saying, but the installation context matters.

Re:UPS irony (1)

drinkypoo (153816) | about a year and a half ago | (#43050627)

After the third used APC UPS that still didn't work properly after battery replacement, I gave up. None of them could handle vaguely anywhere near the load they were supposed to. I don't know whose UPSes to buy, but I wouldn't buy anything from APC any more. It's unfortunate, because they used to follow a simple formula (fat traces, quality components, sturdy enclosures, priced accordingly) and they were a good value proposition.

Re:UPS irony (1)

stoploss (2842505) | about a year and a half ago | (#43050833)

The ironies caused by attempting to prevent faults through increased complexity are multifarious.

I had one client whose installation was at a datacenter with the "standard" triple redundancy of power supply: grid, UPS, and generator. Furthermore, all racks had an "A" and "B" power distribution network. One day they were attempting to bring the "A" power distribution system back into code after inspection (WTF?), so they took "A" offline to make changes. No planned effects, because all the units in the rack had dual power supplies... but then a worker dropped a wrench and "B" went down too after the breaker tripped—for the whole datacenter. Total power loss... and yet the grid was still up.

Lovely.

There's no escape from the fact that increased complexity increases the risk of catastrophic failure. To wit: if the 2013 Super Bowl didn't have the multi-grid relay installed, they wouldn't have had any outage. The power failure prevention mechanism caused the outage. Simpler is usually better, unless people want to pay massively to eliminate all their single points of failure in order to escape the bathtub curve of ironically-diminished reliability caused by the increased complexity of a naively-implemented "redundant system". Even in "properly engineered, no-single-point-of-failure" systems there are often hidden failure vectors, so I normally advocate simplicity and "warm standby" approaches.

Re:UPS irony (1)

drinkypoo (153816) | about a year and a half ago | (#43050911)

That multi-grid relay was supposed to be a warm standby approach, but it was too clever for its own good, which brings us back to your opening sentence. Manual switches would have been preferable. A brief power outage while a maintenance guy scrambles for the relay (hopefully there's someone stationed near the control...) is acceptable when the utility goes down. Having the system screw up and imagine itself an emergency is just lame.

Re:UPS irony (1)

stoploss (2842505) | about a year and a half ago | (#43051015)

That multi-grid relay was supposed to be a warm standby approach, but it was too clever for its own good

Haha, absolutely. Once I had a client who "upgraded" their gigabit ethernet topology to provide multipath IO from their production servers to their SAN. All these switches had dual power supplies connected to dual distribution power systems and the servers had dual power supplies, dual NICs on separate VLANS, etc. Fair enough; I enabled MPIO for the ISCSI SAN access on the production servers.

What *actually* happened? A month later we experienced massive data loss from production server process output because the mulitpath IO to the SAN triggered a firmware bug in the SAN and made it abruptly drop offline completely (and also corrupt recently-written data FTW). Comparatively speaking, how often was it likely that we would lose network access from the server to the SAN (which was in the same rack) in a single-path ethernet configuration? What, failure of the solid-state ethernet switch? Ethernet cable failure in unmoving cables? Ha.

Later on in a separate incident, we lost access to the SAN because someone was doing maintenance to the switch plug configuration during business hours and redundantly unplugged the redundantly-powered switches.

Re:build in some power storage (1)

the eric conspiracy (20178) | about a year and a half ago | (#43050771)

I've had lots more failures due to UPSs going tits up than through data loss on SSDs.

Re:build in some power storage (1)

K. S. Kyosuke (729550) | about a year and a half ago | (#43050399)

Seriously... slap in some basic power circuitry and some caps

A small, stupid, retro NiMH battery might work even better.

Re:build in some power storage (0)

Anonymous Coward | about a year and a half ago | (#43050455)

Seriously... slap in some basic power circuitry and some caps - enough that the drive can finish the cycle it is on and do whatever it needs to do to power off safely.

That would cost money, add to the cost: most folks don't care about the details of data integrity enough to know that this is a good idea. They'd just see the higher price and purchase the next model over which is $1 cheaper.

Re:build in some power storage (1)

Darinbob (1142669) | about a year and a half ago | (#43050667)

You can design the firmware to avoid corruptions as well, it doesn't need a hardware solution. The manufacturers just have to be aware that power failures will occur and take that into account during the design. Extra capacitors won't fix the problem of shortcuts in the design.

Before you ask. (5, Informative)

eddy (18759) | about a year and a half ago | (#43049767)

The paper doesn't disclose the brands.

Re:Before you ask. (1)

war4peace (1628283) | about a year and a half ago | (#43049837)

Of course it doesn't. Naming/Shaming is not allowed.
I was sarcastic, of course. They don't do it, though, because it'd probably put them in a crossfire of lawsuits coming from powerful companies. Nobody wants that. They will lose simply by being bullied financially. It's all about who brings more lawyers to the table, not who's right or wrong.

Re:Before you ask. (1)

Mad Merlin (837387) | about a year and a half ago | (#43049941)

Which is unfortunate. That was the main reason I opened the PDF.

Re:Before you ask. (1)

PRMan (959735) | about a year and a half ago | (#43050273)

Somebody should tell that to Consumer Reports...

Re:Before you ask. (1)

TechyImmigrant (175943) | about a year and a half ago | (#43050403)

MLC == Intel. But they were the good ones.

Remember kids (0)

Anonymous Coward | about a year and a half ago | (#43049813)

Always RAID and have battery backup, it saves lives.

Re:Remember kids (0)

Anonymous Coward | about a year and a half ago | (#43049843)

Kids can't even afford SSDs, let alone that other stuff, you insensitive clod!

Anyone ever hear of a battery-backed cache? (1)

Midnight_Falcon (2432802) | about a year and a half ago | (#43049823)

Last time I checked, standard platter-based disks had the same issue -- a problem that is solved in server/enterprise environments by placing a write-cache battery in the RAID controller.

In a desktop environment I suppose one could embed a write cache battery into the SSDs to abate the issue, but in a laptop environment it'd be unlikely you'd even encounter it since you'd have to be writing data while running out of battery, in which case, you might well deserve it :)

Re:Anyone ever hear of a battery-backed cache? (1)

LunaticTippy (872397) | about a year and a half ago | (#43049839)

A capacitor could hold enough power to finish a write cycle on SSD no problem. It wouldn't even have to be very large.

Re:Anyone ever hear of a battery-backed cache? (1)

noelhenson (691861) | about a year and a half ago | (#43050225)

Actually, you might be surprised at how large it would be. F=It/V. 10mS write time for a sector, 2S for a file. Voltage tolerances being about +/-5%. 3.3V operationg voltage. 5V source voltage. Say 80mA for operating current.

1.7V drop over 10mS should require about 470uF. To write a file (a 2-second file) would be something like: 80mA*2S/1.7V or about 94,000 uF.

I'm not trying to be a killjoy here, but trying to store data during bad-voltage and power-down situations is a nontrivial problem.

Re:Anyone ever hear of a battery-backed cache? (1)

noelhenson (691861) | about a year and a half ago | (#43050267)

*F=Farads. I guess I should have written C=It/V.

Re:Anyone ever hear of a battery-backed cache? (1)

nabsltd (1313397) | about a year and a half ago | (#43050527)

Actually, you might be surprised at how large it would be. F=It/V. 10mS write time for a sector, 2S for a file.

You don't need enough power to finish the OS-level task...you only need enough to write out the data in the drive's RAM cache. Since that is 256MB or less on most current SSD drives (512MB is found on some drives greater than 500GB), it's not as much as you estimate.

Then, too, when you use the correct timings (10ms for a sector is about 200 times too long), you see that even the slowest SSD takes only about 5ms to write out 1MB (with an average around 3ms), that's around 0.75 seconds to flush the whole cache, resulting in about 1/3 the power you estimate. And, that's assuming that the whole cache needs to be flushed, as it's possible that only a few blocks need to be written.

You can also estimate the time in another way, in that these drives can sustain 300-500MB/sec write rates. That means that copying from the drive's RAM cache to flash must be at least that fast, which gives between 0.5 and 1.7 seconds to flush. This time is similar to my estimate above, so I suspect that 0.75 seconds isn't far off from correct.

Re:Anyone ever hear of a battery-backed cache? (1)

Midnight_Falcon (2432802) | about a year and a half ago | (#43050405)

True, but, then when the cap dries out and eventually bursts open it'd probably be a major cause of drive failure and lack of longevity.

Re:Anyone ever hear of a battery-backed cache? (1)

drinkypoo (153816) | about a year and a half ago | (#43050721)

Use a solid cap, and/or socket the cap at the edge of the drive someplace.

Re:Anyone ever hear of a battery-backed cache? (1)

FoolishBluntman (880780) | about a year and a half ago | (#43049939)

In the paper presented, they also include both a consumer and enterprise class disk drive.
The consumer class drive had the same problems as the cheap SSD.

Re:Anyone ever hear of a battery-backed cache? (1)

wisnoskij (1206448) | about a year and a half ago | (#43050703)

OR the battery fails, is taken out, or falls out.

Re:Anyone ever hear of a battery-backed cache? (0)

Anonymous Coward | about a year and a half ago | (#43051021)

What the fuck are you on, faggot? The issue is with the cache in the god damnned SSD not getting flushed due to a power outage. How the fuck is the battery backed write cache on your fucking RAID controller going to help that situation any?! Does it magically keep the drive powered? No.

Power corrupts... (5, Funny)

preflex (1840068) | about a year and a half ago | (#43049855)

... Power failure corrupts absolutely.

Re:Power corrupts... (0)

PRMan (959735) | about a year and a half ago | (#43050287)

MOD parent up! That's hilarious!

UPS (1)

rossdee (243626) | about a year and a half ago | (#43049859)

Why should a power failure corrupt anything? The UPS will shut the computer off if there is a prolonged outage.

penis (-1)

Anonymous Coward | about a year and a half ago | (#43049865)

penis penis vagina penis penis. Viagra penis vagina anus butthole. Ass tits butthole vagina penis penis. Shit penis penis vagina asshole cunt anus vagina!

Re:penis (-1, Offtopic)

maxwell demon (590494) | about a year and a half ago | (#43049891)

Did your brain have a power failure? It seems to be heavily corrupted.

Unsurprising (3, Insightful)

Anonymous Coward | about a year and a half ago | (#43049893)

These devices have an elaborate internal database for the management of block remapping. For this to survive power failures it needs to use transactional updates. Getting this right is hard - it takes years for file systems and databases to become robust. I'd guess that many devices don't even attempt to do it and the ones that do probably have obscure failure modes. A UPS is essential.

Finally somebody said it! (5, Informative)

Dishwasha (125561) | about a year and a half ago | (#43049907)

I had some original Vertex drives from OCZ that kept absolutely corrupting when my laptop got accidentally unplugged and I powered on the machine. I had to RMA them over and over and over again. I finally figured out that my battery was getting old and, although everything was functional even on battery power and it would boot, the initial large draw of power on boot must have created a voltage drop (i.e. brownout) which the SSDs weren't designed to compensate for. Within an hour of boot (even back on plugged power) they would choke, freeze the OS, and be rendered unusable from then on out.

Several SSD manufacturers are probably not engineering well for fluctuating power. Rather than fixing the problem with better engineering, OCZ simply changed their warranty policy to void the warranty if the customer is not providing proper power which, correct me if I'm wrong, I don't think rotating disk hard drive manufacturers have had that in their warranty clauses.

Re:Finally somebody said it! (1)

Anonymous Coward | about a year and a half ago | (#43050065)

Properly designed digital circuitry has a power monitor circuit that prevents the system from running if the power is below a spec'd level. Flash circuitry definitely needed this kind of protection when it first came out, because it used 12V power for programming, and 5V power otherwise, and it was very common for the two power rails to ramp up (or down) at different rates during power on and off. It was very easy to trigger a spurious write cycle because of noise on the lines.

Re:Finally somebody said it! (1, Insightful)

citylivin (1250770) | about a year and a half ago | (#43051011)

Well thats probably becuase you were using OCZ crap. I have never had a quality product from that company.

However that said, I have noticed the same thing with the crucial m4s I have. In one particular laptop, it keeps bricking drives becuase the battery doesnt hold much of a charge any more. Luckily, i can "unbrick" them by plugging in the power (but not data) for 20 minutes, then plugging in the data connection, then rebooting the machine. Has worked more than once.

and crucial has put out a bunch of firmwares trying to deal with this. Last time it happened was a few months ago. I have aprox 15 other drives deployed and it only happens to one or two of them, seems to always be in laptops or after some sort of power surge. Crucial will always RMA the drive as well the one or two times i did not get it going.

And before anyone says "why thats why I dont use SSDs, too new and unstable!" I say that I would not give up my SSD for all the scsi 15ks in the world. SSDs are the single greatest speed increase in computer performance in the last 15 years. Make backups, as you should anyways, and dont be afraid of ssds. When you fly close to the sun, you are going to get burned. Still I would rather FLY so high and roll the dice on reliability (which is still stellar in most circumstances).

Rotational hard drives are such a pain now as an OS drive, and they still die eventually. I recommend SSDs to everyone now, with the caveat above that you always need good backups.

not naming names = data "pulled out of my ass" (2, Insightful)

citizenr (871508) | about a year and a half ago | (#43050201)

Useless paper/test.

Re:not naming names = data "pulled out of my ass" (0)

Anonymous Coward | about a year and a half ago | (#43050719)

No, no ... in fact it is very useful.

It says that SSD are not ready for the big show. oppose to the expected, that non moving parts was the benefit.

They just quit the latency time. If you can live with that, go ahead; if not, stay with regular rotating hard disks.

Re:not naming names = data "pulled out of my ass" (1)

edmudama (155475) | about a year and a half ago | (#43051049)

SSDs are already in the big show, and have been demonstrated reliable in those applications. The key is choose your vendors carefully, ask how they were qualified, etc.

Re:not naming names = data "pulled out of my ass" (1)

Theovon (109752) | about a year and a half ago | (#43050901)

If they do that, they won't get any more free SSDs to test, and that'll impact their ability to write papers criticizing SSDs. What would you prefer? A paper biased towards SSDs too small/cheap to be useful to you, or one that doesn't name names? Anonymity is VERY important in this kind of research.

up/down/up/brown/fried (2, Insightful)

h8sg8s (559966) | about a year and a half ago | (#43050239)

What some of folks don't realize is its the seesaw nature of many power events that's primarily behind both data corruption and SSD failure. It's a rare rack system that has its own power conditioning and UPS these days (HP NonStop comes to mind) and without it you're subject to whatever the event provides in the way of under/over voltage, spikes, drops, etc. Many times these happen in timeframes too fast for power switching equipment to react and in some cases its that stuff that gets fried first.

Interesting failure mode for Crucial SSDs (1)

ckthorp (1255134) | about a year and a half ago | (#43050293)

There is a protection mechanism that I know exists in Crucial SSDs which makes the drive appear dead after some unclean shutdowns of the drive while it performs a firmware-level integrity check of the drive. It may exist in other brands as well. Sometimes it takes 2 runs of 30-60 minutes to get the drive to re-enumerate via SATA. I'd be curious to know if the "dead" drive was affected by this bug.

Re:Interesting failure mode for Crucial SSDs (1)

drinkypoo (153816) | about a year and a half ago | (#43050743)

There is a protection mechanism that I know exists in Crucial SSDs which makes the drive appear dead after some unclean shutdowns of the drive while it performs a firmware-level integrity check of the drive.

I don't know if they're violating a spec or not and it's probably a life's work to find out, but that seems very rude to me. They really ought to identify as busy or something, so that they don't just scare the piss out of you. If you almost-brick an Xperia phone by scragging the bootloader so bad you can't even reflash it, whatever handles the comms is still working and lurking in the background and it will enumerate via USB with the service interface. That way you know whether you should even bother. Would still prefer a more segregated bootloader, though.

Re:Interesting failure mode for Crucial SSDs (2)

Voyager529 (1363959) | about a year and a half ago | (#43050939)

You got this too? I just ordered a Crucial M4 on sale a few weeks ago. the day after I installed and cloned it, I had the same situation where it wouldn't start. I called Crucial, expecting to need an RMA. Luckily I got an informed gentleman on the phone who told me to leave it at the failed POST screen for 20 minutes, reboot, and give it another 20 minutes, and reboot again. It worked. Supposedly it's not so much a 'bug' as an 'obscure feature'. ...I'm keeping my spinning rust drive around just in case.

yeah, looked at the pdf.... (0)

Anonymous Coward | about a year and a half ago | (#43050349)

"We use synchronized I/O (O SYNC), which means each write operation does not return until its data is flushed to the device."

Sure about that? Most of the devices I've seen will report "command complete" while data is either in DRAM or in flight even with write cache disabled. There's only a few that don't do that, and they aren't the cheap ones. You may get lucky on a major player stuffing some decent code in a consumer grade SSD for the sake of fewer firmware versions in manufacturing, but it's usually not the case.

Any device with a "super cap" over 2 years old is suspect. They degrade. All of them are using ceramic arrays now, and only guarantee data in flight if you're really pestering them on a design review.

Also the "brick" may not be a brick. When these drives have to rebuild translation tables, it can take a while. I've seen 60+ minutes on a 400G device. Leave the power on and wait a couple of hours. Reboot. You might get your drive back, maybe even most of your data. I wouldn't count on the last write, but you may get that too if their raid works.

UPS a solution for this (0)

Anonymous Coward | about a year and a half ago | (#43050407)

This reenforces my personal policy - if it is not a self-powered laptop (usually running on mains), all of my other computers are powered through UPS (Uninterruptable Power Supplies), e.g. Belkin, APS, et. al. I learned this the hard way when a brief power interruption scrambled several conventional hard drives on an old Mac Quicksilver.

Buy a SSD with a battery or capacitor (2)

thue (121682) | about a year and a half ago | (#43050417)

This is old news; see fx Wikipedia's coverage [wikipedia.org] . Only buy SSDs with a battery or capacitor, or whatever is the in DRAM cache of the SSD will be lost on power failure.

My Personal Policy (2, Insightful)

wisnoskij (1206448) | about a year and a half ago | (#43050765)

This is why I don't use prototype tech that is really not ready to be used in the real world. And if you do, expect loads of bugs and bricking.

But either way, thanks for funding the development of something I am excited to try out in 2-4+ years when it will be a mature usable technology.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?