Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: Do You Test Your New Hard Drives?

timothy posted about a year and a half ago | from the just-bite-the-corner-a-little dept.

Data Storage 348

An anonymous reader writes "Any Slashdot thread about drive failure is loaded with good advice about EOL — but what about the beginning? Do you normally test your new purchases as thoroughly as you test old, suspect drives? Has your testing followed the proverbial 'bathtub' curve of a lot of early failures, but with those that survive the first month surviving for years? And have you had any return problems with new failed drives, because you re-partitioned it, or 'ran Linux,' or used stress-test apps?"

cancel ×

348 comments

Sorry! There are no comments related to the filter you selected.

Heh (4, Insightful)

Deekin_Scalesinger (755062) | about a year and a half ago | (#42375759)

Like, never. Out of the box and away she goes...good luck to thee!

Re:Heh (4, Insightful)

JMJimmy (2036122) | about a year and a half ago | (#42375987)

Add to the above:

HDD tools are useless. I recently tried a bunch of them - they all reported my HDD in perfect condition... while it was doing the click of death. HDD failed within a week.

Re:Heh (3, Informative)

PlusFiveTroll (754249) | about a year and a half ago | (#42376199)

Sounds more like your hard drive s.m.a.r.t. was useless. The tools can only report what the drive tells it, if smart isn't telling about relocated sectors, resets, or whatever other terrible malfunction then they are left in the dark.

Re:Heh (1)

spire3661 (1038968) | about a year and a half ago | (#42376441)

SMART itself is mostly useless and we should ignore it completely.

Re:Heh (4, Interesting)

hairyfeet (841228) | about a year and a half ago | (#42376407)

The problem is the best damned tool ever made for testing drives hasn't been updating in years and now won't work on drives bigger than 500Gb, I am of course talking about Spinrite. With Spinrite on lvl 2 you just bypass the firmware and write patterns of zeroes and ones and then read back what it reports, if its spitting errors right off the bat then you know to send it back. Problem is Gibson hasn't updated the thing since 06 so it can't handle drives bigger than 500Gb which makes it all but useless today.

So if anybody has found something that works similar to spinrite but works on large drives I too would like to know, I get drives coming in from all over the place at the shop with ZERO history here at the shop so I don't know if they've been barely used or thoroughly abused and having a tool I can run on them would be a big help.

Re:Heh (1)

danomac (1032160) | about a year and a half ago | (#42376069)

The only new computer component I always test out-of-the-box is RAM - I've had many bad experiences over the last 10 years with unstability due to bad RAM.

As far as hard drives go, I never test them. I run several RAID arrays in the house, and I actually have had a replacement drive fail in a week (one of Seagate's recertified drives.) I noticed odd behaviour and rebooted the server and the RAID array was degraded. Oops!

I guess in a way I do test them - if the new drive fails shortly after rebuilding the array, it was likely a lemon to start with.

I don't think the hard drive tools really test anything other than the SMART information on the drives anyway. You're at the mercy of the drive failing bad (or hard) enough to actually trip a smart error. I've also had a drive that clicked and made awful noises for a year before it finally died. And no, it didn't ever report a SMART error, it just crapped out randomly.

Re:Heh (1)

war4peace (1628283) | about a year and a half ago | (#42376291)

On a more general note: I never move important data. What I do is: I copy data from old HDD to new HDD and then use KLS Backup to set up incremental back-up. I still use old HDD until it fails. When that happens, the old HDD is taken out of the system, the "new" HDD becomes the "old" HDD and a brand new HDD becomes... yes, you guessed it: new HDD :)

Unimportant data never gets backed up (e.g. installed games or large ISOs I keep for some reason, music, uncompressed video captures, etc). It goes straight to the new HDD because that's usually larger than the old one.

Re:Heh (-1)

Anonymous Coward | about a year and a half ago | (#42376321)

I bet you $100 that you are a Windows user. I also bet $100, that you never in your life looked at a log file. And I’ll even bet $50 that you never cleaned a computer on the inside.

On the system of a real computer user, every disk has a line like this in /etc/smartd.conf: /dev/disk/by-id/ata-COMPANY_MODEL_SERIAL -a -d sat -n never -m root@intranet.myhomenetwork -M diminishing -s (L/../../5/17)

So I get an e-mail on my home server's admin account whenever something goes bad, and the disk gets a full checkup every Friday at the end of the work day.

What’s so hand about this? Fire and forget. Doing that part to keeping your data safe doesn’t cost any relevant effort at all.

Same thing with backups. Backup script or tool, run via cron job or shutdown script, write to another medium, and pick how safe you want your data to be (e.g. by picking the number, type and storage location of the media).

Re:Heh (1, Informative)

Anonymous Coward | about a year and a half ago | (#42376393)

And I bet you're living in the past. Computer hardware is cheap, easily replaceable commodity parts these days. Why the fuck would I bother running worthless burn in tests when it's so easy and/or cheap to replace faulty parts? I don't care about the drive, just my data, which is always backed up with the most important stuff doubly redundant.

No (0)

Anonymous Coward | about a year and a half ago | (#42375765)

No.

lol (-1, Flamebait)

Anonymous Coward | about a year and a half ago | (#42375767)

Yes. I shove them up my ass and if I cum within 5 seconds I know they'll last forever.

No (0)

Anonymous Coward | about a year and a half ago | (#42375775)

No

SSDs (-1)

Anonymous Coward | about a year and a half ago | (#42375785)

With SSDs any extensive testing is going to take away life-time. Who cares about HDDs anymore these days?

Re:SSDs (5, Insightful)

roc97007 (608802) | about a year and a half ago | (#42375799)

> Who cares about HDDs anymore these days?

Anyone with a need for a massive amount of storage space.

Re:SSDs (0, Insightful)

Anonymous Coward | about a year and a half ago | (#42375865)

The massive storage requirements cause massive backup time, making a RAID setup of some kind necessary. At which point a dying disk now and then no longer is an issue.

Re:SSDs (3, Insightful)

White Flame (1074973) | about a year and a half ago | (#42375947)

Not really. People usually don't modify gigantic footprints of data per day, so standard incremental backup strategies are still very applicable. Most of the large data tends to be read-only over time, typically media, archives, large installation files, etc.

Re:SSDs (3, Insightful)

aaarrrgggh (9205) | about a year and a half ago | (#42376127)

Rebuild time. It takes our hardware raids about 24 hours to rebuild, and software raids about 72 hours. If the disk failure isn't detected immediately, even with RAID-6 you are pushing your luck.

RAID is not backup.

Re:SSDs (0)

Anonymous Coward | about a year and a half ago | (#42376243)

If you prioritize rebuild over user I/O that drops to <8h for a 20*3T SW raid60.

Re:SSDs (1)

guruevi (827432) | about a year and a half ago | (#42376387)

HW RAID and SW RAID have been on par in performance for at least a decade. SW RAID these days is actually exceeding HW RAID performance because of the large difference in performance and calculation capabilities of the CPU (especially with data checksumming and compression).

Re:SSDs (1)

stuporglue (1167677) | about a year and a half ago | (#42376443)

I just bought a new ThinkPad which had several SSD options. I chose the slower 1 terabyte disk instead. I'd rather have everything I need with me, even if it is a little slower.

As for backups, I have a daily cron job which rsyncs between my laptop and my home server.

When I have massive changes I make sure I'm hooked up to the wired home network, otherwise it just goes on over wifi.

Re:SSDs (3, Informative)

cpghost (719344) | about a year and a half ago | (#42375977)

Who cares about HDDs anymore these days?

We do here at work. We need some modest 120+ TB of storage right now, and 30% of that content is highly dynamic (PostgreSQL databases). Anything but data center quality HDD would be silly, not to mention unreliable as hell and heavily expensive. SSDs are just for laptops or so, not for real data storage requirements.

Re:SSDs (1)

jmichaelg (148257) | about a year and a half ago | (#42376209)

> Anything but data center quality HDD would be silly, not to mention unreliable as hell and heavily expensive.

Guess Google is silly then using the cheapest possible hard drives and accommodating the inevitable failures.

Re:SSDs (0)

Anonymous Coward | about a year and a half ago | (#42376273)

Google isn't running PostgresQL databases on those expendable machines

Re:SSDs (1)

PlusFiveTroll (754249) | about a year and a half ago | (#42376249)

> SSDs are just for laptops or so, not for real data storage requirements

Yep, just for laptops

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-910-series.html [intel.com]

http://www.equallogic.com/products/default.aspx?id=10857 [equallogic.com]

SSD isn't great for bulk data storage, but where you need high IOPS a few SSDs in arrays replace a truckload of drives.

Re:SSDs (1)

war4peace (1628283) | about a year and a half ago | (#42376335)

Nope. SSDs are reliable enough to be used in server-grade implementations. The only issue with them is that they're highly specialized. If your regular HDDs become the bottleneck, you will need SSDs. Also, if you have some small implementations where you need fast access to read/write/modify data (some MMOs come to mind) and need to protect it against a power failure or RAM going awry, you should use SSDs.

Re:SSDs (4, Interesting)

cpghost (719344) | about a year and a half ago | (#42376411)

Actually, the only use for SSDs currently are ZILs (ZFS intent logs) and we're evaluating whether we put PostgreSQL transaction logs on an SSD, but that's another story. Our main storage farm is still HDD-based.

Re:SSDs (1)

Desler (1608317) | about a year and a half ago | (#42376021)

People who need reliable, long-term storage care about HDDs. Just like how people still used tape drives even when CDs and DVDs came along.

Re:SSDs (3, Insightful)

PlusFiveTroll (754249) | about a year and a half ago | (#42376255)

Depending on your definition of reliable and long term, people still use tapes.

Re:SSDs (0, Offtopic)

Anonymous Coward | about a year and a half ago | (#42376129)

Let's see. I have three drives connected to this laptop. The internal, which is 1TB, an external that is 3TB and another external that is 4TB.

Let me know when I can buy an SSD for $100 that matches the size of any of those.

dban followed by smartctl (3, Interesting)

X0563511 (793323) | about a year and a half ago | (#42375807)

If dban can write out every sector and not have smartctl show any pending sectors after the fact (and the average speed of the dban wipe was normal) then you've got good chances the drive will be fine.

Re:dban followed by smartctl (0)

Anonymous Coward | about a year and a half ago | (#42375907)

While I don't know what dban is, the idea of writing every sector and then using smartctl seems to be a good one. I personally usually just do a long format, which works in either windows or linux and, oddly enough has identified a couple of drives early as they didn't survive the long format. Of course, last time I replaced a failed drive in a raid 6 set I just replaced it, counting on the redundancy left in the array to last long enough to make sure the drive was good. Also, the array repair automatically writes every sector...

As far as time goes, well its not as if you have to pay attention to the testing, so its not really a delay unless you need that drive online right then...

Re:dban followed by smartctl (0)

Anonymous Coward | about a year and a half ago | (#42376231)

>> While I don't know what dban is,
Really? I just found this to be kind of odd in a comment on ./

Re:dban followed by smartctl (5, Interesting)

bill_mcgonigle (4333) | about a year and a half ago | (#42375913)

Yes, this. I do it online:

dd if=/dev/zero of=/dev/sdX bs=8M

and then check smartctl. If I'm making a really big zpool, I fill them up and let ZFS fail out the turkeys:

dd if=/dev/zero of=/tank/zeros.dd bs=8M
zpool scrub tank

If I'm building a 30-drive storage server for a client I'll often see 1-2 fail out. Better to catch them now then when they're deployed (especially with the crap warranties on spinning rust these days). I need to order in staggered lots anyway, so having 10% overhead helps keep things moving along.

Re:dban followed by smartctl (1)

mathew7 (863867) | about a year and a half ago | (#42376133)

I actually do read-write-read tests (dd_rescue because it can keep going after an error), with smartctl in between each and judge by myself what changed. I did it so often that I bothered to make a bash script to do everything and run overnight (I think it takes 3 hours for each step on my 2TB drives).
The idea for the 1st read is to update the "pending" list (if it needed), followed by writing to rectify it. A final read to see everything is ok (no pending increase).
Although I has several HDDs, I only had 1 pending on a new HDD, and that was on a 2TB 4KB/sector WD green drive when I played with the 63-sectors compatibility jumper. I still have that after 1 year.

Yes, it's happened. (-1, Flamebait)

Anonymous Coward | about a year and a half ago | (#42375811)

I've had one unfortunate incident where a vendor has refused to RMA a new drive, which had failed, as I'd apparently voided the warranty by installing BSD on it.

According to them, their hard drives were guaranteed only if the end user was running windows.

Re:Yes, it's happened. (1)

zidium (2550286) | about a year and a half ago | (#42375827)

And you don't even tell us the vendor?!

Re:Yes, it's happened. (2)

ArchieBunker (132337) | about a year and a half ago | (#42375933)

Sounds like a really old troll.

Yes. I mean no. (0)

JustinFreid (1723716) | about a year and a half ago | (#42375815)

I actually throw mine in a bathtub.
How did this make it to the front page, especially with SSD prices being what they are?

Re:Yes. I mean no. (0)

Anonymous Coward | about a year and a half ago | (#42375925)

i think you are misunderstanding the bathtub failure curve

Re:Yes. I mean no. (2, Funny)

Anonymous Coward | about a year and a half ago | (#42375953)

Let me guess,,, if it sank to the bottom it was a good drive, but if it floated it was a bad drive and needed to be burnt at the stake.

Re:Yes. I mean no. (1)

geminidomino (614729) | about a year and a half ago | (#42375971)

Of course! Fucking witches are getting into everything these days!

Re:Yes. I mean no. (1)

davester666 (731373) | about a year and a half ago | (#42376027)

Well, to be sure, if your HDD does float in water, it probably is possessed...

Re:Yes. I mean no. (0)

Anonymous Coward | about a year and a half ago | (#42376083)

It's nice that you have 800 dollars to spend on a TB worth of storage, but many of us need more than a couple TB, and don't want to spend upwards of 2 grand for a relatively small storage array.

Re:Yes. I mean no. (1)

nabsltd (1313397) | about a year and a half ago | (#42376109)

How did this make it to the front page, especially with SSD prices being what they are?

I have a 20TB RAID array that cost me about $0.10/GB, including controllers. If you can afford to build a 20TB array using SSD, you have far more money than I do. You will also need more controllers than I do (port multipliers divide the bandwidth, which you don't want to do for SSDs), since you'd need at least 20 SSDs (if you were willing to pay about $2.50/GB), but more likely more than 45 (at about $0.85/GB).

You also need special controllers that understand SSDs and can pass TRIM commands, and that will add about $0.15/GB. And, you'll need a much more expensive motherboard, since you need at least 24 PCIe lanes that can all be used for something other than video cards, but likely more than 40. Last, since this is Slashdot, you might not be able to use those special controllers, as not all of them have drivers for the kernel version you want to use.

So, yeah, for a boot drive, SSDs kick ass, but for storing your movie collection, not only are they 10 times more expensive than magnetic disks, but they are way overkill as far as performance is concerned.

Re:Yes. I mean no. (1)

PlusFiveTroll (754249) | about a year and a half ago | (#42376285)

>So, yeah, for a boot drive, SSDs kick ass, but for storing your movie collection, not only are they 10 times more expensive than magnetic disks, but they are way overkill as far as performance is concerned.

And where performance is concerned the raid of SSDs replaces many many more disks.

Use a sledgehammer to drive railroad spikes
Use a finishing hammer to drive finishing nails.

Re:Yes. I mean no. (0)

Anonymous Coward | about a year and a half ago | (#42376221)

>> How did this make it to the front page, especially with SSD prices being what they are?
There are lots of widely different needs when it comes to consumer vs corporate/small business, and even hobbyists who deal with large amounts of data. If SSDs meet all your needs, that's great but you're missing the point about why the OP asked about conventional hard drives.

Normally I don't test them... (1)

Anonymous Coward | about a year and a half ago | (#42375821)

..but I did not have harddrive fail catastrophically on me.

I do test flashcards, and their survival rate is about 50% :-(. (tar czvf /dev/sdb ..., and another flashcard dead...)

Used to never test (2)

AK Marc (707885) | about a year and a half ago | (#42375829)

My first help desk job included every computer in the company. We had a server drive fail, so I had Compaq send a replacement. The new arrival didn't work. So then I spent more time looking at RAID configuration and such, but we got a second replacement. That one didn't work either. But I tested it on arrival. The third replacement worked fine, just when I was worried it was something stupid I was missing. Two DOA RMAs for the same part. And yes, that's happened to me again since that first time.

I test every "used" part as if it's suspect. The question was about new, but they are still new to me.

Re:Used to never test (1)

Hentes (2461350) | about a year and a half ago | (#42376277)

Not to mention that some shadier shops tend to resell used or returned parts as new.

Re:Used to never test (3, Interesting)

PlusFiveTroll (754249) | about a year and a half ago | (#42376361)

Two DOA of the same part isn't out of the question, a good amount of the time the same part number is from the same batch, which may suffer from the same manufacturing defects. I see things like that pretty often in batches of disks that fall out of RAIDs.

Anyone actually does this? (1)

tantrum (261762) | about a year and a half ago | (#42375833)

I havn't even considered testing my personal harddrives. If they break I try to retrieve whatever is on them, but I just buy new drives instead of spending any amount of time fixing them, never returned a disk - I just buy a couple of new ones whenever I need more space.

At work we're using properly configured SANs with 24x7 support, so I couldn't be arsed to test disks there either. We don't have multiple racks of disks, so I don't see any good reason to test everything.

If you're testing new diskdrives you must be really bored or very broke.

Re:Anyone actually does this? (1)

darkHanzz (2579493) | about a year and a half ago | (#42375881)

It used to be that stress-testing HD's with random disk access for one day could flush out a lot of bad ones. The ones that did survive tended to last many years. It's a tricky thing with RAID drives. If you happen to have bought a 'bad' batch, chances that more than one will fail before you replace one are pretty high. So testing makes sense sometime. A while ago, google published some research to show that drives do not fail randomly, but in clusters. Making RAID a bit more susceptible to data loss than one might expect.

Re:Anyone actually does this? (0)

Anonymous Coward | about a year and a half ago | (#42376377)

If you can retrieve whatever is on them, they are not broken!

In other words: If they were already broken, YOU DID NOT RETRIEVE EVERYTHING ON THEM. YOU LOST DATA.

If you are OK with losing data, I may call you completely insane, but in the end it's your thing. But don't expect anyone else to be as crazy as you are.

smartmontools (5, Informative)

WD (96061) | about a year and a half ago | (#42375851)

Set up the smartd.conf file to do the example short-test daily and long-test weekly, and email you when something is fishy. It's a trivial amount of effort, resulting in a significant amount of peace of mind. (In many cases, you'll have some amount of warning before your drive kicks the bucket and it's too late)

Re:smartmontools (5, Funny)

Deekin_Scalesinger (755062) | about a year and a half ago | (#42375969)

This should be modded up for your username alone lol

Re:smartmontools (1)

Gaygirlie (1657131) | about a year and a half ago | (#42376379)

Set up the smartd.conf file to do the example short-test daily and long-test weekly, and email you when something is fishy. It's a trivial amount of effort, resulting in a significant amount of peace of mind. (In many cases, you'll have some amount of warning before your drive kicks the bucket and it's too late)

This is the setup I've used on my server for a while now.

I see all these defrags, fscks and such inferior when compared to S.M.A.R.T. self-tests simply because the drive itself will always know more about its condition than any 3rd-party tools that just try to guess its state via secondary effects, and as such it sometimes baffles me how few people even in this day and age ignores S.M.A.R.T. I recommend smartmontools and smartd under Linux and Hard Disk Sentinel under Windows, though HD Sentinel ain't free.

Exercise the drive (1)

roc97007 (608802) | about a year and a half ago | (#42375855)

Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.

But having a returned drive rejected because I repartitioned it or "ran linux"? Never heard of that.

Re:Exercise the drive (1)

whoever57 (658626) | about a year and a half ago | (#42376033)

Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.

Rather ineffective tests.

Use smartctl and schedule long tests. Also try something like:
dd if=/dev/sda of=/dev/sda bs=64k

Yes, always (0)

Anonymous Coward | about a year and a half ago | (#42375859)

Yes, after four drives failed in their first 48hour of use. Not very nice situation.

man this is sad (0)

Anonymous Coward | about a year and a half ago | (#42375867)

if you want to develop a testing protocol for hard drives, there are lots of papers.

a common technique is to stress the drive over its rated capacity (i.e. thermal) in order to develop a curve,
so you dont have to wait the years required for it to fail under normal conditions.

if instead you care about operations, you use some mitigation strategy like raid.

dealing with drive failure is bread and butter for this crowd. why does such a lame, useless question end up on slashdot?

betteridge's law of headlines (1)

whoever57 (658626) | about a year and a half ago | (#42375869)

betteridge's law of headlines applies here. Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist. As for problems with RMAs for hard drives used under Linux, repartitioned, etc. No.

Did ketchup lead to the extinction of dinosaurs? (1)

Dogtanian (588974) | about a year and a half ago | (#42376275)

betteridge's law of headlines applies here.

No, it doesn't. This is an actual, legitimate question.

As I correctly predicted earlier this year [slashdot.org] , lots of Slashdotters have seized upon Betteridge as the latest fad kneejerk response, and are misapplying it without understanding what it means. In his own words, [wikipedia.org] Betteridge's Law applies to cases where journalists "know the story is probably bollocks, and don’t actually have the sources and facts to back it up, but still want to run it."

For example, without the evidence to back it up, a headline saying "Tomato ketchup caused AIDS that led to exitinction of dinosaurs" would be obvious crap and lead to criticism of the paper and/or journalist. OTOH, "Did Tomato ketchup cause AIDS that led to the extinction of the dinosaurs?" gives them the weasellish get-out of "Well, we didn't actually *claim* that it did".

Even then, if a question headline was a genuine attempt to present a plausibly-supported but not universally-accepted idea (possibly because it was new and/or divisive), then Betteridge's wouldn't apply.

In short, Betteridge's original observation was insightful where he claimed it applied, but it was never a blanket dismissal of question headlines, so please stop the tedious, kneejerk misapplication.

Re:betteridge's law of headlines (0)

Anonymous Coward | about a year and a half ago | (#42376369)

Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist.

Seagate constellations with a consistent 3-5% DOA disagree.

never had early failure (1)

iggymanz (596061) | about a year and a half ago | (#42375871)

manufacturers do a burn-in before shipping, that gets most the early failures. of course, some will still win the lottery and get a crappy early-failure drive but has never happened to me.

Every time (1)

Anonymous Coward | about a year and a half ago | (#42375873)

Every single platter HD I get I scan for bad sectors. I got sick and tired of returning faulty WD black drives to different suppliers because of huge bad sector counts. Sine I have been testing I have returned about 5 drives due to sector issues. I don't run any tests on SSD's

Lifetime of bathtubs (2)

cvtan (752695) | about a year and a half ago | (#42375875)

Old bathtubs lasted longer than old hard drives. Now it's the other way around.

Tools (0)

Anonymous Coward | about a year and a half ago | (#42375879)

I run badblocks -sw /dev/sdX on both new and old disks before use.

This does a 4 pass read/write scan of the disk and reports on errors it finds.

Yet to come across a new disk with issues, but it has saved my bacon sevral times on old disks that are about to die.

I also run smartctl -t long /dev/sdX and then smartctl -a /dev/sdX to read the results.

RAID them and you're OK (0)

Anonymous Coward | about a year and a half ago | (#42375895)

Buy more than one. Then you're OK if one is bad. Though some testing does seem like a decent idea.

Yes! Especially before adding them to an array. (5, Interesting)

Anonymous Coward | about a year and a half ago | (#42375901)

I run some ZFS systems at work. With the current version of the filesystem, you can expand the zpools but you can't shrink them, so adding a bad drive causes immediate problems.

I've found that some drives are completely functional but write at extremely slow rates: maybe 10% of normal. With typical consumer drives, maybe 1/20 is like this. To ensure I don't put a slow drive into a production zpool array of disks, I always make a small test zpool consisting of just the new batch of drives and stress-test them.

This catches not only obviously bad drives, but also the slow or otherwise odd ones.

Re:Yes! Especially before adding them to an array. (1)

mrmeval (662166) | about a year and a half ago | (#42376137)

How have you been treated when returning them? I'd like to know what brands and what vendor. I'm always looking for success stories especially on commodity hardware. Thanks.

Re:Yes! Especially before adding them to an array. (0)

Anonymous Coward | about a year and a half ago | (#42376365)

No problems at all, from either WD or Seagate for RMA. That surprised me a bit, since in some cases I'm asking them to take back a drive that works, it just works -slow-... in their place, I would be skeptical.

I have had nothing but good luck with commodity hardware: my personal theory is that poor shipping and handling practices are responsible for most of commodity hard drive failure. I always buy them with overnight or two-day shipping (to a business address, never thrown on a doorstep), and they always just work.

Murphy's Law of Testing (2)

White Flame (1074973) | about a year and a half ago | (#42375957)

Trying to coax an error will never reveal one. Only when you start using it "for real" will the problem manifest.

Do you test third party software components? (1)

thePowerOfGrayskull (905905) | about a year and a half ago | (#42375983)

Do you perform extensive functional tests against third party software libraries before including them in your system? In most situations, no -- if it's established and proven. You trust that it does what it advertises, and only when it doesn't do you dig further.

Same goes for hard drives.

Re:Do you test third party software components? (1)

PlusFiveTroll (754249) | about a year and a half ago | (#42376419)

Wat? [destroyallsoftware.com] Do you download your software over UDP without any error checking or means of correction? Do your dll's and exe's not verify their size and signature? I tend to verify my packets, files, and packages [youtube.com] .

format and secure erase (1)

danlip (737336) | about a year and a half ago | (#42376001)

I always do a format and a secure erase (one pass of zeros). In addition to finding bad sectors I want to be sure to get rid of any trace of whatever crap they put on it at the factory (viruses, kiddie porn, crapware, etc).

This isn't real estate (0)

Anonymous Coward | about a year and a half ago | (#42376007)

Real estate is a scam business filled with thieves, liars, hypocrites and leeches. You need to inspect before you buy, after you buy, you need insurance, lawyers, and notaries just to be sure that in 2012 a roof doesn't leak water. Real estate is filled with people who couldn't do any better in life.

Hard drives are built by engineers and technicians with a built-in sense of ethics. There's a whole lot less to worry about.

Testing SSDs? (0)

Anonymous Coward | about a year and a half ago | (#42376023)

Does anyone have experience or a good protocol for stress testing SSDs while minimizing wearout?

Or should I even bother?

Yes. And Everything Else. (1)

zenlessyank (748553) | about a year and a half ago | (#42376025)

I do burn-ins on ALL rigs i build for clients. Hard drives are just one of many things that get tested. As others have stated, it sucks to have to listen to clients complain when their NEW rig or hard drive is malfunctioning. It may cost the client an extra day to get the new rig, but I don't have to eat crow!

Badblocks/Shred (1)

SealBeater (143912) | about a year and a half ago | (#42376029)

badblocks -t random /dev/sdX && shred /dev/sdX

Badblocks checks for bad sectors while writting random data to the drive and after all is good, I run shred once or twice to fill the drive with random data. You can probably get by with just badblocks tho.

Yep! (0)

Anonymous Coward | about a year and a half ago | (#42376063)

I always do (or at least when I get some time) and I've found a bad sector a couple of times now. The supplier has always sent me a new one straight out, and I was glad I found out sooner rather than later. For me at least, it seems worth it, and if you run a check overnight and get the computer to shut down after it's done, you're not losing much.

No - I just assume they will fail (1)

turkeyfeathers (843622) | about a year and a half ago | (#42376081)

I buy hard drives in pairs, using one for live data and one kept offline until it's time to back up the live drive (I use Unison sync to quickly determine what's changed between the two drives). My boot drive gets backed up every night with Macrium Reflect. The secret to a happy life: assume that every drive will fail tomorrow and keep everything backed up.

SMART + badblocks (5, Interesting)

SuperBanana (662181) | about a year and a half ago | (#42376087)

I run smartctl and capture the registers, then run badblocks, and compare smartctl's output to the pre-bad-blocks check.

If there are any remapped blocks, the drive goes back, as the factory should have remapped the initial defects already, and that means new failed blocks in the first few hours of operation.

Re:SMART + badblocks (1)

sribe (304414) | about a year and a half ago | (#42376185)

Great idea, thanks. I always test new drives, but this one had not occurred to me.

Re:SMART + badblocks (0)

Anonymous Coward | about a year and a half ago | (#42376279)

I do:
$DRIVE="/dev/sdb"
smartctl --xall $DRIVE > sm_data_0.txt; smartctl -t short $DRIVE; smartctl -- xall $DRIVE > sm_data_1.txt ; badblocks -vws $DRIVE; smartctl --xall $DRIVE > sm_data_2.txt
In addition, if I copy a significant amount of data to the drive I also recheck (and save) the smartctl ouput.

For larger than 1TB drives am am seeing lots of raw read errors on new drives, so I have to agree with other observations I've read about the larger than 1TB drive technology -- its not there yet ....

Google Whitepaper Answers Your Questions (1)

idealego (32141) | about a year and a half ago | (#42376091)

This answers most of your questions and does so using data based on a large dataset.
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/disk_failures.pdf [googleusercontent.com]

If you are concerned about reliability I suggest using an Intel SSD. Their failure rate is very low.

I saw linear failure rates. (1)

AlecC (512609) | about a year and a half ago | (#42376169)

I used to work for a manufacturer of video raid arrays. While I was writing software, not on hardware QA, I saw a lot of drives go past. I saw no sign of high early failures, bathtub style. It seemed to me essentially random. The only tip I would have would be to monitor your bad block count. Most drives only showed one or two "grown" as opposed to factory marked bad blocks. If the bad block list grows into the teens, swap that drive.

Quick Disk Test tool (0)

Anonymous Coward | about a year and a half ago | (#42376193)

Some time ago I wrote a Java app to perform a quick, non-scientific test on USB sticks and hard drives, it is here:

http://sourceforge.net/apps/mediawiki/filereadtest/index.php?title=Main_Page#Quick_Disk_Test_tool

Reallocated Sector Count = lost data? (0)

Anonymous Coward | about a year and a half ago | (#42376203)

If my SMART data is showing the following:

Reallocated Sectors Count = 15
Reallocation Event Count = 15
Current Pending Sector Count = 0
Uncorrectable Sector Count = 0

Could the reallocated sectors mean data was lost? I've seen conflicting information on whether reallocated sectors means data was lost. Are there any other SMART attributes I can look at to determine if data was lost on the drive?

Re:Reallocated Sector Count = lost data? (1)

Gaygirlie (1657131) | about a year and a half ago | (#42376429)

If my SMART data is showing the following:

Reallocated Sectors Count = 15
Reallocation Event Count = 15
Current Pending Sector Count = 0
Uncorrectable Sector Count = 0

Could the reallocated sectors mean data was lost? I've seen conflicting information on whether reallocated sectors means data was lost. Are there any other SMART attributes I can look at to determine if data was lost on the drive?

You would know if there was data that was lost. Normally the drive silently copies the data off of failing sectors to new sectors, reallocates the sector, and you don't notice anything. But if the sector is completely unreadable or returns incorrect CRC (that is, drive's internal CRC that is irrelevant of how the drive is formatted) then the drive will return an error to the operating system and you will be notified of it. The drive does not automatically reallocate such sectors as it will wait until the OS tries to write data to the broken sector before the drive reallocates it exactly for the reason that there wouldn't be silent corruption to files without users' knowledge. Case in point: the power supply on my server caught on fire and disrupted the other electrical components and on one of my drives there was a bunch of sectors with broken internal CRC -- nothing I could do about it, but atleast I was informed of what files I lost when I tried to read them. I proceeded to delete the files in question and wrote random data to the affected sectors after which the reallocated sectors count was increased.

hundreds of drives... (1)

spywhere (824072) | about a year and a half ago | (#42376207)

...bought and installed in desktops & laptops over the last decade, and what I've learned is to buy Seagate drives. I have seen way fewer defects and first-year failures on Seagate than WD, and I was happy to see Maxtor go away.

Re:hundreds of drives... (0)

Anonymous Coward | about a year and a half ago | (#42376433)

I have never seen a Maxtor drive take a complete dump. They were probably the most reliable drives to be had for a good while. I miss that company.
Seagate has pulled themselves together and made something very good in recent times, but there was a time not so long ago that they were the recycled Korean plastic of hard disks.

Disk Utility (1)

hackertourist (2202674) | about a year and a half ago | (#42376283)

When installing a new disk in a Mac, I run Disk Utility with the Secure Erase option enabled. This will write 7 or 30 passes of 0000 to every block, that should find any early problems...

Do You Test Your New Hard Drives? (1)

OneWordReply (2702891) | about a year and a half ago | (#42376287)

Never.

My testing methodology (2)

dpidcoe (2606549) | about a year and a half ago | (#42376293)

I thoroughly test any new hdd I get for my desktop PC:

The first thing I do is format it and install windows. If that works, then we know the drive isn't DOA
From there I torture test it by copying several hundred gigabytes of software and movies, as well as installing some more programs.
After that, I let it run for a few months, using it normally. If it crashes during that time, then I know it was bad.

Manufacturer Tools (1)

SrLnclt (870345) | about a year and a half ago | (#42376319)

Recently picked up a couple 3TB Seagate drives and a Synology box for a new NAS at home. Since I was planning to move all my music, pictures, video, and general documents to the new box, I decided to download the manufacturer HDD tools and scan the drives first just in case. I think Seagate's is called SeaTools, I'm sure WD has a program as well. No errors reported on either drive, and no errors so far with the RAID array after a couple months of use.

Because you ran linux? (1)

damn_registrars (1103043) | about a year and a half ago | (#42376325)

Well, the last drive I returned to a manufacturer was one that I was running FreeBSD on and they didn't seem to care. Granted, the experience with the manufacturer (Seagate) was less-than-pleasant but that had nothing to do with my choice of OS which I don't think they ever asked.

I now buy only Western Digital.

Spinrite (0)

Anonymous Coward | about a year and a half ago | (#42376331)

When I purchase any computer, I always do a spinrite cycle on it.

Burn In Testing for New Gear (1)

hackus (159037) | about a year and a half ago | (#42376343)

This is part of a process for testing new server gear.

Since I use Fedora, currently at 17, burn in testing is important.

Quick tip: Most of the distro's currently do not detect SSD drives during the install and do not include the "discard" keyword in the fstab entries for the device.

If you do use a Modern Distro, make sure that if you install or use a SSD with it, to mount the device with kernel flag for TRIM support set.

For example:

UUID=xxxxxxxxxxxxxxxxx /mnt/ssd2 ext4 discard,defaults 1 2

Where xxx..is your UUID label you made for the device and discard indicates enable TRIM support.

Burn in process for equipment for hard disks usually involved write a file the entire size of the disk, reading it, random seeking it, then deleting it.

I also use a customer script to drive sysbench with some common fileio tasks.

This is important for disks as it can reveal differences in firmware or firmware between SSD's used in arrays. For example a customer of mine had a really bad performing raid array and it was due to the mixing and matching of firmware between drives. (It worked well for a while, but then went bad when one of the drives in the RAID 5 array died, and he replaced it with a new one with different firmware.)

-Hack

Plug it in (2)

mbone (558574) | about a year and a half ago | (#42376427)

Testing is simple - plug it in, and run it till it fails. Might as well use it in the mean-time.

Always (0)

Anonymous Coward | about a year and a half ago | (#42376435)

I write /dev/zero to ALL blocks and I check SMART statistics and /var/log/messages for any timeout/IO issues or for defective blocks. Sometimes it's not a defective drive, but a bad cable.
I've also found FreeBSD to be a lot more informative and restrictive when something isn't working 100%. I belive this is because of the GEOM framework. Linux can be more forgiving, which could cause a minor problem becoming a major problem, because you didn't know about the minor problem before it became a major problem.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>