Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Lies Disks and Their Drivers Tell

Soulskill posted about 2 years ago | from the designed-at-odds dept.

Data Storage 192

davecb writes "Pity the poor filesystem designer: they just want to know when their data is safe, but the disks and drivers try so hard to make I/O 'easy' that it ends up being stupidly hard. Marshall Kirk McKusick writes about the difficulties in making the systems work nicely together: 'In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ.'"

cancel ×

192 comments

almost clicked the link... (5, Funny)

adturner (6453) | about 2 years ago | (#41265333)

But you lost me the moment you mentioned ATA drives.

Re:almost clicked the link... (1)

Anonymous Coward | about 2 years ago | (#41265469)

it's a bit difficult to parse, but the way I interpret TFA, the problem also applies to SATA drives which do not implement the NCQ specification.

Re:almost clicked the link... (5, Insightful)

Lunix Nutcase (1092239) | about 2 years ago | (#41265515)

And yet fails to name any. Looking at Seagates site about NCQ pretty much every consumer model since 2004 has NCQ. This seems overblown.

Re:almost clicked the link... (3, Interesting)

h4rr4r (612664) | about 2 years ago | (#41265543)

I still bet those drives if you pull power on them will lose the data in their onboard caches.

Which means they are lying about fsync.

Re:almost clicked the link... (2, Insightful)

Lunix Nutcase (1092239) | about 2 years ago | (#41265631)

As weighty of an argument as your bet might seem to you, I'd refer actual evidence.

Re:almost clicked the link... (1)

h4rr4r (612664) | about 2 years ago | (#41265723)

Try it.

There are some decent tools out there to test it.

Re:almost clicked the link... (2)

MikeBabcock (65886) | about 2 years ago | (#41265791)

If so, the article should link a proper study or basic attempt at surveying drives and how well they survive such behaviour instead of surmising.

Re:almost clicked the link... (2)

Lunix Nutcase (1092239) | about 2 years ago | (#41265885)

Ok. All my drives, which range in age of at least 4-5 years, support it and they are all the same models that Seagate lists support for. So once again, this sounds like overinflated sensationalism. If it was really such a problem he could have listed a few models to support his claim instead of nebulous handwaving, no?

Re:almost clicked the link... (0)

h4rr4r (612664) | about 2 years ago | (#41265923)

They claim to support it.
Have you tested them?

http://brad.livejournal.com/2116715.html [livejournal.com]

Re:almost clicked the link... (1)

Lunix Nutcase (1092239) | about 2 years ago | (#41265995)

Yeah, it's all a Seagate conspiracy to lie to me. Sorry, but A 7-year-old LJ post hardly has much weight considering NCQ didn't become common in consumer drives until late 2005/early 2006.

Re:almost clicked the link... (5, Informative)

Eponymous Hero (2090636) | about 2 years ago | (#41266803)

you didn't bother to RTFA, good for you. it says quite plainly that (only part of) the problem is not drives that don't support ncq, but those drives that have it and disable it. and that was a relatively small portion of TFA. here's how the disks lie:

File systems need to be aware of the change to the underlying media and ensure that they adapt by always writing in multiples of the larger sector size. Historically, file systems were organized to store files smaller than 512 bytes in a single sector. With the change in disk technology, most file systems have avoided the slowdown of 512-byte writes by making 4,096 bytes the smallest allocation size. Thus, a file smaller than 512 bytes is now placed in a 4,096-byte block. The result of this change is that it takes up to eight times as much space to store a file system with predominantly small files. Since the average file size has been growing over the years, for a typical file system the switch to making 4,096 bytes the minimum allocation size has resulted in a 10- to 15-percent increase in required storage.

just to clarify what the author's point was:

The conclusion is that file systems need to be aware of the disk technology on which they are running to ensure that they can reliably deliver the semantics that they have promised. Users need to be aware of the constraints that different disk technology places on file systems and select a technology that will not result in poor performance for the type of file-system workload they will be using. Perhaps going forward they should just eschew those lying disks and switch to using flash-memory technology—unless, of course, the flash storage starts using the same cost-cutting tricks.

if you want to argue that, great, go nuts. nobody who actually RTFA thinks the argument is really about ncq. the ac you responded to said

the way I interpret TFA, the problem also applies to SATA drives which do not implement the NCQ specification.

well, here's what TFA actually said:

Luckily, SATA (serial ATA) has a new definition called NCQ (Native Command Queueing) that has a bit in the write command that tells the drive if it should report completion when media has been written or when cache has been hit. If the driver correctly sets this bit, then the disk will display the correct behavior.

In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ

i hope it's painfully obvious by now that the point about ncq is not that some drives don't have it; it's that some don't use it -- mostly so you don't go giving their drives bad reviews for being slow but unnoticeably reliable. if it's disabled, you can enable it. what sata drives don't have ncq? i asked wikipedia:

SATA revision 1.0 (SATA 1.5 Gbit/s) .... During the initial period after SATA 1.5 Gbit/s finalization, adapter and drive manufacturers used a "bridge chip" to convert existing PATA designs for use with the SATA interface. Bridged drives have a SATA connector, may include either or both kinds of power connectors, and, in general, perform identically to their PATA equivalents. Most lack support for some SATA-specific features such as NCQ. Native SATA products quickly eclipsed bridged products with the introduction of the second generation of SATA drives.

so yeah, probably not a whole lot of these drives being sold new, but there are lots of shops that buy used gear because it's cheap. these older sata drives haven't all just disappeared when revision 2.0 came out.

Re:almost clicked the link... (1)

FranTaylor (164577) | about 2 years ago | (#41266069)

They SAY they support it

How can you really tell? It's well established that drives lie about their capabilities.

Re:almost clicked the link... (1)

Lunix Nutcase (1092239) | about 2 years ago | (#41266141)

Because you have no evidence showing my drives don't whereas Seagate lying would be fraud? Prove the assertion rather than merely repeating it.

Re:almost clicked the link... (1)

fuzzytv (2108482) | about 2 years ago | (#41266323)

That is not the point - you'll loose data whenever there's a cache without a battery backup involved. The problem is that with some drives (good ones) you'll get at least a consistent filesystem (or easy to fix thanks to the journal), because the operations may be ordered somehow. The bad drives don't respect the ordering, making the corruption much more serious and potentially unfixable.

Re:almost clicked the link... (5, Insightful)

hoggoth (414195) | about 2 years ago | (#41266943)

LOSE LOSE LOSE LOSE! YOU WILL LOSE DATA!

Sorry... I'm usually a calm rational person. I almost never become a grammar-nazi, spelling nazi, or troll. It's just that I see this so often I'm afraid one day Webster will just give up and switch the definitions of Lose and Loose.

lose the kraken! (1)

Anonymous Coward | about 2 years ago | (#41267247)

hey GN, don't loose your cool when you see someone play lose with grammer.

Re:almost clicked the link... (0)

Anonymous Coward | about 2 years ago | (#41267547)

Actually, I think he meant that your data will physically escape the confines of the drive and run rampant around the house. I loosed some data once, and I regret it to this day.

Re:almost clicked the link... (0)

Anonymous Coward | about 2 years ago | (#41265959)

Do you realize to Kirk McKusick is? If he says it is broken, you damn well know it is broken

Re:almost clicked the link... (2)

Lunix Nutcase (1092239) | about 2 years ago | (#41266031)

Falllacious appeal to authority. I know who he is yet if it was as common as he claims he could do better than nebulous handwaving.

Re:almost clicked the link... (2)

ak3ldama (554026) | about 2 years ago | (#41267491)

Given that we are talking about Kirk McKusick an appeal to authority is entirely fair. Just because he didn't have a bunch of citations or references listed at the bottom of the article does not mean they do not exist somewhere. For you to say it is a "fallacious" appeal to authority is unfair - it has not been proven as fallacious. (You assert it to be fallacious due to a lack of reference... the culture created by Wikipedia and all the "[Citation Needed]" slackers never fails to impress me.) Surely there exists blacklists in source in Linux/FreeBSD/other publicly viewable code, I also will not hold your hand [slashdot.org] and show you where.

I have personally seen these kinds of issues (with writes not happening soon enough and fsync calls introduced for data integrity) with flash media which is something mentioned in the beginning of article. I would like to further comment that the article talked about other things such as sector size side effects and the impact on useful space. ++Great article. Does anyone else remember how he (Kirk McK.) used to sell shirts and pc stickers? I still have the bsd daemon logo sticker on the case of my first pc.

Re:almost clicked the link... (4, Informative)

TheGratefulNet (143330) | about 2 years ago | (#41266285)

yeah, well, I have quite a bit of experience with samsung (not seagate branded but the older samsungs) drives.

they REPORTED having ncq but you always had to disable them.

I got so that I do this at bootup:

if [ -e /sys/block/sda/device/queue_depth ] ; then
      echo " sda NCQ now off"
      echo 1 > /sys/block/sda/device/queue_depth
fi

and so on.

performance does not suffer (that I would care about) BUT the data reliab was more than making up for it. no more timeouts, no more syslog 'scaries'.

vendors really do fuck up the protocol implementations. seagate is 'strange' in ways, so is WD, so is hitachi and ibm (I know they are not even in the biz anymore, at least for consumer drives).

windows has a 'blacklist' of what things to not use when talking to drives and so does linux. its a fact of life.

drive vendors are borderline idiots. sad but true ;(

Re:almost clicked the link... (2)

greg1104 (461138) | about 2 years ago | (#41266863)

Intel's early SSDs such as the Intel X25-E were the last time I really got screwed by SATA drives that screwed this up very badly. See the PostgreSQL page on Reliable Writes [postgresql.org] for a lot more details on this subject.

Re:almost clicked the link... (4, Informative)

anomaly256 (1243020) | about 2 years ago | (#41267385)

Green drives from Seagate do not appear to have NCQ. As per below, I have 1 normal and 4 greens in this box:

~$ cat /sys/block/sd?/device/queue_depth
31
1
1
1
1

~$ cat /sys/block/sd?/device/queue_type
simple
none
none
none
none

Re:almost clicked the link... (1)

anomaly256 (1243020) | about 2 years ago | (#41267401)

Btw, these are new drives, less than a year old. Manufactured November 2011

Re:almost clicked the link... (1)

anomaly256 (1243020) | about 2 years ago | (#41267463)

Further info if you want it:

~$ sudo hdparm -I /dev/sd[abcde] | egrep "(Native|Model)"
Model Number: ST2000DM001-9YN164
* Native Command Queueing (NCQ)
Model Number: ST2000DL003-9VT166
Model Number: ST2000DL003-9VT166
Model Number: ST2000DL003-9VT166
Model Number: ST2000DL003-9VT166

Re:almost clicked the link... (0)

Anonymous Coward | about 2 years ago | (#41267651)

Funny. Have the same disks (four) on my NAS, and don't have the same issue:

PESSOA> hdparm -I /dev/sd[abcde] | egrep "(Native|Model)"
                Model Number: ST2000DL003-9VT166
                      * Native Command Queueing (NCQ)
                Model Number: ST2000DL003-9VT166
                      * Native Command Queueing (NCQ)
                Model Number: ST2000DL003-9VT166
                      * Native Command Queueing (NCQ) /dev/sde: No such device or address
                Model Number: ST2000DL003-9VT166
                      * Native Command Queueing (NCQ)

Re:almost clicked the link... (1)

LordLimecat (1103839) | about 2 years ago | (#41265527)

Which is basically none of them. I would be astonished if anyone could link me a drive sold on newegg, amazon, or by Dell that does not implement NCQ when set to AHCI mode.

Re:almost clicked the link... (0)

Anonymous Coward | about 2 years ago | (#41266009)

That's the problem. They SAY they support NCQ, when in reality they do it improperly or incompletely.
The industry 's history is rich with examples of this sort of practice. The storage/hard disk industry in particular! Are you old enough to remember all of the issues with the first drives the supported the "Ultra DMA" ATA33 standard? There were whole lines of drives that would have inevitable corruption if you turned it on. Similar issues with some controller chips too.

Re:almost clicked the link... (1)

Lunix Nutcase (1092239) | about 2 years ago | (#41266051)

So it's claimed. Provide evidence by listing the models which do so rather than handwaving supposition.

Re:almost clicked the link... (-1)

Anonymous Coward | about 2 years ago | (#41266481)

This is what the fucking article is about you jackass. Jesus titty fucking christ. We're trying to warn you about shady practice in the industry. I could point to numerous places where I've seen "Disable NCQ to fix the issue" is the fix, clearly because of bad NCQ implementations on the part of device manufactures. You can read about it in Linux kernel patch notes. I'm not going going to hold your hand and list devices and firmware revisions because it's pointless. The idea you should come away with is "NCQ can't be trusted on consumer hard drives. It may or may not work properly. Disable by default"

Me: That object is hot, don't touch it.
You: So it's claimed. Provide evidence b - AHH GOD FUCK MY HAND IS ON FIRE

Re:almost clicked the link... (1)

jedidiah (1196) | about 2 years ago | (#41266661)

Read it? Just did. Nothing concrete in there, just vague scare mongering. I am likely to get more useful information from the peanut gallery here. I've already seen one guy with an actual real world example.

Re:almost clicked the link... (1)

AK Marc (707885) | about 2 years ago | (#41266505)

You are making the assertion they are wrong, yet are unwilling to support your position. Why do you only demand proof from the other side? The drive manufacturers have been shown to lie previously. So if you are claiming they aren't lying this time, perhaps the proof should be provided by you.

Re:almost clicked the link... (1)

LordLimecat (1103839) | about 2 years ago | (#41266829)

Except the OS, drive, and driver all claim that NCQ is working on every single SATA disk I have seen that has been set to AHCI. To say "yea but theyre still lying".... why shouldnt we ask for specifics? Are we expected to go out and test every extant drive to see whether it supports NCQ?

If the author had specifics that he tested and found to improperly implement NCQ, perhaps he should have included his data so that it could be verified. All he gave was a general overview of tagged queuing and NCQ, and then declared "but not everyone does it right". Thats so vague as to be worthless.

Re:almost clicked the link... (0)

Anonymous Coward | about 2 years ago | (#41267105)

Kingston 32G SSD causes hardware lockups and timeouts when queue depth exceeds about 4.

Re:almost clicked the link... (0)

Anonymous Coward | about 2 years ago | (#41267049)

Oh CMD646, someday I will cross the rainbow bridge and get those lost 1990s files back.

Re:almost clicked the link... (1)

FranTaylor (164577) | about 2 years ago | (#41266105)

how can you actually tell that it's implemented properly, or at all? What if it's all a big lie, just like other parts of the protocol?

Re:almost clicked the link... (3, Informative)

TheGratefulNet (143330) | about 2 years ago | (#41266355)

you'll see it in syslog!

timeouts, retries, even exiting the bus and doing full bus resets (which are slow and you'll NOT miss them).

as I posted before, older (5yr) samsungs were notorious for SAYING they support ncq but you would be foolish to let it just negotiate it and use it.

this was how things were in the very early days of 10/100 ethernet and full/half duplex. yes, the early models 'negotiated' duplex but many of them got it wrong and you'd have to manually set this on hubs/switches since you knew better than the equipment. there were even early NIC chips that worked better at 10meg ethernet than 100baseT! we would do ftp transfer tests and quite often a GOOD 10baseT was more reliable (over time) than 100baseT. the same happened to gig-e, too, in the early years.

Re:almost clicked the link... (1)

sexconker (1179573) | about 2 years ago | (#41266455)

Which is basically none of them. I would be astonished if anyone could link me a drive sold on newegg, amazon, or by Dell that does not implement NCQ when set to AHCI mode.

Nobody who cares uses AHCI. People who care go into the BIOS/UEFI and set their controller to RAID.
Once that's done, you have no clue what the drive is fucking doing and what the Intel / Nvidia RAID controller ROM is doing, and what the corresponding driver is doing.

NCQ? Caching? Hot-swapping? No one can reliably tell you wtf is going on.
You can yank a drive and the add a drive, but when shit explodes (and it will!) you'll be left wondering who's at fault. The drive? The BIOS/UEFI? The RAID controller? The driver? The OS?

If you trust a hard drive manufacturer on the face of things, I've got a wireless router to sell you. It has all these features for QoS and firewalling and it's secure and it gets 300 mbps and it's got the DLNA,I swear. See? It's on the box, it must be true. And I promise I'll keep updating the firmware and never push out a firmware update that removes features and prevents you from rolling back.

Re:almost clicked the link... (1)

LordLimecat (1103839) | about 2 years ago | (#41266887)

I've got a wireless router to sell you. It has all these features for QoS and firewalling and it's secure and it gets 300 mbps and it's got the DLNA,I swear.

You mean this thing? [newegg.com] (after installing dd-wrt)

On a serious note, exactly how is one supposed to purchase a drive if we cant trust anything on the product page? Just guess?

Re:almost clicked the link... (0)

Anonymous Coward | about 2 years ago | (#41265851)

The article was one T short of an AT-AT.

2 out of 3 (4, Insightful)

ardmhacha (192482) | about 2 years ago | (#41265345)

Cheap, fast and reliable.

Pick any two.

Re:2 out of 3 (1)

h4rr4r (612664) | about 2 years ago | (#41265435)

Only because of market segmentation. They sell the same drives as Enterprise Grade SATA with these NCQ turned on in firmware as they do to consumers with it turned off.

Even worse are the RAID controllers(looking at you DELL) that do not disable the cache on the drives when you tell them to disable the write cache. You think your data is safe, then you lose power and what should be an oops has you going to your backups and doing a rebuild or swapping over to a replicated box.

Re:2 out of 3 (1)

LordLimecat (1103839) | about 2 years ago | (#41265597)

They sell the same drives as Enterprise Grade SATA with these NCQ turned on in firmware as they do to consumers with it turned off.

What you get with an "enterprise" sata drive is higher MTBF and a firmware tweaked to work well with RAID (desktop drives try to be more forgiving for IO errors, while the enterprise drives are more quick to decide "ive failed, let the raid controller do its work").

Im not aware of any sata drive that doesnt support NCQ-- its certainly on every desktop drive ive used excepting MAYBE the very first sata drive I bought in 2003. Certainly all SSDs I am aware of (except niche super-low-end ones) and all mass-market desktop drives do.

Re:2 out of 3 (1)

FranTaylor (164577) | about 2 years ago | (#41266283)

Even worse are the RAID controllers(looking at you DELL)

You buy RAID controllers from DELL? You deserve what you get. Buying DELL server gear is like bringing a Schwinn Varsity to the Tour de France.

Re:2 out of 3 (0)

Anonymous Coward | about 2 years ago | (#41267357)

Aha, the corollary to "Nobody ever got fired for buying IBM"...

Re:2 out of 3 (1)

craigminah (1885846) | about 2 years ago | (#41265445)

I hear that in the project management realm...great quote and forces people to think about the interdependence of the three variables.

really? (1)

Anonymous Coward | about 2 years ago | (#41265381)

I haven't seen a drive in at least a couple of years that didn't support NCQ. Is this really an issue? It sounds blown out of proportion.

Re:really? (1)

AK Marc (707885) | about 2 years ago | (#41266557)

You haven't seen a drive that didn't claim NCQ support. But did you test all of them you've seen?

Performance degradation (1)

Jerry Smith (806480) | about 2 years ago | (#41265415)

One can't have ones cake and eat it. Speed or reliability, there should be more differentiation and more clarity in the specs. I want my backup-disk to be very reliable, I want my boot-disk to be fast. Best performance for both, but different circumstances.

Re:Performance degradation (1)

h4rr4r (612664) | about 2 years ago | (#41265523)

In that case boot should just be on an SSD, where these issues pretty much disappear anyway.

Sorry, what? (3, Insightful)

Compaqt (1758360) | about 2 years ago | (#41265449)

We're talking about ATA drives?

As in non-SATA drives?

Who has those anymore?

While the article is good for publication in an academic journal like ACM, it's useless for the real world.

For that, the author should tell us whether most drives on the market have NCQ already or not. Popular drives like WD Green and Seagate's various lines.

Otherwise, saying "$A is useless without $Y" is pointless.

Re:Sorry, what? (1)

spongman (182339) | about 2 years ago | (#41265743)

i'm guessing that since he's talking about 4K sectors, he means SATA since none of the PATA drives were large enough to warrant the switch from 512.

Re:Sorry, what? (2)

MikeBabcock (65886) | about 2 years ago | (#41265837)

An SATA drive is a subset of ATA drives. You're thinking of PATA or IDE drives.

http://en.wikipedia.org/wiki/Serial_ATA [wikipedia.org]

In other words, when someone says "ATA drives" they aren't exclusively talking about non-SATA drives.

Re:Sorry, what? (1)

Lunix Nutcase (1092239) | about 2 years ago | (#41265951)

Wrong. ATA is the original name of what was renamed to PATA once SATA was introduced. So if he is saying what you claim he is using the term incorrectly.

Re:Sorry, what? (1)

AK Marc (707885) | about 2 years ago | (#41266663)

So, if someone is going to New York, and someone corrects them to "New Amsterdam", is the corrector correct in that the area was once called something different, or is "New York" the correct term, as that's the current name and eliminates confusion?

ATA doesn't exist anymore. It's like saying you are going to New York by saying "I'm going to the USA". It might be technically correct, but entirely useless, especially if one is in California telling all his friends he's going to the USA. It's not only technically correct, but confusing, meaningless, and quite useless, just like saying ATA still means PATA. There are two ATAs now, SATA and PATA, and ATA means ATA, which SATA and PATA are both subsets of.

Re:Sorry, what? (1)

Compaqt (1758360) | about 2 years ago | (#41266861)

Regardless, the author's choice of terms plus lack of additional clarification totally muddled what he might have been trying to say.

Also, there's no context for what he's saying ("SATA without NCQ is bad"). It's like saying MySQL without foreign keys is bad, without mentioning the context that MySQL does have foreign keys these days.

Re:Sorry, what? (1)

AK Marc (707885) | about 2 years ago | (#41267551)

Perhaps more applicable if all builds of MySQL claimed to have foreign keys, but only some actually had them.

Re:Sorry, what? (1)

abirdman (557790) | about 2 years ago | (#41266111)

ATA came before SATA. One use I've found for ATA is to increase the number of drives supported on a motherboard. I use one as a boot disk for a FreeNAS box. The drive is basically read-only, so I don't expect write cache issues. ATA drives are very slow and noisy, and the reason that technology is obsolete.

Re:Sorry, what? (1)

TheGratefulNet (143330) | about 2 years ago | (#41266385)

dude, the ata vs sata is ONLY on the controller card!

the drive spindle is the same. its funny to hear someone say that older ide drives are 'noisier'.

you CAN say that older drives are noisier than new ones. and I'd respond with "DUH!"

but scsi, sata, ide, sas, fc: the drives are still the same. controllers are what varies.

Re:Sorry, what? (1)

abirdman (557790) | about 2 years ago | (#41267251)

Dude, thanks for the information. I did not know they were all the same. My noisy drive is a Seagate Bigfoot 20 gig drive that's around 15 years old, 5 1/4 format half-height that weighs five pounds. I can't believe I blamed the noise on the interface.

Re:Sorry, what? (0)

Anonymous Coward | about 2 years ago | (#41266239)

It would never pass review.
All the reviewers would say that ATA is dead.
Hell, the reviewers even reject things about a current technology because it's on the way out (according to them).

Re:Sorry, what? (0)

Anonymous Coward | about 2 years ago | (#41267077)

SATA drives are still ATA drives, IDE (PATA) drivers are equally aswell ATA drives.

I think the author means to say, manufactures LIE (which is in the summary). They claim NCQ but don't actually support it.

ATA drives...? WTF (3)

poet (8021) | about 2 years ago | (#41265463)

We shouldn't even be writing for ATA drives anymore. And any name brand manufacturer that you would trust (on a mediocre level) WD, Seagate etc... all support NCQ.

Re:ATA drives...? WTF (2)

FranTaylor (164577) | about 2 years ago | (#41265947)

Are you saying we should cast the ATA driver out of the kernel and dispose of all our ATA hardware?

Even though it's not in new hardware any more, we still need to support it in existing hardware. The driver still needs work when the kernel APIs change.

I use zfs (1)

the_humeister (922869) | about 2 years ago | (#41265481)

I put my important files (pr0n, etc.) on my zfs mirror file server and scrub each week. The really important stuff (tax returns, etc.) I put in a safe deposit box at the bank.

Re:I use zfs (0)

Anonymous Coward | about 2 years ago | (#41267439)

zfs triple mirror random read speeds are great for spinning disks too, and the writes are about equal to a single disk.
Nothing like getting 500mb/s sequential read off consumer rust with 200% redundancy.

I work in the storage industry. (3, Informative)

Anonymous Coward | about 2 years ago | (#41265483)

Don't assume that "enterprise" disks do this correctly either.

Many have options to make them behave properly but out of the box have write back caches and ignore FUA or similar, leading to the same problems.

Duh (2)

rickb928 (945187) | about 2 years ago | (#41265557)

I never recommended ATA drives for servers. Really old stuff that used MFM and RLL drives was back in the era where the just anything else. I used ATA drives for my home stuff and lab where it wasn't expected to be very reliable, and SCSI was all I used for a very long time. Even today I recommend against SATA though it seems tolerable, but SCSI drives are still my standard.

Mostly I thought SCSI drives were also made better, but Seagate and WD convinced me otherwise.

And yes, MFM drives in a Novell DCB setup were among my first servers. Making NW 2.15c mount a 4 GB volume just so you can say you did it would not be fun today, but back then it was work, and clients paid for it. I'm glad it wasn't a VINES server.

Re:Duh (0)

Anonymous Coward | about 2 years ago | (#41265749)

Wow you really are a computer super hero.

Re:Duh (1)

rickb928 (945187) | about 2 years ago | (#41266691)

Naw, just old.

Re:Duh (1)

FranTaylor (164577) | about 2 years ago | (#41265987)

Funny, google has tens of thousands of servers and they put cheapo SATA drives in them.

Re:Duh (1)

rickb928 (945187) | about 2 years ago | (#41266771)

And Google relies on multiply redundant servers and data, both for performance and reliability. Not many small businesses are gonna want to put in 5-way clustering.

O_Direct Works Quite Nicely (0)

Anonymous Coward | about 2 years ago | (#41265577)

Opening a file with attribute O_DIRECT seems to work quite nicely in bypassing the pagefile caching system, and getting the data to the disk in a timely fashion.

Re:O_Direct Works Quite Nicely (1)

greg1104 (461138) | about 2 years ago | (#41266797)

Except on the many Linux versions where O_DIRECT doesn't work properly. I have kernels where it works as expected; ones where it quietly fails to sync to disk; and ones where using it causes a PANIC. It's never been a priority for that API to function correctly given that Linus thinks direct IO is totally braindamaged [lkml.org] .

ATA (1)

Anonymous Coward | about 2 years ago | (#41265579)

Systems for which both speed and reliability are important should not use ATA disks.

Ok, I'll keep that in mind next time I buy ATA disks.

Acronyms (1)

puddingebola (2036796) | about 2 years ago | (#41265667)

Implementing the NCQ specification in nonhierarchical file system can easily be accomplished by passing an FMGH array through an EMH converter, while maintaining the NCQ specification via a THGN override. All NCQ specification still conform to the YTUR standard established in 1987 at the CMSD conference in Barcelona. If that helps at all.

Re:Acronyms (0)

Anonymous Coward | about 2 years ago | (#41266207)

You forgot to reroute warp power to the deflector dish!

Re:Acronyms (1)

RoverDaddy (869116) | about 2 years ago | (#41266859)

This always sounds to me like it would be equivalent to saying in this world: 'Reroute 120V from the mains to your TV antenna', and the results would be about as useful as one would expect. Seriously, was the deflector dish -designed- to accept warp power before the insane crew of the Enterprise came up with the idea?

Not about ATA, about enterprise data storage (4, Informative)

MSTCrow5429 (642744) | about 2 years ago | (#41265759)

1) This article isn't about ATA, ignore it.

2) The article's point on NCQ is that many consumer drives do not implement it correctly, and disable the write cache on the disk and issue cache-flush requests to increase performance, but leading to possible file-system failures if there is a power outage.

I think this article is saying that for the enterprise, buy enterprise drives, not consumer drives. Most consumers use laptops now, so power failure doesn't fit in, and consumers prefer speed over reliability, which is why I've always been stuck using laptops lacking ECC RAM.

Re:Not about ATA, about enterprise data storage (1)

danomac (1032160) | about 2 years ago | (#41265921)

When the power goes out, all cards are in the air anyway. We had a UPS boo-boo and our enterprise drives (both SCSI & SAS) managed to corrupt data, even with a battery on the controller itself (battery was in good health.)

Shit happens. It's pretty damn difficult to account for power failures... even with battery backups on the local controllers you can only do so much.

Re:Not about ATA, about enterprise data storage (0)

Anonymous Coward | about 2 years ago | (#41266629)

But if your filesystem is sensible, then so long as it's not being lied to about what's actually made it to disk, your data should not be inconsistent.

Re:Not about ATA, about enterprise data storage (2)

MSTCrow5429 (642744) | about 2 years ago | (#41265979)

Windows 7's Device Manager, there is a Policies tab, allowing you to "Enable write caching on the device" and additionally to "Turn off Windows write-cache buffer flushing on the device." The former warns "a power outage or equipment failure might result in data loss or corruption." The latter states "do no select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure." In Windows 7, by default, write-caching is on, and write-cache buffer flush is off. It does note that not all drives allow you to change these settings, possibly indicating that the article's author recommends any modern drive that allows one to manually choose reliability over performance. The major issue with both is that data may reside in primary memory and has not been written to the drive, there's a power failure, and your data disappears.

Re:Not about ATA, about enterprise data storage (5, Informative)

ChumpusRex2003 (726306) | about 2 years ago | (#41267207)

The "Turn off Windows write-cache buffer flushing on the device" option activates an ancient windows bug, and should never be used.

When Windows 3.11 was released, MS accidentally introduced a bug, whereby a call to "sync" (or whatever the windows equivalent was called) would usually be silently dropped. At the time, a few programmers noticed that their file I/O appeared to have improved, and attributed this to MS's much marketed new 32-bit I/O layer. What a lot of naive developers didn't notice was that the reason their I/O appeared to be faster was that the OS was handling file steams in an aggressive write-back mode, and then calls to "sync" were being ignored by the OS.

Because of this, there was a profusion of office software, in particular, accounting software, which would "sync" frequently - some packages would call "sync" on every keypress, or everytime enter was pressed, or the cursor moved to the next data entry field. As on 3.11, this call was effectively a NOP, a lot of packages made it onto client machines, and because it was fast, no one noticed.

With Win95, MS fixed the bug. Suddenly, corporate offices around the world had their accounting software reduced to glacial speed, and tech support departments at software vendors rapidly went into panic mode. Customers were blaming MS, Win95 was getting slated, lawyers were starting to drool, etc. Developers were calling senators and planning anti-trust actions. The whole thing was getting totally out of hand.

In the end, MS decided the only way to deal with this bad PR, was to put an option into windows, where the bug could be reproduced for software which depended upon it. The option to activate the bug was hidden away reasonably well, in order to stop most people from turning it on, and running their file-system in a grossly unstable mode. However, in Win95 - Vista, it had a rather cryptic name "Advanced performance", which meant that a lot of hardware enthusiasts would switch it on, in order to improve performance, without any clear idea of what it did. At least in Win7 it now has a clear name, even though it still doesn't make clear that it should only be used for when using defective software.

Re:Not about ATA, about enterprise data storage (1)

FranTaylor (164577) | about 2 years ago | (#41266001)

Or you can buy a real RAID controller with battery backup for the cache, in which case you are just fine with the cheap SATA drives.

Is this a real problem? (0)

Anonymous Coward | about 2 years ago | (#41265899)

The thrust of the article seems to be that desktop-market SATA drives don't support native command queuing and that means filesystems can't guarantee integrity right before a power failure. That sounds a little out-of-date to me, I thought most SATA drives supported NCQ these days. A quick unscientific skim through the top three desktop drive manufacturers suggests this is true:

Seagate website:
"Since late 2004, most new SATA drive families have supported NCQ"

Western Digital Website does not make a similar statement but it appears that at least the "green" and "black" lines of desktop drive support NCQ meaning most if not all of their popular drives

Hitachi does not make statements on their website but searching product descriptions shows that at least their most popular "deskstar" line supports NCQ

Which would suggest that only a very small population of old or ultracheap hard drives are affected.

Re:Is this a real problem? (1)

FranTaylor (164577) | about 2 years ago | (#41266025)

They might SAY they support it, but HOW CAN YOU REALLY TELL?

We all know that hardware LIES all the time about its ACTUAL capabilities, just READ the article!

Re:Is this a real problem? (1)

Lunix Nutcase (1092239) | about 2 years ago | (#41266113)

They do despite the people parroting his words without being able to back up the statements beyond a fallacious appeal to authority.

NCQ - Native Command Queueing (4, Informative)

wonkey_monkey (2592601) | about 2 years ago | (#41266055)

Native Command Queueing [wikipedia.org]

Because not everybody knows everythingTM

Get Hardware RAID (4, Insightful)

FranTaylor (164577) | about 2 years ago | (#41266179)

The people who make hardware RAID know all about the lying drives, they get good information from the manufacturer on how to make the drives play nice with the RAID controller.

Just read the compatibility charts for your RAID controller, many drives have footnotes with minimum drive firmware requirements and other odd behavior.

Re:Get Hardware RAID (1)

Anonymous Coward | about 2 years ago | (#41266421)

Then test your RAID controller since many of them don't actually check parity on read. They only use the parity bits to reconstruct the data if one of the devices fail. That would be fine if consumer drives always failed catastrophically. Alas, SATA drives can fail in ways that cause corrupted reads.

Test is simple enough: write known data to the RAID, shutdown, remove a disk and use dd to corrupt the data, re-install, power up, read data and check. In many cases the raid will happily return the bogus data.

Re:Get Hardware RAID (3, Interesting)

randallman (605329) | about 2 years ago | (#41266725)

The only real advantage to "Hardware RAID" is the battery backed cache. Hardware RAID comes with the disadvantage of a whole other operating system "firmware" with its own bugs and often proprietary disk layout. Parity calculations are nothing for current CPUs, so the onboard processor is not so useful. Advanced filesystems such as ZFS or BTRFS need direct access to the disks. I'd like to see drives and/or controllers with battery backed cache. Until then, I rely on my UPS.

The solution is to use logging and hashes (1)

Anonymous Coward | about 2 years ago | (#41266413)

You can avoid the need for NCQ if you use a log-structure and protect references with strong checksums. In that way you will know after a crash if say a child tree node referenced is what the referencing parent thinks it should be, and you can use double-buffering or logging to roll back to a known good state. I believe ZFS does this, as does the experimental Lithium distributed file system developed by VMware. Don't bother with NCQ.

how about a utility or SMART patch (1)

Anomalyst (742352) | about 2 years ago | (#41266495)

That would test and identify a drive for NCQ and cache disable/enable operation correctness that would report the model/serial and result to a central website

Re:how about a utility or SMART patch (1)

greg1104 (461138) | about 2 years ago | (#41266711)

Whether this sort of thing works correctly can change based on drive firmware. So even a given model/serial number combination can change which type of results it gives over time. There is no substitute for testing yourself.

Linus's Input on Write Cache (3, Interesting)

randallman (605329) | about 2 years ago | (#41266877)

I think this is quite interesting.

http://yarchive.net/comp/linux/drive_caches.html [yarchive.net]

While I've often gotten the impression that the write cache opens up a large "write hole", Linus says that data is cached only for milliseconds, not held in the cache for several seconds. Still, I'd like to see battery backed caches in regular drives and/or controllers.

Would be nice to hear from some drive firmware writers.

Does everyone here think that ATA = PATA? (0)

Anonymous Coward | about 2 years ago | (#41267211)

Are you even real nerds? What's up with you Slashdot??

It's cool that a US Marshall is doing this stuff. (0)

Anonymous Coward | about 2 years ago | (#41267447)

Way to go, Kirk McKusick!

And does this mean I have a shot at being a US Marshall?

The Lies Dicks and Their Drivers Tell (1)

SnarfQuest (469614) | about 2 years ago | (#41267487)

Misread this as "The Lies Dicks and Their Drivers Tell" at first, and was wondering what new crap the DNC was doing.

Read the solution here .. (1)

dgharmon (2564621) | about 2 years ago | (#41267525)

Put some flash ram on the HD with its own on-board battery backup ...

ATA? (1)

erc (38443) | about 2 years ago | (#41267715)

ATA? Does anyone use that anymore? Hasn't the world gone to SATA, FC, or SCSI-? This seems a lot of ado about nothing...
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...