Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Jens Axboe On Kernel Development

ScuttleMonkey posted more than 7 years ago | from the nuts-and-bolts dept.

Operating Systems 68

BlockHead writes "Kerneltrap.org is running an interview with Jens Axboe, 15 year Linux veteran and the maintainer of the linux kernel block layer, 'the piece of software that sits between the block device drivers (managing your hard drives, cdroms, etc) and the file systems.' The interview examines what's involved in maintaining this complex portion of the Linux kernel, and offers an accessible explanation of how IO schedulers work. Jens details his own CFQ, or Complete Fair Queue scheduler which is the default Linux IO scheduler. Finally, the article examines the current state of Linux kernel development, how it's changed over the years, and what's in store for the future."

cancel ×

68 comments

Sorry! There are no comments related to the filter you selected.

Testicles (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#17829794)

now at Best Buy

Re:Testicles (-1, Troll)

Anonymous Coward | more than 7 years ago | (#17831940)

Now here's the question ScuttleMonkey really wants to know: Is the cum included? ScuttleMonkey wants enough to swim in.

Canada was right! (-1, Troll)

ringbarer (545020) | more than 7 years ago | (#17829802)

It's about time we told islamics that their barbarism is NOT acceptable!

FIST SPORT!

At university, Twofo fucks YOU (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#17829842)

Twofo [twofo.co.uk] Is Dying

DC++ [dcpp.net] hub.twofo.co.uk:4144

It is official; Netcraft confirms: Twofo is dying

One more crippling bombshell hit the already beleagured University of Warwick [warwick.ac.uk] filesharing community when ITS confirmed that Twofo total share has dropped yet again, now down to less than a fraction of 1 percent of all file sharing. Coming hot on the heels of a recent Netcraft survey which plainly states that Twofo has lost more share, this news serves to reinforce what we've known all along. Twofo is collapsing in complete disarry, as fittingly exemplified by failing dead last in the recent Student comprehensive leeching test.

You don't need to be one of the Hub Operators to predict Twofo's future. The hand writing is on the toilet wall: Twofo faces a bleak future. In fact there won't be any future at all for Twofo because Twofo is dying. Things are looking very bad for Twofo. As many of us are already aware, Twofo continues to lose users. Fines and disconnections flow like a river of feces [tubgirl.com] .

N00b Campus users are the most endangered of them all, having lost 93% of their total share. The sudden and unpleasant departures of long time Twofo sharers fool_on_the_hill and Twinklefeet only serves to underscore the point more clearly. There can no longer be any doubt: Twofo is dying.

Let's keep to the facts and look at the numbers.

Sources indicate that there are at most 150 users in the hub. How many filelists have been downloaded? Let's see. 719. But 1621 IP addresses have been logged, and 1727 nicks have been sighted connecting to one user over the last term. How many searches are there? 600 searches in 3 hours. The highest sharer on campus, known as "firstchoice", or Andrew.Maddison@warwick.ac.uk in real life, was sharing over 1 TiB, despite working in ITS and not being on the resnet. He's only there so people off campus who think they're too good for bittorrent can continue to abuse the University's internet connection.

Due to troubles at the University of Warwick, lack of internet bandwidth, enforcements of Acceptable Usage Policies, abysmal sharing, retarded leechers, clueless n00bs, and ITS fining and disconnecting users, Twofo has no future. All major student surveys show that Twofo has steadily declined in file share. Twofo is very sick and its long term survival prospects are very dim. If Twofo is to survive at all it will be among p2p hardcore fuckwits, desperate to grab stuff for free off the internet. Nothing short of a miracle could save Twofo from its fate at this point in time. For all practical purposes, Twofo is dead.

Fact: Twofo is dying

Khmm... Block devices? How quaint! (2, Informative)

mi (197448) | more than 7 years ago | (#17830042)

FreeBSD dispensed with them altogether years ago...

Character devices only, thank you very much.

*Duck*

No block devices = no disk scheduling? (4, Interesting)

Kadin2048 (468275) | more than 7 years ago | (#17831220)

So how does that work?

At risk of starting a holy war, is there any reason why one approach would be superior? And do they lend themselves to different methods of scheduling? In TFA, Axboe talks about [1] the scheduling mechanism used in later versions of the 2.6 kernel series, which alleviates a problem that I (and most other people, probably) have run into before.

I'm curious, because although I don't use any of the 'real' BSDs very often -- I spend most of my time (at home, anyway) using either Mac OS X, which uses the Mach/XNU kernel (which is derived from 4.3BSD, although I don't know if the I/O scheduler has been rewritten since then), or Linux with the 2.6 kernel, and it seems to me that OS X's disk I/O leaves something to be desired compared to Linux's.

Does BSD handle I/O differently in some fundamental fashion than Linux? It sounds like, by eliminating block devices, that they basically remove the kernel from doing any re-ordering or caching of data, which makes things "safer" (in the event of a crash) but seems like it would have big performance penalties when using drives that aren't very smart, and don't do a lot of caching and optimization on their own. It seems like getting rid of I/O scheduling altogether is a stiff price to pay for "safety."

[1] (quoting because there doesn't seem to be anchors in TFA)

Classic work conserving IO schedulers tend to perform really poorly for shared workloads. A good example of that is trying to edit a file while some other process(es) are doing write back of dirty data. ... Even with a fairly small latency of a few seconds between each read, getting at the file you wish to edit can take tens of seconds. On an unloaded system, the same operation would take perhaps 100 milliseconds at most. By allowing a process priority access to the disk for small slices of time, that same operation will often complete in a few hundred milliseconds instead. A different example is having more two or more processes reading file data. A work conserving scheduler will seek back and forth between the processes continually, reducing a sequential workload to a completely seek bound workload. ...

Re:No block devices = no disk scheduling? (1)

stsp (979375) | more than 7 years ago | (#17833310)

Does BSD handle I/O differently in some fundamental fashion than Linux? It sounds like, by eliminating block devices, that they basically remove the kernel from doing any re-ordering or caching of data, which makes things "safer" (in the event of a crash) but seems like it would have big performance penalties

Good question.

The FreeBSD people claim that no one is using block devices anyway (source [freebsd.org] ):

no serious applications rely on block devices, and in fact, almost all applications which access disks directly take great pains to specify that character (or "raw") devices should always be used. Because the implementation of the aliasing of each disk (partition) to two devices with different semantics significantly complicated the relevant kernel code FreeBSD dropped support for cached disk devices as part of the modernization of the disk I/O infrastructure.

So does Oracle not rely on the block layer they pay Axboe to maintain? Or did FreeBSD's block layer implementation simply suck so badly that no one was using it? I'm using FreeBSD btw, and I don't really notice much of a difference to Linux wrt disk i/o (but I don't run busy databases).

I'd like to get a satisfying answer from a disk i/o guru as well please :-)

Err, no (2, Informative)

Fweeky (41046) | more than 7 years ago | (#17833536)

"It sounds like, by eliminating block devices, that they basically remove the kernel from doing any re-ordering or caching of data, which makes things "safer""

No; FreeBSD's shifted the buffer cache away from individual devices and into the filesystem/VM, where it caches vnodes rather than raw data blocks. The IO queue (below all this block/character/GEOM stuff) is scheduled using a standard elevator algorithm [wikipedia.org] called C-LOOK. It's showing it's age in places, and there's been some effort towards replacing/improving it, making it pluggable etc (e.g. Hybrid [freebsd.org] ); sadly it's a tricky problem to solve properly. See this recent thread [freebsd.org] .

Re:No block devices = no disk scheduling? (2, Interesting)

jd (1658) | more than 7 years ago | (#17839910)

Block devices lend themselves nicely to offload engines, as you can RDMA the processed data into a thin driver that basically just offers the data to the userspace application in the expected format but does little or no actual work. You can even do direct data placement into the application and just use the kernel as a notification system. So, the smarter the hardware, the more you can get from being able to handle large chunks of data or large numbers of commands in a single shot. Arguably, you can still do some of this with a character device - you can RDMA into the kernel, but direct data placement would be a headache and I can't see you getting much from either offloading or kernel bypass.

However, that is actually one of the benefits of character devices. They're lightweight on the hardware and the software, making "routine" activity extremely fast and efficient, and making it easier to be sure everything is correct and robust. For most "normal" activity, you're not wanting to do anything particularly complex. Wordprocessors, by and large, are not based on scatter/gather algorithms, and it is rare to find non-sequential MP3s. Also bear in mind that most CPUs outpace memory tens, if not hundreds, of time over - they are certainly going to outpace any peripherals a person might have. Why accelerate the kernel, if the kernel isn't the bottleneck? That just risks introducing bugs with no obvious gain.

Myself, I believe that it's stupid to design limitations into one component because of limitations in another. The limitations in the other component will be subject to change, but the designed limitations will hang around for much longer. I also think it's stupid to look at current typical use. Current typical use is dictated by what is currently practical. If you change what is practical, you will change what is typical use. The OS and the users are not independent of one another. What people wanted is unimportant, it's what people want to want that should dictate what OS writers should want to offer. And, yes, I believe that direct data placement has the potential to eliminate the need for both binary-only drivers and heavy-weight kernels.

(Linux contains a huge number of very low-level drivers, and is limited in what it can absorb in the way of new high-level functionality because of the risk of breakage and the difficulty of maintaining such a gigantic tree. If those had all been intelligent peripherals, the same amount of effort and coding would have produced a kernel with staggering capabilities and electronic superpowers. The drivers can't go away, even if intelligent devices replace the dumb ones of today, because people will use legacy stuff. Actually, it's worse. As Microsoft showed with Winmodems and Winprinters, it's possible to sell people dumber-than-dumb devices and even heavier-weight software that does a worse job, slower.)

sounds hard (1)

192939495969798999 (58312) | more than 7 years ago | (#17830080)

"the piece of software that sits between the block device drivers (managing your hard drives, cdroms, etc) and the file systems.'"

That sounds REALLY hard. I'd be more interested if there's a development strategy he could recommend re: complex development projects.

Scared me... (2, Funny)

creimer (824291) | more than 7 years ago | (#17830128)

I thought the title was: Ewe Boll On Kernel Development...

Disagree with Mr. Axboe... (5, Interesting)

isaac (2852) | more than 7 years ago | (#17830154)

JA: In your opinion, with the increased rate of development happening on the 2.6 kernel, has it remained stable and reliable?

Jens Axboe: I think so. With the new development model, we have essentially pushed a good part of the serious stabilization work to the distros.
I respectfully disagree that the new development model works well from an end-user's perspective (an "end user" of many thousands of linux hosts, not a toy desktop environment). Minor point releases now contain major changes in e.g. schedulers. This makes for a lot of work for real Linux users, backporting the useful bugfixes while retaining older algorithsm for which workloads are optimized. Result: a severely splintered kernel and a lot more work for us.

If core changes of such magnitude are no longer sufficient to merit a dev branch or even a major point release, why bother with the "2.6" designation at all? Just pull a Solaris and call the next release "Linux 20" or "Linux XX."

-Isaac

Re:Disagree with Mr. Axboe... (1)

archen (447353) | more than 7 years ago | (#17830720)

This is one thing I really like about FreeBSD, and that's the fact that they aren't afraid of versions. You have a development branch, and a production branch. Changes are typically moderate until a major revision 5.x to 6.x. It's also nice that you typically have stability within a version, and often a backwards compatibility layer. For instance nVidia drivers work in FreeBSD 5x, but all that's needed for 6x is to compile an option in the kernel (there by default).

Many such as myself are getting tired of having a buggy kernel as we can remember the stability of the 2.4 days. Linus needs to get over his fear of versions and step up to the plate. If I were Linus I'd throw away the old even-odd stable versioning and increase the version by .1 every year on a schedule.

Re:Disagree with Mr. Axboe... (5, Insightful)

Kjella (173770) | more than 7 years ago | (#17830758)

Well, on the other side distros were backporting *huge* amounts of patches from 2.5 to 2.4, so while plain vanilla 2.4 was stable, almost noone was running it. The 2.6 releases means the distros are shipping "stabilized unstables" instead of "destabilized stables", I guess that works out better for some and worse for some. Are RHEL, SLES, Debian stable kernels not good enough kernels to start out with, if stability is what you need? I feel there's quite a few things I see come which I find great that arrive in a timely fashion, not at the release of 2.8 in a few years. I think most that use a distro's kernel feel that way.

If you're the kind of kernel hacker who liked to get yours directly from kernel.org, yes then it sucks. But IMO the kernel has grown too big for just the core devs, think of it as an "extended" kernel team including the distros, where kernel.org releases are "internal betas". I think if you cut it back and expect just kernel.org to deliver stable kernels with the resources they have (which admittingly, they used to) then kernel development will slow way down.

Re:Disagree with Mr. Axboe... (1)

isaac (2852) | more than 7 years ago | (#17830996)

But IMO the kernel has grown too big for just the core devs, think of it as an "extended" kernel team including the distros, where kernel.org releases are "internal betas". I think if you cut it back and expect just kernel.org to deliver stable kernels with the resources they have (which admittingly, they used to) then kernel development will slow way down.
I live with the fragmentation and vendor lock-in that comes with distro-engineered kernels because I have to, but I don't like it. I'm just saying that from my perspective, I greatly preferred having a stable kernel tree that was not distro-specific.

Vendor lock-in like that makes me queasy in an open-source world. I guess I'm just nostalgic.

-Isaac

Re:Disagree with Mr. Axboe... (1)

dhasenan (758719) | more than 7 years ago | (#17831546)

Then choose a kernel--a stock kernel would do best for you, most likely--and stick with it. If you really need something from a newer kernel, you can do the work for backporting, pay someone else to do it, or live without (you were before, no?).

Re:Disagree with Mr. Axboe... (1)

gmack (197796) | more than 7 years ago | (#17831664)

This actually reduces fragmentation since only bug fixes get back ported. I don't know why he didn't mention it but the older branches are still maintained. If you want bug fixes then get 2.6.18.5 or something and only move between versions if you want new features. The distros are sending their fixes upstream.

that's bullshit (0)

Anonymous Coward | more than 7 years ago | (#17831740)

But IMO the kernel has grown too big for just the core devs

Those guys, including Linus, are just fucking around with shit as a live experiment with users. They've already complained since 2.3 that people don't test the dev kernels enough. Not even bothering to attempt to prove if it works, they release the van Riel VM and it totally screws users for about six point releases before they remove it. We also get Linus saying he'll screw with the ABI intentionally to mess up binary-only modules and no real direction in terms of overall architecture stabilization.

I disagree. It's not too big. They just can be bothered to manage it properly.

Re:Disagree with Mr. Axboe... (4, Insightful)

ComputerSlicer23 (516509) | more than 7 years ago | (#17830766)

Don't take this the wrong way, but your complaint sounds a lot like the story about a patient and a doctor:

"Doctor, when I do this, it hurts", and the doctor replies, "Well don't do that".

I mean, if you are following bleeding edge kernels, and complaining that they aren't as stable as you'd like. Why not just follow a vendors kernel? If you use or install "many thousands", you are either maintaining your own de-facto distribution or you are using someone else's distribution. Vendor's do exactly the work you want done on your behalf.

I patiently wait for my vendor kernel, which might be 10 point releases behind integrate bug fixes and then upgrade in a year or two to a much newer point release (I think RedHat has used 2.6.9 and/or 2.9.13 in recent memory)... Incrementing a different number wouldn't really make any difference anyways. At that point it's all semantics, if you know the rules of the game, it's not hard to tell what's dangerous as an upgrade and what's not.

It's not like 2.4.13 (or whatever one in the 2.4 series that introduced series disk corruption) was safe merely because it was a point release... They are safe because somebody took it out back and beat on the kernel for a while and it didn't cause any problems. If you upgrade without proper testing and it breaks, you get to keep the pieces.

Kirby

That /also/ goes for companies! (0)

Anonymous Coward | more than 7 years ago | (#17832938)

I mean, if you are following bleeding edge kernels, and complaining that they aren't as stable as you'd like. Why not just follow a vendors kernel?

And people still wonder why major vendors slowly but steadily fully drop customer distributions... RedHat dropped its stuff and moved it into Fedora, SuSE dropped theirs and moved it into 'OpenSuSE'. Why?

My theory is simple: the model doesn't plain out suck from an end-users point of view, but it also puts a massive overhead on companies who try to actually run their business on Linux. It keeps amazing me, time and time again, how people are so easy with conclusions like "So run a vendor kernel" without seemingly even the will to realize that the burden on the end user also applies to the vendor. Big difference being that while it may simply annoy the enduser, it can really hurt the company.

Linux had better get its act together otherwise I foresee it sinking right back from the depths it came from while its being fully overrun by massive discussions in a desperate attempt to make it more commonly appealing again. Because if I compare both Linux and Solaris at this point I see a lot more stable software being released by Sun. Its free, its more reliable and if I wish to fork their software to run something on my own it doesn't put the same massive amounts of overhead on my shoulders. Granted, don't get me wrong here, naturally its not perfect. Even Sun has its flaws, but right now one of the most heard complaints is thay they promise and only deliver at a much later moment. It protects stability and reliability, it doesn't really help with the people who were actually interested in those features.

And then there's ofcourse also the BSD tree. I can't fully comment on these since my experience is very limited. But here too you see some "cutting edge" environments (if you want it to be) like FreeBSD but also environments which take it much slower and rely on robustness and stability (OpenBSD for example).

So, long way to basicly say that I don't think your comment is fair. Please try to think beyond your own little space. What applies to the enduser applies to the vendor. And the latter is a very important factor when it comes to boosting Linux.

Re:That /also/ goes for companies! (1)

ComputerSlicer23 (516509) | more than 7 years ago | (#17834928)

Last time I checked, there are approximately 5-10 major distros. The vendors have to do their due diligence on any kernel. It's not like RedHat picks a kernel and sits on it for several years. Cherry picking fixes isn't tons of fun, but it's not impossible and if the bugs don't affect anyone why exactly do they need to be fixed again? It's not like those people don't communicate or you can't work with them if you choose to. SuSE and RedHat guys talks. SuSE and Gentoo talk. Having everyone attempt to stabilize their trees independently is insane... that's why I'm saying everyone attempting to do their own is crazy. But attempting to make one group of folks stabilize one tree for the whole world is at least as insane.

Pretty much everybody else bases their kernel sources off of one of those 10-20 trees (counting major kernel developers that maintain reasonable stable trees). There are no free lunches RedHat and SUSE get free engineering and major movement forward. To aquire that they take on QA and stabilization work. More importantly they get to pick when they suffer their pain. Attempting to state that the developers have to absorb all that and never allow anything that isn't stable and tested out the door is insane, we just left that madness. It wasn't stable then anyways, and pretending that you can merely engineer your way towards perfect software that is moving as fast as Linux currently does is ignoring reality. They get to pick what they feel has the most stable base to move to. It's an engineering decision. The forward advancements that Linux is making make moving forward worth the effort and expense of stabilizing a version every 18 months or so.

In terms of Solaris or OpenBSD, vote with your feet. Move towards them and let the people who have figured out a way to deal with the pains you describe do it. Linux works, because it works... That sounds like a tautology, but it's true. What they are doing works for them. By nearly all accounts, everyone is much happier with the current model then the disaster that was the 2.2 to 2.4 transition. It took forever, and when you finally got it wasn't stable until the vendors did their vendor thing to it. I ran the gambit of kernels between 1.3 and 2.4... 2.6 is a much nicer program to be on...

The 2.6 kernel is moving forward and getting major developments out into the hands as broad a set of people as possible. It's a thing of beauty.

Kirby

Re:Disagree with Mr. Axboe... (2, Insightful)

diegocgteleline.es (653730) | more than 7 years ago | (#17832002)

The kernel development model is optimized to make distros happy, not end users. Just like Gnome/KDE, BTW. This is because, well, in the Real World most of desktop/servers use (or should use) the kernel shipped by their distro. And because distros are who emply most of kernel hackers.

In other words, the previous development model made happy say 1% of people (you) and 99% unhappy (distros and hence people using distros). The current model makes 99% of people happy (distros) and 1% unhappy.

IMO it's was a good change. And if you don't like it, just use Opensolaris. There's nothing wrong with it.

Re:Disagree with Mr. Axboe... (1)

noz (253073) | more than 7 years ago | (#17835230)

Minor point releases now contain major changes in e.g. schedulers.
As much as this is now commonplace, I believe the virtual memory management subsystem was entirely replaced half-way through the 2.4 series. This management style has always been a concern for production users.

Where are they now? (3, Interesting)

LaminatorX (410794) | more than 7 years ago | (#17830216)

I did a double take when I saw this, as Jens was an exchange student at my high-school way back when. Small internet.

Exhilarating! (1)

Annoymous Cowherd (1036734) | more than 7 years ago | (#17830218)

An excellent read!

Something exciting about delving in the low level logic that gives you the feeling that there's always something more to learn !

I guess always being two steps behind is the motivation that makes it all worth while.

Wow ... (2, Funny)

ravee (201020) | more than 7 years ago | (#17830288)

15 year Linux veteran and the maintainer of the linux kernel block layer,...

In the interview he says he is now 30 years old. Wow that means he started working in Linux at the age of 15 - a real prodigy. A very interesting interview.

Btw, it is nice that kerneltrap.org has finally had a make over. The earlier website design looked rather drab.

Re:Wow ... (3, Informative)

Error27 (100234) | more than 7 years ago | (#17833580)

Marcello was only 18 when he took over the 2.4 branch. He was working for Conectiva at age 13 or 14... Debian has had a bunch of really young package maintainers for critical packages.

What about the process' priority? (4, Insightful)

mi (197448) | more than 7 years ago | (#17830372)

CFQ now uses a time slice concept for disk sharing, similar to what the process scheduler does. Classic work conserving IO schedulers tend to perform really poorly for shared workloads.

I wonder, if the originating process' priority is taken into account at all... It has always annoyed me, that the "nice" (and especially the idle-only) processes are still treated equally, when it comes to I/O...

Re:What about the process' priority? (1)

undertow3886 (605537) | more than 7 years ago | (#17830516)

The article mentions an "ionice".

Re:What about the process' priority? (2, Insightful)

mi (197448) | more than 7 years ago | (#17830574)

The article mentions an "ionice".

Indeed, it does — but should not the I/O-niceness be automatically derived from the process' niceness?

Re:What about the process' priority? (1)

MartinG (52587) | more than 7 years ago | (#17831254)

Maybe there is a case for a userland tool that sets both at once combining the nice and ionice commands into one, but they certainly should not be tied together in the kernel. The kernel is there to provide mechanisms for setting these things, not for deciding what should be linked to what.

Re:What about the process' priority? (1)

mi (197448) | more than 7 years ago | (#17831326)

nice(1) should be doing that (with the help of the kernel-provided mechanisms) then, in my not so humble opinion. Some kind of ionice can be used for finer tuning, but by default a nicer process should be nicer on everything — IO included.

Re:What about the process' priority? (1)

MartinG (52587) | more than 7 years ago | (#17832036)

I would probably agree with you there. In fact one command could easily handle all of this doing what you suggest by defailt, and having additional arguments for selecting different CPU and I/O nice values.

I suspect nice(1) was not changed for backwards compatibility reasons. There would perhaps be corner cases where a process expected their fair share of I/O time but didn't need much CPU (e.g., tar zcf scripts for backups?) that would suffer too much or not complete if they were suddenly I/O starved.

Re:What about the process' priority? (0)

Anonymous Coward | more than 7 years ago | (#17832946)

If a process were I/O-bound, you would never nice it in the first place because it wouldn't be chewing up CPU. I'd have to work pretty hard to contrive a situation where I would want a process to use up only idle CPU but be able to thrash the disk as much as it wants.

dom

Re:What about the process' priority? (1)

diegocgteleline.es (653730) | more than 7 years ago | (#17832060)

You can make it behave that way if you want, but nobody forces you .

Yes, CPU priority is taken into account (1)

Sits (117492) | more than 7 years ago | (#17835100)

I wonder, if the originating process' priority is taken into account at all... It has always annoyed me, that the "nice" (and especially the idle-only) processes are still treated equally, when it comes to I/O...
Are you sure they are? See the ionice man page [die.net] here:

Best effort. This is the default scheduling class for any process that hasn't asked for a specific io priority. Programs inherit the CPU nice setting for io priorities.

CFQ not the default scheduler? (4, Informative)

rehabdoll (221029) | more than 7 years ago | (#17830470)

Anticipatory is, according to my menuconfig:

The anticipatory I/O scheduler is the default disk scheduler. It is
generally a good choice for most environments, but is quite large and
complex when compared to the deadline I/O scheduler, it can also be
slower in some cases especially some database loads.
*

Anticipatory is also preselected with a fresh .config

Re:CFQ not the default scheduler? (2, Informative)

darkwhite (139802) | more than 7 years ago | (#17830600)

CFQ was committed relatively recently and there was discussion for a while as to whether and when to make it default. I think 2.6.19 uses Anticipatory by default, but 2.6.20 will use CFQ by default (not 100% sure though).

Re:CFQ not the default scheduler? (4, Informative)

zdzichu (100333) | more than 7 years ago | (#17830732)

CFQ is default since 2.6.18 [kernelnewbies.org] , released back in September 2006.

Re:CFQ not the default scheduler? (1)

rehabdoll (221029) | more than 7 years ago | (#17830902)

well, not in my 2.6.20-rc7 default config.

Scheduling better than no scheduling? (4, Interesting)

Kadin2048 (468275) | more than 7 years ago | (#17832658)

Are there any hard metrics on what the performance advantages are of various schedulers, under typical load conditions?

Reading TFA piqued my interest into I/O scheduling and I've been doing some reading on it, and it seems like there are several competing schools of thought, of which Axboe (and potentially the Linux kernel developers generally) are only one.

An alternative view, such as this from Justin Walker (a Darwin developer) on the darwin-kernel mailing list [apple.com] , holds that it's not worthwhile for the OS kernel to do much disk scheduling, since "the OS does not have a good idea of the actual disk geometry and other performance characteristics, and so we [kernel developers] leave that level of scheduling up to the controllers in the disk drive itself. I think, for example, that recent IBM drives have some variant of OS/2 running in the controller. Since the OS knows nothing about heads, tracks, cylinders for modern commodity disks, it's futile to try to schedule I/O for them." (written Mar 2003)

Axboe seems to acknowledge that this may sometimes be the case, because they do have the 'non-scheduling scheduler,' which he recommends only for use with very intelligent hardware. However, it seems like some people think that commodity drives are already 'smart enough' to do their own scheduling.

It seems like determining which approach was superior would be relatively straightforward, and yet I've never seen it done (although maybe I'm just not looking in the right places). Anecdotally, I'm tempted to agree with Axboe, since it seems like, when doing things where several processes are all thrashing the disk simultaneously, my Linux machine feels faster than my OS X one, but this is by no means scientific (they don't have the same drives in them, not working with the same datasets, etc.).

On what drives, and under what conditions, is it advantageous to have the OS kernel perform scheduling, and on which ones is it best just to pass stuff to the drive and let the controller do all the thinking?

Re:Scheduling better than no scheduling? (1)

Sits (117492) | more than 7 years ago | (#17835402)

The sort of "scheduling" you are talking about sound like block reordering. This is where you try and group requests for blocks that you guess are going to be in a similar part of the disk together in the hopes of speeding things up. It's absolutely true that today's disks bear less and less resemblance to the old cylinders, sectors and heads of old disks and most disks have their own cache which can do reordering (not to mention the silent remapping that modern disks do when sectors go bad). Unless the disk cache queue is very deep though, I suspect there is some advantage to doing I/O scheduling because you might be able to wait longer before making a decision as to what to read thus avoiding the worst case scenario where you seek between disparate files continuously.

This sort of I/O scheduling also goes further. For example, with cfq you can arrange for a daily file scan to have it's I/O queued up behind anything else so your web browser is not made to stall (so long) when writing to disk. When disks become capable of this type reordering (I believe MS are pushing for this) then again there will be less need for the OS to do it.

Re:Scheduling better than no scheduling? (3, Informative)

axboe (76190) | more than 7 years ago | (#17840118)

It depends on what you need to schedule. If your drive does queuing and only one process IO is active, then the OS can do very little to help. The OS usually has a larger depth of ios to work with, so it's still often beneficial to do some sorting at that level as well.

IO scheduling is a lot more than that, however. If you have several active processes issuing IO, the IO scheduler can make a large difference to throughput. I actually just did a talk at LCA 2007 with some results on this, you can download the slides here:

LCA2007 CFQ talk [kernel.dk]

Re:Scheduling better than no scheduling? (1)

Slashcrap (869349) | more than 7 years ago | (#17840970)

An alternative view, such as this from Justin Walker (a Darwin developer) on the darwin-kernel mailing list [apple.com], holds that it's not worthwhile for the OS kernel to do much disk scheduling, since "the OS does not have a good idea of the actual disk geometry and other performance characteristics, and so we [kernel developers] leave that level of scheduling up to the controllers in the disk drive itself. I think, for example, that recent IBM drives have some variant of OS/2 running in the controller. Since the OS knows nothing about heads, tracks, cylinders for modern commodity disks, it's futile to try to schedule I/O for them." (written Mar 2003)

You. Have. Got. To. Be. Shitting. Me.

Hard drives running OS2 on their controllers? Why would anyone do that? What do you even gain from having a whole OS take the place of embedded firmware?

I think he's been smoking crack.

mod uQp (-1, Troll)

Anonymous Coward | more than 7 years ago | (#17830660)

share the code (1)

kokoko1 (833247) | more than 7 years ago | (#17830748)

JA: Is there anything else you'd like to add? Jens Axboe: Share the code! :)

Hehe. (1)

Zaurus (674150) | more than 7 years ago | (#17830824)

Am I the only one that misread that as "an interview with Jens Axboe, 15 year old Linux veteran" ?

Re:Hehe. (1)

stu42j (304634) | more than 7 years ago | (#17831022)

No.

Re:Hehe. (1)

diegocgteleline.es (653730) | more than 7 years ago | (#17832116)

Apparently yes. Man, you should buy new glasses.

High disk usage (1)

twistedcubic (577194) | more than 7 years ago | (#17830914)

Is this the part of the kernel that's responsible for making systems really slow during extended disk writes, while the CPU utilization is minimal?

Sort of :) (1)

Kadin2048 (468275) | more than 7 years ago | (#17831332)

I think it would be more correct to say:
[His] is the the part of the kernel that's responsible for making systems slightly less slow during extended disk writes, while the CPU utilization is minimal.

And even that's not quite true, where the scheduler really comes into play is when you have two or more processes trying to access the disk at the same time. During an extended, sustained read or write, the scheduler probably just needs to stay the hell out of the way and pass data as fast as it can.

You could also say, that as a secondary priority, he's responsible for keeping the CPU utilization minimal, during those disk writes...

Re:Sort of :) (0)

Anonymous Coward | more than 7 years ago | (#17836682)

Its also the part that consumes 100% of a CPU when the bandwith on a disk goes over 500MB/sec...

Plugging explained (0)

Anonymous Coward | more than 7 years ago | (#17831148)

Plugging itself is a mechanism to slightly defer starting an IO stream until we have some work for the device to do. You can compare it to inserting the plug in the bathtub before you fill it with water, the water will not flow out of the tub until you remove the plug again.

Missing Question: How do you pronounce your name? (2, Interesting)

chuck (477) | more than 7 years ago | (#17831396)

As a native English speaker, comfortable with Spanish and aware of the basics of French (so I'm not entirely uneducated), I am entirely unequipped to reason the pronunciation of "Jens Axboe." Can someone help me out?

Re:Missing Question: How do you pronounce your nam (3, Interesting)

LaminatorX (410794) | more than 7 years ago | (#17831704)

Back in school we pronounced it with a "y" sound for the "j": "Yens" rhymed with "mens." Now, as to weather that was actually the correct pronunciation or merely something close enough that he didn't bother correcting us; I couldn't say.

Re:Missing Question: How do you pronounce your nam (1)

value_added (719364) | more than 7 years ago | (#17832532)

Now, as to weather that was actually the correct pronunciation or merely something close enough ...

Close enough. ;-)

Re:Missing Question: How do you pronounce your nam (3, Informative)

axboe (76190) | more than 7 years ago | (#17840082)

Hi John!

That is correct, like a "y", rhymes with "mens". I saw another question on the lastname, I typically tell foreigners that it is pronounced ax-bow. Europeans often think the 'oe' is like the Danish "ø", however that is not the case.

Re:Missing Question: How do you pronounce your nam (1)

LaminatorX (410794) | more than 7 years ago | (#17857488)

He there, good to see you still exist! So, is kernel.dk always STFU or is that just up for /.ing?

Re:Missing Question: How do you pronounce your nam (1)

krestenk (919883) | more than 7 years ago | (#17832858)

I am not proficient in sound-writing. (Or whatever it is called..)
But since Jens is a Dane, like myself I'd give it a shot.
Yens Aksbo
Where Yens is pronounced with the pressure on the e.
Yêns
And boe is pronounced without the e, and with the pressure on the a.
âksbo
Hope this helps.

Re:Missing Question: How do you pronounce your nam (2, Informative)

Ysangkok (913107) | more than 7 years ago | (#17833290)

Well, he's a Dane. I'm a Dane too so I'll tell you how I would pronounce it:

Jens is NOT pronounced "Djens". "J" is pronounced as a Palatal approximant [wikipedia.org] in Danish - just like "y" in English. Yens is somewhat more correct, but the "e" has to be pronounced like the IPA [æ]. Danish is not logic at all. If it was, "Jens" would be spelled with a "æ". Take a look at Jens [wikipedia.org] .

IPA: [jæns]

Axboe is more complicated:

  • A is pronounced flat. (like when you say "aah" at the dentist. Just like "spa". Take a look at Open back unrounded vowel [wikipedia.org] )
  • X is pronounced "ks". When the word is pronounced quickly it may sound like "gs".
  • B is b.
  • "Oe" is usually pronounced like the Danish "ø". See Close-mid front rounded vowel [wikipedia.org]

IPA transcription of Axboe would be something like: Open back unrounded vowel + [ksbø]

(I can't get the IPA sign for "Open back unrounded vowel" to display in Slash)

Re:Missing Question: How do you pronounce your nam (1)

chuck (477) | more than 7 years ago | (#17833548)

That's awesome! Thanks.

Re:Missing Question: How do you pronounce your nam (1)

Ysangkok (913107) | more than 7 years ago | (#17834076)

I made a mistake. In this case (Axboe) "oe" is just pronounced "o". This is the Close-mid back rounded vowel [wikipedia.org] IPA: [o].

This is what Slashdot is about (3, Interesting)

bcmm (768152) | more than 7 years ago | (#17832278)

Thank you very much. Much of this article is informative, technical and really, really nerdy. I for one sit through dupes and rubbish like today's meaningless benchmarking of differing minor kernel versions in the hope of reading articles like this.

BTW, does anyone have a good set of benchmarks of the performance of different IO schedulers when running one or two or three IO intensive tasks, when running one intensive and many small tasks, etc.? That would actually help me decide whether to rebuild my kernel with CFQ.

Also, ionice would have made my old machine much more usable when doing backups... Oh well.

Re:This is what Slashdot is about (1)

phrasebook (740834) | more than 7 years ago | (#17840064)

Also, ionice would have made my old machine much more usable when doing backups... Oh well.

Is it any different with your new machine? My Athlon X2 (SATA disks, 2GB etc) crawls when I start rsyncing my /home.

return to old stable/unstable please (0)

Anonymous Coward | more than 7 years ago | (#17838356)

seriously, 2.6.19 and above is unusable on my computer, because the i2c driver crashes ipw2200

not enough testing is being done. you people are ruining your own kernel

Reiser4 (1)

scott_karana (841914) | more than 7 years ago | (#17839408)

I was a little disappointed when he said filesystems like Reiser4 and ZFS don't affect the block layer. I'm not sure about ZFS, either, but I do know that Reiser4 can do stuff above and beyond what the block device layer can do, these days.
How do I know? Why, it's on the Namesys webpage!

Re:Reiser4 (2, Interesting)

axboe (76190) | more than 7 years ago | (#17840140)

That's largely because they do more than traditional file systems. Some of the ZFS functionality Linux would put in other layers, for instance. Once the IO is issued to the block layer, there's no difference.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>