×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Completely Fair Scheduler

CmdrTaco posted more than 6 years ago | from the i'm-always-late dept.

Linux 292

hichetu writes "Kernel trap has a nice summary of what is going on behind the scenes to change the Linux Scheduler. The O(1) Linux scheduler is going to be changed so that it is fair to interactive tasks. You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

292 comments

Isnt this called Cron ? (5, Funny)

Anonymous Coward | more than 6 years ago | (#18831995)


I thought Linux used Cron as a scheduler ?

Re:Isnt this called Cron ? (5, Informative)

ozamosi (615254) | more than 6 years ago | (#18832027)

That is for scheduling background tasks that run once a day (or whatever you set it to)

This is for scheduling CPU resouces in real time. To decide if Firefox or Apache is going to be executed the following split second.

Re:Isnt this called Cron ? (5, Funny)

LiquidCoooled (634315) | more than 6 years ago | (#18832103)

Can't we just give the processes weapons and let them decide which follows?

Re:Isnt this called Cron ? (5, Interesting)

sphealey (2855) | more than 6 years ago | (#18832197)

> Can't we just give the processes weapons and
> let them decide which follows?

That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams as the "separate the B+ from the A" question. If your answer was interesting enough he would send you over to one of his Masters candidates to see if it could be taken any further. So I wouldn't count your suggestion out from the start!

sPh

Re:Isnt this called Cron ? (-1, Flamebait)

Anonymous Coward | more than 6 years ago | (#18832413)

Why would such a stupid question end up on an exam? What would be the 'correct' answer?

The Mother of All Comp-Sci Flame Wars (5, Funny)

mosel-saar-ruwer (732341) | more than 6 years ago | (#18832419)


GP: Can't we just give the processes weapons and let them decide which follows?

P: That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams as the "separate the B+ from the A" question. If your answer was interesting enough he would send you over to one of his Masters candidates to see if it could be taken any further. So I wouldn't count your suggestion out from the start!

Behold: The Mother of All Possible Comp Sci Flame Wars: The Darwinistically Selected Genetic Algorithms -vs- the Intelligently Designed Algorithms.

Bumper Stickers $4.95; T-Shirts $19.95:

$DEITY does not play dice with the Turing Machine.

Re:Isnt this called Cron ? (1)

Jah-Wren Ryel (80510) | more than 6 years ago | (#18832483)

> Can't we just give the processes weapons and
> let them decide which follows?

That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams...
Operations Research being closely related to Industrial Engineering, your prof proves the old joke about how IE's are Imaginary Engineers.

Ring-0 (1)

TiCL (1063546) | more than 6 years ago | (#18832297)

Can't we just give the processes weapons and let them decide which follows?
Ring-0 motherfucker! Do you use it?

Well duh (0)

Anonymous Coward | more than 6 years ago | (#18832007)

Obviously O(1) is a bad choice. Nice to see Linux learning something from Windows here.

Fair? (4, Funny)

alienmole (15522) | more than 6 years ago | (#18832029)

If scheduling was completely fair, this would have been a frist ps0t.

goatseGOATSEgoatseGOATSEgoatseGOATSEgoatseGOATSE (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#18832217)

Srry, d00d. May b l8r.

Smells like Communism (5, Funny)

0xdeadbeef (28836) | more than 6 years ago | (#18832039)

Free software: because some processes are more equal than others.

Re:Smells like Communism (5, Funny)

Anonymous Coward | more than 6 years ago | (#18832135)

Yes, each process according to its needs,
each CPU according to its abilities.

Re:Smells like Communism (1)

Kjella (173770) | more than 6 years ago | (#18832699)

And for the capitalist owning the CPUs and giving the processes work: Outlook not so good. Fortunately, our slave laborers are very loyal.

Re:Smells like Communism (0)

Anonymous Coward | more than 6 years ago | (#18832289)

two jiffies good
four jiffies bad

wait, strike that. reverse it.

This is communism! (0)

Anonymous Coward | more than 6 years ago | (#18832041)

Life is not fair and neither are kernels.

History has shown that the truly successful kernels use a form of Darwinian scheduling that assign more processor time to the most productive tasks along with light-weight memory protection to act as a safety net for tasks that fall through the cracks.

Re:This is communism! (1, Insightful)

Anonymous Coward | more than 6 years ago | (#18832339)

This sort of scheduler is more suited for desktops, where you want everything to react responsively at the cost of lower throughput.

Surprised? (5, Funny)

WombatDeath (681651) | more than 6 years ago | (#18832055)

You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks.

No I won't, because I don't know what the hell it means.

Hah! In your face, Taco!

Re:Surprised? (5, Informative)

Eevee (535658) | more than 6 years ago | (#18832101)

In computer science, Big O notation [wikipedia.org] is used for the complexity of a task.

Re:Surprised? (-1, Troll)

Anonymous Coward | more than 6 years ago | (#18832205)

We don't care, you pencil-necked dork.

Re:Surprised? (4, Funny)

BiggerIsBetter (682164) | more than 6 years ago | (#18832323)

In computer science, Big O notation is used for the complexity of a task.

Meanwhile, outside computer science, Big O faces are used for the completion of a task.

And that relates to "fairness" how? (0)

Anonymous Coward | more than 6 years ago | (#18832355)

Because it doesn't.

Re:And that relates to "fairness" how? (1)

Rakshasa Taisab (244699) | more than 6 years ago | (#18832431)

It doesn't, it relates to the particular implementation used in Linux, which has a constant complexity. The new scheduler uses RB-trees, which would make it O(log(N)) for insertion.

Re:And that relates to "fairness" how? (3, Funny)

wellingj (1030460) | more than 6 years ago | (#18832457)

The current O(1) schedule is not entirely fair to attain so named O(1) performance. That's how jackass.

Re:And that relates to "fairness" how? (1)

ResidntGeek (772730) | more than 6 years ago | (#18832587)

Fairness has absolutely nothing to do with algorithmic complexity. O(1) means the scheduler runs in the same amount of time, every time, and says nothing about how fair the scheduling is. Jackass.

Re:Surprised? (5, Informative)

jrschulz (684749) | more than 6 years ago | (#18832235)

A scheduler is the piece of software that brings you the illusion of multi-tasking. Because a single processor (with a single core) can only run one process at the same time, the operating system switches the process currently running. And it does this very fast (IIRC up to 1000 times a second in the case of linux).

The scheduler decides which process runs when and has to make sure that no process has to wait in the queue forever without getting his share of CPU time (this is what is called "starving").

Since the scheduler is a program by itself, it has a specific runtime characteristic, usually dependent of the number of programs waiting for their CPU share. The special property of the current scheduler in linux is that its runtime is in fact independent of this number. That's expressed in CS by O(1).

mod parent up (1)

bakuun (976228) | more than 6 years ago | (#18832387)

Good explanation of the scheduler, and what significance it has that it is O(1). (as opposed to some other complexity, where the scheduler would take longer and longer to execute the more processes there are in the queue).

Would have modded up myself if I had the points to do it.

Re:Surprised? (1, Insightful)

Anonymous Coward | more than 6 years ago | (#18832447)

Yeah, I know what O(1) means, but I still don't know what "You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks." means.

SMP and RT capabilities? (2, Interesting)

Anonymous Coward | more than 6 years ago | (#18832057)

Will the new scheduler be more efficient at scheduling single processes across a multi-core system?

On another not[e], when are we going to be able to build a default kernel capable of low latency I/O for A/V work?

I/O prioritisation (5, Interesting)

Doug Neal (195160) | more than 6 years ago | (#18832069)

Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation. Windows now has it, so there's no excuse. CPU power is fairly abundant these days so managing its usage is less of an issue than it used to be, but I/O bandwidth is often in short supply and I/O-bound applications can choke a system and make interactive processes a pain in the ass to use. I'd like to see some way of reserving and limiting bandwidth to particular devices for particular processes. And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy... as far as I know, the system calls don't even exist in the kernel to do this yet.

Re:I/O prioritisation (4, Interesting)

tepples (727027) | more than 6 years ago | (#18832123)

And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy... as far as I know, the system calls don't even exist in the kernel to do this yet.
Windows has this to an extent. I press Ctrl+Shift+Esc to open Windows Task Manager, then View > Select Columns... > turn on Page Faults Delta, and I see the penalty for sticking with the 6-year-old paid-for PC that I still use.

Re:I/O prioritisation (4, Informative)

jawtheshark (198669) | more than 6 years ago | (#18832405)

Now, page faults are indeed a form of I/O, but a page fault is technically seen just the fact that some memory required isn't in physical memory. I don't think the parent poster was talking about that. One of the most common reasons for page faults are simply that a block of memory has been swapped to disk, and then suddenly it is required, and as such the block of memory needs to be read into the physical memory.

I'd say: add some memory to that box of yours.

You can read up on it here [wikipedia.org]

Re:I/O prioritisation (1)

tepples (727027) | more than 6 years ago | (#18832585)

I'd say: add some memory to that box of yours.
I know that. But do they make PC133 anymore?

Re:I/O prioritisation (1)

$RANDOMLUSER (804576) | more than 6 years ago | (#18832639)

I'd say: add some memory to that box of yours.
That assumes that the loader is going to try to bring the whole program into memory at load time, which is almost never the case. It completely ignores the notion of working sets [wikipedia.org], or how they change over the lifetime of a process. As an example, why should a program keep its "open all files" code in memory once all the files are opened? Vomit Making System loaded the first page of a program (the entry point) and forced the program to demand in other (512 byte) pages as required. Picking a (code) page to discard was simply done LRU. The working set size was an OS tunable parameter which was affected by things like average number of users, average process size, average process runtime, etc.

Re:I/O prioritisation (-1, Flamebait)

Anonymous Coward | more than 6 years ago | (#18832139)

fs_usage is what you're looking for. Oh wait, what's that? Linux is stuck in the '90s and doesn't have fs_usage? Well, make that one more thing for you to rip off other OSes, then.

Re:I/O prioritisation (5, Informative)

Anonymous Coward | more than 6 years ago | (#18832149)

> What it could really do with is I/O prioritisation.
ionice [die.net]. Available since 2005-08-28.

> And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy

I agree, that would be nice.

Re:I/O prioritisation (1)

Kjella (173770) | more than 6 years ago | (#18832663)

I see ionice isn't installed by default on Debian etch, but found it in the schedutils package. However, I still don't see how to use it in a useful way. For example, I would like to load kplayer with high priority on both CPU and IO, every time I run it (i.e. every time I start the process). I understand I can dig out the process ID, open up a terminal, renice and ionice it, but is there any way to make it run that way by default?

Re:I/O prioritisation (0)

Anonymous Coward | more than 6 years ago | (#18832743)

Read the manpage? Know nice?

nice -n -19 ionice -c 1 kplayer

Re:I/O prioritisation (2, Informative)

hackstraw (262471) | more than 6 years ago | (#18832249)

Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation. Windows now has it, so there's no excuse.

I dunno. Linux has has some changes recently in the scheduling department, and an O(1) process scheduler can only be "a good thing". Recently, the I/O block layer got a new scheduler (linky http://kerneltrap.org/node/7637 [kerneltrap.org]). Regarding other I/O prioritization, I can't say with authority that this is needed or not.

Maybe all of these things are related, but in my selfish world, I want Linux to scale better. Scaling well beyond 64 CPUs or cores, but I believe this is a much greater task than ~60 hours of coding. It took a while before Linux got SMP clean, and today SMP is normal, not something that reserved for servers. The monolithic high clockrate CPU is dead, and this has been overtaken by multi-core processors, and more than one of them inside of one box. In my eye, this is where all OSes should be focusing their attention.

Re:I/O prioritisation (2, Informative)

makomk (752139) | more than 6 years ago | (#18832625)

Linux has has some changes recently in the scheduling department, and an O(1) process scheduler can only be "a good thing".

Which, presumably, is why they're thinking of taking out its current O(1) process scheduler and replacing it with an O(log(n)) one?

Re:I/O prioritisation (1)

JohnFluxx (413620) | more than 6 years ago | (#18832319)

I'm the author of the kde task manager thing. It would be interesting to add a view for I/O. If you find out how to do it code wise, let me know :-)

Re:I/O prioritisation (0)

Anonymous Coward | more than 6 years ago | (#18832359)

Exactly right. Anyone with a quad core will tell you the hard drive is the bottleneck now.

Re:I/O prioritisation (5, Interesting)

Stephen Williams (23750) | more than 6 years ago | (#18832379)

And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy

I'd love something like that.

There's a way of logging I/O; it's pretty rough-and-ready, not really suitable for permanent use, but can be handy for figuring out what keeps causing a laptop HDD to spin up, for example. As root, do:

echo 1 >/proc/sys/vm/block_dump
I/O is then logged to the kernel ring buffer, and can be retrieved with dmesg. The entries look like:

pdflush(138): WRITE block 1161864 on dm-4
pdflush(138): WRITE block 0 on dm-3
pdflush(138): WRITE block 524328 on dm-3
pdflush(138): WRITE block 786952 on dm-3
pdflush(138): WRITE block 786960 on dm-3
When you've finished, do

echo 0 >/proc/sys/vm/block_dump
as root to turn it off again.

Like I said, very rough-and-ready, nowhere near as nice as a proper I/O top would be, but there it is.

-Stephen

Re:I/O prioritisation (2, Informative)

Mike McTernan (260224) | more than 6 years ago | (#18832641)

iostat is probably an easier way to achieve similar output.

It still doesn't break it down by task though, so not really the iotop people are looking for.

Re:I/O prioritisation (2, Informative)

Anonymous Coward | more than 6 years ago | (#18832383)

Linux has had I/O prioritization for a few years now. There are a couple of I/O schedulers adapted to different workloads (deadline, anticipatory, fair, no op) that can be set globally or per block device. The default scheduler since 2.6.18 is cfq (the "fair" one) and it supports manual prioritization too (using the "ionice" command).

In fact, I don't think Linux has ever suffered from the same I/O problems that people often complain about in Windows.

Re:I/O prioritisation (1)

Endymion (12816) | more than 6 years ago | (#18832733)

speaking of fairness in I/O and windows...

I know most unix flavors have had proper elevator-scheduling I/O for a long time, and you can hear it on the disks - it's fairly rare to hear hard random thrashing. When I tried similar tasks on windows, it's always doing some really nasty back-and-forth seeking, slowing everything way down.

Does windows even use a sane elevator I/O yet?

Re:I/O prioritisation (1)

jhines (82154) | more than 6 years ago | (#18832397)

If i remember correctly VMS used to give a job a temporary priority boost, upon completion of the IO for just this reason, to get it to the top of the cpu queue and get it going again.

Re:I/O prioritisation (1)

wild_berry (448019) | more than 6 years ago | (#18832545)

Version 3 of the Completely Fair Queueing IO scheduler was put into the mainline Linux Kernel in 2.6.13 and has been the default IO scheduler since 2.6.18. Jens Axboe, who wrote and maintains it, says that CFQ "has also grown more advanced features such as IO priority support, allowing a user to define the IO priority of a process with ionice, similar to CPU scheduling." in an interview at KernelTrap (http://kerneltrap.org/node/7637 [kerneltrap.org]).

Re:I/O prioritisation (2, Interesting)

Anonymous Coward | more than 6 years ago | (#18832605)

I read an article somewhere on Sun's site about I/O scheduling in SunOS, but unfortunately I can't find it now. I'll try to summarize it. The basic point was that when a task wants to write data, it can typically hand that data off to the kernel and then keep running. That means that it's making best use of system resources, since it will use the processor (for example) while the kernel writes to the disk in the background. When a task wants to read data, however, it usually can't do anything until the data's read from disk. So what SunOS will do is schedule I/O unfairly, prioritizing reads over writes, to minimize the amount of time that tasks spend blocked. (My experience is that it will even aggressively swap out pages to expand the write cache.) I wish I could find that article, I seem to remember it having some interesting nitty-gritty details about the way this is actually implemented, as well as some benchmarks from a customer.

Re:I/O prioritisation (1)

tokul (682258) | more than 6 years ago | (#18832609)

CPU power is fairly abundant these days so managing its usage is less of an issue than it used to be,
Remember that when multiuser system starts choking due to unoptimized process handling code. Linux can run on setups that differ from your single user workstation.

Re:I/O prioritisation (2)

Doug Neal (195160) | more than 6 years ago | (#18832787)

Remember that when multiuser system starts choking due to unoptimized process handling code. Linux can run on setups that differ from your single user workstation.
I know this all too well and had this in mind in my original post, most of my Linux work is on servers (although I do have a Linux workstation) - being able to reserve/restrict I/O and CPU per user or better still per website and database would be like, totally useful.

Re:I/O prioritisation (1)

dropadrop (1057046) | more than 6 years ago | (#18832687)

This is absolutely true. Nothing is more annoying then trying to do a low priority but large & stressing disk operation on a host where there are small high priority disk operations active all the time. Currently the low priority task will be handled like the others, causing everything important to a grinding halt.

Re:I/O prioritisation (4, Interesting)

Animats (122034) | more than 6 years ago | (#18832747)

Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation.

QNX has that, which is essential for real-time work.

QNX has the advantage that I/O, like almost everything else in QNX, is done via inter-process message passing operations. The message passing system uses priority queues, and so requests to file systems and devices get handled in priority order. So resource managers (file systems, device drivers, etc.) don't have to explicitly handle priorities; it's done for them. Some resource managers, like disk handlers, process multiple requests at a time so they can reorder them to optimize access, but network devices and such are FIFO at the resource manager level and priority ordered at the message level.

The end result is that you can compile or surf the web on a system that's also doing real time work without interfering with the real time tasks.

credit goes to Con Kolivas (3, Insightful)

phrasebook (740834) | more than 6 years ago | (#18832085)

This new scheduler may have 'Ingo Molnar' written all over it but I'll be giving the credit to ck!

Re:credit goes to Con Kolivas (-1, Flamebait)

Anonymous Coward | more than 6 years ago | (#18832361)

Ingo the ego maniac? Lots of egos among the kernel hackers.
People should read the thread ( http://kerneltrap.org/node/8059 [kerneltrap.org] )
to see the politics of Linux kernel development.

Re:credit goes to Con Kolivas (2, Insightful)

Anonymous Coward | more than 6 years ago | (#18832435)

Perhaps if we had more 'ego maniacs' like Ingo, stuff would work better?

I don't think your characterization is fair, I read the entire thread and all I see is someone doing his job.

Re:credit goes to Con Kolivas (0)

Anonymous Coward | more than 6 years ago | (#18832737)

Yeah right. doing his job.. Take all the credits. Diss other people's ideas, reimplement them and claim innovation. That used to define the Microsoft notion of innovation.

Re:credit goes to Con Kolivas (1)

wellingj (1030460) | more than 6 years ago | (#18832517)

But like it or not, Linux is going to go more places with Ingo's Real Time additions that without it.

Con has been vindicated and acknowledged (0)

Anonymous Coward | more than 6 years ago | (#18832363)

We want small, simple and efficient. Without comparing the respective patch sets I believe Ingo had valid concerns about fragmentation. There's nothing personal in it, sometimes you have to stand back from a proposed solution to accurately assess the problem.

Another developer could work on Tux and rewrite or extend it by having a fresh perspective and identifying an area where Ingos code could be improved. It's just the way it goes and in the end it's not bruised egos but technical merit that matters. Let's see how the code develops.

Re:credit goes to Con Kolivas (1)

Wonko the Sane (25252) | more than 6 years ago | (#18832599)

CFS vs SD isn't decided yet. Con is still releasing new versions as the people testing it find new bugs. Not to mention that SD is in the mm- tree and CFS isn't (yet)

Re:credit goes to Con Kolivas (5, Informative)

jb.cancer (905806) | more than 6 years ago | (#18832723)

Well, yes & no. It really was Con's RSDL that got ppl looking seriously into changing the mainline scheduler (there already being several out-of-mainline alternatives like nicksched, etc.)

Con's scheduler seemed to work better at higher workloads than the mainline, by just trying to distribute load evenly and not trying pretty interactivity tricks. But several ppl did say it didn't perform well for certain X client workloads. That's when Ingo's CFS was posted.

There really is 2 alternatives Ingo's CFS & Con's SDL that's being simultaneously tested by the kernel developers now, and none is accepted into mainline.

So it wouldn't be fair to say that CFS is *the* next Linux scheduler. It could be SDL as well.

Optimizations leading to less optimized code (3, Insightful)

suv4x4 (956391) | more than 6 years ago | (#18832107)

It's a very interesting phenomenon.

After enough number of iterations trying to optimize a software program to do everything very well compared to the base "naive" solution, you end up with an OS that does everything poorly.

It's counterintuitive, but we see it it every day around us.

Re:Optimizations leading to less optimized code (1)

tomstdenis (446163) | more than 6 years ago | (#18832233)

What are you smoking? This is supposed to be an improvement on the O(1) scheduler which was an improvement on the previous scheduler.

I don't know about you but my 2.6 based box runs just fine, certainly no slower than my previous 2.4 boxes, and it packs a lot more features, drivers and other internal changes/improvements.

Re:Optimizations leading to less optimized code (2, Insightful)

Chandon Seldon (43083) | more than 6 years ago | (#18832629)

After enough number of iterations trying to optimize a software program to do everything very well compared to the base "naive" solution, you end up with an OS that does everything poorly.

That's blatantly false. Sure, there are tradeoffs. There are also cases where a better algorithm is an outright win 100% of the time.

O(1) - what a huge misnomer (1, Informative)

Anonymous Coward | more than 6 years ago | (#18832133)

Doing a constant time reevaluation for n processes is still O(n) whether you do it individually or at once. Granted you don't have a pause as all processes are reevaluated but the amount of work the scheduler does is NOT independent of the # of processes.

Re:O(1) - what a huge misnomer (3, Informative)

Watson Ladd (955755) | more than 6 years ago | (#18832315)

Actually the Linux scheduler is O(1). Processes are placed on O(1) queues, and the oldest processes on each queue are compared to each other. The number of queues is fixed, so Linux only has 5 nice levels. This is a scheduler because newer processes on a queue do not deserve the time more then the processes ahead of them.

Re:O(1) - what a huge misnomer (1)

pclminion (145572) | more than 6 years ago | (#18832335)

I was going to agree with you but you're wrong. Process priority is the sum of an intrinsic priority and another value which increments as time goes on, in order to ensure that EVERY process can run at least SOME of the time. This means that at every scheduling pass, all of these dynamic priorities must be updated. That's OBVIOUSLY O(n).

Re:O(1) - what a huge misnomer (3, Informative)

Watson Ladd (955755) | more than 6 years ago | (#18832527)

No. You are wrong. It looks O(n) unless you notice that the processes back on the queue ran more recently then the ones at the front. Therefore they have lower priority then the ones at the front, so we only need to worry about the processes at the front of the queues. To evaluate these priorities we just store the last time they ran, and then subtract from the current time, then add that to the inherent priority.

Re:O(1) - what a huge misnomer (5, Informative)

ASBands (1087159) | more than 6 years ago | (#18832559)

Here's how Linux 2.6.x task scheduler works:

  • A "runqueue" keeps track of all essential tasks assigned to a given CPU [Queue is O(1) efficient]
  • Each runqueue contains two priority arrays, the active and the expired. As a task completes, it is moved (during the move, the new timeslice is calculated - the point of debate and largest fundamental change to the task scheduler) [Moving from static array to static array is O(1) efficient]
  • When the active priority array is finished, the expired array becomes the active array [Swapping 2 pointers is O(1) efficient]
  • The priority arrays are of fixed length 140, as that is the amount of priority levels Linux has [O(140) = O(1)]
The point of debate comes if two threads have the same priority, in which case they are put on a round robin in the priority array. However, the confusion comes in that the execution is O(n) (which makes sense if you think about it), but the scheduler itself handles these at O(1) efficiency.

Re:O(1) - what a huge misnomer (4, Insightful)

xenocide2 (231786) | more than 6 years ago | (#18832791)

It seems like the difference is that people call it an O(1) scheduler to reflect the fact that all the work required to be done at the end of a timeslice (record keeping, picking a new process etc) is done in constant time, and people arguing for O(n) are referring to the cost to schedule all processes. Nobody's saying they can find a schedule for n processes without looking at all of them.

Re:O(1) - what a huge misnomer (1, Informative)

Anonymous Coward | more than 6 years ago | (#18832343)

I think what it really means is that the time to select a job is O(1) time whenever the scheduler is called. And this doesn't seem to be unintuitive for me. Plus that's totally reasonable: in any time-sharing system where you have a time-slice, you have an unbounded amount of times that the scheduler is called. All you can possibly hope for is that each time the scheduler is called, it gives you a job in constant amount of time. (You cannot expect that to be sub-constant, really.)

Re:O(1) - what a huge misnomer (4, Informative)

ASBands (1087159) | more than 6 years ago | (#18832351)

If you want a good description of the old way it works (which is what you're talking about), [PDF Warning] here it is [trancesoftware.com]. As this PDF describes, the 2.4.x kernel uses a O(n) fast algorithm for choosing the next task, going through the array of tasks to re-evaluate them. The the Linux 2.6.x scheduler recalculates timeslices as each task uses up its timeslice. In this fashion, the kernel should always know the most important thread, it is simply the next element. Granted, you are doing the same amount of work, but the pauses for re-evaluation are spread out so that you don't get long lapses where nothing important happens.

Re:O(1) - what a huge misnomer (1)

ookabooka (731013) | more than 6 years ago | (#18832371)

I don't know. . .take something simple like FIFO (first in first out). The "scheduler" in that scenario does very little and would definitely be a O(1) algorithm since nothing is evaluated and it just keeps a list of who asked for attention first. I'm not familiar with the scheduler in question so I can't say for sure what it is, but if it evaluates 1 process every context switch, that is O(1) with regards to switching tasks as it is independent on how many processes exist. Overall if you have n jobs and you want to know how much overhead the scheduler adds, then yes it adds O(n) because it has to run at least n evaluations (it would probably be something higher unless each processes terminated after being run). Consequently if it evaluated all processes on context switches it would be O(n) for the individual switch and O(n^2) total overhead. . . .God I hate my Operating Systems class.

Re:O(1) - what a huge misnomer - INCORRECT (4, Informative)

ArbitraryConstant (763964) | more than 6 years ago | (#18832523)

You are incorrect because the act of scheduling the next process to run requires constant time regardless of how many processes there are. The O(1) refers to this, correctly.

The reason this is important, and the reason they are worried about the act of scheduling the next process rather than time complexity over all N processes, is that if scheduling the next process were not constant time, the percentage of time spent scheduling the next process would grow larger as you added more processes. That's fine if you're on a desktop system and you go from 100 to 200, but as the number starts getting large on (say) big servers, you start running into a situation where all your CPUs are perpetually tied up trying to figure out what process to run next.

O(n) time over N processes is not a problem; you've either got the CPUs or you don't. If you don't, then your performance will suck for reasons that are your own fault. If you do have enough CPUs, then the time spent scheduling will remain in step with the time spent running the processes, and this is fine. However, if the time spent scheduling grew, every time a process was scheduled, across all CPUs, then your $50000 server would be worthless because it wouldn't be able to handle the workloads you intended for it. All those expensive CPUs would sit there figuring out what process to run rather than running it.

So, it is NOT a misnomer. It accurately describes the portion of the problem that the developers are concerned with. It's O(n) over n processes, and that's great because it means you can get to n without breaking down.

How does this work? (1)

Watson Ladd (955755) | more than 6 years ago | (#18832137)

I would love to know how the algorithm actually works. Is this just a new data structure for organizing the processes, or is there other stuff involved?

The Multics scheduler always seemed very nice (4, Interesting)

Anonymous Coward | more than 6 years ago | (#18832211)

On Multics, tasks that used up their CPU were put in a lower priority run queue and tasks that didn't use up their CPU time were put into a higher priority run queue.

So interactive tasks naturally floated to the top and compute bound tasks naturally sank.

(Anyone remember Multics?)

Re:The Multics scheduler always seemed very nice (3, Funny)

Hognoxious (631665) | more than 6 years ago | (#18832231)

I used Multics, but I thought it prioritised the tasks by how thick the cards were.

Re:The Multics scheduler always seemed very nice (1)

Guy Harris (3803) | more than 6 years ago | (#18832711)

I used Multics, but I thought it prioritised the tasks by how thick the cards were.

I guess that explains the slow response I got - I mostly used Multics [multicians.org] from an IBM 2741 [wikipedia.org], so I didn't have any cards.

Re:The Multics scheduler always seemed very nice (0)

Anonymous Coward | more than 6 years ago | (#18832563)

This is the basis for the usual BSD style scheduling, as well.

What I really want (1, Insightful)

Anonymous Coward | more than 6 years ago | (#18832295)

I want a nice-like utility that works for more than just CPU time. I want to be able to do hddnice -5 firefox and networknice -7 pr0ndownloader. Why can't the principles of nice work for various forms of I/O scheduling as well as CPU scheduling? Does anyone know of a Unix that has these sorts of things. I just hate it when my system is unresponsive because some process, which is niced into all get out, is swapping like crazy, and is allowed to do so because "it's not using CPU time". Bah!

Re:What I really want (0)

Anonymous Coward | more than 6 years ago | (#18832621)

Cool idea(s). I suppose you'll be told that you don't understand how things work and that's what its unreasonable to ask for this. But it makes sense to me.

I love this. (1)

Progman3K (515744) | more than 6 years ago | (#18832399)

FTA:
>> [...] the core scheduler got smaller by more than 700 lines:

Someone who I respect tremendously once said
"Perfection is attained when you remove everything that is not essential"

I have to agree

Re:I love this. (1)

fimbulvetr (598306) | more than 6 years ago | (#18832623)


Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. ~Antoine de Saint-Exupéry

Re:I love this. (1)

Paul Crowley (837) | more than 6 years ago | (#18832731)

Antoine de Saint-Exupéry:

Il semble que la perfection soit atteinte non quand il n'y a plus rien à ajouter, mais quand il n'y a plus rien à retrancher.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

Completely fair? (4, Funny)

tji (74570) | more than 6 years ago | (#18832437)

No, I think I'll wait for the unbelievably fair scheduler, or perhaps the ridiculously fair scheduler.

"Stolen idea"? (comments below TFA) (0)

Anonymous Coward | more than 6 years ago | (#18832449)

Somebody into the kernel devs community care to read the user comments below TFA and explain what's going on to us mere mortals?

Interactive tasks (3, Insightful)

Zarhan (415465) | more than 6 years ago | (#18832601)

For this reason, I've been using Con Kolivas' patches to replace the scheduler. http://members.optusnet.com.au/ckolivas/kernel/ [optusnet.com.au] - very helpful especially if you don't have the fastest computer around. Also seems to help a bit with I/O - if my hard drive is trashing for whatever reason, interactive stuff still remains reasonably responsive. Or at least it doesn't make my mouse cursor skip...

Even so, I'd prefer to have IO better scheduled - ionice doesn't really seem to work at least for me.

Wait wait wait! What about fixing disk IO first? (2, Informative)

Anonymous Coward | more than 6 years ago | (#18832673)

There's a huge thread about broken disk IO in kernels > 2.6.17.
http://forums.gentoo.org/viewtopic-t-482731.html [gentoo.org]

How about fixing that first?

I've noticed all kinds of new schedulers coming out of the woodwork in 2.6. Optional kernel preemption. And also changes to the default timer (250 instead of 100, etc.). These are choices like I've never had in Linux before. Should be good, right?

But at a higher level there's been such brokenness.
K3B CD burning broke in 2.6.8 and wasn't fixed until 2.6.10.
Firewire performance (for me) started fluctuating after ~2.6.7. Although grabbing DV over firewire doesn't take much CPU time, for some reason having video preview enabled during capture may/may not cause dropped frames (on a modern amd64! pathetic!) depending on kernel version.
And the promise of 2.6 being responsive and feeling interactive under heavy load? Also ended after 2.6.7 for me.
Recent kernels, like 2.6.18, seem to take minutes to balance tasks out. Playing video during heavy CPU usage? For me it gives priority to everything else BUT the new video process for about 30 seconds. (CFM or anticipatory scheduler)

So I'm not sure if we should be welcoming all this new "choice" with scheduler-du-jour or try to give us a good default choice that actually works right first.

I agree (0)

Anonymous Coward | more than 6 years ago | (#18832769)

There's no development tree which has been a blessing and a curse. With practically everyone using 2.6 now, the devs should work on regressions and stability and then branch 2.7 for development of 2.8.

I'd also like to see ingos RT patches merged into mainline and some of the more esoteric stuff moved out into patchsets.

The new Linux scheduler must be intelligent (1)

linuxIsLife (1044762) | more than 6 years ago | (#18832739)

To be smarter the new Linux scheduler will use artificial neural networks and for speed-up it will be written in Lisp.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...