Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Removing the Big Kernel Lock

CmdrTaco posted more than 6 years ago | from the wait-i-thought-locks-made-it-secure dept.

Operating Systems 222

Corrado writes "There is a big discussion going on over removing a bit of non-preemptable code from the Linux kernel. 'As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. "This commit returned the BKL code to the 2.6.7 state of affairs in essence," began Ingo Molnar. He noted that this had a very negative effect on the real time kernel efforts, adding that Linux creator Linus Torvalds indicated the only acceptable way forward was to completely remove the BKL.'"

Sorry! There are no comments related to the filter you selected.

FIRST! (-1, Offtopic)

Helmholtz (2715) | more than 6 years ago | (#23446146)

muahahahah

I don't understand (1)

Nimey (114278) | more than 6 years ago | (#23446164)

Why did they remove the preemptable BKL?

RTFAing says that temporarily forking the kernel with a branch dedicated to experimenting with the BKL is being considered. Maybe they can call it 2.7...

Re:I don't understand (4, Interesting)

Vellmont (569020) | more than 6 years ago | (#23446302)


Why did they remove the preemptable BKL?

I'm not a kernel developer, but I'd say it's because there's widespread belief that the preemtable BKL is "the wrong way forward". Statements like these lead me to believe this:

"all this has built up to a kind of Fear, Uncertainty and Doubt about the BKL: nobody really knows it, nobody really dares to touch it and code can break silently and subtly if BKL locking is wrong."


In any large software project there's always a path to get from where you are, to where you want to be. It sounds like any version of BKL is considered ugly and causes problems, and patching it just won't work. In other words, fixing this part of the kernel isn't really possible, so they need to start over and change any code that relies on it to rely on something different entirely.

Bad interaction with the generic semaphores. (5, Informative)

Anonymous Coward | more than 6 years ago | (#23446634)

The recent semaphore consolidation assumed that semaphores are not timing critical. Also it made semaphores fair. This interacted badly with the BKL (see [1]) which is a semaphore.

The consensus was to not revert the generic semaphore patch, but to fix it another way. Linus decided on a path that will make people focus on removing the BKL rather than a workaround in the generic semaphore code. Also, Linus doesn't think that the latency of the non-preemptable BKL is too bad [2].

[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-05/msg03526.html
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8e3e076c5a78519a9f64cd384e8f18bc21882ce0

Re:I don't understand (4, Funny)

johannesg (664142) | more than 6 years ago | (#23447034)

Are you arguing for a microkernel style solution, sir? If so, I salute your bravery! ;-)

Re:I don't understand (4, Informative)

QX-Mat (460729) | more than 6 years ago | (#23446356)

new semaphore code was introduced that simplified locking. Unfortunately in many kernel situations it's proven to affect performance at around something like 40% - which isn't just considerable its disastrous.

rather than merge the old locking code back in, and reintroduce the many different locking primitives they had, someone decided to simply reenable the BKL - the downside of which is they have to either fix the regression caused by the simpler semaphore code (not likely, it's very simple and clean - everyone's favourite pet/child) or remove instances of where the semaphore code is likely to be called (the BKL).

Matt

LWN has a summary of the issue (3, Informative)

toby (759) | more than 6 years ago | (#23446470)

here [lwn.net] (for subscribers. I dare not post a free link here :)

Re:I don't understand (2, Interesting)

kestasjk (933987) | more than 6 years ago | (#23446722)

new semaphore code was introduced that simplified locking. Unfortunately in many kernel situations it's proven to affect performance at around something like 40% - which isn't just considerable its disastrous. rather than merge the old locking code back in, and reintroduce the many different locking primitives they had, someone decided to simply reenable the BKL - the downside of which is they have to either fix the regression caused by the simpler semaphore code (not likely, it's very simple and clean - everyone's favourite pet/child) or remove instances of where the semaphore code is likely to be called (the BKL). Matt
Couldn't they just ask the real-time developers to kindly find a real-time kernel to work on? Why try to make a non-preemptible kernel preemptible for the sake of real-time, if it affects non-real-time performance?

(Performance != Speed) // in an RT system (5, Insightful)

Arakageeta (671142) | more than 6 years ago | (#23446854)

That's a terrible excuse. There are many applications where a real-time Linux kernel is highly desired. Besides, it is important to note that real time systems do not focus on speed. This is a subtle difference from "performance" which usually caries speed as a connotation; it doesn't for a real time system. The real time system's focus is on completing tasks by the time the system promised to get them done (meeting scheduling contracts). It's all about deadlines, not speed. So from this point of view, the preemptible BKL, even with the degraded speed, could still be viewed as successful for a real time kernel.

Re:I don't understand (0)

Anonymous Coward | more than 6 years ago | (#23446874)

Couldn't they just ask the real-time developers to kindly find a real-time kernel to work on?

Sure. We'll stop kernel development while we're at it too.

Re:I don't understand (0)

Anonymous Coward | more than 6 years ago | (#23446768)

If the situation is really as simple as you describe it, isn't that very naive? There is a reason there are lots of different locking algorithms out there. Sometimes you want a fair lock, sometimes you don't. Sometimes you want to reschedule while waiting for the lock, sometimes you don't. If you take out all the lock algorithms and replace them with one, it's easy to see a loss in performance if the code is especially tuned for a particular type of lock.

Re:I don't understand (5, Informative)

diegocgteleline.es (653730) | more than 6 years ago | (#23446368)

Because these days the BKL is barely used in the kernel core, or so Linus says [lkml.org] : the core kernel, VM and networking already don't really do BKL. And it's seldom the case that subsystems interact with other unrelated subsystems outside of the core areas. IOW, it's rare to hit BKL contention - and in those cases, you want the contention period to be as short as possible. And spinlocks are the faster locking primitive, so making the BKL a spinlock (which is not preemptable) makes the BKL contention periods faster. A mutex/spinlock brings you "preemptability" and hides a bit the fact that there's a global lock being used sometimes at the expense of performance, which may be a good thing for RT/lowlatency users, but apparently Linus prefers to choose the solution that is faster and doesn't hid the real problem.

Re:I don't understand (3, Informative)

diegocgteleline.es (653730) | more than 6 years ago | (#23446442)

A mutex/spinlock brings you "preemptability"

Duh, I meant mutex/semaphore. And Linux semaphores have become slower, meanwhile mutexes still are fast as old semaphores were, as #23446368 says. The options were to move from a semaphore to mutexes or spinlocks, but Linus chose spinlocks because the RT/low-latency crow will notice it and will try to remove the remaining BKL users.

Re:I don't understand (0)

Anonymous Coward | more than 6 years ago | (#23447136)

The problem is obviously that they have a Crow working on the RT kernel. They should replace the Crow with an Owl as they can see further ahead.

Re:I don't understand (1)

Elladan (17598) | more than 6 years ago | (#23447268)

Or better yet, replace it with Tom Servo. What sort of a lunatic would have Crow do kernel development?

Re:I don't understand (5, Interesting)

lorenzo.boccaccia (1263310) | more than 6 years ago | (#23446446)

also, from reading the full arguments on the list, preempting the BKL has hit a dead end where going in any direction broken code in various other kernel parts
so they want to try this other road: make the BKL working as intended, add more debugging information and making each call of the BKL more visible to the kernel developers, and then remove the call to the BKL using other synchronization mechanism, changing the BKL client code to call other primitives. This won't fix the BKL, but renders it useless and removable.
it's good to see those decisions made inside the linux kernel, as being backward compatible is the road to madness that hindered the windows kernel.

Re:I don't understand (0)

Anonymous Coward | more than 6 years ago | (#23446762)

> being backward compatible is the road to madness that hindered the windows kernel.

Do you have any specific instances to cite of this happening in the NT kernel?

Re:I don't understand (5, Interesting)

lorenzo.boccaccia (1263310) | more than 6 years ago | (#23446876)

There are some, yes. For example, windows had for years a workaround for simcity inside his memory management api, to support the fact that previous versions of windows didn't cleared memory regions, so simcity (which was bugged, but working under win3.1) forced the api of win 95 to have a special case on which the free() call wouldn't really release the memory. Another bug on the way nt handled stream on ntfs has to be maintained for certain versions of microsoft office, which relied on the fact that streams were not deleted when files were deleted, so recreating a deleted file maintained the same stream.
the first one: http://www.joelonsoftware.com/articles/APIWar.html [joelonsoftware.com]
for the second one couldn't find any reference,I think I first read it on the russovich blog
the fact is, there are a lot of bug that couldn't be resolved because resolving them would broke backward compatibility, there are a lot of api which couldn't be cleaned for the same reason, for example the filesistem api, which had lead to the curious situation on which each program using different portions of the api shows a different file opener/file save dialog, and so on. There are a lot of strange things happening in windows, all the time: you could look at some of them on this blog: http://blogs.msdn.com/oldnewthing/default.aspx [msdn.com]

Re:I don't understand (2)

siride (974284) | more than 6 years ago | (#23447154)

You sure that's all in the kernel? I have a feeling that's mostly, if not entirely, in userland APIs, which is not uncommon to happen on Linux either. Witness X and the toolkits.

Re:I don't understand (4, Insightful)

SpinyNorman (33776) | more than 6 years ago | (#23446528)

If that is true then it sounds like a bad decision.

If the BKL code is rarely used then the general usage performance impact is minimal and the efficiency of a spinlock vs mutex is irrelevant. If this is not true then saying it is rarely used is misleading.

However for real-time use you either do or don't meet a given worst case latency spec - the fact that a glitch only rarely happens is of little comfort.

It seems like it should have been a no-brainer to leave the pre-emptable code in for the time being. If there's a clean way to redesign the lock out altogether then great, but that should be a seperate issue.

BKL is again a big source of latency (4, Informative)

Sits (117492) | more than 6 years ago | (#23446546)

Matthew Wilcox replaced the per platform semaphore code with a generic implementation [lwn.net] because it was likely to be less buggy, reduced code size and most places that are performance critical should be using mutexes now.

Unfortunately this caused a 40% regression in the AIM7 benchmark [google.com] . The BKL was now a (slower) semaphore and the high lock contention on it was made worse by its ability to be preempted. As the ability to build a kernel without BKL preemption had been removed [google.com] Linus decided that the BKL preemption would go. Ingo suggested semaphore gymnastics to try and recover performance but Linus didn't like this idea.

As the the BKL is no longer be preemptible [google.com] it is now a big source of latency (since it could no longer be interrupted). People still want low latencies (that's why they made the BKL preemptible in the first place) so they took the only option left and started work to get rid of the BKL.

(Bah half a dozen other people have replied in the time it's taken me to edit and redit this. Oh well...)

Re:I don't understand (1)

NoSCO (858498) | more than 6 years ago | (#23447068)

I understood considerably less than you it seems, and I consider myself a reasonably proficient linux junkie. "As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. *blink*

Linux? (1, Funny)

Anonymous Coward | more than 6 years ago | (#23446220)

What's linux?

Re:Linux? (0, Funny)

Anonymous Coward | more than 6 years ago | (#23446318)

it's part of the GNU operating system.

Re:Linux? (2, Funny)

Anonymous Coward | more than 6 years ago | (#23447224)

Specifically its just the bootloader for GNU Emacs, the finest most complete operating system known to mankind.

Re:Linux? (4, Informative)

Dunbal (464142) | more than 6 years ago | (#23446332)

What's linux?

The future.

Re:Linux? (-1, Troll)

Toreo asesino (951231) | more than 6 years ago | (#23446628)

The future.
jah, because it's gone from 0% to like nearly 1% in, er, the entire time it's existed! Wow! It's really making it's presence felt!

Re:Linux? (2, Insightful)

ichigo 2.0 (900288) | more than 6 years ago | (#23446686)

Perhaps its growth is exponential.

Re:Linux? (3, Insightful)

RiotingPacifist (1228016) | more than 6 years ago | (#23446714)

Its gone from 0% to 100% on my pcs, everything is relative.

1%? (2, Informative)

zogger (617870) | more than 6 years ago | (#23447012)

Linux is something like nearly half the servers in existence and most of the top supercomputers. Desktop is a slower road of course, but it is still chugging along slowly but surely. Look at apple, originally a big percentage of desktops, then dropped to almost nothing, now inching its way back up because it got good. Stuff changes. The linux desktop market is big enough for there to be a lot of credible choices just within "linux" itself, there are half a dozen or so really good desktops and dozens of pretty good desktop linuxes out there now. And word gets around. It will be like FF, 0% to now upwards of one quarter to one half depending on where you look around the planet. There's some magic number that is hard to pinpoint but once anything reaches a certain level of use/adoption it really takes off then, usually near as I can see around 10%, then it makes huge jumps. Bad car analogy time, toyota prius is now more than one million cars sold from zero cars ten years ago, and the first with a mass market hybrid system that they really tried to make and sell in decent numbers (compared to honda for example who only fooled around with their insight). Now look, all the major manufacturers either have their own hybrids or will have them shortly. Ten years, that's all it takes once some threshold hits and it looks "real" to joe consumer to go from exotic to normal. I think this year the asus eeePC made linux "real" to a lot of people, so I am expecting ubiquitous linux as a choice to be along shortly with most computer makers as an option. And that is leaving out all the gadgets people use day to day running some smallish embedded linux, gps systems, cellphones, etc.

Re:Linux? (0)

Anonymous Coward | more than 6 years ago | (#23447048)

You might be retarded. You're talking about "on the desktop".

Re:Linux? (3, Funny)

MobileTatsu-NJG (946591) | more than 6 years ago | (#23446424)

What's linux?
oh!! You saw that IBM ad a few years ago, too!

Re:Linux? (-1, Troll)

Anonymous Coward | more than 6 years ago | (#23446598)

jah, because it's gone from 0% to like nearly 1% in, er, the entire time it's existed! Wow! It's really making it's presence felt!

Re:Linux? (1)

RiotingPacifist (1228016) | more than 6 years ago | (#23446694)

Why is this funny^H^H^H troll?

Looks like "Worse is Better" all over (5, Insightful)

paratiritis (1282164) | more than 6 years ago | (#23446236)

Worse is Better [dreamsongs.com] (also here [wikipedia.org] ) basically says that fast (and crappy) approaches dominate in fast-moving software, because they may produce crappy results, but they allow you to ship products first.

That's fine, but once you reach maturity you should be trying to do the "right thing" (the exact opposite.) And the Linux kernel has reached maturity for quite a while now.

I think Linus is right on this.

Re:Looks like "Worse is Better" all over (0)

93 Escort Wagon (326346) | more than 6 years ago | (#23447028)

And the Linux kernel has reached maturity for quite a while now.
Trying...not...to...morph...into...Grammar...Nazi...

(hey, it's the only way I could participate in this discussion).

Fascinating. (0, Funny)

Anonymous Coward | more than 6 years ago | (#23446238)

Lets be sure to get every thread from the Linux kernel mailing list on the front page.

Re:Fascinating. (4, Funny)

ResidntGeek (772730) | more than 6 years ago | (#23446270)

If this bores you, every lkml thread would cause your head to explode.

Re:Fascinating. (0)

Anonymous Coward | more than 6 years ago | (#23446282)

Perhaps thats why I don't subscribe then?

Re:Fascinating. (3, Insightful)

pla (258480) | more than 6 years ago | (#23446388)

If this bores you, every lkml thread would cause your head to explode.

Hey, I consider myself a code junky (and yes, even consider the issue of the BKL somewhat interesting), but I realize that this topic has about as much appeal to the average Slashdotter as mowing the lawn.

Re:Fascinating. (3, Funny)

hostyle (773991) | more than 6 years ago | (#23446482)

Hey, its not easy keep every blade of grass within 0.3mm in length and maintain cross-colour length rules while keeping a close watch on weather-judged per-species expected length margins, you insensitive clod!

Keep off my lawn too, you pesky kernel hackers^WWkids ...

Re:Fascinating. (5, Insightful)

ResidntGeek (772730) | more than 6 years ago | (#23446812)

Slashdot's not supposed to be interesting to every reader all the time. If you want someone to cater to a least common denominator, you'd be better off somewhere else.

Re:Fascinating. (1)

joeman3429 (1288786) | more than 6 years ago | (#23446846)

it's called Digg

Re:Fascinating. (1)

tomhudson (43916) | more than 6 years ago | (#23446884)

Slashdot's not supposed to be interesting to every reader all the time. If you want someone to cater to a least common denominator, you'd be better off somewhere else.

s/somewhere else/Faux News/gi;

Fixed it for ya ;-)

Re:Fascinating. (0)

Anonymous Coward | more than 6 years ago | (#23447306)

This is only to weed out the true geeks from the fake geeks.

Geek wannabees: Move along, nothing interesting for you to see here.

Re:Fascinating. (2, Insightful)

IntlHarvester (11985) | more than 6 years ago | (#23447326)

Hey, I consider myself a code junky (and yes, even consider the issue of the BKL somewhat interesting),
but I realize that this topic has about as much appeal to the average Slashdotter as mowing the lawn.
This topic is probably mainly of historical interest. (BKL used to be one of those bread-n-butter slashdot stories in the early days)

The funny thing is that the reply quality here is quite high for technical topics, but over time slashdot management has found that retarded political threads are much more popular.

Translation? (1)

Pazy (1169639) | more than 6 years ago | (#23446262)

Is there any chance someone who understands this can translate it a bit? I may be a nerd but I dont do much with Kernel's or much coding and would really appreciate if someone could simplify this a bit so I could understand it.

Re:Translation? (4, Funny)

kcbanner (929309) | more than 6 years ago | (#23446286)

Its like rubbing cheetah blood on the engine of your car to make it go faster.

Re:Translation? (1)

maxume (22995) | more than 6 years ago | (#23446304)

So it makes things cooler?

Re:Translation? (5, Informative)

Burdell (228580) | more than 6 years ago | (#23446340)

When the Linux kernel first supported multiprocessor systems, it was done with a single lock protecting access to all the kernel (the Big Kernel Lock); the kernel could still only do one thing at a time. Over time, most sections of the kernel have introduced their own fine-grained locking and moved out from under the BKL, allowing many parts of the kernel to be running at the same time on multiple processors. The BKL has shrunk over time, but it still exists over a chunk of the kernel. The kernel hackers recently tried to replace the hard lock with a preemptable lock, but that had some bad interactions with the scheduler (which determines what process/kernel thread runs when), so Linus switched back to the old-style BKL.

Now, a group is trying to see if it is possible to weed out all the remaining uses of the BKL and replace them with localized locking for specific sections of the kernel. This is tricky, as there are side-effects of the BKL that are not always obvious.

Re:Translation? (1)

Pazy (1169639) | more than 6 years ago | (#23446358)

Thanks so much, looks live ive learned something today :D

Re:Translation? (1)

PeterKraus (1244558) | more than 6 years ago | (#23446382)

Thank you.

Wouldn't it be easier and mainly better to start all over? You know, like, remove that part of the code and code it all over again, see what is broken, and continue this way?

Re:Translation? (0)

Anonymous Coward | more than 6 years ago | (#23446432)

Wouldn't it be easier and mainly better to start all over?
You mean, hop in a time machine and go back to 1992?

Re:Translation? (0, Troll)

Hal_Porter (817932) | more than 6 years ago | (#23446746)

Windows NT was around before 1992 and it doesn't have a big kernel lock - individual resources are protected with spinlocks.

But that's because it was designed from the ground up to be SMP friendly, as opposed to being a clone of a 1970's operating system.

Re:Translation? (0)

Anonymous Coward | more than 6 years ago | (#23447164)

Which decade did VMS come from?

Re:Translation? (5, Informative)

Anonymous Coward | more than 6 years ago | (#23446548)

"That part of the code" is the difficult part. The BKL assumption is present in thousands of place all around the kernel, and nobody really know where. You can have two pieces of code, that looks totally unrelated, that happen to work because in all the code path leading to them the BKL is taken. Removing the BKL and "code it all over again" will create this new race condition.

There would be thousands of such, and you'll probably never succeed in debugging it.

The approach suggested in the article is to replace the BKL by a true lock, then "pushing it down", which means understanding WHY that code want the BKL, and get smaller locks instead in subroutines.

For instance, one piece of code could take the BKL because it will change 3 data structure. You could then remove the BKL and use, in the 3 part of code that changes those 3 structure, and use a finer grained lock for each of those.

By iterating this way, you should always get a somewhat working kernel, and slowly kill the BKL.

Re:Translation? (5, Informative)

LordNimon (85072) | more than 6 years ago | (#23446656)

Wouldn't it be easier and mainly better to start all over?

No.

You know, like, remove that part of the code and code it all over again, see what is broken, and continue this way?

It's not that simple. When it comes to locking, there is no "part of the code" that can be replaced. Locking governs interaction between two pieces of code, sometimes two pieces that are very different but have some small thing in common.

Besides, the kernel is too big to just start throwing parts of it out and redoing them from scratch. It's much better to make incremental improvements, because then the people working on them will actual learn how to solve the problem. The BKL is not just a coding problem, but also a people and project management problem.

Re:Translation? (0)

Anonymous Coward | more than 6 years ago | (#23446556)

Scheduler huh? Then it seems to me that this particular problem will go away when schedulers die.

In around 10 years we will have more processors than processes and threads, so each process will have its own private processor and no scheduling will be necessary (actually it will seldom be used, like HDD swap today with 4GB+ RAM). Think 100 to 1000 processors per machine.

Re:Translation? (3, Insightful)

tomhudson (43916) | more than 6 years ago | (#23446902)

Scheduler huh? Then it seems to me that this particular problem will go away when schedulers die.

In around 10 years we will have more processors than processes and threads, so each process will have its own private processor and no scheduling will be necessary (actually it will seldom be used, like HDD swap today with 4GB+ RAM). Think 100 to 1000 processors per machine.

Keep dreaming ...

With all those processors, you'll want to be saving energy, so you'll be aiming to turn off individual processors until needed, and run the remaining processors at full load, so you'll still need a scheduler, locks, etc.

And yes, it's possible even today to use up more than 4 gig of ram and have to hit swap.

Re:Translation? (1)

sticks_us (150624) | more than 6 years ago | (#23446794)

I didn't read all of the kerneltrap stuff, but isn't it true that the CONFIG_PREEMPT_RT [kernel.org] folks have made some good headway here?

I don't believe the -rt patch is expected to perform well as a drop-in replacement everywhere, or on systems that have management interrupts, but I think for people interested in real-time programming, the rt patch is a good starting point.

Re:Translation? (3, Interesting)

Anonymous Coward | more than 6 years ago | (#23446342)

Anytime you have more than one application running, they could get into an argument about who gets to use the serial port, the video display, memory, or drive storage. This is especially critical in multi-processor systems.

The answer is to allow sections of code to "lock" access for a brief duration -- "I'm working with this right now, don't anyone else touch it." Simple in theory, very difficult in concept.

Note that I'm speaking generically; I'm not an expert on the Linux kernel. Ideally, though, you want locks to be "granular" -- in other words, you only lock that specific hardware and/or portion of memory that you need exclusive access to. Apparently, the "big kernel lock" takes a brick wall and hammer approach, locking access (and claiming exclusive access during the lock, preventing anything from running). It's not granular.

If I'm wrong, someone else here can correct me. Like I said, I'm not an expert on the Linux kernel.

Re:Translation? (1)

Pazy (1169639) | more than 6 years ago | (#23446370)

Thanks for the help, always looking to learn new things.

Re:Translation? (4, Informative)

DarkOx (621550) | more than 6 years ago | (#23446658)

Your not wrong, and like you I am going to continue in an over simplified style so the non programs can understand. The part you are leaving out is why you want your locking to be granular.

The granularity is important because you want other threads(jobs) to beable to get something done. At some point there is this thing called a scheduler that assigns your thread to execute, because every job needs a CPU. You get to work until your time allotment is expired or you have to stop because something you need to continue is not availible, because its say "locked".

Think of this like working in a shop along side someone else. You have one set of tools, you need a little screw driver, and a big one to do your work. The other guy needs the little scew driver and a pair of pliers. You want him to put the screw driver down while he is using the pliers so that you can use it if you need to. If he instead puts it in his breast pocket you are going to have to wait to finish your job until he finishes his. Even though its your turn at the work bench(CPU) you can't do anything with it because you don't have what you need. So all you can do is yeild the rest of the time to the other task, and hope he finished up soon.

In the kernel world this really short circuits the work of the scheduler. It might want to give time to other threads and it will but they are going to just turn around and give that up because whichever thread is holding the BKL is likely the only one who can actually do any work. As an end user this means something like data gets read from your network card ok but your sound keeps skipping.

The tricky part with more granular locks though is avoiding circular conditions, these can crash the system. Imagine: Job One needs resource A and B and has A locked, its waiting for B. Job Two needs B and C and has B locked and is waiting on C. Job Three needs A and C and has C locked and is waiting on A. Unless the system can detect this condition which is hard to do in many cases none of these threads will ever be able to run. The kernel contributors likely have some work ahead to eliminate the BKL and not cause these types of problems.

Re:Translation? (2, Interesting)

suck_burners_rice (1258684) | more than 6 years ago | (#23447256)

Since removing the BKL will cause deadlock situations like the one you describe, perhaps a solution to this problem is to re-think the way locking is implemented. If a program knows that it will need access to resources A, B, and C, it could put in a request to reserve all three of those resources simultaneously. If the three resources are available at that moment, they will all be locked simultaneously, the task will execute, and then they will be unlocked simultaneously. But if one or more of the resources are not available at that moment, that task will simply stop executing (it won't be scheduled) until the first instance that all three become available. This way, a resource doesn't become locked until it is actually going to be used.

Re:Translation? (5, Informative)

Anonymous Coward | more than 6 years ago | (#23446364)

Ok so here's the deal:
Linux is a preemptive multi-tasking kernel. What this means is that a hardware interrupt like a keyboard click or the system timer will interrupt whatever is currently running on the CPU, and an interrupt handler in the kernel starts running code. In order to make sure that all the states of the kernel are consistent (ie: not corrupt), the different parts of the kernel are supposed to lock the data that they are using or modifying (ie, readlock or writelock) in case another code path gets run at the same time trying to modify the same data. It becomes even more important in a multi-cpu environment where locks have to be atomic (happen at the same time on all CPUs). So what you are supposed to do is only lock the resources you currently need (a file system drivers would only lock parts of the filesystem, not a character device). Because some programmers are lazy, or not sure what they are doing, they just use the big kernel lock which locks pretty much everything in the kernel. This is bad for multi-tasking and multi-processing because it means you can only have one codepath using the lock at a time.

Note: it's been a while since I've done kernel work, so I'm sure this is not 100% true, but hope it helps you understand.

Re:Translation? (1)

Pazy (1169639) | more than 6 years ago | (#23446378)

Woah, ive learned more today than in college since August lol

Re:Translation? (2, Funny)

weicco (645927) | more than 6 years ago | (#23446738)

And you learn more when you write your own (virtual) device driver which crashes your kernel and renders it to unbootable state :)

Not that I know anyone who has done so... or at least I wont admit it!

Re:Translation? (1)

larien (5608) | more than 6 years ago | (#23446572)

Getting the locks right can make a huge difference in performance. The problem for the developer is that he has to lock enough to prevent race conditions and data corruption, but not so much so as to destroy performance. As you say, developers have taken the "easy" route of locking everything because they can.

There should be very few things which lock up an entire kernel, even for a few nanoseconds.

Re:Translation? (3, Funny)

93 Escort Wagon (326346) | more than 6 years ago | (#23447066)

Ok so here's the deal:
Linux is a preemptive multi-tasking kernel. What this means is that a hardware interrupt like a keyboard click or the system timer will interrupt whatever is currently running on the CPU, and an interrupt handler in the kernel starts running code. In order to make sure that all the states of the kernel are consistent (ie: not corrupt), the different parts of the kernel are supposed to lock the data that they are using or modifying (ie, readlock or writelock) in case another code path gets run at the same time trying to modify the same data. It becomes even more important in a multi-cpu environment where locks have to be atomic (happen at the same time on all CPUs). So what you are supposed to do is only lock the resources you currently need (a file system drivers would only lock parts of the filesystem, not a character device). Because some programmers are lazy, or not sure what they are doing, they just use the big kernel lock which locks pretty much everything in the kernel. This is bad for multi-tasking and multi-processing because it means you can only have one codepath using the lock at a time.
Like putting too much air in a balloon!

Re:Translation? (1)

mikael (484) | more than 6 years ago | (#23446550)

The Big Kernel Lock was designed to only allow one CPU in a multiprocessor system to access the kernel at a time. This wasn't too bad for a two core system, but it becomes a very big problem when multiple CPU's are in use. In a large SMP system(say a rack of 4 core Intel Xeon's), all processes/threads on all 16+ CPU's could end up halting while waiting for one system call on one CPU to complete.

And the reasons for each CPU wishing access to the kernel might be for a completely different reason. One CPU might be wanting to access the hard disk drive, while another is making a shared-memory request, and yet another is sending data to the network card.

If the BKL is broken up and replaced by a lock for each subsystem, then the latency problem can be eliminated. Though, there is the risk of deadlock where any CPU held more than one lock at any time.

Re:Translation? (1)

maxwell demon (590494) | more than 6 years ago | (#23447144)

But couldn't the deadlock risk be removed by adding a strategy where you have to acquire all your locks at the same time, with a single call?

That is, have one call, say

lock(resource1|resource2|resource3);
to lock the three resources at once, and disallow any further call to lock before unlock was called by the process. The lock call would wait until all three resources are unlocked by other threads, and then lock all three of them atomically. This would prevent a deadlock, because you could only lock a resource when you don't currently hold a lock.

Fundamental kernel structures such as this... (-1, Flamebait)

Viol8 (599362) | more than 6 years ago | (#23446326)

... should have been sorted out and locked down a long time ago, this isn't exactly cutting edge technology. But then who am I kidding , look at the memory manager fun and games in early 2.6 versions.

Re:Fundamental kernel structures such as this... (0)

Anonymous Coward | more than 6 years ago | (#23446444)

You obviously don't know anything about Linux kernel development. So why bother giving your useless opinions on it? Seriously, do you think they are worth anything at all?

Re:Fundamental kernel structures such as this... (1)

turgid (580780) | more than 6 years ago | (#23447054)

He has a point. All of this stuff in Solaris, for example, was sorted out in Solaris 2.7 which came out well over a decade ago.

Linux is great, but its development is weird. Remember all the problems in 2.4 that didn't get sorted out until about 2.4.23? Then there's 2.6 which didn't become usable until 2.6.13 or so.

In my very humble opinion, there should be a 2.7.x development branch for these sorts of experiments. But, I'm not Linus, and I suppose I should write my own damned kernel instead of complaining.

Re:Fundamental kernel structures such as this... (1)

RiotingPacifist (1228016) | more than 6 years ago | (#23446576)

Yeah fixing this is sooooo easy, as a slashdot reader ofc I know how to do it better than those kernel mailing list noobs.

In fact I've got the code right here.
what you want to see it?
Oh look over there a flying car.

Re:Fundamental kernel structures such as this... (1)

rubycodez (864176) | more than 6 years ago | (#23446652)

no, finer grained locking has been evolving in Linux for over five years (and similar efforts are in the BSD). it will take years more work, nothing simple or obvious about it except to slashdot posters talking out of their ass.

Re:Fundamental kernel structures such as this... (1)

hxnwix (652290) | more than 6 years ago | (#23446744)

Fundamental kernel structures such as this... should have been... locked down a long time ago
Yeah, well that's the whole problem, isn't it? A long time ago, we had the Big Kernel Lock.

It was ALL LOCKED.

Now, we're trying to UNLOCK it. See? Locking semantics are tricky.

Re:Fundamental kernel structures such as this... (2, Interesting)

HiThere (15173) | more than 6 years ago | (#23447016)

I'll agree that this should have been shorted out long since. But it wasn't, and very few people though that it was reasonable to expend time on something so obviously unreasonable. (Multiprocessors were things like Illiac IV, huge monsters that were utterly impractical.)

Time passes, technology changes, and now it's become urgent to deal with this, so now it's being dealt with.

One should, perhaps, wonder what currently unreasonable problem should actually start being addressed RIGHT NOW!! The things I can think of divide neatly into two camps. 1) We don't know enough to even get started, and 2) It really seems utterly implausible, even given this example to work from. Unfortunately, somewhere in there is something that's being overlooked, and I don't know what. Kernel support for Actors? Kernel security to control Actors? Kernel support for Language parsing? They all seem implausible.

What is clearly needed soon is software that facilitates the use of multi-processor environments. Dataflow languages have promise, but there may be other reasonable choices. Possibly some interface that would easily allow different computer languages to work together, but that may be a real impossibility. Or even a language basically like C or C++. but extended with a "foreach" operator that allowed parallel execution of the loop body...but the language would need to be smart enough to tell what needed to be read locked and what needed to be write locked, and what could just be ignored. This implies that use of pointers is *severely* circumscribed! And if you're going to do that, you probably ought to have garbage collection. It might sound like I'm talking about Java, but that would be wrong. This language would need to be close to the metal, so it could adapt itself (at run time!!) to the local machine. And since we want as much efficiency as possible, virtual machines, interpreters, etc. are probably out.

I don't know of any language that meets the specs I've outlined, but I know of many languages that meet large parts of them. Of the languages I know, D (Digital Mars D) comes the closest, but its totally missing on even the parallelization that C/C++ have (as an add-on).

But that doesn't really say where the kernel should be going...except that possibly C isn't the best language to use for a multiprocessor environment. (But C is still the most efficient in most places, and it DOES have add-ons for parallelization...though whether you can use those add-ons in kernel programming isn't something I've investigated.)

Punchline (5, Informative)

Anonymous Coward | more than 6 years ago | (#23446422)

Since the summary doesn't cut to the chase, and the article was starting to get a little boring and watered-down, I read Ingo's post and here's what I got from it: the BKL is released in the scheduler, so a lot of code is written that grabs the lock and assumes it will be released later, which is bad. Giving it the usual lock behavior of having explicit release will break lots of code. Ingo created a new branch that does this necessary breakage so that the broken code can be detected and fixed. He wants people to test this "highly experimental" branch and report error messages and/or fixes.

Assuming everything is stable and correct, the next step is to break the BKL into locks with finer granularity so that the BKL can go the way of the dodo.

Interesting (0, Flamebait)

Anonymous Coward | more than 6 years ago | (#23446502)

For years Linux users have bashed *BSD with the Giant Lock stating that Linux had it removed years ago. It appears that Linux still has parts of their lock still present. The point here is that you shouldn't throw stones in glass houses.

PS: I am sure I will be marked as a troll. For the record; this is a point to stop the flame wars. Yes, netcraft has confirmed the Giant Lock.

Re:Interesting (4, Funny)

RiotingPacifist (1228016) | more than 6 years ago | (#23446590)

Yeah we got rid of the Giant lock, this is just the big lock, totally different things.

What does this mean for us mere mortals? (1)

analog_line (465182) | more than 6 years ago | (#23446518)

Will this affect anything I do if I am eventually given an option to install this kernel version? (Or am presented with a distro that has this kernel as the default?)

I know (or think I know) low latency is important for audio work, and I know people who do a lot of audio work under Linux, should I be giving them aheads up to avoid upgrading their kernel until this gets fixed, or should I start looking for unofficial, special low latency versions of the kernel to recommend to them?

Re:What does this mean for us mere mortals? (1)

dziman (415307) | more than 6 years ago | (#23446600)

Test it out and let others know your results. Useful to you, your friends, and to the developing community.

It means pain for long term gain (1)

Sits (117492) | more than 6 years ago | (#23446640)

I imagine a kernel will come out that just has uses the BKL far less (I don't think it will be a compilation option). There is a risk of instability (especially if you are using SMP/preemption) while overlooked code that need locking is sorted out (this could lead to deadlocks or in an extreme case memory corruption). Over time this risk should decrease.

This work won't go into 2.6.26 (it's too late). It may not even go into 2.6.27 (it's been done outside of the mainline tree). This may mean that until it is done kernel from 2.6.25 may have better "worse case" latencies than following kernels until this work goes in. Once it does go in that kernel may have even better "worse case" latencies than 2.6.25.

For folks not doing audio recording (listening to your MP3s doesn't count) and without the need for hard realtime the worse latencies in 2.6.26 are too small to matter.

However if you can afford to risk your machine testing this experimental work will result in issues being found and fixed quicker and a better end result for people without old hardware that few people still have.

Re:What does this mean for us mere mortals? (0)

Anonymous Coward | more than 6 years ago | (#23446676)

If Ingo's post is any indication, this new branch is going to be very unstable at first. That said, you can test it out and report problems. Of course, be prepared for it to hang a lot. And you should back up your data, etc. (So probably not on a production system...)

I would expect that if this branch is successful, we should be seeing it merged back eventually.

Re:What does this mean for us mere mortals? (1)

ttldkns (737309) | more than 6 years ago | (#23446708)

Unless you are in a position where you compile your own kernel every few weeks then this wont affect you in any noticible way.
 
Im sure all the major distros will be watching whats happening here and making sure it doesnt affect their end users. From what i've read the changes mean better performance at the expense of latency but they're working to elimintate the BKL all together.
 
By the time your distros next kernel upgrade comes around im sure none of this will matter. :)

Re:What does this mean for us mere mortals? (1)

eh2o (471262) | more than 6 years ago | (#23446756)

AFAIK you already need a patched or at least properly configured kernel to get really good latency response. Planet CCRMA and the Debian Multimedia Project are two distros that come with LL kernels. Obviously these distros will stay on older versions until the BKL situation is resolved.

Re:What does this mean for us mere mortals? (1)

LVSlushdat (854194) | more than 6 years ago | (#23446914)

Don't forget the UbuntuStudio derivative.. Been doing a RT kernel option since 7.04, to my knowledge..

Re:What does this mean for us mere mortals? (1)

Alex Belits (437) | more than 6 years ago | (#23446786)

Unless they run real-time audio processing applications, it won't matter.

If they DO run real-time audio processing applications, they most likely have their kernel specifically configured for it, and won't get updates until developers will make sure that there is no additional latency introduced.

Keep these on Front Page... (5, Insightful)

TheNetAvenger (624455) | more than 6 years ago | (#23446566)

Keep these on Front Page...

This is the type of stuff that needs to be kept in the news, as the people who post here often have no understanding of, and the ones that do, have the opportunity to explain this stuff, bringing everyone up to a better level of understanding.

Maybe if we did this, real discussions about the designs and benefits of all technologies could be debated and referenced accruately.. Or even dare say, NT won't have people go ape when someone refers to a good aspect of its kernel design.

Re:Keep these on Front Page... (-1, Troll)

Anonymous Coward | more than 6 years ago | (#23446836)

"Maybe if we did this, real discussions about the designs and benefits of all technologies could be debated and referenced accruately."

Maaaaaaaaaaahahahahaa. Are you for real?

You seriously think knowledge of preemptive locking mechanisms in multi-tasking kernel architectures will make people educated in decisions about global warming, the prospects of space travel or the privacy implications of the digital society?

Wake up, propeller boy.

Mod parent up (0, Redundant)

HiThere (15173) | more than 6 years ago | (#23447196)

Mod parent up

Is this story in English? (0, Redundant)

Evildonald (983517) | more than 6 years ago | (#23446610)

No really. Is it? BK-Whatsit?

Re:Is this story in English? (1)

davolfman (1245316) | more than 6 years ago | (#23446814)

I think it has something to do with kernel multi-threading. Beyond that my head starts to hurt.

Sounds like the Linux kernel needs some tests... (2, Informative)

Crazy Taco (1083423) | more than 6 years ago | (#23446692)

He noted that the various dependencies of the lock are lost in the haze of 15 years of code changes, "all this has built up to a kind of Fear, Uncertainty and Doubt about the BKL: nobody really knows it, nobody really dares to touch it and code can break silently and subtly if BKL locking is wrong."

Wow. It sounds like it's about time someone on the kernel team reads Working Effectively With Legacy Code [amazon.com] by Michael Feathers.

I'm a software developer myself on a very large project myself, and this book has absolutely revolutionized what I do. Having things break silently in the kernel is a sure sign that dependency problems in the code exist, and most of this book is about ways to break dependencies effectively and get code under test. And that's the other thing... if they aren't writing tests for everything they do, then even the code they write today is legacy code. Code without tests can't be easilly checked for correctness when a change is made, can fail silently easilly, and can't be understood as easilly.

That's what this book is about, and if things in the kernel have deteriorated to such a state then they need to swallow their pride and take a look at resources designed to cope with this. I know they are all uber-coders in many respects, but everyone has something they can improve on, and from the description they give of their own code, this is their area for improving.

Re:Sounds like the Linux kernel needs some tests.. (4, Interesting)

Alex Belits (437) | more than 6 years ago | (#23446858)

And that's the other thing... if they aren't writing tests for everything they do, then even the code they write today is legacy code. Code without tests can't be easilly checked for correctness when a change is made, can fail silently easilly, and can't be understood as easilly.
On the other hand, code WITH tests also can't be easily checked for correctness when a change is made. There is only very small scope of possible mistakes that a test can detect, and if you will try to make test verify everything, test will grow larger (and buggier, and more incomprehensible) than your code. It's also possible that intended behavior of the code and expected behavior that the test checks for, diverge because of some changed interface. Tests help with detection of obvious breakage, but you can never rely on anything just because it passed them.

In other words:

TESTS DON'T VERIFY THAT YOUR CODE IS NOT BUGGY. YOU VERIFY THAT YOUR CODE ISN'T BUGGY.

Re:Sounds like the Linux kernel needs some tests.. (1)

gilesjuk (604902) | more than 6 years ago | (#23446910)

You do wonder if they need some proper test strategy to test regression etc..

Also, I wonder if the Linux kernel can carry on expanding or if it's time for the form of the kernel to change.

I know people like the monolithic kernel, but lack of change does not promote new techniques. Doesn't have to be a microkernel or have to fit in any existing box.

Re:Sounds like the Linux kernel needs some tests.. (5, Insightful)

pherthyl (445706) | more than 6 years ago | (#23446912)

Whatever your large project is, I'm willing to bet it's nowhere near as complex as the kernel. Whenever you get the feeling that they must have missed something that seems obvious, you're probably the one that's wrong. No offense, but they have a lot more experience dealing with unique kernel issues than you do.

You talk about unit testing, but how exactly are you going to unit test multi-threading issues? This is not some simple problem that you can run a test/fail test against. These kinds of things can really only be tested by analysis to prove it can't fail, or extensive fuzz testing to get it "good enough"..

Tough to test drivers for hardware you don't have (4, Informative)

Sits (117492) | more than 6 years ago | (#23446930)

It's hard to test whether you've broken a driver when you don't have the hardware to test with. Perhaps the future will be Qemu emulation of all the different hardware in your system : )

This is not to say that there need to be tests for things that can be caught at compile time or run time regardless of hardware but there is only so far you can take it.

It's not like the kernel doesn't have any testing done on it though. There's the Linux Test Project [sourceforge.net] which seems to test new kernel's nightly. If you ever look in the kernel hacking menu of the kernel configuration you will see tests ranging from Ingo Molnar's lock dependency tester [mjmwired.net] (which checks to see locks are taken in the right order at run time), memory poisoning, spurious IRQ at un/registration time, rcu torture testing, softlockup testing, stack overflow checking, marking parts of the kernel readonly, changing page attributes every 30 seconds... Couple that with people like Coverity [coverity.com] reporting static analysis checks on the code. Tools like sparse [kernel.org] have been developed to try and so some of the static checks on kernel developer machines while they are building the code.

But this is not enough. Bugs STILL get through and there are still no go areas of code. If you've got the skills to write tests for the Linux kernel PLEASE do! Even having more people testing and reporting issues with the latest releases of the kernel would also help. It's only going to get more buggy without help...

This is why monolithic kernels do real-time badly (5, Insightful)

Animats (122034) | more than 6 years ago | (#23446928)

This task is not easy at all. 12 years after Linux has been converted to an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ lock_kernel() critical sections and 800+ ioctls. They are spread out across rather difficult areas of often legacy code that few people understand and few people dare to touch.

This is where microkernels win. When almost everything is in a user process, you don't have this problem.

Within QNX, which really is a microkernel, almost everything is preemptable. All the kernel does is pass messages, manage memory, and dispatch the CPUs. All these operations either have a hard upper bound in how long they can take (a few microseconds), or are preemptable. Real time engineers run tests where interrupts are triggered at some huge rate from an external oscillator, and when the high priority process handling the interrupt gets control, it sends a signal to an output port. The time delay between the events is recorded with a logic analyzer. You can do this with QNX while running a background load, and you won't see unexpected delays. Preemption really works. I've seen complaints because one in a billion interrupts was delayed 12 microseconds, and that problem was quickly fixed.

As the number of CPUs increases, microkernels may win out. Locking contention becomes more of a problem for spinlock-based systems as the number of CPUs increases. You have to work really hard to fix this in monolithic kernels, and any badly coded driver can make overall system latency worse.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?