Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Too Perfect a Mirror

timothy posted about a year and a half ago | from the computer-behaving-badly dept.

KDE 192

Carewolf writes "Jeff Mitchell writes on his blog about what almost became 'The Great KDE Disaster Of 2013.' It all started as simple update of the root git server and ended up with a corrupt git repository automatically mirrored to every mirror and deleting every copy of most KDE repositories. It ends by discussing what the problem is with git --mirror and how you can avoid similar problems in the future."

cancel ×

192 comments

Sorry! There are no comments related to the filter you selected.

Lean how your tool works? (5, Insightful)

gweihir (88907) | about a year and a half ago | (#43262511)

Preferably, before using them? This sounds very much like plain old incompetence, possibly coupled with plain old arrogance. Thinking that using a version control system does absolve one from making backups is just plain old stupid. Then, with what I have seen from the KDE project, that would be consistent.

Re:Lean how your tool works? (5, Insightful)

maxwell demon (590494) | about a year and a half ago | (#43262559)

Also, mirrors are not backups. Mirrors are intended to be identical to the original, so mirroring worked as expected. How should the software know that the removal of most repositories was not intentional?

Re:Lean how your tool works? (0)

gweihir (88907) | about a year and a half ago | (#43262613)

Indeed. Mirrors, RAID, version control, all are _not_ backups. Anybody halfway competent knows that. The detailed analysis just shows these people had no clue. Well, maybe they will be a bit more careful and professional now.

Re:Lean how your tool works? (0)

Anonymous Coward | about a year and a half ago | (#43263061)

The file system became corrupted. Is it too much to ask that a mirror doesn't automatically copy damaged files? Shouldn't this be the simplest type of corruption to prevent?

Re:Lean how your tool works? (4, Insightful)

gweihir (88907) | about a year and a half ago | (#43263251)

Yes, it is too much. How would the mirror operation ever know without full checks on everything? Quit asking for nanny-software that treats its users as incompetent and illiterate. Is it too much to ask for the admins to actually have a brief look at the description of the operation they are using as their primary redundancy mechanism? I don't think so. If they had done this very basic step, they would have known to run a repository check before mirroring. If they had any real IT knowledge, they would have known that mirrors are not backups and that you need backups in addition.

Also, from what I gather from their grossly incomplete "analysis" is that they had a file that read back differently on multiple reads (not sure, they seem not to have checked that), which is not a filesystem corruption (the OS checks for that on access to some degree), but a hardware fault. Filesystems and application software routinely do not check for that. It is one of the reasons to always do a full data compare when making a backup.

Re:Lean how your tool works? (0)

Anonymous Coward | about a year and a half ago | (#43263571)

We are talking about the mirror. There is more than one screwup here. The failure to actually have a backup is the first. The failure to run an intelligent mirror is the second.

How would the mirror operation ever know without full checks on everything?

Is a checksum too much to ask for (after each update)? Oh wait, git already does this. So all a mirror has to do is check it. Why shouldn't they have that functionality?

Re:Lean how your tool works? (1)

Artifakt (700173) | about a year and a half ago | (#43262975)

You got in quick with a valid point, and completely shot yourself down with unsupported opinions. Why? Why say, in effect, "This is a proveably avoidabble mistake, and now I'm going to throw around vague hints of some totally unspecified complaint list, full of sound and fury, but signifying nothing in particular.", and so make everyone ignore the part that is both a defensible point and the only point actually pertenant to the article? Why shoot yourself in the foot like that?

Re:Lean how your tool works? (0)

gweihir (88907) | about a year and a half ago | (#43263143)

Huh? Since when do opinions need support? But I admit that I like from time to time I like to just check the effects of such statements. I have been at karma-cap for about 10 years now (except the one time where I kept pointing out their obvious reasoning errors to a bunch of religious nuts and lost all 50 karma points in a single thread), so there is really not a lot I will lose.

I would also like to point out that the incompetence and arrogance of the KDE team is quite visible once you investigate a bit of their history. It is also relevant and related, as this incident indicates the opinion I have gotten from the outside was spot-on for this aspect of the project, making the assumption it is pervasive more likely.

You may also have noticed that some person just saying "make backups" got modded down into oblivion. Sometimes the only thing that ./ moderation does is to show that many, many people with moderation points are fundamentally stupid.

Re:Lean how your tool works? (4, Interesting)

vurian (645456) | about a year and a half ago | (#43263759)

"I would also like to point out that the incompetence and arrogance of the KDE team is quite visible once you investigate a bit of their history." Actually, if you would investigate the history of the KDE sysadmin team you would find out that this handful of volunteers are doing a job that many full-time, well-funded sysadmins cannot rival. And.. Anyone who talks about "the KDE team" as if it's a single, monolithic entity doesn't know what they're talking about.

Should have gone with windows.. (0)

Anonymous Coward | about a year and a half ago | (#43263395)

Doesn't everyone knows about the file system corruption that happens often on linux ext4 formatted systems?

Oh well.. I guess the neckbeards have been successful in blaming the victims instead of ext4 devs.

Not git related (5, Insightful)

Rob Kaper (5960) | about a year and a half ago | (#43262513)

This is not a problem with git --mirror: rsync or any other mirroring tool would end up in the same situation.

It's up to the master to deliver the goods and upgrading a master should include performing a test run as well as making a backup prior to the real upgrade. This was a procedural failure, not a software failure. But good to hear disaster was averted.

Re:Not git related (0)

Anonymous Coward | about a year and a half ago | (#43262567)

If emacs devs had that attitude they would never have implemented autosave.

It should be made hard to well and truly destory datasets.

Re:Not git related (1)

garyebickford (222422) | about a year and a half ago | (#43262773)

'destory' - I like that. It should be a word. :) Like the opposite of history? The removal of one's history. In fact, it applies rather well in this case. "The repository was destoried."

Re:Not git related (0)

Anonymous Coward | about a year and a half ago | (#43262807)

I pulled a Homer!

Re:Not git related (3, Insightful)

Carewolf (581105) | about a year and a half ago | (#43262595)

True, but git does have a mechanism for checking integrity, and the discussion here is where you should use the fast git --mirror which has no checks, and where the slower mechanism which does fits in.

Re:Not git related (3, Interesting)

gweihir (88907) | about a year and a half ago | (#43262681)

You can --mirror any time. If you actually have backups, not just mirrors and hope.

Re:Not git related (2)

gweihir (88907) | about a year and a half ago | (#43262647)

Indeed. Git is blameless here. Git also is not a backup tool, you need backups in addition, just for cases like this one.

No Git also failed (5, Informative)

Anonymous Coward | about a year and a half ago | (#43262699)

The files were corrupted, Git didn't report squat about the problems. The sync got different versions each time. Sure there are two layers of failure here, but one of them certainly is Git.

What he's saying is simple, Torvalds comment is not completely true:
"If you have disc corruption, if you have RAM corruption, if you have any kind of problems at all, git will notice them. It’s not a question of if. It’s a guarantee. You can have people who try to be malicious. They won’t succeed. You need to know exactly 20 bytes, you need to know 160-bit SHA-1 name of the top of your tree, and if you know that, you can trust your tree, all the way down, the whole history. You can have 10 years of history, you can have 100,000 files, you can have millions of revisions, and you can trust every single piece of it. Because git is so reliable and all the basic data structures are really really simple. And we check checksums."

He's saying that if the commits are corrupted:
"If a commit object is corrupt, you can still make a mirror clone of the repository without any complaints (and with an exit code of zero). Attempting to walk the tree at this point will eventually error out at the corrupt commit. However, there’s an important caveat: it will error out only if you’re walking a path on the tree that contains that commit. "

So there's a clear room for improvement. Sure the fault was a corrupt file, but the second layer of protection, Git's checking, ALSO FAILED. Denial isn't helpful here, Git should also be fixed.

Re:No Git also failed (3, Insightful)

gweihir (88907) | about a year and a half ago | (#43262733)

Well, so this was _not_ a git failure, as there was an explicit warning that it does not cover this case. Not the fault of git but those that did not bother to find out. That a "mirror" operation does not check the repository is also no surprise at all.

Incidentally, even if git had failed, that is why you have independent and verified backups. A competently designed and managed system can survive the failure of any one component.

But it is SUPPOSED to (2, Insightful)

Anonymous Coward | about a year and a half ago | (#43262791)

"Not the fault of git but those that did not bother to find out"

No, Git has the integrity check, the integrity check didn't work. If the integrity check had worked as claimed then their backups were solid.

I know people are saying "keep backups", but they're really missing the point. A backup is a copy of something, the more up to date the better, better still if it keeps a historic set of backups. Perhaps with some sort of software to minimize the size, perhaps only keep changes..... you can see where I'm going with this.

Git sync to a lot of drives IS A BACKUP. It is exactly what an ideal backup should be, historic, up to date, minimizes storage. What is that system if it isn't an automatic backup!

Except for this bug, which needs to be fixed, and a little less faith in git too would also be a good thing.

It's really no different than if you use the backup software, and it made careful backups and kept historic copies, and then one day your disk got corrupted, you promptly went to your backups only to find the backup software had been chomping those because it didn't notice the integrity was corrupt and had happily been corrupting the backups it was keeping.

So I see comments saying they didn't have backups OMG! But no, their problem was they only used ONE TYPE OF BACKUP SOFTWARE Git sync. I bet all of you use only ONE type of backup software and are equally vulnerable to this failure.

Re:But it is SUPPOSED to (4, Informative)

gweihir (88907) | about a year and a half ago | (#43262883)

Git does not have the magic "integrity check" on making mirrors. If they had bothered to look at the documentation they would have known. If they has thought about it for a second, they would have realized that expensive integrity checks might be switched off on a fast mirror operation. If they had even be a bit careful, they would have checked the documentation and known. They failed in every way possible.

Stop blaming the tool. This is correct and documented behavior. Start blaming the people that messed up badly.

And no, nothing done within the system being backed up is a backup. A backup needs to be stored independent of the system being backed up. Stop spreading nonsense.

So don't trust git (0, Troll)

Anonymous Coward | about a year and a half ago | (#43263007)

"Git does not have the magic "integrity check" on making mirrors"

Right, so, it returns OK (0), yet the commit may be corrupt, it hasn't walked the full tree, and it may corrupt all copies. Good job you warned me about this flaw! I know to stick with p4s!

"Stop blaming the tool. This is correct and documented behavior. Start blaming the people that messed up badly."

Your backup tool is taking backups of the corrupt archive. Keep it independent or not, its corrupted when you come back to it.

The lesson here is not to trust one piece of software.

Re:But it is SUPPOSED to (0)

vurian (645456) | about a year and a half ago | (#43263463)

Stop defending the tool. The tool is shit. Start praising the KDE sysadmins who are volunteers, all of them, and who are doing their job better than any professional sysadmin I've ever seen.

Re:But it is SUPPOSED to (2, Interesting)

Anonymous Coward | about a year and a half ago | (#43263517)

Git does not have the magic "integrity check" on making mirrors.

Why on earth not?

If they had bothered to look at the documentation they would have known.

There's no mention of this in any of the git-clone, git-push, git-pull or git-fetch man pages on my system, at least not near any instance of the word "mirror".

If they has thought about it for a second, they would have realized that expensive integrity checks might be switched off on a fast mirror operation.

Why? The point of the mirror option (at least as far as the documentation mentions) is to propagate all branch additions/deletions/forced updates automatically, not to make it fast. Git is advertised as having strong integrity checking as a feature, so why would you assume that would ever be turned off, except maybe with an explicit --no-check-hashes option?

If they had even be a bit careful, they would have checked the documentation and known. [...] This is correct and documented behavior.

Not documented in any of the obvious places to look, at least. Maybe if they'd bothered to read literally the entire Git documentation they might have found a mention of this somewhere, but reading the entire documentation every time you start using a new option just in case there might be some special non-obvious caveat goes way beyond "even a bit careful".

And no, nothing done within the system being backed up is a backup. A backup needs to be stored independent of the system being backed up.

The whole point of the mirrors is that they're not the same system as the original.

Re:No Git also failed (2)

jankoh (2547488) | about a year and a half ago | (#43262781)

Did you read the whole article?
Even the part about "git fsck"?
I just assume, that it was a design choice of Linus, NOT to run fsck each time, when performing let's say, mirror.
Anyway, you can adjust just your sync scripts to include the fsck and carry on.
(or better yet, run git fsck after each filesystem fsck???)

Re:No Git also failed (2)

gweihir (88907) | about a year and a half ago | (#43262811)

Indeed. And it is absolutely no surprise that a fast mirror operation does not do a full consistency and data check. The most you can expect is a check whether data was copied correctly, and even for that you should check the documentation to make sure.

Also, not knowing that backups are both mandatory and not somehow "automagically" done is basic IT operations knowledge. These people did not bother to find out and now blame git, when it is only their own lack of skill they have to blame.

And the other thing (0)

Anonymous Coward | about a year and a half ago | (#43262865)

"Anyway, you can adjust just your sync scripts to include the fsck and carry on."

And what if the corruption occurs after the fsck and before the sync?
The git sync shouldn't return OK if the commit object is corrupt. It's a bug, it needs fixed, no big deal, and no reason to defend a simple bug as though its a feature! Adding an fsck call is a temp workaround, but for solid faith in git this needs to be fixed.

But also I think a healthy lack of faith in a backup software (even if git's making the backup) is important. How many of those nightly backups could be silently corrupted by a bug in the backup software! Your disk fails, you try the backups and ...

Re:No Git also failed (1)

garyebickford (222422) | about a year and a half ago | (#43262787)

I recently read an article (sorry, don't recall where) that said that Git was a 'functional data structure' akin to a functional programming language, and that was why it was so reliable.

Re:Not git related (0)

Anonymous Coward | about a year and a half ago | (#43262815)

I agree. I'm more familiar with Windows, but my thinking here is screw the mirrors.

What you really need is to make sure that your primary backups are not corrupted so that a restore is possible. Make a backup copy on a second system and run consistency checks before backing it up. That way you aren't affecting the primary system with a resource intensive process and you make sure that you have a good backup to restore from. You can always rebuild the mirrors after the restore process.

Re:Not git related (0)

Anonymous Coward | about a year and a half ago | (#43262905)

The main function of version control IS backup, and git has failed that task.

Re:Not git related (2)

qbast (1265706) | about a year and a half ago | (#43263589)

No, the main function of version control is ... version control.

Re:Not git related (1)

sadboyzz (1190877) | about a year and a half ago | (#43263471)

Yes. But silent data corruption is obviously a problem of the filesystem, ext4 in this case. Too bad btrfs is still years from stable.

The 'K' stands for ... (4, Funny)

Anonymous Coward | about a year and a half ago | (#43262517)

You know, calling it a disaster really depends on your point of view.

Re:The 'K' stands for ... (1)

gbjbaanb (229885) | about a year and a half ago | (#43263147)

nearly was, if KDE disappeared completely we'd all have to use Gnome... which would be a true definition of the word.

RAID is not a backup (0)

Anonymous Coward | about a year and a half ago | (#43262527)

Neither are online mirrors.

A thousand times. (Unless online mirrors roll back (1)

raymorris (2726007) | about a year and a half ago | (#43262591)

A thousand times this. Say it with me - a mirror is not a backup. A RAID mirror is not a backup, a cluster mirror is not a backup, and a git mirror is not a backup.

Unless of course the mirroring system integrates rollback to earlier mirrors, something like Clonebox for example.

Re:A thousand times. (Unless online mirrors roll b (1)

Anonymous Coward | about a year and a half ago | (#43262621)

Rollbacks are also not backups.
Practice what you preach.

Re:A thousand times. (Unless online mirrors roll b (0)

Anonymous Coward | about a year and a half ago | (#43262639)

Common sense would dictate that git manages its own backup automatically anyways, so you don't need additional ones. Well, that didn't work out that great in this case.

Re:A thousand times. (Unless online mirrors roll b (3, Insightful)

gweihir (88907) | about a year and a half ago | (#43262701)

No. Backup is out of scope for version control. Anybody with actual common sense would not expect it to make backups "magically" by itself and check to make sure. Then they would implement backups. But that does actually require said common sense.

Re:A thousand times. (Unless online mirrors roll b (1, Insightful)

Antique Geekmeister (740220) | about a year and a half ago | (#43262939)

May I respectfully disagree? I've often seen such focus on what is "out of scope" used to limit cost and to limit the "turf" on which an employer or contractor needs access. But backup is _certainly_ a critical part of source control, just as security is. The ability to replicate a working source control system to other hardware or environments due to failure or corruption of the primary server is critical to any critical source tree. Calling it "out of scope" is like calling security "out of scope". By ignoring the consequences at the design stages of a source control system, very real risks are often taken without even thinking of the possible consequences, and the resources necessary to provide such critical features later can, and often do, multiply the cost of a project in unexpected ways.

A nightly mirror on low-cost hardware with snapshot capability, for example, can provide very useful fallback capability. Even hardlink based softwaer snapshots can work well.. It requires thought to configure correctly, and to schedule the mirrors and make sure they don't conflict with other high bandwidth operations such as tape backup, and to handle "churn" diskspace requirements. And I've had some very good success with partners and clients who took such modest backup tools and saved enormous cost on high-speed tape backup systems high bandwidth connections for remote mirroring facilities, or who had difficulti4es meeting very short backup windows by using the mirror, or the snapshots, to do the tape backups for archival. It does inject a phase delay into the tape backups, and recovery from tape has to be tested, but it's been extremely effective.

Several times, I've found that the problem is a political one. The backup system is often a very expensive, high performance capital cost, or some kind of proprietary "turf" of a manager who is very comfortable with and enamored of it, and they're concerned that adding this layer will make them look foolish for spending the money, or cost them their job as a proprietary owner of critical infrastructure. They already had the political battle purchasing the hardware in the first place and don't care to rehash their previous work. But it's often amazing what staging the backups this way can do for performance and user access to their backed up data. Most restoration cases are due to accidental file deletion or editing, and the users no longer need access to the tape backup system or off-site archival, and only to the snapshots which have read-only access with the same privileges as the original source material.

Re:A thousand times. (Unless online mirrors roll b (1)

Anonymous Coward | about a year and a half ago | (#43263055)

May I respectfully disagree? I've often seen such focus on what is "out of scope" used to limit cost and to limit the "turf" on which an employer or contractor needs access. But backup is _certainly_ a critical part of source control, just as security is. The ability to replicate a working source control system to other hardware or environments due to failure or corruption of the primary server is critical to any critical source tree. Calling it "out of scope" is like calling security "out of scope". By ignoring the consequences at the design stages of a source control system, very real risks are often taken without even thinking of the possible consequences, and the resources necessary to provide such critical features later can, and often do, multiply the cost of a project in unexpected ways.

THIS.

But while we're at it - from TFA: "The root of both bugs was a design flaw: the decision that git.kde.org was always to be considered the trusted, canonical source. The rationale behind this decision is relatively obvious; itâ(TM)s a locked-down, authenticated resource that runs customized hooks to validate the code being pushed to it. Itâ(TM)s perfectly reasonable to decide that it should be considered to be correct."

Several times, I've found that the problem is a political one. The backup system is often a very expensive, high performance capital cost, or some kind of proprietary "turf" of a manager who is very comfortable with and enamored of it, and they're concerned that adding this layer will make them look foolish for spending the money, or cost them their job as a proprietary owner of critical infrastructure. They already had the political battle purchasing the hardware in the first place and don't care to rehash their previous work. But it's often amazing what staging the backups this way can do for performance and user access to their backed up data. Most restoration cases are due to accidental file deletion or editing, and the users no longer need access to the tape backup system or off-site archival, and only to the snapshots which have read-only access with the same privileges as the original source material.

If, at the end of the day, we do what TFA suggests, and propose that one machine be considered "the" authoritative centralized source, we've just given the backup-dude/sysadmin his job back.

The elephant in the room here is back in that section of TFA that refers to "the trusted, canonical source."

Congratulations, now that you've migrated from git, you discover you still need something that functions as the "centralized" part of a centralized version control system. There are many reasons to argue for DVCS over centralized, but eliminating big iron central server and the concept of backups "because the source is on everybody's laptops!" isn't one of them.

Re:A thousand times. (Unless online mirrors roll b (1)

gweihir (88907) | about a year and a half ago | (#43263295)

There are many reasons to argue for DVCS over centralized, but eliminating big iron central server and the concept of backups "because the source is on everybody's laptops!" isn't one of them.

Well, sort of. If they had done full repo updates on the "mirrors", this issue would likely not have happened. The core problem was that they did el-cheapo mirroring without understanding what the consequences are. They would still have to do full checkouts and detach them afterwards to make them proper backups. After all, the git software could have flaws. So while it does not need to be a "big iron central server", setting up several systems specifically doing backups is non-optional. In a sense they will be "central" systems then.

Re:A thousand times. (Unless online mirrors roll b (4, Informative)

gweihir (88907) | about a year and a half ago | (#43263057)

I believe you are not talking about backup. A backup allows system recovery after a disaster and cannot ever be stored in the system itself. What you are talking about is availability improvement. That _can_ be part of the primary system. RAID, for example, exclusively serves this purpose (except RAID0). But backups must also protect against user and administrator error, software errors, the data-center burning down, sabotage, etc.

Replication is not the tool for that. The problem is that any data copy part of the system itself can be corrupted by the system as the system still has access to it. That is why a backup must be both removed from the system so it is independent, and allow full reconstruction, even if the original system is completely destroyed.

Now, improving uptime and reducing downtimes is important, but it is not what a backup does. A backup makes sure you do not lose your data permanently. What uptime improvement does is to make it less likely that you need to go back to the backup.

Or to put it differently, backup is for Disaster Recovery. Uptime improvement is for reducing DR cost reduction by reducing the probability of it becoming necessary and for reducing downtime cost.

I do agree to the political angle though.

Re:A thousand times. (Unless online mirrors roll b (1)

gweihir (88907) | about a year and a half ago | (#43263079)

Oh, and I should say that backup is very much in scope for a version control system installation! (We do nightly full and hourly incremental backups, for example.) It is just not in scope for the version control system software itself, as it solves a different problem.

Re:A thousand times. (Unless online mirrors roll b (0)

Anonymous Coward | about a year and a half ago | (#43263275)

But backup is _certainly_ a critical part of source control, just as security is.

Interesting example, given that git also doesn't do security or authentication (hence the need for gitolite)

It was, shall we say "surprising" to discover that having commit access to a git repository allowed you to delete the history of other peoples' work.

Re:A thousand times. (Unless online mirrors roll b (0)

Anonymous Coward | about a year and a half ago | (#43262669)

Also, a SCM is not a backup, not even git! Every software can fuck up.

Re:A thousand times. (Unless online mirrors roll b (1)

vurian (645456) | about a year and a half ago | (#43263489)

Especially backup software.

Re:RAID is not a backup (1)

gweihir (88907) | about a year and a half ago | (#43262653)

Indeed. Online snapshots are a different matter, but mirroring can never replace backups. Quite obvious in fact.

No backups?! (5, Insightful)

Blymie (231220) | about a year and a half ago | (#43262597)

Good grief!

After all of that, not a single proposed solution is a proper, rotational backup.

This is what rotational backups are FOR. They let you go back months in time, and even do post-corruption, or post-cracking examination of the machine that went down!

Backups do *not* need to be done to tape, but a mirror or a raid card is NOT a backup. This is actually simple, simple stuff, and it seems like the admins at KDE are a bit wet behind the ears, in terms of backups.

They probably think that because backups used to mean tape, that's old tech, and no one does that.

Not so! Many organizations I admin, and many others I know of, simply do off-site rotational backups using rsync + rotation scripts. This is the key part, copies of the data as it changes over time. You *never* overwrite your backups, EVER.

And with proper rotational backups, only the changed data is backed up, so the daily backup size is not as large as you might think. I doubt the entire KDE git tree changes by even 0.1% every day.

Rotational backups -- works like a charm, would completely prevent any concern or issue with a problem like this, and IT IS WHAT YOU NEED TO BE DOING, ALWAYS!

Re:No backups?! (0)

tangent3 (449222) | about a year and a half ago | (#43262665)

A git repository itself acts as a rotational backup...
The article itself suggests ZFS snapshots of the git repository, which works just as well.

Re:No backups?! (3, Informative)

Blymie (231220) | about a year and a half ago | (#43262707)

Git has no rotational backup ability in it. You can't do rotational backups of the machine, on the machine for starters!

ZFS is not a rotational backup as well!

Failure, 101, backups. Go back to school.

Both of the above solutions do not prevent slow corruption, and they do not prevent issues where the machine is suspect. (Yes, ZFS can have bugs). They also do not help if the machine has been hacked into. They don't help if there is a fire, flood, or theft of the local box.

Modern backup methodology has been developed over decades of people suffering JUST THROUGH THIS VERY THING. If you plan to just throw all that away, and pretend everyone doing backups is an idiot -- MAKE SURE YOU KNOW WHAT YOU ARE DOING.

Because -- this very issue would not have been even a tiny concern, if proper, off machine, rotational backups were being done. And, if you aren't going to follow proper backup methodology, then you'd better sit down in a quite place for a few hours, and think of every possible disaster scenario, AND issues with the code you're going to be using for those backups.

Hell, this whole KDE problem started, because the people using it did not even know how git works, 100%! Now, you're suggesting that using another tool, ON THE SAME BOX, is the answer? What will someone miss on ZFS?

No, please, think about this more carefully.

Re:No backups?! (4, Insightful)

gweihir (88907) | about a year and a half ago | (#43262849)

What really surprises me is that people still do not understand backup, after it has been solved for decades. Backup _must_ be independent. It _must_not_ be on the same hardware. It _must_not_ not even be on the same site, if the data is critical. It must protect against anything happening to the original system. Version control, mirrors, RAID, all do not qualify as backup. They are not independent of the system being backed up.

However, the amount of incompetence displayed in the original story and the comments here explains a lot. Seems that in this time of "virtual everything" people do not even bother to learn the basics anymore and are then surprised when they make very, very basic mistakes.

Re:No backups?! (1)

Blymie (231220) | about a year and a half ago | (#43263009)

Yeah :(

Re:No backups?! (1)

gweihir (88907) | about a year and a half ago | (#43262751)

No, a git repository is not a backup. It is a version-control tree. Backups are always _independent_ of the working system and for very good reasons. Come on, people, this is beginner's stuff.

Re:No backups?! (1)

Kjella (173770) | about a year and a half ago | (#43262921)

A git repository itself acts as a rotational backup... The article itself suggests ZFS snapshots of the git repository, which works just as well.

That still smells like a single point of failure to me, because they didn't say anything about actually backing up those snapshots to another machine. So if you can crack this one root server, you can delete all the snapshots, corrupt all the projects and boom all the good copies are gone.

Re:No backups?! (1)

Carewolf (581105) | about a year and a half ago | (#43262695)

The very first proposed solution is a backup:

One thing that will be put into place as a first effort is that one anongit will keep a 24-hour-old sync; in the case of recent corruption, this can allow repositories to be recovered with relatively recent revisions. The machine that projects.kde.org is migrating to has a ZFS filesystem; snapshots will be taken after every sync, up to some reasonable number of maximum snapshots, which should allow us to recover the repositories at a period of time with relatively fine granularity.

So one 24 hour old backup, and a another machine saving backups of every single sync as ZFS snapshots.

Re:No backups?! (4, Insightful)

Blymie (231220) | about a year and a half ago | (#43262755)

A 24 hour old sync isn't a backup. It's a slightly delayed mirror.

"Rotational backups" isn't just a single thing. It's a whole ball of wax. Part of that ball of wax, are test restores. Another part of that are backups that only sync changes, something exceptionally easy with rotational backups, but not as was with a filesystem snapshot.

In 10 seconds, I can run 'find' on a set of rotational backups I have, that go back FIVE YEARS and find every instance of a single file that has changed on a daily basis. How does someone do that with ZFS snapshots? This is something that is key when debugging corrupt , or looking for a point to start a restore from (someone hacks in).

Not to mention that ZFS could be producing corrupt snapshots -- what an annoyance to have to constant restore those, then do tests on the entire snapshot to verify the data.

What I see here is a reluctance to do the right thing, and a desire to think that the way people do traditional backups is silly.

Re:No backups?! (2)

Carewolf (581105) | about a year and a half ago | (#43262841)

More accurately the problem is that the hardware resources available to KDE are very limited and the KDE repository is one of the largest git repositories in the world. Back when subversion was the hot new thing, the thing that carried it forward was KDE because it was trying to migrate for SVN for several years before subversion was even capably of handling a repository that large. Git still can't remotely handle a project that large, which is why KDE is now split into a thousand different git projects.

How often would you do do complete backups of KDE? How many would you save? How much hardware would that require? ZFS snapshots sounds like an ideal situation to handle the backups, since it can deduplicate. It does give another point of failure, but ZFS is pretty professional and high quality, and this is something it is designed to handle.

Re:No backups?! (1)

gweihir (88907) | about a year and a half ago | (#43262863)

What I see here is a reluctance to do the right thing, and a desire to think that the way people do traditional backups is silly.

There is psychological research into this: People making stupid decision often invest considerable effort in convincing themselves that the decisions are not stupid.

My take-away message is that many, many slashdotters have a data-disaster in their future, as they do not understand what backup is for or how to do it so it actually fulfills its purpose.

Re:No backups?! (3, Informative)

Doc Hopper (59070) | about a year and a half ago | (#43262767)

I do storage & backup for a living on an extremely large scale. Your post is correct in the main, except for this:

You *never* overwrite your backups, EVER.

You must overwrite tapes if you want to keep media costs reasonable. In our enterprise, we typically use $30,000 T10Kc tape drives with $300 T10K "t2" tapes. Destroyed/broken/worn-out media costs already eat the equivalent of several well-paid sysadmin salaries each year. Adding additional cost for indefinite retention is a huge and unnecessary cost.

Agreed, though, this KDE experience isn't quite like that. Source code repositories commonly have 7-year-retention backups for SLA reasons with customers; most of my work deals with customer Cloud data, which kind of by definition is more ephemeral and we typically only provide 30, 60, or 90-day backups at most, in addition to typical snapshotting & near-line kinds of storage.

No reasonable-cost disk-based storage solution in the world today provides a cost-effective way to store over a hundred petabytes of data on site, available within a couple of hours, and consuming just a trickle of electricity. But if you have a million bucks, a Sun SL8500 silo with 13,000+ tape capacity in the silo will do so. All for the cost of a little extra real-estate, and a power bill that's a tiny fraction of disk-based online storage.

Tape has a vital place in the IT administration world. Ignore this fact to your peril and future financial woes.

Re:No backups?! (2)

drinkypoo (153816) | about a year and a half ago | (#43262855)

No reasonable-cost disk-based storage solution in the world today provides a cost-effective way to store over a hundred petabytes of data on site, available within a couple of hours, and consuming just a trickle of electricity.

Lots of businesses (and most open source projects) are still dealing with only a couple terabytes of data or far less, and so they not only can but probably should use disk-based backups for reasons of both cost and convenience as nothing else will be cheaper, faster, or easier.

Tape is now an enterprise-only thing, and good riddance.

Re:No backups?! (0)

Anonymous Coward | about a year and a half ago | (#43263271)

That's why tapes are worse in many scenarios. If you overwrite and reuse tapes they don't last as long - wear and tear.

If you use HDDs you'd have more storage and the media comes with its own drive rather than costing 30k. And the capacity goes up as the tech improves without you having to buy a new super expensive tape drive.

Also means you can get more bandwidth if you do it right.

Re:No backups?! (1)

WuphonsReach (684551) | about a year and a half ago | (#43263345)

Tape has a vital place in the IT administration world.

Tape is expensive, fragile and requires special hardware. Removable or external magnetic hard drives, OTOH, are cheap, sturdy and will work on any system that you can scrounge up.

Given the costs of tape drives and tape media, it's not surprising that a lot of small / medium businesses just use hard drives for backups. External 2.5" 1TB drives are dirt cheap and you could do weekly off-site backups using them with 13 generations for less then $2000. You can't even buy a large capacity tape drive for $2000. Much less the tapes needed to run a proper backup cycle.

Unless there are legal reasons to keep 5-10 years of backups, or you are dealing in more then 3-5 TB of storage to be backed up, or taking things off-site daily via courier tape is just too expensive.

Rotation backups wouldn't fix it! (0)

Anonymous Coward | about a year and a half ago | (#43263235)

Except they suspect the corruption was there a long time unnoticed and so your rotation copies have the corruption too! Worse, because its rotational, sooner or later the oldest one has gone....

Really, you're putting your faith in MAGICSOFTWAREBACKUP, and saying "well Git mirrors aren't proper mirrors", except they ARE proper mirrors and they do keep historic backups! That what distributed server versioning software *IS*, it too never overwrites old versions, it too only stores differences, it too only syncs the differences, it too is physically distributed among many machines and locations!

The problem here, is git has a flaw, and your MAGICSOFTWAREBACKUP could equally have a flaw. Perhaps it's not copying files ending in _fred, who knows, software is software, bugs are bugs! Don't assume your software (whatever it is) that describes itself as backup software is somehow less problematic than a git sync!

I hate incremental backups (the kind you describe) particularly because I've had a corrupt root file and couldn't recover from a backup. I had 2 months of data back, even if the backup had worked, it would still have been a disaster to lose more than 2 months.

IMHO this is a simple git bug, the synch'd copies were not only corruped BUT NOT EVEN IDENTICAL, so there's clearly a problem here. Oh well, software is software, find the bug fix it, and don't rely on one type of backup, ever again, even your rotational backups.

A git sync to multiple machines, plus a second type of backup is the way to go. The git mirror counts as one type of backup, you need another type, some other software some other way. It could be rotational backups, it could be as simple as filecopy on a cron job, it could be a second versioning server, (e.g. a Perforce repo mirrored from git ), but some *second* backup strategy.

Re:No backups?! (1)

TheRealMindChild (743925) | about a year and a half ago | (#43263297)

And with proper rotational backups, only the changed data is backed up

I hate you. This is why I had a couple of orgs I worked at, when restore needed done, we had to start from two years ago, and then apply the changes from the backups going forward from the first. They're on tape? Even longer wait. A complete backup should be done if it can be. If not, it should be done on a regular basis.

Re:No backups?! (1)

Blymie (231220) | about a year and a half ago | (#43263369)

Don't hate me. ;) Typically, you do a full backup every $x period of time.

Trusting that your *only* full backup is good, isn't a great policy either. I tend to do full backups every quarter, but it depends upon the data set, and of course, the size of the data set. If the data set is trivial... then who cares? Do it weekly.

Sounds like... (1)

eexaa (1252378) | about a year and a half ago | (#43262605)

...someone has been using Internets as a backup machine? :)

Re:Sounds like... (4, Funny)

bmo (77928) | about a year and a half ago | (#43262659)

There is nothing wrong with using the internet as a backup machine - with the caveat that you know what you're doing and you're using the right service/tool properly.

Personally, I have all my very important documents in an encrypted archive labelled "Area_51_Aliens_Proof.rar" with the note "It is dangerous for me to provide the key, but in the event of my death or imprisonment, a key will be provided EXPOSING EVERYTHING!!!" and uploaded to various paranormal bittorrent trackers and mirrored by various denizens of /x/.

I expect my documents to be archived in perpetuity.

--
BMO

Re:Sounds like... (0)

Anonymous Coward | about a year and a half ago | (#43262731)

An excellent idea. Get all the paranoid people together to make sure your backup is redundant forever.
People who are paranoid enough to make sure they have a copy. Who think that hidden truths lie around the corner.
It's not like they'd ever be able to crack the code. What could possibly go wrong? http://blog.zorinaq.com/?e=42

Re:Sounds like... (1)

bmo (77928) | about a year and a half ago | (#43262797)

You know, it was a joke, but Julian Assange has a file called "insurance" that is mirrored by a lot of people.

1. Nobody has cracked the archive, even though the payload could be spectacular. It's not like nobody is trying.

2. It really could just be his automobile insurance contract. Nobody knows.

3. Sufficient key length and a strong algorithm /can/ stretch brute-forcing time into "end of the universe" length.

--
BMO

Re:Sounds like... (1)

CBravo (35450) | about a year and a half ago | (#43263033)

From what I was reading there is not much interesting in it... I already included the torrent in the TBL (torrent blacklist). Noone will be seeding it anymore.

Re:Sounds like... (1)

bmo (77928) | about a year and a half ago | (#43263211)

> Noone will be seeding it anymore.

This Noone guy really gets around because I hear about him all the time, even though I never see him.

--
BMO

Apropos (0)

bmo (77928) | about a year and a half ago | (#43262609)

"With great power comes great responsibility" - Spider Man, issue #1.

--
BMO

Re:Apropos (0)

Anonymous Coward | about a year and a half ago | (#43262667)

Kids, come here, there's a guy who signs his posts with his user name, when said user name is already displayed on top of his post!
That's a rare creature.

Re:Apropos (1)

bmo (77928) | about a year and a half ago | (#43262705)

The fact that this bothers you only strengthens my resolve to never change my signature.

Have a great day.

--
BMO

Re:Apropos (1)

odie_q (130040) | about a year and a half ago | (#43262727)

With great power comes great responsibility" - Spider Man, issue #1.

"Who said that? I'll kill them with my power!" - Homer Simpson, S19E03

Re:Apropos (1)

bmo (77928) | about a year and a half ago | (#43262833)

Quidquid latine dictum, altum videtur. - unattributed

--
BMO

delayed update to servers.. (1)

vanuda (1539873) | about a year and a half ago | (#43262641)

Set up servers that gets a delayed update.. i.e 1 day delayed copy,1 week delayed copy and perhaps 1 month delayed copy.. Hopefully someone will notice an stop sync between servers before everything is gone.. Even if some part is lost.. Not all is lost..

Re:delayed update to servers.. (4, Informative)

gweihir (88907) | about a year and a half ago | (#43262785)

And another amateur-level solution. Does nobody know how to do backups anymore? O.k., here is the very basics of mandatory characteristics of a backup:

- Backup data storage independent of the system being backed up
- Several generation of backups kept for long enough to be absolutely sure you can recover (yes, that can mean years) and frequently enough that loss is acceptable.
- Expect that one backup generation can be faulty and ensure that even then, recovery is possible and data-losses are acceptable.
- Full disaster recovery possible, even if your original system is stolen by aliens.
- Disaster recovery is tested regularly
- Data is verified (full compare or 2-sided crypto-hash compare) on backup

This really is "IT operations 101". Forget about all these halve-ba(c)ked amateur stuff, IT DOES NOT WORK.

Re:delayed update to servers.. (1)

RyuuzakiTetsuya (195424) | about a year and a half ago | (#43262895)

I haven't administered a git repo before, but, with something like git that has historical commit data, do you need more than say, a month or so of backup data?

Re:delayed update to servers.. (1)

gweihir (88907) | about a year and a half ago | (#43262965)

It is very simple: Determine how long it will take to notice a problem in the very worst case, then make sure you have at the very least two full backups that cover this time.

In most case that makes a backup of only a month gross negligence. If you run a full data consistency check every week, then having backups every week and keeping them for a month may be adequate if the project is not important. But what if, for example, you notice after 3 months that somebody hacked into the repository and changed things? Or hacked into some user's machine where the user has commit permissions? True, retroactively changing the history is not possible in git, but doing new changing is quite possible. So, are you confident you can reliably spot any malicious action withing less than a month _and_ stop the oldest backup from being overwritten before it is too late?

Re:delayed update to servers.. (0)

Anonymous Coward | about a year and a half ago | (#43263245)

With a vast amount of people doing git pull --rebase (which fails in very ugly ways if teh upstream repository is manipulated) etc. there is a good chance it would be noticed fairly quickly.
Plus, it will be hard to corrupt the copies of people who only do a git pull, so you probably have a lot of backups around, even though they will be very inconvenient to get and would rely on trusting those who provide the copies.
Either way, you will probably be able to get away with a lot rarer backups for active git projects than for arbitrary data - especially if you regularly check the repository via "git fsck" for example. Though that doesn't mean I'd recommend it.

Re:delayed update to servers.. (1)

Todd Knarr (15451) | about a year and a half ago | (#43263417)

Yes. Think about this: how do you recover the repository when the historical commit data is what's been damaged? Note that it doesn't have to be data corruption, although that's fairly common. One of the worst problems to recover from is human error, eg. an administrator makes a mistake cleaning up obsolete projects and permanently deletes more projects than intended, or makes a mistake on the filesystem itself and deletes the files associated with part of the repository. And yes you need more than a month's worth of backups to recover from that because sometimes the damage may not be apparent for months. I've got a project at work in version control that's incredibly critical, without it several major customers are totally off-line. Changes to it are very rare, measured in years per change, but when we do need changes to it they're high-priority (again the customer is totally off-line until the change goes in). If someone makes a mistake and wipes out that project it might literally be years before someone has a reason to look for the project and notice it's gone missing. If we only have a couple months worth of backups, what are we going to do?

No backup of the KDE sources! (2, Informative)

Anonymous Coward | about a year and a half ago | (#43262663)

They had/have no fucking backup! And complain about some git mirror issues. I can't fucking believe it that they can be so stupid.

The solution: MAKE BACKUPS!

Re:No backup of the KDE sources! (1)

Sulphur (1548251) | about a year and a half ago | (#43263017)

They had/have no fucking backup! And complain about some git mirror issues. I can't fucking believe it that they can be so stupid.

The solution: MAKE BACKUPS!

"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

what almost became 'The Great KDE Disaster Of 2013 (1)

Anonymous Coward | about a year and a half ago | (#43262717)

Isn't that what every major release is called? Except for the "2013" part?

Re:what almost became 'The Great KDE Disaster Of 2 (1)

garyebickford (222422) | about a year and a half ago | (#43262823)

Could be worse - Unity, Gnome 3, ...
I'm playing this on KDE 4, trying it out. All I really want to do is run Compiz and some other stuff in my highly tuned environment - I use the Desktop Cube, with a transparent desktop, and Cairo Dock. I left KDE back about 6-7 years ago, but right now it's closer to what I want and am used to than anything else. I have Bodhi/Enlightenment running on another machine. It's nice too, but right now I'm like a man without a country.

Welcome to "rsnapshot" (2, Informative)

Anonymous Coward | about a year and a half ago | (#43262805)

Rsnapshot provides cheap, userland hardlinked rotating snapshots work very well. Simply do the rsnapshots in one location, and three are dozen ways to make the completed, synchronized content accessible for download or other mirrors when the mirror is complete.

The only thing I dislike about it is the often requested, always refused feature of using "daily.YYYYMMDD-HHMMSS" or a similar naming scheme, instead of the rotating "daily.0, daily.1, daily.2" names which are quite prone to rotating in mid-download for anyone accessing the snapshots via NFS or a web browser. The only way you can tell the rotations apart is by the timestamp on the top level directory, and that's very confusing when it rotates out from under you in mid-operations.

duh (1)

sribe (304414) | about a year and a half ago | (#43262843)

Replicated systems need regular backups too. No shit, sherlock...

btrfs (1)

ssam (2723487) | about a year and a half ago | (#43262845)

If only Linux had a filesystem that checksummed all you data, and check the checksum at every read. we could call it better FS, or something like that.

Moral of the story.... (2)

Lumpy (12016) | about a year and a half ago | (#43262859)

you ALWAYS have incremental backups on MULTIPLE MEDIUMS.

If you think your Git repositories are your backup, then you need to learn what the word Backup means.

Re:Moral of the story.... (1)

CBravo (35450) | about a year and a half ago | (#43263045)

It seems someone just did learn that aspect...

programming != IT (1)

SuperBanana (662181) | about a year and a half ago | (#43262875)

Most IT people would have said "Where are your backups?" When the programmers say "We're using mirrors", the IT person would say, "Where are your backups?" a second time.

$50 says that whoever handles IT for KDE said "Hey guys, we need backups" and the programmers all said "Nah, we've got mirroring."

Seriously: why doesn't an organization as large as KDE have backups? I understand if Safe the Fuzzy Wuzzies doesn't have good IT, but a major open source project?

Always amazes me how I don't tell programmers how to do their job, yet I've had a decade and a half of programmers arguing with me about how to do mine. Which is particularly funny, since if the server under their desk dies, it's magically my fault/responsibility.

Re:programming != IT (1)

fa2k (881632) | about a year and a half ago | (#43262971)

I think the problem is that a code repository is very much a moving target. They didn't say whether they had backups, so they probably didn't and that's stupid, but it would also be a problem if they had a week old backup

Re:programming != IT (1, Informative)

vurian (645456) | about a year and a half ago | (#43263741)

"an organization as large as KDE have backups?" You mean one full-time secretary and a couple of volunteer sysadmins? That's how large KDE's support organization is. How much money do you think KDE has? It is less than 200k euros. That's how large the budget is -- and it has to pay for everything.

ZFS (1)

fa2k (881632) | about a year and a half ago | (#43262893)

The article suggests using ZFS because of its protections against bad hardware.

It implies that ZFS protects against bad RAM but *this is not the case*. The ZFS developers recommend using ECC memory.

Re:ZFS (1)

toby (759) | about a year and a half ago | (#43262981)

If *ZFS* isn't proof against bad RAM, imagine how poorly conventional filesystems fare. ECC memory is advisable in situations demanding integrity anyway.

IT AIN'T CALLED GIT FER NUTHIN !! (0)

Anonymous Coward | about a year and a half ago | (#43262945)

Because, you know you are a redneck !!

Three Letters: ZFS. (1)

toby (759) | about a year and a half ago | (#43262977)

Just use it. Write in place filesystems are obsolete from an integrity point of view.

Re:Three Letters: ZFS. (0)

Anonymous Coward | about a year and a half ago | (#43263025)

Still not a replacement for backups. Backing up from ZFS snapshots may be a place to start, but there are tons of FOSS ways to accomplish this task.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?