Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Software Operating Systems BSD

NetBSD - Live Network Backup 156

dvl writes "It is possible but inconvenient to manually clone a hard disk drive remotely, using dd and netcat. der Mouse, a Montreal-based NetBSD developer, has developed tools that allow for automated, remote partition-level cloning to occur automatically on an opportunistic basis. A high-level description of the system has been posted at KernelTrap. This facility can be used to maintain complete duplicates of remote client laptop drives to a server system. This network mirroring facility will be presented at BSDCAN 2005 in Ottawa, ON on May 13-15."
This discussion has been archived. No new comments can be posted.

NetBSD - Live Network Backup

Comments Filter:
  • Mac OS X (Score:1, Interesting)

    by ytsejam-ppc ( 134620 )
    I'm not up on my xBSD's, so can someone explain how hard this would be to port to the Mac? This would be perfect for cloning my son's Mac Mini.
  • use rsync (Score:2, Informative)

    by dtfinch ( 661405 ) *
    It's much less network and hardware intensitive and with the right parameters, will keep past revisions of every changed file. Your hard disks will live longer.
    • Comment removed based on user account deletion
    • Re:use rsync (Score:5, Informative)

      by FreeLinux ( 555387 ) on Friday April 29, 2005 @11:06AM (#12383656)
      This is a block level operation, whereas rsync is file level. With this system you can restore the disk image including partitions. Restoring from rsync would require you to create the partition, format the partition and the restore the files. Also, if you need the MBR...

      As the article says, this is drive imaging whereas rsync is file copying.
      • Re:use rsync (Score:3, Insightful)

        by Skapare ( 16644 )

        In most cases, file backups are better. Imaging a drive that is currently mounted writable and actively updated can produce a corrupt image on the backup. This is worse that what can happen when a machine is powered off and restarted. Because the sectors are read from the partition over a span of time, things can be extremely inconsistent. Drive imaging is safest only when the partition being copied is unmounted.

        The way I make backups is to run duplicate servers. Then I let rsync keep the data files i

        • Re:use rsync (Score:3, Interesting)

          by spun ( 1352 )
          From the article, it sounds like they are using a custom kernel module to intercept all output to the drive. This would keep things from getting corrupted, yes?
      • Restoring from rsync would require you to create the partition, format the partition and the restore the files.

        Sure, but that's not difficult. Systemimager [systemimager.org] for Linux keeps images of disks of remote systems via rsync, and has scripts that take care of partition tables and such.

        Yes, it's written for Linux, but it wouldn't be difficult to update it to work with NetBSD or any other OS. The reason it's Linux specific is that it makes some efforts to customize the image to match the destination machin

        • I think you're talking about distributing a built system to multiple machines in a file farm. At least that's what Brian built SystemImager for, originally, to mass install a system image to a server farm. As long as the source image is in a stable state, that's fine. But if you are making backups of machines, backing up their actively mounted and working partition by the disk image is the bad idea, regardless of the tool. I used to do that once after I built a system just so I have an image of the whol

    • Re:use rsync (Score:2, Insightful)

      by x8 ( 879751 )
      What's the fastest way to get a server running again after a disk crash? With rsync, if I backup /home and /etc, I still have to install and configure the OS and other software. That could take a significant amount of time (possibly days). Not to mention the time spent answering the phone (is the server down? when will it be back up?)

      But if I have a drive image, I could just put it on a spare server and be back up and running almost immediately. That would require an identical spare server though.

      What
      • Re:use rsync (Score:3, Informative)

        by dtfinch ( 661405 ) *
        Just make sure the backup server is properly configured (or very nearly so) I guess.

        Our nightly rsync backups have saved us many times from user mistakes (oops, I deleted this 3 months ago and I need it now), but we haven't had a chance to test our backup server in the event of losing one of our main servers. We figure we could have it up and running in a couple hours or less, since it's configured very closely to our other servers, be we won't know until we need it.

        • I recall the last place I was a developer at, we tested our IT department like that a few times haha.... We'd "simulate" a hardware failure. Usually by pulilng the power, but sometimes we'd get a little more scientific with it... Or we'd simulate a database crash and ask for a backup from our IT department.

          We were developers plagued with an IT department that wanted to take control of the application and add red tape to our deployment cycle. While we understood there was a place for it, we worked for a
          • It was IIS instead of Apache, and IIS is a piece of shit, so no one wanted to bother learning it because it was a piece of shit.

            Your IT dept. probably has more of a clue than you do.
            • I never made the choice for IIS, but I didn't use that as an excuse not to know how to administer it, and neither should they.

              Besides, there's many larger companies who use IIS than those guys...
      • What do the big enterprises who can't afford downtime do to handle this?

        They have hot standby servers and/or clusters (so that individual server downtime becomes irrelevant), automated installation procedures (so that reinstalling machines takes maybe a couple of hours at the most) and centralised configuration management tools (so that restoring the new machine to the same state as the old one is simply a matter kicking off the config management tool and letting it reconfigure the machine appropriately).

    • rsync doesn't scale to huge numbers of files. It also doesn't work so well when all of those are changing at once. Finally, the protocol and algorithms may work for imaging an entire disk as if it was a file, but the program doesn't -- it can ONLY copy device nodes as device nodes, and will NEVER read a block device as a normal file. There have been patches to fix this, which have been rejected.

      We use a scheme which actually seems better for systems which are always on: DRBD for Linux [drbd.org]. Basically, ever
      • Our nightly rsync backups consist of roughly 400,000 files. There were hourly in the past, and it was so transparent that we never noticed any problems or performance degredation, but we switched to nightly after two hard disks in the backup server died the same week.
    • How is rsync on windows? Especially on giant files like 30 to 50 gigs.

      Anybody have any experience with that?
  • Pros and Cons (Score:5, Insightful)

    by teiresias ( 101481 ) on Friday April 29, 2005 @11:01AM (#12383593)
    This would be an extremely sensitive server system. With everyones harddrive image just waiting to be blasted to a blank harddrive, the potential for misdeeds is staggering. Even in an offical capacity, I really feel uneasy if my boss was able to take a copy of my harddrive image and see what I've been working on. Admittely, yes it should all be work but here we are allowed a certain amount of freedom with our laptops and I wouldn't want to have that data at my bosses fingertips.

    On the flipside, this would be a boon to company network admins especially with employees at remote sites who have a hard crash.

    Another reason to build a high speed backbone. Getting my 80GB harddrive image from Seattle, while I'm in Norfolk would be a lot of downtime.
    • The duplication is done right away the modification occured in the main disk.
      (from the comments [kerneltrap.org] below article)

      Another reason to build a high speed backbone. Getting my 80GB harddrive image from Seattle, while I'm in Norfolk would be a lot of downtime. (parent)

      Seems that this thing will sync up everytime you call home. So when you're on the road downloading that just updated massive PPT presentation for your conference.... you'll be downloading one copy from the server while the server is desperately try
    • Even in an offical capacity, I really feel uneasy if my boss was able to take a copy of my harddrive image and see what I've been working on.


      your boss has the right and the ability (at least at my company) to do that. plus, i leave my personal and secret stuff on my box at home, not at work, where it belongs. if i was a boss, i would want the ability to see what my employees are working on. that's why i pay them.
    • Sorry. As an IT guy I routinely peruse people's harddrives looking for interesting material. I use Windows scripting host to search everyone's drives for mp3's wma'a avi's and mpg's.

      It isn't your laptop. You have noe freedom to do anything with it.
      • It's that sort of attitude which makes people work against, rather than with, the IT department.

        So what if people have some MP3s on their hard disk - if listening to music is affecting their work then it's the responsibility of their supervisor to deal with that.

        I've worked support before, and as much as users can be a pain in the ass, the only reason you have a job is because of them - without users, there is no point in an IT department.
      • I'm pretty sure this is just a troll, but since there are probably quite a few inexperienced people out there who really do think like this...

        Sorry. As an IT guy I routinely peruse people's harddrives looking for interesting material. I use Windows scripting host to search everyone's drives for mp3's wma'a avi's and mpg's.

        Idiots like you are why IT departments have to struggle to do their jobs properly.

        It isn't your laptop. You have noe freedom to do anything with it.

        It isn't *yours*, either, hots

    • This isn't a magic wand your boss waves over your box. If s/he has access to your box s/he has access to your box - regardless of how perfect the copy of your stuff will be.
    • I really feel uneasy if my boss was able to take a copy of my harddrive image and see what I've been working on.
      I agree that such things are a valid concern, but this technology isn't meant to solve such problems. That's what block-level encryption is for. I presume you can do that on NetBSD, as well as encrypting your swap space to prevent any data on disk being unencrypted. Not that you'd want to backup your swap space, though ;)
  • by LegendOfLink ( 574790 ) on Friday April 29, 2005 @11:02AM (#12383599) Homepage
    ...when you get that idiot (and EVERY company has at least 1 of these guys) who calls you up asking if it's OK to defrag their hard-drive after downloading a virus or installing spyware. Then, when you tell them "NO", they just tell you that they did it anyways.

    Now we can just hit a button and restore everything, a few thousand miles away.

    The only thing left is to write code to block stupid people from reproducing.
    • Can defragging really cause the spread of a virus? I always assumed defraggers worked at the sector level.
      • The biggest problem usually is the virus and/or spyware will corrupt files. Inept Windows users for some reason think defragging a harddrive is the answer to every computer problem in the universe. They defrag, and next thing you know, you can't boot the machine up.

        Theoretically, a drive defrag should have no effect on how an operating system runs, only that it is re-sorting the physical drive to make file access faster. But for some reason, it messes things up.
      • Defragging won't spread a virus unless the virus attached itself to the defragger. I haven't heard of any viruses that do this. If the virus is the sort that will delete files, then defragging is the worst thing to do. After removing the virus, it's easiest to reclaim lost data with the correct tools when nothing new is written to disk. The files are still "there", but if the file is written over where it physically occupies drive space, then salvage becomes much harder, or even impossible, for most.
    • by SecurityGuy ( 217807 ) on Friday April 29, 2005 @12:06PM (#12384358)
      The only thing left is to write code to block stupid people from reproducing.


      Unfortunately the user interface for the relevant hardware has a very intuitive point and shoot interface.

    • What would be more perfect is simply being a competant admin in the first place, and not letting your users have permissions to fuck everything up. Nevermind that this is for NetBSD, which doesn't have a whole lot of viruses, nor a defrag program.
  • by Bret Tobey ( 844402 ) on Friday April 29, 2005 @11:02AM (#12383603) Homepage
    Assuming you can get around bandwidth monitoring, how long before this becomes incorporated into hacking tools. Add this to a little spyware and a zombie network and things get very interesting for poorly secured networks & computers.
    • This requires cooperation from the kernel. If you have enough privileges on the target machine to install a kernel module then the game is already over. Rootkits have been around for decades that do things just as invasive, such as monitor all network traffic passing through the box for passwords.

      If somebody has the ability to install this on a machine then the problem is not with this module, it's that the person somehow got root privileges. In that sense this is no more of a "hack" than ssh, rsync, ne
      • You're right, root = compromised. Tools like this aren't good or bad by themselves, it's the user. I could see this tool being modified to scoop data on already compromised systems, kind of like a virtual "smash & grab." It will be interesting to see how this gets incorporated into other methods & kits, good or bad.
  • by OutOfMemory ( 879817 ) on Friday April 29, 2005 @11:04AM (#12383625)
    I've been using der Mouse to copy files for years. First I user der Mouse to click on the file, then I use der Mouse to drag it to a new location!
  • Doesn't NetBSD support dump -L the way FreeBSD does? This strikes me as a much more powerful and general solution than this custom tool...
  • by hal2814 ( 725639 ) on Friday April 29, 2005 @11:04AM (#12383631)
    Maybe setup is inconvenient. Remote backups using dd and ssh (our method) was a bit of a bear to initially setup, but thanks to shell scripting and cron and key agents, it hasn't given us any problems. I've seen a few guides with pretty straightforward and mostly universal instructions for this type of thing. That being said, I do hope this software will at least get people to start looking seriously at this type of backup since it lets you store a copy off-site.
  • If one tries to clone an FS that is active, can this cloning tool handle open/changin files (often the most important/recent-in-use files on the system)? I remember an odd bug in an Mac OS X cloning tool that would create massive/expanding copies of large files that were mid-download during a cloning.
  • Isn't there an automated network disk backup tool for paranoids like me?

    Well, I'm not really paranoid, but I had some cases where faulty file system drivers or bad RAM modules changed the content of some of my files and where I have then overwritten my backup with these bad files.

    Isn't there any automatic backup solution that avoids such a thing? What I have in mind: there should be several autonomous instances of backup servers (which may actually reside on desktop PCs linked via LAN) that control each o
    • Use rsync and hardlinked snapshots. There are lots of examples out there. I rolled my own a while back, but if you want something relatively nicely polished and based on that idea, check out dirvish [dirvish.org] (I didn't find that until after I already had my system set up).

      I really like having several months worth of nightly snapshots, all conveniently accessible just like any other filesystem, and just taking up slightly more than the space of the changed files.
    • Yes, there is, but it's expensive

      IBM Tivoli Storage Manager [ibm.com] Just Works (after a rather complicated setup process), does its job in the background on whatever schedule you choose, does it without complaint, maintains excruciatingly detailed logs, maintains multiple back-revisions of files, works over a network, SAN, or shared-media, and talks to tape drives and optical drives and pools of cheap disk. If you want, backups can be mirrored across multiple TSM serves, and you can always fire up the (simple, ug

  • by Anonymous Coward
    Wel, not a solution for BSD people (unless you're running a bsd under Xen and the toplevel linux kernel is doing the DRBD).

  • by RealProgrammer ( 723725 ) on Friday April 29, 2005 @11:26AM (#12383846) Homepage Journal
    While this is cool, as I thought when I saw it on KernelTrap, disk mirroring is useful in situations where the hardware is less reliable than the transaction. If you have e.g., an application-level way to back out of a write (an "undo" feature), then disk mirroring is your huckleberry.

    Most (all) of my quick restore needs result from users deleting or overwriting files - the hardware is more reliable than the transaction. I do have on-disk backups of the most important stuff, but sometimes they surprise me.

    I'd like a system library that would modify the rename(2), truncate(2), unlink(2), and write(2) calls to move the deleted stuff to some private directory (/.Trash, /.Recycler, whatever). Obviously the underlying routine would have to do its own garhage collection, deleting trash files by some FIFO or largest-older-first algorithm.

    Just a thought.
    • disk mirroring is your huckleberry

      WTF?
    • by gordon_schumway ( 154192 ) on Friday April 29, 2005 @12:00PM (#12384303)

      I'd like a system library that would modify the rename(2), truncate(2), unlink(2), and write(2) calls to move the deleted stuff to some private directory (/.Trash, /.Recycler, whatever). Obviously the underlying routine would have to do its own garhage collection, deleting trash files by some FIFO or largest-older-first algorithm.

      Done. [netcabo.pt]

    • I'd like a system library that would modify the rename(2), truncate(2), unlink(2), and write(2) calls to move the deleted stuff to some private directory (/.Trash, /.Recycler, whatever). Obviously the underlying routine would have to do its own garhage collection, deleting trash files by some FIFO or largest-older-first algorithm.

      Why modify the system calls? Keep the system calls simple and orthogonal, so the kernel codebase stays small(er). Write this functionality in userland, starting wherever you are

      • Why modify the system calls? ... Write this functionality in userland ... write wrappers to the C calls to do this (etc.)

        So, are you saying that the parent should modify every single binary on the system???? Including binaries that he may not have source to? Sounds pretty much unworkable. While I wouldn't propose that the parent poster actually implement such a system, the only reasonable place to do this IS at the system call level where it can be applied to everything.

        Personally, I think you are better
        • Here is the problem: Existing programs are written knowing that deleted programs dissappear immediately. Therefore, since programs may be writing temporary files to /tmp or elsewhere, or even have their own backup systems, a garbage system with limited space could end up playing housekeeper for thousands of unused or redundant files, and few of the legitimate ones.

          Yes, my solution only works for future use; but the system call solution breaks the expectancies of already written programs, and muddles the un

    • This seems very similar to Network Appliance's Filer "SnapMirror" product. It copies changed disk blocks across the net to another system, for disaster recover purposes mainly, but could also be used for read-only use (e.g., publishing). NetApp's license fees for this feature are huge, like $40K per side I think.

      I'd really like to use this for backup and disaster recovery. Couple it with FreeBSD's snapshot and you have a large part of the NetApp functionality.

  • nothing new (Score:2, Interesting)

    by Afroplex ( 243562 )
    Novell Zenworks has had this capability for sometime in production environments. It also integrates with their management tools so it is easy to use on an entire network. To say this technology is newly discovered is a far cry from the truth. They also use Linux on the back end of the client to move the data to the server.

    It is nice though to have something like this in the open source world though. Competition is good.
  • How soon do you think this will this be available in the Major Linux distros? I would love to have this for my debian machine. Perhaps I wouldn't have had to spend all last Saturday rebuilding my machine and restoring individual files.

    SIGS!!!We don't need no stinkin sigs

  • Wacky idea (Score:3, Insightful)

    by JediTrainer ( 314273 ) on Friday April 29, 2005 @11:44AM (#12384080)
    Maybe I should patent this. Ah well, I figure if I mention it now it should prevent someone else from doing so...

    I was thinking - I know how Ghost supports multicasting and such. I was thinking about how to take that to the next level. Something like Ghost meets BitTorrent.

    Wouldn't it be great to be able to image a drive, use multicast to get the data to as many machines as possible, but then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

    I think that would really rock if someone wanted to image hundreds of machines quickly and reliably.

    I'm thinking it'd be pretty cool to have that server set up, and find a way to cram the client onto a floppy or some sort of custom Knoppix. Find server, choose image, and now you're part of both the multicast AND the torrent. That should take care of error checking too, I guess.

    Anybody care to take thus further and/or shoot down the idea? :)
    • Wouldn't it be great to be able to image a drive, use multicast to get the data to as many machines as possible, but then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

      Multicast will work across subnets (you just need to set the TTL > 1).
      • BitTorrent would not be required, as you probably don't want to be distributing your custom OS images to the outside world.

        With BitTorrent you could set up your server as the tracker and multicaster for your images. BitTorrent doesn't HAVE to make it out onto the internet, you just keep the BT traffic inside your corporate network. The BT would be extremely helpful to distribute the load across multiple computers instead of just hitting one machine.

        Another thing, I was thinking (usually a bad thi

        • Multicasted BitTorrent is a complete waste. The idea with multicast is that there is no real "load" on the sender -- you can run an open-loop multicaster with your image, and people can join the group to download it. Alternately, you can use a protocol like MTFTP to make it a bit more "on-demand".

          Either way, bittorrent is completely useless in an environment where multicast is available.
    • Re:Wacky idea (Score:1, Informative)

      by Anonymous Coward
      check frisbee (emulab.net) for fast reliable
      multi/unicasting system images
    • Re:Wacky idea (Score:3, Insightful)

      by evilviper ( 135110 )
      I must shoot down your idea. I have lots of experience with this sort of thing.

      then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

      Bittorrent poses NO advantage for this sort of thing. Why not just a regular network service, unicasting t

  • by Anonymous Coward
    I've used Linux for years to do this using md running RAID1 over a network block device. It works very well unless you have to do a resync. Is this better than that?

    I'm asking because I'm backing-up about a dozen servers in real-time using this method, and if this method is more efficient, then I might be able to drop my bandwidth usage and save money.
    • In fact, I can't think of any way that could possibly be worse than what you are doing now. Running a RAID1 over a network block device is horribly innefficient, and slow as all hell. This just backs things up when you want to, not all the time constantly with every trivial change like a network mirror does.
  • I have done that 12 years ago on AIX with no problems as long as (a) the hd you dd it off from and to are sound and (b) there are no transmission failures beyond what rsh (at that time) would retry and mask.
  • ghost 4 unix (Score:3, Interesting)

    by che.kai-jei ( 686930 ) on Friday April 29, 2005 @12:14PM (#12384449)
    not the same?

    http://www.feyrer.de/g4u/ [feyrer.de]
  • I just took one of our mailservers offline a minute ago to do a block-level copy, so this would be fantastic. I develop images for our machines, e.g., mailserver, etc, and then dd them onto other drives. When I update one machine, I then go around and update the others with the new image. This saves me tons of time, and we do a similar thing with desktops and Norton Ghost (although, if I'm not mistaken, this actually a file level copy).

    And since we're running OpenBSD on those machines, porting this sho

  • How about disk cloning across servers, for on-demand scalability? As a single server reaches some operating limit, like monthly bandwidth quota, disk capacity, CPU load, etc, a watchdog process clones its disks to a fresh new server. The accumulating data partition may be omitted. A final script downs the old server's TCP/IP interface, and ups the new one with the old IP# (/etc/hostname has already been cloned over). It's like forking the whole server. A little more hacking could clone servers to handle loa
  • WTF (Score:5, Informative)

    by multipartmixed ( 163409 ) on Friday April 29, 2005 @12:52PM (#12384963) Homepage
    Why on earth are people always so insistent on doing raw-level dupes of disks?

    First of all, it means backing up a 40GB with 2 GB of data may actually take 40GB of bandwidth.

    Second of all, it means the disk geometries have to be compatible.

    Then, I have to wonder if there will be any wackiness with things like journals if you're only restoring a data drive and the kernel versions are different...

    I have been using ufsdump / ufsrestore on UNIX for ...decades!. It works great, and its trivial to pump over ssh:

    # ssh user@machine ufsdump 0f - /dev/rdsk/c0t0d0s0 | (cd /newdisk && ufsrestore f -)

    or


    # ufsdump 0f - /dev/rdsk/c0t0d0s0 | ssh user@machine 'cd /newdisk && ufsrestore 0f -' .. it even supports incremental dumps (see: "dump level"), which is the main reason to use it over tar (tar can to incremental with find . -newer X | tar -cf filename -T -, but it won't handle deletes).

    So -- WHY are you people so keen on bit-level dumps? Forensics? That doesn't seem to be what the folks above are commenting on.

    Is it just that open source UNIX derivative and clones don't have dump/restore utilities?
    • WHY are you people so keen on bit-level dumps? Forensics?

      Yes!

      EnCase Enterprise Edition costs $10,000 per license. This software basically mimmicks EnCase's functionality for free.

      If der Mouse were to port this to the Windoze world, and get CFTT (http://www.cftt.nist.gov/ [nist.gov] to validate it's forensic soundness, he could make a fortune undercutting Guidance Software.
    • There are situations when this is desirable - especially in testing environments.

      I used to QA a series of imaging tools on Windows boxes, which envolved performing a series of regression tests over the software install and operation. The software had to work on 98/2K/2000/XP, with or without any number of updates and service packs, and in concert with several versions of either IE or Netscape (4,6,and 7 series). Having a block level copy of the disk of a test machine in various system configuration states
    • Re:WTF (Score:2, Interesting)

      by JonMartin ( 123209 )
      I hear ya. We've been cloning our labs with dump/restore over the net for years. Works on everything: Solaris, *BSD, Linux. Wrapper scripts make it a one line command.

      I know some Linux distros don't come with dump/restore. Maybe that's why more people don't use it.
    • Re:WTF (Score:3, Interesting)

      by evilviper ( 135110 )

      Why on earth are people always so insistent on doing raw-level dupes of disks?

      I can think of a few reasons. It makes time-consuming partioning/formatting unnecesary. It does not require as much work to restore the bootable partion (ie. no need to bootstrap to run "lilo", "installboot" or whatnot). But mainly, because there are just no good backup tools...

      I have been using ufsdump / ufsrestore on UNIX for ...decades!. It works great, and its trivial to pump over ssh:

      Full dumps work fine, despite

    • Re:WTF (Score:3, Interesting)

      by setagllib ( 753300 )
      You missed the point. Here you only need to copy the image once and then all subsequent writes are done on both images at once (the on-disk and the network one). That means that everything after the initial copy (assuming you begin doing this on an existing fs) is as efficient and real-time as possible, requiring no polling for changes or any scheduling. It is essentially RAID1 over a network. Although it doesn't do much against system crashes (since neither side will have the final syncs and umount writes)
  • by RonBurk ( 543988 ) on Friday April 29, 2005 @12:59PM (#12385069) Homepage Journal
    Image backups have great attraction. Restoring is done in one big whack, without having to deal with individual applications. Absolutely everything is backed up, so no worries about missing an individual file. etc. So why haven't image backups replaced all other forms of backup? The reason is the long list of drawbacks.

    • All your eggs are in one basket. If a single bit of your backup is wrong, then the restore could be screwed -- perhaps in subtle ways that you won't notice until it's too late to undo the damage.
    • Absolutely everything is backed up. If you've been root kitted, then that's backed up too. If you just destroyed a crucial file prior to the image backup, then that will be missing in the restore.
    • You really need the partition to be "dead" (unmounted) while it's being backed up. Beware solutions that claim to do "hot" image backups! It is not possible, in the general case, for a backup utility to handle the problem of data consistency. E.g., your application stores some configuration information on disk that happens to require two disk writes. The "hot" image backup software happens to backup the state of the disk after the first write, but before the second. If you then do an install, the disk is corrupted as far as that application is concerned. How many of your applications are paranoid enough to survive arbitrary disk corruption gracefully?
    • Size versus speed. Look at the curve of how fast disks are getting bigger. Then look at the curve of how fast disk transfer speeds are getting faster. As Jim Gray [microsoft.com] says, disks are starting to behave more like serial devices. If you've got a 200GB disk to image and you want to keep your backup window down to an hour, you're out of luck.
    • Lack of versioning. Most disk image backups don't offer versioning, certainly not at the file level. Yet that is perhaps the most common need for a backup -- I just messed up this file and would like to get yesterday's version back, preferably in a few seconds by just pointing and clicking.
    • Decreased testing. If you're using a versioned form of file backup, you probably get to test it on a fairly regular basis, as people restore accidental file deletions and the like. How often will you get to test your image backup this month? Then how much confidence can you have that the restore process will work when you really need it?

    Image backups certainly have their place for people who can understand their limitations. However, a good, automatic, versioning file backup is almost certainly a higher priority for most computer users. And under some circumstances, they might also want to go with RAID for home computers [backupcritic.com].

    • Image backups certainly have their place for people who can understand their limitations. However, a good, automatic, versioning file backup is almost certainly a higher priority for most computer users.

      Great. Now, could you please enlighten us as to what a good, automatic, versioning file-based backup system might consist of?

      AFAICT, this doesn't seem to exist. It doesn't matter how much sense it makes, or how perfect the idea is. It is simply unavailable.

      In fact, the glaring lack of such a capable s
      • Ummm. Well, there's DAR [linux.free.fr] and there's kdar [sourceforge.net]. I think there's even a win32 version for the clueless.

        It doesn't get much easier than this. You can have a sane, incremental backup setup in a single line cronjob or even point and click one up.

        If that's not simple enough for you then you have no business of storing or working with sensible data.
    • It's not that complicated. Disk image backups and file-level backups are not intended to serve the same purpose.

      Disk image backups are pure disaster recovery or deployment. Something is down and needs to be back up ASAP, where even the few minutes of recreating partitions and MBRs is unwanted. Or it's about deploying dozens or hundreds of client systems as quickly as possible with as few staff as possible.

      File level backups are insurance for users. Someone deletes/edits/breaks something important and
  • The facility today supports symmetric cryptography, based on a shared secret. The secret is established out-of-band of the network mirror facility today. User identification, authentication and session encryption are all based on leveraging the pre-established shared secret.
    ----------- Confucious say: "The shared secret is no longer a secret."
  • And what about Ghost for You [feyrer.de]. this does netbackup with onlye one 1.44" disk.
  • I run rsync on a backup server, and save the files without compression on removeable disks.

    It makes it alot easier to find a file, cause it exists in the same location, uncompressed.

    The huge advantage though, is that rsync only transfers those files that have changed. Which means that backups are very quick.

    I also mount samba shares on the backup server, and do rsync backups of "My Documents" folders for the windows boxes. Works great there too!

    Even better, the My Documents folders are available as

It is easier to write an incorrect program than understand a correct one.

Working...