Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?

samzenpus posted about 2 years ago | from the save-often dept.

Data Storage 405

An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"

cancel ×

405 comments

Sorry! There are no comments related to the filter you selected.

USB and disk Speed (4, Insightful)

gagol (583737) | about 2 years ago | (#40943507)

May be your limiting factor here.

Re:USB and disk Speed (4, Informative)

gagol (583737) | about 2 years ago | (#40943529)

If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

Re:USB and disk Speed (3, Insightful)

drsmithy (35869) | about 2 years ago | (#40943651)

If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

I'd be willing to bet his change rate isn't 24TB/day.

Re:USB and disk Speed (5, Funny)

jamesh (87723) | about 2 years ago | (#40943779)

If the OP's porn collection can be logically broken up at some level, eg:

/porn/blonde
/porn/brunette
/porn/redhead

then the backup software could create one job for each directory, and multiple USB disks could be attached at once giving increased throughput. USB3 also increases speed to the point where the 7200RPM disk itself will become the bottleneck.

So at 100MB/second per disk write speed with 4 disks going at once (assuming the source disks are capable of this supplying this volume of data and there are no other throughput limitations), you could do it in 16 hours, or 24 hours with more realistic margins.

If it turns out that the source data is not porn (unlikely) and is highly compressible, then it could be done in far less time.

Bacula can do all of this.

Re:USB and disk Speed (2, Funny)

Anonymous Coward | about 2 years ago | (#40943869)

Or, he could watch the content as it is copied. At 600 Mbytes/hour (assuming standard mpeg compression), it would be a month of 24/7 nonstop action!

"- Hey boss, I need to, uhh, work from home for the next four weeks to handle the backup..."

Re:USB and disk Speed (4, Funny)

Pieroxy (222434) | about 2 years ago | (#40943883)

then the backup software could create one job for each directory,

Is that what we call a blow job?

Re:USB and disk Speed (1)

Anonymous Coward | about 2 years ago | (#40944167)

Is that what we call a blow job?

We're talking about multiple "directories" (know-what-I-mean? notch-notch. say-no-more), so I think we would call it blow bang, DP, gang bang or orgy.

Now, imagine a beowulf cluster fuck of those... :-)

Re:USB and disk Speed (4, Funny)

ilikejam (762039) | about 2 years ago | (#40944173)

No. No it is not.

Re:USB and disk Speed (5, Interesting)

Anonymous Coward | about 2 years ago | (#40943837)

Agreed. Best thing I ever did was get a computer case with a SATA sled bay, like one of these [newegg.com] . It won't help with breaking up the files, but a plain SATA connection will be many times faster and many times cheaper than getting external USB drives (because you don't have to keep paying for external case + power supply). After you copy it over, you just store the bare drives in a nice safe place.

This assumes it's a one-time or rare thing. If you do want access or the backup process is a regular thing, then an NAS or RAID setup is probably more convenient so that you don't have to keep swapping drives in and out.

Re:USB and disk Speed (2)

shokk (187512) | about 2 years ago | (#40943933)

If he's looking for reliability in a backup, then his choice of disks is going to be a factor. A drive with consumer grade chances of URE is going to die in a handful of writes and reads. USB grade drives (Caviar Green anyone?) aren't known for their reliability. Something like a Hitachi Ultrastar RE has a very very low chance of encountering a URE, so will be much more reliable.

solution (1)

Anonymous Coward | about 2 years ago | (#40943513)

1.take all hard drives out of USB enclosures
2.install in PC with multiple SATA cards
3.samba

Re:solution (4, Informative)

aglider (2435074) | about 2 years ago | (#40943797)

3.samba

Uh? Why?
cp -a is all you need once you put the HDD inside the target machine.
And if you put it into another machine on the same network, then rsync is the answer.
Forget about the buggy and slow SAMBA.

Re:solution (1)

Pieroxy (222434) | about 2 years ago | (#40943891)

Agreed. Samba should be at the very bottom of the list. It is the best solution only when there's no other solution.

Re:solution (1, Informative)

myowntrueself (607117) | about 2 years ago | (#40944119)

3.samba

Uh? Why?
cp -a is all you need once you put the HDD inside the target machine.
And if you put it into another machine on the same network, then rsync is the answer.
Forget about the buggy and slow SAMBA.

cp copies file by file.

A more efficient way is something like

tar -cf - .|(cd /somewhere/ ; tar xf -)

tar treats the directory contents as a data stream. Its much faster for large amounts of files and data.

doomed? (-1)

Anonymous Coward | about 2 years ago | (#40943521)

Really? Coding something that simple is a burden?

DaisyChain (1)

Anonymous Coward | about 2 years ago | (#40943525)

I believe you can daisy chain external drives together if you have the right cases.
For ease though, I'd consider a DroBo http://www.drobo.com/products/professionals/drobo-5d/index.php

Re:DaisyChain (2)

Captain Hook (923766) | about 2 years ago | (#40943577)

It's not mentioned by the Author, so I might be assuming too much but if he's trying to write to USB Drives as opposed to a RAID of some sort I figured he wanted to be able to read the drives individually, prehaps on a different machine without a network connection between them.

The drobo won't allow that, the file system is spread across all the drives.

I guess it kind of depends on what the author needs to do with the drives when he's finished writing to them.

Re:DaisyChain (1)

GCsoftware (68281) | about 2 years ago | (#40943953)

An 8 drive DroboPro with 3 TB disks might just about do it.

Check out:
http://www.drobo.com/products/professionals/drobo-pro/index.php [drobo.com]

Re:DaisyChain (2)

GCsoftware (68281) | about 2 years ago | (#40943963)

Actually 8x4 TB disks will do it, with the overhead etc, giving you 24.96 TB usable space.

JBOD or more accurately, spanned volume (0)

Anonymous Coward | about 2 years ago | (#40943541)

http://en.wikipedia.org/wiki/Spanned_volume
http://macs.about.com/od/usingyourmac/ss/raidjbod.htm

JBOD allows you to create a large virtual disk drive by concatenating two or more smaller drives together. The individual hard drives that make up a JBOD RAID can be of different sizes and manufacturers. The total size of the JBOD RAID is the combined total of all the individual drives in the set.

Re:JBOD or more accurately, spanned volume (2)

sumdumass (711423) | about 2 years ago | (#40943865)

how transportable is that though?

I mean, if i copied 200 gig across 3 drives in a jbod raid, could i plug just one drive in to access the information on another machine? Suppose my laptop only has 2 usb ports and i do not have a hub plus i'm running a different OS, does this mean i can't look for information on the set?

I have never used JBOD for raid, I have however used regular mirrored and stripped raids with and without fault tolerance (raid 5 and 10 or a mirrored stripe for instance) and know this can be a problem. In fact, I've even seen issues reading a complete raid set across systems when you aren't using a true hardware raid controller.

Re:JBOD or more accurately, spanned volume (2)

hawkinspeter (831501) | about 2 years ago | (#40944063)

Seems like a very bad idea to me. You'll have trouble creating a JBOD device without connecting all the drives simultaneously. Also, you're basically increasing the chance that the entire JBOD volume will be broken as the number of drives goes up. If you've got one drive failing, you'll be lucky to get any data back at all.

To my mind, Bacula would be a good choice as you can set up virtual tapes that will correspond to the drives and you can set the backup to wait for the operator to swap over the drive and then continue the backup. Also, once you've got Bacula installed and working, it's easy to do incremental backups and thus not need to write out the whole dataset again.

Bacula is your friend (4, Informative)

bernywork (57298) | about 2 years ago | (#40943543)

http://www.bacula.org/en/ [bacula.org]

There's even a howto here:

http://wiki.bacula.org/doku.php?id=removable_disk [bacula.org]

Re:Bacula is your friend (1)

richlv (778496) | about 2 years ago | (#40943571)

was going to suggest bacula as well, but came a bit late :)

Re:Bacula is your friend (3, Informative)

Anonymous Coward | about 2 years ago | (#40943643)

Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium. As long as your mySQL catalog is intact restoration is a synch...

Did I mention it supports backup archiving as well if you want duplicate copies for Tapes being shipped off site...

Re:Bacula is your friend (5, Informative)

arth1 (260657) | about 2 years ago | (#40944049)

Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium.

Except for good old tar, which is present on all systems.

Most people are probably not aware that tar has the ability to create split tar archives. Add the following options to tar:
-L <max-size-in-k-per-tarfile> -M myscript.sh ... where myscript.sh echoes out the name to use for the next tar file in the series. It can be as easy as a for loop checking where the tar file already exists and returning the next hooked up volume where it doesn't.
Or it could even unmount the current volume and automount the next volume for you. Or display a dialogue telling you to replace the drive.

One advantage is that you can easily extract from just one of the tar files; you don't need all of them or the first-and-last like with most backup systems. Each tar file is a valid one, and at most you need two tar files to extract any file, and most of them just one.

Tar multivolume can, of course, be combined with tar's built in compression.

Re:Bacula is your friend (1)

hoover (3292) | about 2 years ago | (#40943659)

Another thumbs up for bacula if you need more than a single backup of your data (like copying it to drives only once)

Re:Bacula is your friend (0)

Anonymous Coward | about 2 years ago | (#40944131)

Bacula is really good, for what i've seen but a little difficult to install, configure it and have it running in the end, I believe tar is a good and fastest solution too! You can start with it and then take the time to set up Bacula. BTW the array of disks is a real fast-easy-cheap-non-handwork-requiring solution compared with USB HDDs.

use 'dd' inlinux (1)

Anonymous Coward | about 2 years ago | (#40943545)

Use 'dd' in linux

Why USB HDDs? (1, Interesting)

Anonymous Coward | about 2 years ago | (#40943547)

Are you REALLY sure that you want to use USB HDDs? The cost savings of using a box of HDDs may well be offset by the hassle in finding the backup software, the manual labor of swapping them, finding the correct drive to retrieve a certain file, etc.

How about a pair of Synology DS1512+ NASes? In addition to getting all of the storage online at all times, you get RAID support, etc.

Re:Why USB HDDs? (1)

jamesh (87723) | about 2 years ago | (#40943795)

Are you REALLY sure that you want to use USB HDDs? The cost savings of using a box of HDDs may well be offset by the hassle in finding the backup software, the manual labor of swapping them, finding the correct drive to retrieve a certain file, etc.

How about a pair of Synology DS1512+ NASes? In addition to getting all of the storage online at all times, you get RAID support, etc.

No reason why they can't all be attached at once. with 3TB disks, and 8 USB3 ports, you'ld only need to plug them all in to do the backup then remove them all to take them offsite when the backup is done.

A few portable NAS's holding 4 disks each might be a better option, but don't exclude USB for its simplicity.

Split into multiple tar files? (5, Informative)

Anonymous Coward | about 2 years ago | (#40943549)

I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?

Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:

How to Create a Multi Part Tar File with Linux [simplehelp.net]

A Full 24TB using only 2 USB ports (2)

Bondolon (1000444) | about 2 years ago | (#40943551)

Assuming you're not worried about backup speed, you could use a four-bay external hard-drive enclosure in combination with RSYNC and LVM on any linux variety. I don't know if they all do, but the MediaSonic HF2-SU3S2 supports 3TB hard drives per bay, which means that two of them could be used in conjunction to provide 24TB of backup storage. Since you can make a large volume out of the full 24TB using LVM, you could even use something like dd to write to the disk (RSYNC with the archive option would be a better choice though, imho).

RAID (5, Informative)

Anonymous Coward | about 2 years ago | (#40943553)

For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.

Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.

For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

Re:RAID (1)

dutchwhizzman (817898) | about 2 years ago | (#40943705)

If not a RAID (those tend to fail just as hard) get at least two, possibly three copies of each file on separate drives. The last thing you want is to wait for RAIDs to recover and watch them fail during recovery, with your only copy of a file on them.

Re:RAID (2)

Kjella (173770) | about 2 years ago | (#40943875)

For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning. Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files. For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

Yeah... though I suspect with the price premium for 4TB drives - they're huge - and the cost of an 8-port RAID6 capable RAID card you're considerably above the budget he was going for. If this is like "projects" or something I'd probably suggest the human archiving method - split your live disk into three areas, "work in progress" and "to archive" and "archive". Your WIP you back up completely every time, your "to archive" you add to the latest archive disk (plain, no RAID), and make an index of it so you can easily find on which archive disk it is then move it to "archive" on the live disk. Very low tech incremental backup but this seems like a hobby project. I certainly hope it's not a company's backup / disaster recovery plan...

Re:RAID (1)

Anonymous Coward | about 2 years ago | (#40944235)

Very low tech incremental backup but this seems like a hobby project.

Call me old, but I wasn't expecting anyone calling 24TB a hobby project...

Now get off my lawn!

Re:RAID (-1, Redundant)

fatphil (181876) | about 2 years ago | (#40943943)

For that much data, I'd recommend just keeping the original HD-DVDs and Blu-ray media for your "20 GB" files, and the original CDs for your "3 MB" files.

We are helping someone back up his music and audio collection, aren't we?

And they won't be pirated, will they?

Re:RAID (1)

DRJlaw (946416) | about 2 years ago | (#40944195)

We are helping someone back up his music and audio collection, aren't we?

Well, actually you're not helping someone do anything. You're just vomiting up speculative accusations.

And they won't be pirated, will they?

See above. Then ignore children, CD rot, and every other legitimate reason for backing up the optical media that you've spent your hard-earned money on.

Now run along and sue everyone who's provided actual, helpful advice. However, you may want to look up the standard for "contributory infringement" first...

Re:RAID (-1)

Anonymous Coward | about 2 years ago | (#40944001)

Repeat after me:

RAID IS NOT A BACKUP SOLUTION.
RAID IS NOT A BACKUP SOLUTION.
RAID IS NOT A BACKUP SOLUTION.

___
Filter error: Don't use so many caps. It's like YELLING. Me error: No shit, sherlock? Yelling was invented for a purpose. This is is. Filter error: Don't use so many caps. It's like YELLING. Me error: No shit, sherlock? Yelling was invented for a purpose. This is is. Filter error: Don't use so many caps. It's like YELLING. Me error: No shit, sherlock? Yelling was invented for a purpose. This is is. Filter error: Don't use so many caps. It's like YELLING. Me error: No shit, sherlock? Yelling was invented for a purpose. This is is.

Julian? (5, Funny)

WinstonWolfIT (1550079) | about 2 years ago | (#40943569)

Out on bail mate?

Re:Julian? (0)

Anonymous Coward | about 2 years ago | (#40943915)

Kim DotCom?

git-annex (4, Informative)

Anonymous Coward | about 2 years ago | (#40943585)

You might want to look into git-annex:
http://git-annex.branchable.com/ [branchable.com]

I've not tried it, but it sounds like an ideal solution for your request, especially if your data is already compressed.

NAS Box (1)

second_coming (2014346) | about 2 years ago | (#40943591)

http://www.synology.com/products/product.php?product_name=DS2411%2B&lang=uk [synology.com] Still portable enough to do your backup then take offsite.

Re:NAS Box (0)

Anonymous Coward | about 2 years ago | (#40943621)

This brings the question. Who does the data belong to, where will it go when it's copied and honestly. Why would you use USB drives for backup? There are better professional solutions out there. If you like experimenting, why not go for tape? I'm told a lot of major institutions/corporations use them. Is that exotic enough for ya?

Re:NAS Box (-1)

Anonymous Coward | about 2 years ago | (#40943817)

This brings the question. Who does the data belong to, where will it go when it's copied

This brings the answer. None of your fucking business, you nosey twat.

Re:NAS Box (1)

symes (835608) | about 2 years ago | (#40943663)

I have a Synology NAS and I'm very pleased with it. I don't have anywhere near the volume of data the OP has though. One thing with a NAS is that you'll be subject to the networks available bandwidth and, depending on your set up, this could make backing up lots of data pretty darn tedious. And might annoy admin (and other users). So while a decent portable raid might be the better option, it might be better to find one that just plugs in rather than use the network. Might find one that can be setup to use SSDs as well.

Re:NAS Box (0)

Anonymous Coward | about 2 years ago | (#40943863)

24 TB SSD ??? It would severely stress MY bank account :-(

Sometimes Simple is Harder... (0)

Anonymous Coward | about 2 years ago | (#40943619)

If you have 24TB of data to backup, it would be easier to just build another 24TB storage array. The amount of time you would spend swapping disks and then validating that disks don't go bad would sap any "savings" of not building a big array to begin with.

So, I would buy up some cheap dual-core dual processor xeon systems that ebay is flooded with currently, buy as much raid 5 and sata disks as it takes to get to 24tb with raid 5, and then you can actually do a meaningful backup that doesn't have a labor cost factored to each iteration.

I'm assuming the original 24tb exists in RAID 5 already, so if you have access to the existing hardware infrastructure, just buid a RAID 5 mirror. If you're doing a web mirror, RAID5 should be good enough and if you loose more than one disk then worry about restoring from the other mirror members.

Tape? (5, Insightful)

mwvdlee (775178) | about 2 years ago | (#40943625)

Why not tape, backup RAID, SAN or some other dedicated backup hardware solution?
24TB is well within the range that a professional solution would be required.
Given a harddisk size of ~1TB, making a single backup to 24 disk isn't a backup; it's throwing data in a garbage can.
More than likely atleast one of those disks will die before it's time.

Re:Tape? (4, Insightful)

Lumpy (12016) | about 2 years ago | (#40943783)

Yup. spool to tape. get a SDLT600 tape cabinet and call it done. if you get a 52 tape robot cabinet you will have space to not only hold a complete backup but a second full backup in incrementals that will all run automatically. Plus it has the highest reliability.

And anyone whining about the cost. If your 24Tb of data is not worth that much then why are you bothering to back it up?

Re:Tape? (0)

Anonymous Coward | about 2 years ago | (#40943793)

"24TB is well within the range that a professional solution would be required."

Wot? It's my movie collection!

Re:Tape? (1)

mwvdlee (775178) | about 2 years ago | (#40943945)

Assuming the 24TB is worthy to backup as a single backup.
Your movie collection is probably (A) not worthy of backup and (B) far more easily backed up as individual movies.

This is the key question (0)

Anonymous Coward | about 2 years ago | (#40944145)

Let's say both the primary file and a 1TB backup disk fails. Is the damage felt by OP equal to 1/24 of his happiness or less? Then multiple drives is very justified. Is there a chance this 1TB drive contains a database that makes half the data completely useless when it crashes? Then multiple backups/redundancy is required. This is a vital piece of information to make a recommendation.

Re:Tape? (0)

Anonymous Coward | about 2 years ago | (#40943871)

Parent is right on the money, this type of backup requires a professional solutions and tape or some sort of SAN is definitely the method that you will want to use. I've seen plenty of Hadoop systems use this method.

Re:Tape? (5, Informative)

Anonymous Coward | about 2 years ago | (#40943955)

No kidding. For $2400, you get 24x TB HDs and a bookkeeping nightmare if you ever actually resort to the "backup." For $3k, you get a network-ready tape autoloader with 50-100TB capacity and easy access through any number of highly refined backup and recovery systems.

Now, if the USB requirement is because that's the only way to access the files you want to steal from an employer or government agency, then the time required to transfer across the USB will almost guarantee you get caught. Even over the weekend. You should come up with a different method for extracting the data.

Re:Tape? (0)

Anonymous Coward | about 2 years ago | (#40943999)

More than likely atleast one of those disks will die before it's time.

Depending on what is stored(porn collection?) and how those 24 disks are handled it might not be a big problem. You lose 4.16% of the data if one disk fails. But this is the backup so you would have to lose the original and the backup at pretty much the same time to actually lose data.

Having them in a "one-fail-everyone-fail"-mode would be idiotic.

Just plain old rsync... (0)

Anonymous Coward | about 2 years ago | (#40943627)

The Btrfs filesystem allows you to merge multiple physical disks to a single filesystem.
(AFAIK it's not stable yet, but it just had to be mentioned!)

tar --multi-volume (5, Interesting)

jegerjensen (1273616) | about 2 years ago | (#40943653)

Evidently, our UNIX founding fathers had similar challenges...

Tar already does this (3, Informative)

cyocum (793488) | about 2 years ago | (#40943669)

Have a look at tar and it's "multi-volume" [gnu.org] option.

Re:Tar already does this (5, Informative)

leuk_he (194174) | about 2 years ago | (#40943739)

multi volume tar [gnu.org] Just mount a new usb disk whenever it is full.

However to have reasonable retrieve rate (going through 24 TB of data will rake some days over USB2), You better split the dataset in multiple smaller sets. That also has the advantage that if one disk chrashes (AND Consumer grade USB disk will chrash!) not your entire dataset is lost.

For that reason (diskfailure), do not use some linux spanning disk feature. File systems are lost when one of the disks they write on are lost. Unless you use a feature that can handle lost disks (Raid/ Zraid)

And last but not least: Test your backup. I have seen myself cheap USB interfaces failing to write the data to disk without a good error messages. All looks ok until you retreive the data and some files are corrupted.

Linuxquestions thread on multi-disk backups (2, Informative)

Anonymous Coward | about 2 years ago | (#40943671)

Here's a Linuxquestions thread [linuxquestions.org] outlining multi-disk backup strategies.

The gist of the discussion is to use DAR [linux.free.fr] .

No. (1, Insightful)

AdmV0rl0n (98366) | about 2 years ago | (#40943679)

I'm not sure if you posed the question out of being nieve, or if its just being daft. You don't want to be moving 24TB over the USB bus. End of discussion really - at least in terms of USB.

Whoever or however you ended up looking at USB for this was wrong/wrong way.

You have lots of choice in terms of boxes, servers, NAS boxes, locally attached storage. 24TB is in the range of midrange NAS boxes.

Once you have this, you can start to make choices on the many backup, replication, and duplication bits of software that already exist, both free and proprietary.

Re:No. (1, Informative)

ledow (319597) | about 2 years ago | (#40943851)

USB 2.0 provides 480Mbps of (theoretical) bandwidth. So unless you go Gigabit all over your network (not unreasonable), you won't beat it with a NAS. Even then, it's only 1-and-a-bit times as fast as USB working flat-out (and the difference being if you have multiple USB busses, you can get multiple drives working at once). And USB 3.0 would beat it again. And 10Gb between the client and a server is an expensive network to deploy still.

Granted, eSATA would probably be faster but there's nothing wrong with USB for such tasks if you *don't* want to provide Gigabit connections everywhere and (presumably) greater-than-gigabit backbones.

Re:No. (0)

Anonymous Coward | about 2 years ago | (#40944025)

Re: picking the media

Go with SAS, eSATA, or USB 3.0. (in that order.)
Your main constraints are the speed at which the first array can read blocks and the speed at which the second array can write blocks. And, the risk of hitting a data hazard at some later point.
You may be wise to wrap the output using a Run Length Limited (RLL) coding scheme and a FEC coding scheme.

Re: Archiving by splitting across multiple disks.
There is always dd and tar. ;-)

You know... (5, Funny)

marsu_k (701360) | about 2 years ago | (#40943683)

Porn is a renewable resource, there's no need to store so much of it.

Yeah, you missed basic common Linux knowledge (1)

Anonymous Coward | about 2 years ago | (#40943695)

Script your own solution for your specific problems.

That’s kinda the whole point of having a computer... as opposed to a set of appliances that happen to run on a computer you never use directly.

Seriously: Build your own homebrew NAS. (4, Interesting)

Qbertino (265505) | about 2 years ago | (#40943719)

What your attemting isn't easy, it's actually difficult.
Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.

Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.

My 2 cents.

Bash.... (4, Informative)

djsmiley (752149) | about 2 years ago | (#40943733)

First bash script to grab the size of the "current" storage;

compress the files up until that size;

Move compressed file onto storage;

request new storage, start again.

----------

Or, if you've got all the storage already connected; bash for 0..x; do { cp $archive$x /mount/$x/ }; done :D

we generate a lot of data (3 GB/min)... (1)

acidfast7 (551610) | about 2 years ago | (#40943751)

... by employing a detector with a size of 2463 x 2527 pixels (6M) at 12 Hz (12 times / sec). When run continuously for a set of data (roughly 900 degrees) ...

we collect 900 frames in roughly 2 minutes including hardware limitations for starting/stopping.

In proper format for processing, this works out to about 6MB/image and roughly 3GB/min for 2 minutes.

With an experienced crew of 3-4 people ... one handling the samples, one handling the liquid nitrogen, one running the software and one taking notes (overall monitoring also) ... we can run through 600 samples in a 24 shift ...

Which roughly works out to about 600 x 6GB = 3.6 TB on a "working" day.

To answer your question ... we never make physical copies of stuff ... the data stays online in multiple places on multiple continents ... and when something is published the data becomes publicly available in a central database

Why do you need a physical copy anyway?

Re:we generate a lot of data (3 GB/min)... (1)

ledow (319597) | about 2 years ago | (#40943819)

I'm not the OP but:

Because downloading 3.6Tb to restore from a backup for just one day is pretty ridiculous for someone on a home broadband?

Backup to external servers is ridiculous for anyone without university-sized access to the net. Hell, the school I work for try to back up 10Gb to a remote server each night and it often fails because it took too long (and we're only allowed to do that because we're a school - the limits for even business use on the same connection are about 100Gb a month).

Absent a stupidly fast connection for a home, you have to have a physical copy that you can put somewhere else.

The fact that you *don't* see that, tells me that you probably have far too much hardware and connectivity available to you.

Re:we generate a lot of data (3 GB/min)... (1)

acidfast7 (551610) | about 2 years ago | (#40943901)

Two quick things:

1. Why do a complete restore of the 3.6TB? Just take the files that want to use again/have been lost.

2. Why work at home? It's home, not work.

USB is not for backup (1)

aglider (2435074) | about 2 years ago | (#40943757)

USB is for a second working copy.
Backups should also ensure durability of the copy, while USB HDD have a shorter lifespan than a normal HDD which in turn has shorter lifespan than tapes, the usual medium for durable backups.

Re:USB is not for backup (1)

Anonymous Coward | about 2 years ago | (#40943843)

USB HDD have a shorter lifespan than a normal HDD which in turn has shorter lifespan than tapes

What planet are you from?

Re:USB is not for backup (1)

phillymjs (234426) | about 2 years ago | (#40943977)

I think he might mean that a HDD sitting in a server and running 24/7 will likely last longer than a HDD that's in an external enclosure and gets physically moved around and powered on/off frequently.

Use DAR or KDAR (2, Informative)

pegasustonans (589396) | about 2 years ago | (#40943771)

If you don't want to invest in new hardware, you could use DAR [ubuntugeek.com] or KDAR [sourceforge.net] (KDE front-end for DAR).

With KDAR, what you want is the slicing settings [sourceforge.net] .

There's an option to pause between slices, which gives you time to mount a new disk.

eSATA instead of USB or Firewire (0)

Anonymous Coward | about 2 years ago | (#40943773)

My experience is that eSATA II (3G) is about 4X faster than USB2. The benchmarks I have seen show that it is still faster than USB3. Today you can probably get eSATA III (6G)

Use purpose designed backup media. (1)

gallondr00nk (868673) | about 2 years ago | (#40943789)

Backup tapes were designed precisely for the problem you have. LTO-5 tapes are about 1.5TB, if I remember right. Stored correctly they shouldn't give any problems when you come to retrieve whatever is backed up. Most archiving efforts use backup tape, and they can't all be wrong :)

Re:Use purpose designed backup media. (1)

Antique Geekmeister (740220) | about 2 years ago | (#40944007)

Actually handling all those tapes and recovering data from them is very expensive in manpower and time, and can be very awkward for recovering data. Those tapes, and tape drives, are also _expensive_. They're useful for sites that require secure off-site storage, or encrypted off-site storage, but for most environments today they are pointless. Easily detachable physical storage has become very inexpensive, far more economical, and is far less vulnerable to the vulnerabilities of mishandling SCSI connections. I've seen far, far too many SCSI setups for tape drives and external media fail due to misconfiguration, miscabling, and the very poor driver integration of SCSI controllers in far too many operating systems. USB has proven startlingly simple, resilient, and _cheap_ to manage.

I use the external drive approach very frequently for data center migration and virtualization OS image migration, though usually I only back up the configuration files from the virtualized hosts, not the complete images. It's very effective. 24 TB is bigger than I've personally done this way, but it's certainly feasible if it's not treated as a single lump. If the the data can be factored reasonably before transferring, don't simply duplicate it every time. Split it up into reasonably sized chunks and _mirror_ it onto the USB drives, so the first backup is lengthy but following backups are far more efficieint.

Assuming that the backup system is Linux based, the ""rsync" tool can be written into a script to see which media is attached, to mount those media, and to mirror the contents of a set of directories to those media.. It's also reasonable to use a USB hub to allow mounting multiple USB devices simultaneously so it can be done all at one time, rather than having to swap media.

Re:Use purpose designed backup media. (1)

tomtomtom (580791) | about 2 years ago | (#40944091)

Actually, for a data set this large it will probably work out only very slightly more expensive - and the benefit to be gained is worth it IMHO (in speed if nothing else - USB disks are *slow* and eat a lot of CPU). I live in the UK so I'll work in GBP. I think US prices are likely to be cheaper but the relative sizes will be similar.

I'd figure around ~£1100 for drive and SAS interface plus £500-700 for 24TB worth of media. Throw in an extra 2TB drive to spool to before you write to tape as well for say £150 (if you are buying SAS) and you get to less around £81/TB (which works out roughly the same as current external hard drive costs). If your data is precious though you'll want double the amount of media so you can store offsite (or at least have a spare backup). Then the lower marginal cost of tape vs disk will become apparent.

Yes, tape can be harder to configure correctly and swapping tapes over etc will be a pain for a set this large. But that's equally true for disk; and we all know that it's not a backup until you've checked that you can restore from it. User error in configuration of the backup scripts is way more likely to cause an issue than any kind of hardware error and for that reason alone, you are stupid in my opinion if you don't test your backups. If you test them, then you will spot any SCSI misconfiguration etc immediately.

I agree that for moving data around, disk (or network) is much much easier. But that wasn't the submitter asked about.

use zip (0)

Anonymous Coward | about 2 years ago | (#40943877)

" I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...)" - the split archive functions in the Linux zip program might be able to do this. But, I've never used this feature in Linux but remember using it on good old pkzip on dos when trying to span files across multiple floppy disks.

PAR (3, Informative)

fa2k (881632) | about 2 years ago | (#40943967)

I have just seen "PAR" a couple of times here on slashdot, haven't used it, but it seems great for this: http://en.wikipedia.org/wiki/Parchive [wikipedia.org] . You need enough redundancy to allow one USB drive to fail. And I would rather get a SATA bay and use "internal" drives than having to deal with external USB drives. Get "green" drives, they are slow but cheap.

24TB? (1)

codman1 (904493) | about 2 years ago | (#40943989)

Some sort of NAS or tape would be your best option without knowing more. How often do you need to do the "backup"? Is it really a "backup" or data replication eg. are you needing to restore the data after a serious failure. Have a look at this seems to have some good advise and i think could be a solution to your issue, as i see the big problem is the amount of time and the restorability of the data after a failure. http://www.smallnetbuilder.com/nas/nas-howto/31485-build-your-own-fibre-channel-san-for-less-than-1000-part-1 [smallnetbuilder.com]

Work-related? Get a REAL backup solution (0)

Anonymous Coward | about 2 years ago | (#40943991)

If this is work-related, and the 24 TB of data is critical to your company, DON'T FUCK AROUND WITH TOYS.

Get a real backup solution - before they get a real sysadmin.

NAS (2)

Wolfling1 (1808594) | about 2 years ago | (#40943995)

A 24TB NAS is not very hard to assemble. Relatively cheap, and basically transfers data at Gb speed - assuming that you populate it with fast disks. Set one up with RAID and you're away. Personally, I would do it with a low end server and a big-ass RAID array. That way, you can really control its behaviour via the OS. Linux is ferpect for this kind of thing.

Madness (0)

Anonymous Coward | about 2 years ago | (#40944043)

What I want to know is this:

Who would have managed to get 24TB of data, without already having a backup solution in place?

24TB is a lot of data. It isn't something you get overnight. It should have been apparent a *long* time ago that some kind of backup was going to be needed.

If this is business data, then someone has been neglegent.

eSATA (1)

tmshort (1097127) | about 2 years ago | (#40944057)

Your best bet for speed is likely to be eSATA.

Have you looked into something like this:
http://eshop.macsales.com/shop/NewerTech/Voyager/Hard_Drive_Dock [macsales.com]

The cost becomes noise when you consider how many drives you will end up needing, and per TB, will be cheaper than USB solutions.

I don't know how your data is organized, but if possible, you may want to back it up by project/directory/etc.

There are also online backup systems that can do what you want, but it'll take an extremely long time...

I know (2, Funny)

Anonymous Coward | about 2 years ago | (#40944069)

The iCloud! ;-)

Going about it all wrong (1)

Charliemopps (1157495) | about 2 years ago | (#40944071)

Get an old computer... anything will work really. You have to know someone that has one laying in their basement. Plug your drives into that. share the drives on your network. Use any general backup software and sequentially backup what you need to backup over the network. Now it will do it overnight and you really don't care how long it takes. It can even do it every night. If you want it safe from fire and such.... build a box out of 2x4s and Drywall scraps form homedepot. Make it 5 sheets thick and it'll withstand any housefire you could possibly have. If you really want to go hardcore you can pour a box out of concrete, but that'll be hard to move.

Read it from Torvald's lips (4, Funny)

zapyon (575974) | about 2 years ago | (#40944083)

"Only wimps use tape[*] backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
Linus Torvalds (1996) http://en.wikiquote.org/wiki/Linus_Torvalds [wikiquote.org]

(Isn't that prescience of "The Cloud"?)

––––––––––
* replace this with your favorite backup media of today ;-)

ZFS (0)

Anonymous Coward | about 2 years ago | (#40944101)

Connect 8 x 3 TB USB drives (more if you want RAIDZ), add all the disks to a pool and copy your data. If you need more space later, just add more disks to the pool. This will obviously be slow, but if what you need is a navigable copy of a lot of data, once you've made the copy, it won't matter.

This is what ZFS was designed for. I use Solaris & OpenIndiana, but there's a MacOS port, MacZFS.

Considerations (0)

Anonymous Coward | about 2 years ago | (#40944103)

1 You need to maximise your computer's ram
2 Firewire is definitely preferable to USB, its much faster, you can get discs which offer both.
If you are actually going to be copying from another USB device and not an internal hard disc, thats even more necessary
3 Use high capacity, high speed discs, minimum a terabyte
4 I use the OS facilities from my internal drives to the BU drives and just copy the drive. If there is space left for another drive I'll put more on. I never split a drive across one or more BU drives. Thus its easy to keep track of what's where. Wastes space, but Gb are cheap.
5 If you are going to overwrite data on a BU drive then format it first, it will then work faster and there is less likelyhood of errors.
6 If the data is really important to you, then you have an additional bind. For that sort of data, you need to also have a copy in another location in case where the computer is gets wrecked for some reason – that takes out both the original and the backups. If its at work then your home should suffice to store the BU.
JS

Kim Dotcom (1)

Albinoman (584294) | about 2 years ago | (#40944111)

It's a little late to be asking that now.

ZFS + Tape Backup (0)

Anonymous Coward | about 2 years ago | (#40944115)

Online RaidZ ZFS with dual parity + ongoing offsite tape backups is the only way I would conduct this backup.

Count Bacula (2)

freaker_TuC (7632) | about 2 years ago | (#40944153)

Count Bacula as your friend ;) -> http://www.bacula.org/ [bacula.org]

What do you have now? (1)

DarwinSurvivor (1752106) | about 2 years ago | (#40944155)

Sometimes the easiest way to duplicate (back up) data is to simply duplicate the hardware it's already on. If it's on a 16-disk (x 2TB) NAS system, build another one. If it's on tape, buy more tapes, if it's on random HDD's scattered all over the place, then you have bigger problems to deal with first (like building a NAS box)!

Backup advice (2, Insightful)

Anonymous Coward | about 2 years ago | (#40944171)

I do things like this all the time with a data set about half of that, ~ 12TB. You didnt say anything about what the data is but from the request and the fact you mentioned USB I would gather this is your typical warez hording mp3/flac, mkv, apps and also a personal picture and video collection of fam.

Here is a checklist i would execute similiar to mine. I find the most reliable way to keep your data over the years is by following a checklist or procedure and choosing when to move to the next storage platform.

Step 0: Get USB out of your head. Pop upon the drive and attach it to the native bus, PATA, SATA. if SATA may want to invest in ESATA cases. Its not solely the speed. I have done stupid things like this, in which the data backup takes over 2 days, and on the 2nd day some unrelated event affecting my USB bus causes all kinds of problems with the transfer. Over time doing cheesy things like this affects other things, like doing stupid shit in real life, usually with duct tape or guerrilla glue, then you have your wife on you. Right now your wife may not catch on to this, but it will escalate. Just do shit the right way.

Step 1: Organize. Actually understand what you are backing up. I never got into these tools like google desktop that allow a user to accept the fact that he/she has no idea where their files are. Understand and make an effort to organize your files before you back them up and know the capacity of each 'genre' of crap you are backing up. Run a tool like 'jdiskreport' to find this information out after you organize. Create a mapping on paper of where shit is going, zork style. If you have really important shit like family pictures, taking up say 200GB, and your mkv collection is 12TB, you may want to make 2x copies of your family shit. Anything you download off the internet is easily replaceble despite how obscure your tastes may be and will turn up again. I would question even backing it up but that is another conversation.

Step 2. Label your drives accordingly to your documentation.

Step 3. Format the drives in the most likely native format you will use and are familiar with. If you are a noob linux guy who runs Windows 7 all the time, dont be an idiot and experiment with your backup on ext3. It is not that ext3 is a bad filesystem, but you may not be the most skilled in restoring your data in various scenarios. For example im a linux and solaris geek but am just getting into macs --- im not comfortable enough with mac failures enough to store my crap on a mac fs. Whatever your skillset is, dont use the most optimal file system on paper, use what you know, even if it is NTFS (which imo is very reliable).

Step 4. Copy your shit over using your knowledge of your data organization and native OS commands or tools.

Step 5. Run a checksum on your important stuff and store the hashes to verify everything is fine over time. Odd situations occur when backing up data. I have run into cases where i didnt realize the files i was about to backup were bad/corrupt until i saw the good copy on a backup drive i was about to incrementally overwrite.

Step 6. Store the shit somewhere else if you can reasonably do this and feel confident in the security of your data. If you have to start encrypting your crap, you add some more complexity that can effect the reliability of your restoration, but again if you proceduralize and keep up on it you will be fine.

Backup design and integrity is hard work and serious business when dealing with large volumes. It reminds me of the Seinfeld episode where he goes to the car rental place and they dont have his car and he goes into his "Anyone can take the ticket" diatribe. Anyone can back up their data. But can you get it back? I am not an expert in this area and dont pretend to be, i am just a seasoned IT administrator who has performed alot of backups in my day and have managed to keep most of my data safe over the years.

Use checksums (MD5 etc.) against bit errors (0)

Anonymous Coward | about 2 years ago | (#40944183)

When moving really large amounts of data it is not unlikely to see an incidental bit error, especially when new hardware is involved. Data on disk is generally safe because of ECC. But pumping that much data through RAM, associated controllers and all the non-ECC protected buses on a mainboard will increase the chance of experiencing bitrot because of tolerance or thermal issues. At some point it is just a matter of statistics.

Why USB? (0)

Anonymous Coward | about 2 years ago | (#40944201)

I really have to ask why USB? Your looking at a top speed of 40MB/s on USB 2.0, more commonly you get 20 to 30MB/s.

Either a cable designed to hot swap drives, or a drive bay would work a lot better if a NAS is out of the question. The cable solution involves a SATA cable which has both the data and power lines bundled together on the HDD side. Reduces the risk of damaging the HDD when pulling the plug or plugging another in. A drive bay would cost a bit more, but is significantly less risky and much easier to use.

Even putting the destination drives in another machine and connecting the machines with a crossover cable will be much faster than USB 2.0 speeds. I really wouldn't suggest going through a router unless you take care to keep the router cool (a desk fan should be sufficient).

That leads me to my last point. When transferring that much data, the machine(s) are going to get much hotter than they do even in intense computational work. Your going to want to pop the side of the case off and set up a good fan to pull heat away from it. Easiest way to kill a HDD is to let it run hot for long periods of time.

Keep it simple (2)

jampola (1994582) | about 2 years ago | (#40944225)

# rsync -avz /this /that. Split your directories corresponding to the sizes of your drives. If on Linux, run smartctl -H /dev/sdX to check your disk health and if possible, take the HDD's our of their usb enclosures and connect them directly to SATA for faster xfer speeds. These drives will 9/10 mount just like a normal drive since usually they are just a normal drive housed in an enclosure.

Good luck :)
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>