×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: Best *nix Distro For a Dynamic File Server?

timothy posted about a year and a half ago | from the when-birdwatching-goes-too-far dept.

Data Storage 234

An anonymous reader (citing "silly workplace security policies") writes "I'm in charge of developing for my workplace a particular sort of 'dynamic' file server for handling scientific data. We have all the hardware in place, but can't figure out what *nix distro would work best. Can the great minds at Slashdot pool their resources and divine an answer? Some background: We have sensor units scattered across a couple square miles of undeveloped land, which each collect ~500 gigs of data per 24h. When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds. We need to present the contents of these drives as one unified tree (shared out via Samba), and the best way to go about that appears to be a unioning file system. There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive. We've been looking around, but are having trouble finding info for this seemingly simple situation. Can we get FreeNAS to do this? Do we try Greyhole? Is there a distro that can run unionfs/aufs/mhddfs out-of-the-box without messing with manual recompiling? Why is documentation for *nix always so bad?""

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

234 comments

Do you need a unified filesystem at all? (2)

TheSunborn (68004) | about a year and a half ago | (#41123311)

Why do you need a unified filesystem. Can't you just share /myShareOnTheServer and then mount each disk to a subfolder in /myShareOnTheServer (such as /myShareOnTheServer/disk1).

Re:Do you need a unified filesystem at all? (1)

TheSunborn (68004) | about a year and a half ago | (#41123331)

Seems someone ate the rest of my text. But if you do it this way, then the windows computers will se a single system with a folder for each harddisk. The only reason this might cause problems is if you really need files from different harddisks to appear as if they are in the same folder.
 

Re:Do you need a unified filesystem at all? (4, Insightful)

Anrego (830717) | about a year and a half ago | (#41123377)

I have to assume they are using some clunky windows analysis program or something that lacks the ability to accept multiple directories or something.

Either way, the aufs (or whatever they use) bit seems to be the least of their worries. They bought an installed a bunch of gear and are just now looking into what to do with it, and they've decided they want it to boot in 30 seconds (protip: high end gear can take this long just doing it's self checks, which is a good thing! Fast booting and file server don't go well together).

Probably a summer student or the office "tech guy" running things. They'd be better off bringing in someone qualified.

Re:Do you need a unified filesystem at all? (1)

Anonymous Coward | about a year and a half ago | (#41123581)

OP here:

No, it's mainly for user convenience. People will be looking at the share manually and it's easy to lose track of what you're doing when you have a dozen folder views open with the same names.

As for the gear, it was sourced from a defunct project with similar goals. The board was specifically bought for booting fast and is configured to get through the bios in under 5 seconds (you can disable a lot when you don't need raid and there's only one persistent drive). We have many WinXP and linux systems that go from cold-to-desktop in under 30 seconds, so this isn't really a big concern, I just threw it in there to weed out suggestions involving massive ubuntu installs that take 2+ minutes to load a bunch of default shit we don't need.

Re:Do you need a unified filesystem at all? (2)

Knuckles (8964) | about a year and a half ago | (#41123697)

Massive Ubuntu installs taking 2 minutes to boot? Whatever its faults, Ubuntu was the one distro most focused on boot time for a long while, and even a standard desktop install goes from BIOS hand-off to login screen in 10 - 12 secs with a standard HD.

Re:Do you need a unified filesystem at all? (3, Interesting)

spire3661 (1038968) | about a year and a half ago | (#41123717)

After looking through your proposal you need 2 pieces. You need a WORKSTATION to accept the drives as well as cleanse( you are going to verify the data as non-malicious right?) and catalog the data and be able to shut down and boot up on command. Then you need a SERVER that hosts the data to be served. Thinking you are going to serve directly from the hotswaps is a bad idea.

Re:Do you need a unified filesystem at all? (4, Informative)

Anonymous Coward | about a year and a half ago | (#41123969)

FreeNAS is based on FreeBSD, and boot speed (no matter what the OS) is based entirely on the hard drive speed + CPU speed + 'automagic' configuration.

FreeBSD boots pretty fast, but you need to turn off things like the bootloader menu delay, and set fixed IP addresses. Same on Linux, but Linux tends to be sloppy about starting up services.

In either case you can usually just turn anything you don't need off, and just turn on what you do need.

FreeBSD's ZFS is better than anything you can setup on Linux, but unless the box has a lot of RAM you're not going to get the expected performance.

Most of the NAS devices you see for sale run FreeNAS if they're based on x86-64 CPU's or Linux if they're not (PPC/MIPS/ARM) but they're not particuarly great pieces of hardware, you pretty much end up with something stupid silly like:
OS -> UFS/EXT2/EXT3 -> Samba share
for Windows clients, but you can also do this on FreeBSD/FreeNAS (ZFS is terrible under Linux-FUSE)
FREEBSD->ZFS (using all drives, even remote drives) -> iSCSI
iSCSI is something that you must have GigE/10GB Fiber for, and decent processing power. Most of the systems you see (including DELL) that do iSCSI are woefully underpowered for a small server, or extremely overkill (enterprise)

Windows however supports iSCSI out of the box. So you can do something theoretically stupid like this:
FreeBSD -> ZFS ->iSCSI ->Windows box accesses iSCSI and shares it with other Windows machines.

So it depends what you really want to do. From your description, it sounds like what you really want to do is hotplug a bunch of drives into a system, that system is "union"'d by filesystem mounts (nobody says you have to mount everything to root) and the share them under that samba.

But another possibility, not clearly indicated is that maybe the drives have overlapping file systems that you want to see as one (eg same directory structure, different file names) this is more complicated to deal with, but I'd probably go with not trying to share off the hotswapped drives and instead RSYNC all the drives to another filesystem and share that instead.

Re:Do you need a unified filesystem at all? (4, Informative)

Anonymous Coward | about a year and a half ago | (#41123521)

OP here:

I left out a lot of information from the summary in order to keep the word count down. Each disk has an almost identical directory structure, and so we want to merge all the drives in such a way that when someone looks at "foo/bar/baz/" they see all the 'baz' files from all the disks in the same place. While the folders will have identical names the files will be globally unique, so there's no concern about namespace collisions at the bottom levels.

Wow (5, Insightful)

Anonymous Coward | about a year and a half ago | (#41123315)

I know I’m not going to be the first person to ask this, but if I understand it the plan here was:

1 - buy lots of hardware and install
2 - think about what kind of software it will run and how it will be used

I think you got your methodology swapped around man!

Why is documentation for *nix always so bad?

You are looking for information that your average user won’t care about. Things like boot time don’t get documented because your average user isn’t going to have some arbitrary requirement to have their _file server_ boot in 30 seconds. That’s a very weird use case. Normally you reboot a file server infrequently (unless you want to be swapping disks out constantly..). I’m assuming this requirement is because you plan on doing a full shutdown to insert your drives... in which case you really should be looking into hotswap

Also mandatory: you sound horribly underqualified for the job you are doing. Fess up before you waste even more (I assume grant) money and bring in someone that knows what the hell they are doing.

Re:Wow (4, Insightful)

LodCrappo (705968) | about a year and a half ago | (#41123405)

I know I’m not going to be the first person to ask this, but if I understand it the plan here was:

1 - buy lots of hardware and install
2 - think about what kind of software it will run and how it will be used

I think you got your methodology swapped around man!

Why is documentation for *nix always so bad?

You are looking for information that your average user won’t care about. Things like boot time don’t get documented because your average user isn’t going to have some arbitrary requirement to have their _file server_ boot in 30 seconds. That’s a very weird use case. Normally you reboot a file server infrequently (unless you want to be swapping disks out constantly..). I’m assuming this requirement is because you plan on doing a full shutdown to insert your drives... in which case you really should be looking into hotswap

Also mandatory: you sound horribly underqualified for the job you are doing. Fess up before you waste even more (I assume grant) money and bring in someone that knows what the hell they are doing.

Wow.. I completely agree with an AC.

The OP here is in way over his head and the entire project seems to have been planned by idiots.

This will end badly.

Re:Wow (2, Interesting)

Anonymous Coward | about a year and a half ago | (#41123471)

He still hasn't told us what filesystem is on these drives they're pulling out of the field. That's the most important detail...........

Re:Wow (4, Informative)

mschaffer (97223) | about a year and a half ago | (#41123477)

[...]

Wow.. I completely agree with an AC.

The OP here is in way over his head and the entire project seems to have been planned by idiots.

This will end badly.

Like that's the first time. However, we don't know all of the circumstances and I wouldn't be surprised that the OP had this dropped into his/her lap.

"Wow I completely agree with an AC" (0)

Anonymous Coward | about a year and a half ago | (#41124197)

Says somebody hiding behind his alias "LodCrappo" and trolling the AC's for a flame war.

Re:Wow (0, Insightful)

Anonymous Coward | about a year and a half ago | (#41123433)

Agreed.

Also: the submitter asks about a "distro". A distro is a pre-packaged solution for a broad group of users. He has to build and test his own solution.

If you know what you are doing, it does not matter which distro you are using.

To the boss of the submitter: fire him and hire somebody who has a clue.

Re:Wow (0)

Anonymous Coward | about a year and a half ago | (#41123687)

Op here:

> If you know what you are doing, it does not matter which distro you are using.

No, it shouldn't, but there's a lot to be said about the value of your time. I know I can spend a week futzing with basically anything and get it to work, but I'm hoping to save time by having someone point out a couple good candidates to start with.

> A distro is a pre-packaged solution for a broad group of users

To an extent, sure, but "distro" also covers things like Arch, which is pretty damn close to "build and test our own solution"

Re:Wow (1)

LVSlushdat (854194) | about a year and a half ago | (#41123775)

Jesus Christ, WHAT AN ASSHOLE!! Telling the boss to fire him, with jobs as scarce as they are..... For all YOU know, AC-Asshole, the boss might be the one TELLING him the hardware he has to use.. LOVE AC's hiding behind anonymity so they can spew their hatred.. I know, I know, replying to AC's... just encourges em... So shoot me, I HATE people like this AC...

Re:Wow (4, Informative)

arth1 (260657) | about a year and a half ago | (#41123565)

Yeah. Before we can answer this person's questions, we need to know why he has:
1: Decided to cold-plug drives and reboot
2: Decided to use Linux
3 ... to serve to Windows

Better yet, tell us what you need to do - not how you think you should do it. Someone obviously needs to read data that's collected, but all the steps in between should be based on how it can be collected and how it can be accessed by the end users. Tell us those parameters first, and don't throw around words like Linux, samba, booting, which may or may not be a solution. Don't jump the gun.

As for documentation, no other OSes are as well-documented as Linux/Unix/BSD.
Not only are there huge amounts of man pages, but there are so many web sites and books that it's easy to find answers.

Unless, of course, you have questions like how fast a distro will boot, and don't have enough understanding to see that that that depends on your choice of hardware, firmware and software.
I have a nice Red Hat Enterprise Linux system here. It takes around 15 minutes to boot. And I have another Red Hat Enterprise Linux system here. It boots in less than a minute. The first one is -- by far -- the better system, but enumerating a plaided RAID of 18 drives takes time. That's also irrelevant, because it has an expected shutdown/startup frequency of once per two years.

Re:Wow (2)

Crudely_Indecent (739699) | about a year and a half ago | (#41123773)

I would further enhance the question by asking: What the hell are you collecting that each sensor stores 500GB in 24 hours - photos? Seriously, these aren't sensors - they're drive fillers.

Seriously, if "sensor units scattered across a couple square miles" means 10 sensors - that's 5 Terabytes to initialize and mount in 30 seconds. I suspect that the number is greater than 10 sensors because the rest of the requirements are so ridiculous.

And why the sneakernet? If they're in only a couple of square miles - why not set up a mesh network and deliver real-time data without the need for daily collection? 30 seconds to boot probably wouldn't be a requirement if the system is only booted once.

All of the questions about why this person is even involved are probably moot. He'll be outed as an idiot in short order.

Re:Wow (3, Insightful)

plover (150551) | about a year and a half ago | (#41124229)

While I'm curious as to the application, it's his data rates that ultimately count, not our opinions of if he's doing it right.

500GB may sound like a lot to us, but the LHC spews something like that with every second of operation. They have a large cluster of machines whose job it is to pre-filter that data and only record the "interesting" collisions. Perhaps the OP would consider pre-filtering as much as possible before dumping it into this server as well. If this is for a limited 12 week research project, maybe they already have all the storage they need. Or maybe they are doing the filtering on the server before committing the data to long term storage. They just dump the 500GB of raw data into a landing zone on the server, filter it, and keep only the relevant few GB.

Regarding mesh networking, they'd have to build a large custom network of expensive radios to carry that volume of data. Given the distances mentioned, it's not like they could build it out of 802.11 radios. Terrain might also be an issue, with mountains and valleys to contend with, and sensors placed near to access roads. That kind of expense would not make sense for a temporary installation.

I don't think he's an idiot. I just think he couldn't give us enough details about what he's working on.

Re:Wow (3, Interesting)

Anonymous Coward | about a year and a half ago | (#41123795)

Op here:

1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).

2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?

3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.

> Better yet, tell us what you need to do

- Take a server that is off, and boot it remotely (via ethernet magic packet)
- Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives.
- Share out the unioned virtual tree in such a way that it it's easily accessible to mac/win clients
- Do all this in under 30 seconds

I don't know why people keep focusing on the "under 30 seconds" part, it's not that hard to get linux to do this.....

> huge amounts of man pages

quantity != quality

Re:Wow (3, Informative)

msobkow (48369) | about a year and a half ago | (#41123861)

The "under 30 seconds part" is not as easy as you think.

You're mounting new drives -- that means Linux will probably want to fsck them, which with such volume, is going to take way more than 30 seconds.

Re:Wow (0)

Anonymous Coward | about a year and a half ago | (#41123955)

the nineties called and they want their inefficient filesystems back.

Re:Wow (0)

Anonymous Coward | about a year and a half ago | (#41124043)

Not to mention, the BIOS may want to do its checks on the system and the disk controller may too, depending on the setup. If you know a regular schedule, I'd set timers so the real time clock wakes the system before you need it. But as it stands, there is not enough detail as to what you are doing and what hardware you have, which makes it next to impossible to give either generic descriptions ("any system will work if you cut back on services" or "none will work") or just complain about how stupid the project seems.

Re:Wow (1)

Anonymous Coward | about a year and a half ago | (#41123991)

Better question: Is this that DEA license plate camera project out in the desert somewhere? That would definitely fit all the mentioned criteria and be JUST bandwidth intensive enough to not make the mesh network idea feasible.

Re:Wow (4, Informative)

Anonymous Coward | about a year and a half ago | (#41123653)

Op here:

The gear was sourced from a similar prior project that's no longer needed, and we don't have the budget/authorization to buy more stuff. Considering that the requirements are pretty basic, we weren't expecting to have a serious issue picking the right distro.

>You are looking for information that your average user won’t care about.

Granted, but I thought one of the strengths of *nix was that it's not confined to computer illiterates. Some geeks somewhere should know which distros can be stripped down to bare essentials with a minimum of fuss.

As for the 30 seconds thing, there's a lot side info I left out of the summary. This project is quirky for a number of reasons, and one of them being that the server itself spends a lot of time off, and needs to be booted (and halted) on demand. (Don't ask, it's a looooooong story).

Re:Wow (1)

Coz (178857) | about a year and a half ago | (#41123685)

Consider setting up several servers and GlusterFS, auto-replicating the data when it's mounted and presenting a infield shared file system. You can run CentOS or RHEL6 for the OS, and the FS will take care of data persistence, replication, and presenting a CIFS or NFS view.

Re:Wow (1)

Nutria (679911) | about a year and a half ago | (#41123699)

Some geeks somewhere should know which distros can be stripped down to bare essentials with a minimum of fuss.

Debian (The Universal OS)
RHEL/CentOS/Scientific
Gentoo
Slackware

Re:Wow (1)

quist (72831) | about a year and a half ago | (#41123917)

...know which distros can be stripped down ... with a minimum of fuss.[?]

Debian (The Universal OS)
RHEL/CentOS/Scientific [...]

And don't forget to compile a bespoke, static kernel.

OpenAFS+Samba (1)

Zombie Ryushu (803103) | about a year and a half ago | (#41123317)

Use OpenAFS with Samba's modules. Distribution doesn't matter.

Re:OpenAFS+Samba (1)

wytcld (179112) | about a year and a half ago | (#41123417)

Looking at the OpenAFS docs, they're copyright 2000. Has the project gone stale since then?

Re:OpenAFS+Samba (1)

Monkius (3888) | about a year and a half ago | (#41123487)

OpenAFS is not dead. IIRC, any Samba AFS integration probably is. This doesn't sound like a job for AFS, however.

Mechanical Hard Drive (3, Insightful)

Anonymous Coward | about a year and a half ago | (#41123329)

Why does it have to be a mechanical hard drive? Why not use an SSD for the boot drive?

Re:Mechanical Hard Drive (2)

mspohr (589790) | about a year and a half ago | (#41124251)

It sounds like they inherited a bunch of hardware and don't have a budget for more stuff.
So... make do with what you have.

Is this a joke? (0)

Anonymous Coward | about a year and a half ago | (#41123335)

Is this a joke? A troll?

Any Linux distribution will boot in less than 30 seconds if you turn off all the services you don't need, which is probably most of them in your case. Any modern distribution will have packages for aufs ready to install. Any modern distribution will tell you, via D-Bus, when a removable disk is plugged in so you can run whatever program you want to handle it e.g. a script that mounts it at the right place in your tree.

Re:Is this a joke? (2)

marcosdumay (620877) | about a year and a half ago | (#41123541)

Any Linux distribution will boot in less than 30 seconds if you turn off all the services you don't need... will have packages for aufs ready to install... will tell you, via D-Bus, when a removable disk is plugged...

You know, I was on the "it doesn't matter" camp untill I readed your post. Now I just changed my mind.

Yes, any distro will do it. You'll have the same (lack of) trouble configuring the service on any distro. So, choose a distro that is easy to get into bare bones and to upgrade, because those are the two main differentiators here.

I sugest Slackware. Probably somebody else knows about somethig simpler, but not so simple that it will end up giving you more work.

Re:Is this a joke? (1)

fearlezz (594718) | about a year and a half ago | (#41123543)

Any Linux distribution will boot in less than 30 seconds if [..]

Linux does. Too bad it takes the bios and raid array of a server up to minutes to do their checks...

Didn't you post this already to hardforum? (-1)

Anonymous Coward | about a year and a half ago | (#41123337)

Why didn't you take the advice from there?

I would automate the copying (4, Informative)

guruevi (827432) | about a year and a half ago | (#41123345)

Really, singular hard drives are notoriously bad at keeping data around for long. I would make sure you have a copy of everything. So make a file server with RAIDZ2 or RAID6 and script the copying of these hard drives onto a system that has redundancy and is backed up as well.

How many times I have seen scientist come out with their 500GB portable hard drives and they are unreadable... way too much. If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year. Most of our drives (500GB 2.5" portable drives) last a few months, once they have processed about 6TB of data full-time they are pretty much guaranteed to fail.

Re:I would automate the copying (1)

Anonymous Coward | about a year and a half ago | (#41123833)

OP here:

We don't need persistence, this data is pretty ephemeral and there's little point in backing it up. If we lose the data from one sensor one day, it's no big thing.

The analysis generated from this WILL be backed up though, but that's a different system that's already covered.

>If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year

These are fullsize desktop drives for exactly that reason.

Re:I would automate the copying (1)

Amouth (879122) | about a year and a half ago | (#41124243)

>If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year

These are fullsize desktop drives for exactly that reason.

You realize that being "full size desktop drives" makes zero difference for write duty cycle on mechanical drives?

As long as your not on the bleeding edge of platter density then the manufacturers use the same process for all platters both large and small. For lower capacity larger drives they just reduce the number of platters in the drive.

Re:I would automate the copying (0)

Anonymous Coward | about a year and a half ago | (#41124077)

Most of our drives (500GB 2.5" portable drives) last a few months, once they have processed about 6TB of data full-time they are pretty much guaranteed to fail.

Jesus, and I thought SSDs were bad at 30k writes per site. By my reckoning, you're getting on average 12 writes per site. Are you transporting your drives by booting them down the corridor?

Re:I would automate the copying (1)

Dr_Barnowl (709838) | about a year and a half ago | (#41124413)

It could well be physical shock ; I've changed from spinning disks in a 2.5" caddy to an SSD. Some of the problems I had were to do with cheap-assed caddies with lousy power electronics that would fail. But I had three disks die in about a year ; I changed to SSD and it's been going strong ever since.

These disks were only transported twice a day ; to work, and then home. But they inevitably got dropped sooner sorter or later.

CentOS, its enterprise class (1)

perpenso (1613749) | about a year and a half ago | (#41123395)

CentOS may be your best bet. Its Red Hat Enterprise Linux rebuilt from the Red Hat source code, minus the Red Hat trademark.

Re:CentOS, its enterprise class (3, Informative)

e3m4n (947977) | about a year and a half ago | (#41123437)

Scientific Linux is also a good option for similar reasons. Given its a science grant, they might like the idea that its used at labs like CERN

Re:CentOS, its enterprise class (0)

Anonymous Coward | about a year and a half ago | (#41123439)

What about Scientific Linux?

Re:CentOS, its enterprise class (4, Insightful)

wytcld (179112) | about a year and a half ago | (#41123491)

"Enterprise class" is a marketing slogan. In the real world, all the RH derivatives are pretty good (including Scientific Linux and Fedora as well as CentOS), and all the Debian derivatives are pretty good (including Ubuntu). Gentoo's solid too. "Enterprise class" doesn't mean much. The main thing that characterizes CentOS from Scientific Linux - which is also just a recompile of the RHEL code - is that the CentOS devs have "enterprise class" attitude. Meanwhile, RH's own devs are universally decent, humble people. Those who do less often thing more of themselves.

For a great many uses, Debian's going to be easiest. But it depends on just what you need to run on it, as different distros do better with different packages, short of compiling from source yourself. No idea what the best solution is for the task here, but "CentOS" isn't by itself much of an answer.

Re:CentOS, its enterprise class (3, Insightful)

Anonymous Coward | about a year and a half ago | (#41123649)

"Enterprise class" means that it runs the multi-million dollar crappy closed source software you bought to run on it without the vendor bugging out when you submit a support ticket.

Eventual migration to RHEL (1)

perpenso (1613749) | about a year and a half ago | (#41124393)

The nice thing about CentOS is that if/when you wind up on RHEL (comes with hardware, what you hosting provider is using, etc) the migration will be pretty simple.

Re:CentOS, its enterprise class (1)

NemoinSpace (1118137) | about a year and a half ago | (#41123533)

I disagree. This guy:
  • doesn't like to read man pages
  • wants other people to tell him what buttons to push.

Redhat, with a support contract is for him.

Re:CentOS, its enterprise class (1)

perpenso (1613749) | about a year and a half ago | (#41124355)

I disagree. This guy:

  • doesn't like to read man pages
  • wants other people to tell him what buttons to push.

Redhat, with a support contract is for him.

Well if he starts with CentOS that migration will be pretty simple.

unRaid (1, Informative)

Anonymous Coward | about a year and a half ago | (#41123397)

unRaid FTW, I use this to handle TB's of data and it works fine.

Re:unRaid (0)

Anonymous Coward | about a year and a half ago | (#41123415)

I forgot to mention that it also has redundancy and shows up as a single share!

Here we go again (3, Insightful)

Anonymous Coward | about a year and a half ago | (#41123441)

Another "I don't know how to do my job, but will slag off OSS knowing someone will tell me what to do. Then I can claim to be l337 at work by pretending to know how to do my job".

It's call reverse physiology, don't fall for it! Maybe shitdot will go back to its roots if no one comments in junk like this and the slashvertisments?

What Greyhole isn't (4, Insightful)

NemoinSpace (1118137) | about a year and a half ago | (#41123445)

  • Enterprise-ready: Greyhole targets home users.

Not sure why the 30s boot up requirement is there, so it depends on what you define as "booted" . Spinning up 12 hard drives and making them available through Samba within 30s guarantees your costs will be 10x more than they need to be.
This isn't another example of my tax dollars at work is it?

Re:What Greyhole isn't (1)

nullchar (446050) | about a year and a half ago | (#41123593)

This isn't another example of my tax dollars at work is it?

I hope not! Or my university tuition fees, or really any other spending, even other people's money.

Who cares if the server boots up in 30 seconds or 30 minutes? The OP now has up to 12 500GB drives to either copy off or access over the lan. There's hours of data access or data transfer here.

Questionable (2, Informative)

Anonymous Coward | about a year and a half ago | (#41123467)

Why would you want a file server to boot in 30 secs or less? Ok, lets skip the fs check, the controller checks, the driver checks, hell lets skip everything and boot to a recovery bash shell. Why would you not network these collection devices if they are all within a couple of miles and dump to an always on server?

I really fail to see the advantage of a file server booting in under 30 seconds. Shouldn't you be able to hot swap drives?

This really sounds like a bunch of kids trying to play server admin. My apologies if this is not the case, but given the parameters provided this IS what it sounds like.

You don't need a union file system (1, Interesting)

Anonymous Coward | about a year and a half ago | (#41123493)

There's no reason you need a union filesystem. Just mount the data at an appropriate point in a directory tree. Union file systems are designed to solve a different problem.

What you boot from has nothing to do w/ what you read the data from.

Samba is a really strange choice. Given the data volume I'd expect you to be using a large Linux cluster to process the data for which NFS would be more appropriate. It certainly sounds like microseismic data in which case the processing will benefit from making duplicate copies of the data and mounting read only via NFS so the first available server provides the data. Multiple ethernets are needed to get full benefit from doing that though.

*nix documentation is actually very good. But there is a lot of it, so you tend to have grey hair by the time you've read all of it.

BTW Does the CEO play guitar? I play harmonica.

Nas4free + zfs (1)

Anonymous Coward | about a year and a half ago | (#41123503)

Check out nas4free. It's basically freenas based on newer FreeBSD 9 which has zfs v28. I have been running it in a heavily used production system for 2 months with zero issues. I have 3 raidz2 setups that are shared out via NFs, cifs and afp. This setup is snapshotted every hour and also replicated via zfs send to offsides dr location.

If you go this route invest in a ssd for Zil and one for l2arc if you decide to dedup.

Openfiler (0)

Anonymous Coward | about a year and a half ago | (#41123531)

I run 100TB storage using openfiler. It is particular about hardware but rock solid. Match with many nics it is wicked fast. Also I would spread you IO load across a few machines to increase transfer speeds.

Distributed file system (0)

Anonymous Coward | about a year and a half ago | (#41123555)

Have you considered using a distributed file system such as ceph [ceph.com]?

You will need more drives as the data takes twice the space, but on the other hand you won't need to worry about boot times or scalability anymore.

"Why is documentation for *nix always so bad? (-1, Offtopic)

Zedrick (764028) | about a year and a half ago | (#41123611)

Perhaps because new versions spring up so fast or something?

I'm running a fairly new Mint Cinnamon that I'm not quite used to yet, and after a few hours of trial & error and reading various googled suggestions I failed to install any kind of drivers for my external AU-25. Ok, so I just plugged in my guitar as front mike instead.

But then I wanted to play along with my Megadeth-mp3's. The only really good MP3-player I know on Linux is XMMS, and that one is apparently not very populair anymore, so I can't install it with apt (just something called XMMS2 which apparently can do anything except play MP3's).

Well, nobody can prevent me from building it, right? I've been using Linux since 1994, but after 2 hours of trying to crack "configure: error: *** GLIB >= 1.2.2 not installed - please install first ***" by installing just about every dependency on the entire internet, I gave up and apt-get installed rythmbox.

Which kind of worked fine (though it couldn't sort my mp3's in s sane order, which is why I wanted XMMS in the first place) until I got tired of music and wanted to stop. So, I clicked the X in the upper-right corner... the window dissapeared, but the music continued. Great. Had to ps and kill, and now my beer supply is out of sync with my enthusiam for music.

So yeah, Unix/Linux documentation sucks.

Re:"Why is documentation for *nix always so bad? (3, Insightful)

Knuckles (8964) | about a year and a half ago | (#41123809)

Saying "only good mp3 player" makes no sense unless you specify your criteria. Amarok, Banshee, VLC, Rhythmbox, or smplayer are all capable mp3 players by various criteria and easily found by googling for "linux mp3 player". If you use Ubuntu, searching for mp3 player in Software Center finds a plethora of good players. Googling "list of linux audio software" easily finds other things besides just mp3 players: maybe something like Audacity satisfies your requirements better. Search for "mp3" on xmms2.org finds the answer in the first link - your xmms2 install needs have the MAD library, maybe your distro does not install that.

Does not seem like the problem is with bad docs.

Re:"Why is documentation for *nix always so bad? (1)

b4dc0d3r (1268512) | about a year and a half ago | (#41124305)

OP rather clearly stated criteria for "good mp3 player". Here it is, since you missed it the first time: "sorts music like XMMS and since I'm used to XMMS does most everything in a similar fashion as well."

Re:"Why is documentation for *nix always so bad? (1)

Knuckles (8964) | about a year and a half ago | (#41124437)

First of all, xmms seems to be really outdated, and if his distro does not include it and he cannot compile it himself for whatever reason (despit copious info for how to install dev packages and compile for any distro), I fail to see how this is a failure of the un*x docs specifically. Current documentation for WinPlay3 is also rather scarce.

It's also hard to believe that there is not a single mp3 player out there that sorts music like xmms, whatever this is. He stated he tried Rhythmbox and was not too happy, but the allegedly so deficient un*x docs readily list many more, while the Ubuntu software center lets you try them out with one click.

Anyway, the logical conclusion seems to be to use xmms2, docs for which appear in the third google hit (for me) when searching for xmms. Which in turn, as I wrote, would take him to the solution in the first hit when searching for mp3 at xmms2.org. Again, how is this evidence for generally bad docs?

Re:"Why is documentation for *nix always so bad? (1)

Knuckles (8964) | about a year and a half ago | (#41124483)

Also, "my criteria is for x to work like ancient app y" is not so workable. Sounds like Microsoft's convoluted standard's document for Office Open XML regarding backward compatibility. "You have to emulate bug x of Word 2, but we can't tell you exactly how that worked". Someone might have helped him if he had given specific requirements.

FreeBSD (0)

Anonymous Coward | about a year and a half ago | (#41123619)

I think there is no single good answer to this and most people will give their personal preference. I'm recommending FreeBSD because it can easily be tweaked and it also has a very good handbook http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ (this is to answer your question about lack of documentation). FreeNAS actually is built on top of FreeBSD.

Partly easy, partly not... (2)

fuzzyfuzzyfungus (1223518) | about a year and a half ago | (#41123625)

Booting in under 30 seconds is going to be a bit of a trick for anything servery. Even just putzing around in the BIOS can eat up most of that time(potentially some minutes if there is a lot of memory being self-tested, or if the system has a bunch of hairy option ROMs, as the SCSI/SAS/RAID-generally disk controllers commonly found in servers generally do...) If you really want fast, you just need to suck it up and get hot-swappable storage: even SATA supports that(well, some chipsets do, your mileage may vary, talk to your vendor and kidnap the vendor's children to ensure you get a straight answer, no warranty express or implied, etc.) and SAS damn well better, and supports SATA drives. That way, it doesn't matter how long the server takes to boot, you can just swap the disks in and either leave it running or set the BIOS wakeup schedule to have it start booting ten minutes before you expect to need it.

Slightly classier would be using /dev/disk/by-label or by-UUID to assign a unique mountpoint for every drive sled that might come in from the field(ie. allowing you to easily tell which field unit the drive came from).

If the files from each site are assured to have unique names, you could present them in a single mount directory with unionFS; but you probably don't want to find out what happens if every site spits out identically named FOO.log files, and(unless there is a painfully crippled tool somewhere else in the chain) having a directory per mountpoint shouldn't be terribly serious business.

ZFS Filesystem will help (4, Insightful)

Anonymous Coward | about a year and a half ago | (#41123645)

500G in a 24h period sounds like it will be highly compressible data. I would recommend FreeBSD or Ubuntu with ZFS Native Stable installed. ZFS will allow you to create a very nice tree with each folder set to a custom compression level if necessary. (Don't use dedup) You can put one SSD in as a cache drive to accelerate the shared folders speed. I imagine there would be an issue with restoring the data to magnetic while people are trying to read off the SMB share. An SSD cache or SSD ZIL drive for ZFS can help a lot with that.

Some nagging questions though.
How long are you intending on storing this data? How many sensors are collecting data? Because even with 12 drive bay slots, assuming cheap SATA of 3TB a piece. (36TB total storage with no redundancy), lets say 5 sensors, thats 2.5TB a day data collection, and assuming good compression of 3x, 833GB a day. You will fill up that storage in just 43 days.

I think this project needs to be re-thought. Either you need a much bigger storage array, or data needs to be discarded very quickly. If the data will be discarded quickly, then you really need to think about more disk arrays so you can use ZFS to partition the data in such a way that each SMB share can be on its own set of drives so as to not head thrash and interfere with someone else who is "discarding" or reading data.

Re:ZFS Filesystem will help (1)

burni2 (1643061) | about a year and a half ago | (#41123679)

Agree, please someone may mod this up

I would add: LTO-5 drive, for backup

Pogoplug (1)

Tsiangkun (746511) | about a year and a half ago | (#41123665)

You are in over your head. Buy a pogoplug and some usb2 hubs. Connect your drives to the hubs and they appear as unified file system on your clients. Or, if you need better performance accessing the data, call an expert.

You could also just use symbolic links. (3, Insightful)

DamnStupidElf (649844) | about a year and a half ago | (#41123695)

Unless you're talking about millions of individual files on each drive it should be relatively quick to mount each hard drive and set up symbolic links in one shared directory to the files on each of the mounted drives. Just make sure Samba has "follow symlinks" set to yes and the Windows clients will see just see normal files in the shared directory.

This whole project is a joke/fishing trip (1)

Anonymous Coward | about a year and a half ago | (#41123701)

this whole project is a joke, either:
A) you don't know what your looking for(reasonable but silly)
or B) you don't know how to collect data of what your looking for (pointless).

Process your sensor data at the sensor. There is no reason that anyone needs to take more than 10-50 mb of data per sensor device per session.

If it's a signal detection, use a smart filter to capture the event, or a FFT to capture the frequencies.
If it's a measurement, use a buffered slope detection and only capture the change.

If you do need to move 500gb/per day/per sensor. Just install fiber to the sensors and stream it back to a localized collection server.
This saves countless sneaker net headach's, compression issues and the sort. For the collection server Buy It!, there are great products out there that can
take your 500gb of poorly compressed sensor data and make it 500mb of indexed intelligence (totally avoiding the obvious buzz word use here sorry /.).

Silly workplace security policies (1)

cowboy76Spain (815442) | about a year and a half ago | (#41123725)

Aka: I do not want the insecurity of losing my workplace if my boss happens to learn in slashdot how clueless I am.

Seriously... could you send us the resumé that you sent to get that job?

Arch Linux (0)

Anonymous Coward | about a year and a half ago | (#41123755)

Arch, with conservative repo selection, is the best server OS I've ever used.

Use Linux and Call it good (3, Interesting)

adosch (1397357) | about a year and a half ago | (#41123761)

Why is documentation for *nix always so bad?""

For starters, I'm really tired of this /. *NIX is-too-hard ranting all the time on 'Ask Slashdot' posts. Don't be a n00b douche; if you don't get it, then spend some time and get it. Don't blame the documentation; dig in and figure out something for yourself for once. Sometimes you Nintendo-and-Mt-Dew generation make me want to throw up.

As for your solution, do-not go with some installable appliance-type distro like FreeNAS; yes it's *BSD under the hood, but you're at the mercy of what that 'focused' distro is goign to provide for you. Case in point: since you're undecided, go with a full-blown distro so you have some flexibility to grow and augment the mission and purpose of this server you're hosting data on.

Since you're clearly a n00b since it's coming to picking out a *NIX solution, go with anything Linux at this point, and set up the NAS services yourself (e.g. Samba/SMB, NFS, etc.) In turn, you'll be able to go to get better community support helping you out, you'll have more flexible OS configuration and growth, and you'll probably learn something to boot.

Also, you don't need to do union filesystem. Simple udev rules and auto mounting them under your top-level structure you're sharing out with your NAS services will do you just fine.

Re:Use Linux and Call it good (-1)

Anonymous Coward | about a year and a half ago | (#41123819)

You like to suck mens' dicks, right? You like taking it in the ass? I thought so. You sound like a total fag.

Re:Use Linux and Call it good (0)

Anonymous Coward | about a year and a half ago | (#41124013)

Dude. I think who sounds like a "fag" in any case would be....yes, you.

Re:Use Linux and Call it good (0)

Anonymous Coward | about a year and a half ago | (#41124191)

I used to think linux documentation was bad, but then I realized why: they assume you actually know something because they are written for a general audience. I learned linux through 3 major components. The first was experimentation (what if I try changing just this one setting?), using the man pages in conjunction with googling unknown terms (WTH do they mean by an 'ordered' journal?) and asking dedicated communities (linuxquestions.org for general things, distro site for specific distro questions and program websites for specific programs).

Is this a troll? (0)

Anonymous Coward | about a year and a half ago | (#41123765)

Firstly, the troll bit: "Why is documentation for *nix always so bad?" Scuse me? Uni systems have the BEST documentation of ANY current OS's. Why do you say the documentation is poor?
Secondly: If you have to ask slashdot for this you should not be doing this. Get someone who knows what they are doing to do it for you.

Hadoop? (0)

Anonymous Coward | about a year and a half ago | (#41123785)

Have you read about Hadoop? I'm not altogether sure it fits what you're doing precisely, but depending on how the data will be used and your fault tolerance characteristics, it might be a good fit.

waaaay over head (4, Insightful)

itzdandy (183397) | about a year and a half ago | (#41123807)

What is the point of 30 second boot on a file server? If this is on the list of 'requirements', then the 'plan' is 1/4 baked. 1/2 baked for buying hardware without a plan, then 1/2 again for not having a clue.

unioning filesystem? what is the use scenario? how about automounting the drives on hot-plug and sharing the /mnt directory?

Now, 500GB/day in 12 drive sleds....so 6TB a day? do the workers get a fresh drive each day or is the data only available for a few hours before it gets sent back out or are they rotated? I suspect that mounting these drives for sharing really isnt what is necessary, more like pull contents to 'local' storage. Then, why talk about unioning at all, just put the contents of each drive in a separate folder.

Is the data 100% new each day? Are you really storing 6TB a day from a sensor network? 120TB+ a month?

Are you really transporting 500GB of data by hand to local storage and expecting the disks to last? reading or writing 500GB isn't a problem, but constant power cycling and then physically moving/shaking the drives around each day to transport is going to put the MTBF of these drives in months not years.

dumb

Re:waaaay over head (1)

gweihir (88907) | about a year and a half ago | (#41124005)

I agree. "30 seconds boot time" is a very special and were un-server-like requirement. It is hard to achieve and typically not needed. Hence it points to a messed-up requirements analysis (or some stupid manager/professor having put that in there without any clue what it implies). This requirement alone may break everything else or make it massively less capable of doing its job.

What I don't get is why people without advanced feel they have any business setting up very special and advances system configurations. IT is hard, requires real knowledge and skills (you cant talk it over with the system, it will not work if you mess up) and anything non-standard makes it that much harder and hence requires very good justification.

Re:waaaay over head (0)

Anonymous Coward | about a year and a half ago | (#41124309)

They generally get stuck with it. Most people would rather fight like hell with something they don't understand and risk losing their job when it all goes wrong than tell their boss they aren't qualified do it and definitely lose their job. Doesn't make them bad people, it's the boss who should have more of a clue about what he's asking and what skillset / person he needs to use to get it done. Having worked in capability requirements for some years, 30 seconds definitely sounds like someone pulled it out of their ass. Why not 35? Or 25? The OP says in another reply that it's a quirky project and the servers only get turned on for short periods. Still 30 seconds sounds pulled-out-of-the-ass. It's a reasonable target to achieve on a very simple desktop or laptop with a single drive, no funky boot ROMs etc. but not with that much storage. Does it count from power on or from when the BIOS hands off to the operating system? If the former, how much time do you actually have to boot once the BIOS has finished its business? Fundamentally you may be attempting something impossible, but why would a 5 minute (or 10 minute) boot time not be sufficient? You only have to do it once a day, right, so put all the drives in, switch on and go for a coffee. Or is there some actual requirement in terms of data turnaround - you've got so long to bring them in from the field, get the data off and get them out again - and this has somehow been turned into a 30 second boot requirement. Anyhow, without the ability to either change the boot time requirement or buy at least some new hardware I think the task may be impossible. Without understanding more details of the *whole* project (is it classified or something, you seem reluctant to give details) I wouldn't like to suggest any particular solution. Unless you have funds to buy new hard drives as the initial batch fail I'd be nervous about transporting all that data by hard drive each day - and if you have funds to replace the hard drives you may well find it better value for money to think through your requirements properly and spend your money on a different set of hardware.

Unifed filesystem is a crutch (1)

gweihir (88907) | about a year and a half ago | (#41123859)

Use systems of symbolic links.

Also, why "30 seconds boot time"? This strikes me as a bizarre and unnecessary requirement. Are you sure you have done careful requirements analysis and engineering?

As to the "bad" documentation: Many things are similar or the same on many *nix systems. This is not Windows where MS feels the need to change everything every few years. On *nix you are reasonably expected to know.

Just use "mount -o bind" (0)

Anonymous Coward | about a year and a half ago | (#41123879)

It seems to me that the easiest way is to mount all of the sensor HDDs to a single directory (using ordinary "mount") on the OS and then share that directory over the network.

Even easier would be to write a shell script to handle the mounting & sharing so that all you have to do is connect the drives and then execute that script.

cognitive dissonance... (1)

Anonymous Coward | about a year and a half ago | (#41123907)

"When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds."

"There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive."

So... it takes minutes to hours to get your drives to the server, but then suddenly it's an emergency to get the server booted. That makes no sense to me. Please explain.

Suggest you change your approach! (0)

Anonymous Coward | about a year and a half ago | (#41123961)

You say "a couple of square miles of undeveloped land". You also seems to say that you have around 12 sensor stations/ portable hard drives that are going to be swapped in and out each day. For reasons stated by others this is asking for loosing data. I would think you can't afford to do that.

Why not set up a wi-fi network using wifi links such as ubiquiti nanostation. You can easily cover a a couple of square miles with that sort of equipment; unless we are up in some mountains or in terrain where you are not even close to obtaining line of sight between stations. You local hardware store might be able to help with aluminum mast tubes. Aim to have the nanostations as far away as possible from any nearby objects including trees.

Then use rsync https://en.wikipedia.org/wiki/Rsync to continuously copy your valuable sensor data to your central location equipped with a safe and daily backed up RAID system. With 6 TB of data per day you are in for some data management issues unless you can somehow weed out irrelevant data on the go. And another plus is that you can work on new sensor data all day.

Please do my work for me (1)

Anonymous Coward | about a year and a half ago | (#41123973)

Because I suck at it and I'm too lazy to learn how to do it myself.

external as an option (0)

Anonymous Coward | about a year and a half ago | (#41124023)

Real servers, even file servers, don't get booted often. And when they do, they should take as much time as they need to check the things they need to check.

Suggest you consider a hot-swap option, such as external drives on USB3.

Mount the external drives into some folder using some kind of naming scheme that makes sense. And/Or copy the data into a nice redundant file system, like RAID6 or RAIDZ.

You will want to speak with a professional storage provider about this, as you are in the special needs category.

OP doesn't have a clue (0)

Anonymous Coward | about a year and a half ago | (#41124039)

But he think he has. Disastrous.

Hey. Could we get a bit of a better screening for what Ask Slashdot stories get to FP??? This sucks big time.

Blah (0)

Anonymous Coward | about a year and a half ago | (#41124049)

As many have noted, your system is junk if it boots in 30 seconds as server grade hardware typically contain a series of initilization steps and validations that blow away 30 seconds.Initializing 12 disks will take equally as long on a cold boot. Anyhow, stop cold booting and learn to use mount.

The unified file system is rather generically phrased and doesn't describe the task adequately enough. Do the files on the disks contain static names? If so, symlink farm from your mount points.

If the files on the hard disks contain random giberrish for names then simply index the files using a scripting language and create those symlinks. There are a number of ways to do this without actually having to execute the script. The less interesting method is to create a cron job that would clean up stale links and create new ones. Slightly more interesting would be to use the hot plug facility to create the index. Magically, the latter method which depends on hot plug would work well with actually avoiding the cold boot altogether.

I'm afraid this problem wouldn't even take an afternoon to solve.

Typically, the "bad documentation" experience is more accurately defined as "I don't know what I'm looking for." Really, once I have a few key words for a subject it's rather easy to find documentation for a particular area.

OP here (5, Informative)

Anonymous Coward | about a year and a half ago | (#41124155)

Ok, lots of folks asking similar questions. In order to keep the submission word count down I left out a lot of info. I *thought* most of it would be obvious, but I guess not.

Notes, in no particular order:

- The server was sourced from a now-defunct project with similar setup. It's a custom box with non-normal design. We don't have authorization to buy more hardware. That's not a big deal because what we have already *should* be perfectly fine.

- People keep harping on the 30 seconds thing.
The system is already configured to spin up all the drives simultaneously (yes the PSU can handle that) and get through the bios all in a few seconds. I *know* you can configure most any distro to be fast, the question is how much fuss it takes to get it that way. Honestly I threw that in there as an aside, not thinking this would blow up into some huge debate. All I'm looking for are pointers along the lines of "yeah distro FOO is bloated by default, but it's not as bad as it looks because you can just use the BAR utility to turn most of that off". We have a handful of systems running winXP and linux already that boot in under 30, this isn't a big deal.

- The drives in question have a nearly identical directory structure but with globally-unique file names. We want to merge the trees because it's easier for people to deal with than dozens of identical trees. There are plenty of packages that can do this, I'm looking for a distro where I can set it up with minimal fuss (ie: apt-get or equivalent, as opposed to manual code editing and recompiling).

- The share doesn't have to be samba, it just needs to be easily accessible from windows/macs without installing extra software on them.

- No, I'm not an idiot or derpy student. I'm a sysadmin with 20 years experience (I'm aware that doesn't necessarily prove anything). I'm leaving out a lot of detail because most of it is stupid office bureaucracy and politics I can't do anything about. I'm not one of those people who intentionally makes things more complicated than they need to be as some form of job security. I believe in doing things the "right" way so those who come after me have a chance at keeping the system running. I'm trying to stick to standards when possible, as opposed to creating a monster involving homegrown shell scripts.

Re:OP here (0)

Anonymous Coward | about a year and a half ago | (#41124373)

sudo apt-get install Debian

You're welcome.

Not gonna happen. (5, Insightful)

Anonymous Coward | about a year and a half ago | (#41124453)

You have to be able to identify the disks being mounted. Since these are hot swappable, they will not be automatically identifiable.

Also note, not all disks spin up at the same speed. Disks made for desktops are not reliable either - though they tend to spin up faster. Server disks might take 5 seconds before they are failed. You also seem to have forgotten that even with all disks spun up, each must be read (one at a time) for them to be mounted.

Hot swap disks are not something automatically mounted unless they are known ahead of time - which means they have to have suitable identification.

UnionFS is not what you want. That isn't what it was designed for. Unionfs only has one drive that can be written to - the top one in the list. Operations on the other disks force it to copy it to the top disk for any modifications. Deletes don't happen to any but the top disk.

Some of what you discribe is called an HSM (hierarchical storage management), and requires a multi-level archive where some volumes may be on line, others off line, yet others in between. Boots are NOT fast, mostly due to the need to validate the archive first.

Back to the unreliability of things - if even one disk has a problem, your union filesystem will freeze - and not nicely either. The first access to a file that is inaccessable will cause a lock on the directory. That lock will lock all users out of that directory (they go into an infinite wait). Eventually, the locks accumulate to include the parent directory... which then locks all leaf directories under it. This propagates to the top level when the entire system freezes - along with all the clients. This freezing nature is one of the things that a HSM handles MUCH better. A detected media error causes the access to abort, and that releases the associated locks. If the union filesystem detects the error, then the entire filesystem goes down the tubes, not just one file on one disk.

Another problem is going to be processing the data - I/O rates are not good going through a union filesystem yet. Even though UnionFS is pretty good at it, expect the I/O rate to be 10% to 20% less than maximum. Now client I/O has to go through a network connection, so that may make it bearable. But trying to process multiple 300 GB data sets in one day is not likely to happen.

Another issue you have ignored is the original format of the data. You imply that the filesystem on the server will just "mount the disk" and use the filesystem as created/used by the sensor. This is not likely to happen - trying to do so invites multiple failures; it also means no users of the filesystem while it is getting mounted. You would do better to have a server disk farm that you copy the data to before processing. That way you get to handle the failures without affecting anyone that may be processing data, AND you don't have to stop everyone working just to reboot. You will also find that local copy rates will be more than double what the servers client systems can read anyway.

As others have mentioned, using gluster file system to accumulate the data allows multiple systems to contribute to the global, uniform, filesystem - but it does not allow for plugging in/out disks with predefined formats. It has a very high data throughput though (due to the distributed nature of the filesystem), and would allow many systems to be copying data into the filesystem without interference.

As for experience - I've managed filesystems with up to about 400TB in the past. Errors are NOT fun as they can take several days to recover from.

fast boot isn't unreasonable (0)

Anonymous Coward | about a year and a half ago | (#41124383)

If you actually have to boot fast, stay away from any distro that isn't friendly about bare bones installs. Yes, that means ubuntu and redhat among many others. You seem to have a very specific use model for this machine, so run only what you want and don't let the distro bully you into running more.

That said, as many other people have commented, cold booting seems pretty pointless. Unless you are really stuck with pata drives, you will almost certainly be using a hot-swap friendly bus (sata, sas, usb, firewire). If you must power down the server, you can still suspend or hibernate.

Unionfs should be supported by pretty much everything. FreeNAS and other special purpose distros will probably make it really easy, if this is the sort of use model envisioned by the maintainers, or more difficult if its not. So if the docs don't make it look easy, I'd suggest avoiding those and sticking with an environment where its easy to roll your own features.

Personally, I would probably use debian with systemd (I like it a lot more than upstart), or freebsd for something like this. Slackware and gentoo are also nicely minimalistic. If boot time matters, seriously, use a SSD for the os, and keep the spinny disks just for data. If you keep the services light, you should have no trouble getting any of those to boot in 10 seconds (after bios does its stuff).

2.5" notebook hard drives (also sold as 2.5" externals), will probably hold up better than desktop or even enterprise drives, if you're going to bounce them around every day. As people mentioned, don't expect them to last forever. I would suspect you will mostly be writing sequentially to the drives, that should help you get a bit more life out of them. Run the drives' self tests frequently and toss them as soon as you start seeing problems crop up. 1-1.5TB 2.5" external drives are reasonably cheap at this point.

First thing I thought of... (1)

bjwest (14070) | about a year and a half ago | (#41124389)

The first thing I thought of was loss of one of the drives during all this moving around. Seems the protection of the data would be of the utmost priority here. Keeping this in mind, I'd go with a RAID 5 or 10 [thegeekstuff.com] setup. This will eliminate having the data distributed on different "drives", so to speak, and it would appear to the system one single drive. This would increase the drive count, but loosing a drive, ether physically (oops, dropped that one in the puddle) or electronically (oops, this drive crashed because we keep swapping it every day) would be a non-issue, or at least a not-tragic issue. I'm sure you have a swappable tray system now for the number of drives you need, you may need to add a tray or two for this setup. Just make sure you keep the drives in the correct order, or swap out the whole drive unit.

As for the original question. I don't think there's really a "best" distro for this, they'll all pretty much do the above out of the box and almost automagically. What you need to look for is what is the easiest distro to use in this case. What will the users be able to use with the least support from me? Unless you're the one that will be swapping out the drives on a daily basis, then use what you're most comfortable with.

Not everything needs to be developed in house... (1)

Bork (115412) | about a year and a half ago | (#41124503)

One of the things I have learned in my career is that I do not know everything. I do the things I can do well and understand the business and when I need something more, I will call in those that have the skills to help. If you get the right consultant involved with the project, they will bring the knowledge necessary to do the job right.

One of the biggest mistakes you can make is to think that you have to know everything and its some kind of fault to say, “I do not know right now but let me look into it and come back with an answer”. Do not limit yourself to only the knowledge and skills of the in house staff, tap into other sources to bring in new knowledge and skills that can help you solve problems.

My biggest resource was a list of those I could call in to help me with an issue or in the help in locating those that could come up with an answer. It not a fault to say I need to bring in some help on this project.

---
Excuse the grammar, been awake a few to many hours and not thinking to clear right now.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...