×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: How Reliable are Enormous Filesystems in Linux?

Cliff posted more than 15 years ago | from the huge-tracts-of...bytes! dept.

Linux 145

Josh Beck submitted this interesting question:"Hello. We're currently using a Mylex PG card and a pile of disks to run a 120 GB RAID5 under Linux. After some minor firmware issues with the Mylex (which their tech support acknowledged and fixed right away!) , we've got a very stable filesystem with a good amount of storage. My question, though, is how far will Linux and e2fs go before something breaks? Is anyone currently using e2fs and Linux to run a 500+ GB filesystem? "Josh continues... "I have plenty of faith in Linux (over half our servers are Linux, most of the rest are FreeBSD), but am concerned that few people have likely attempted to use such a large FS under Linux...the fact that our 120 GB FS takes something like 3 minutes to mount is a bit curious as well, but hey, how often do you reboot a Linux box?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

145 comments

no idea, but.. (0)

Anonymous Coward | more than 15 years ago | (#2038736)

.. this could become some fine 'success story'
if it works ok. Anyway I hope you continue to
run as normal (ie. keeping linux), even if its a major production system because it's great feedback to the developers, and maybe even ironing out a few bugs if you encounter any. (giving back to the community, instead of going with something more tested, because if nobody drives linux to the limit, who will then be able to look at any references when they are to choose systems?)

_other_ FS'es? (0)

Anonymous Coward | more than 15 years ago | (#2038737)

Have you been thinking about another FS maybe?
There's UFS support and alot other. Ofcourse
the linuxport might not be as stable as they
would in UNIX(tm) or BSD, but you get the point.

Netware (0)

Anonymous Coward | more than 15 years ago | (#2038738)

Hey, Netware's not Linux, but there's no reason for Netware to crash that often if its done right. You have a different problem, bud, not Netware.

Journaling filesystems (0)

Anonymous Coward | more than 15 years ago | (#2038739)

If I remember correctly, there is development being done on journaling filesystems for Linux. Journaling filesystems take a fraction of the time to fsck.

As a previous poster noted, fsck:ing such large filesystems is a pain. In a Solaris-environment SUN recommends journaling when at a 40GB-filesystem. AIX only has journaling filesystems.

This is one of the big reasons not to use large filesystems in Linux.

Just out of curiosity: Could someone "crash" a large filesystem and then time the fsck ? I mean, someone out there must have a couple of 18GB disks to play with :-)

Backup and Restore? (0)

Anonymous Coward | more than 15 years ago | (#2038740)

I'm curious are you planning on backing up that partition, or to restore it? RAID doesn't mean you don't need backups and a partition that size is pure MURDER!

If you are planning on, or are currently backing it up, how long does that take, and have you done any research into how long it takes to restore various amounts of data?

ask Remy Card (0)

Anonymous Coward | more than 15 years ago | (#2038741)

He has built ext2fs... card@csi.uvsq.fr

2T (0)

Anonymous Coward | more than 15 years ago | (#2038742)

I read somewhere the linux filesystem was good up to two terabytes.

Backup and Restore? (0)

Anonymous Coward | more than 15 years ago | (#2038743)

We've got a 200GB RAID for our Origin 2000, and we also have a DLT tape robot .. it's a cabinet with a couple of drives, a transport arm, and a rack of tapes. It makes a full backup every weekend (when the RAID isn't being used much) and partial backups every day. Very convenient, if you can afford it.

Penguin Computing has (0)

Anonymous Coward | more than 15 years ago | (#2038744)

Call Sam Ockman at Penguin Computing [penguincomputing.com]. He's built some serious boxes for us the likes of which you won't likely see on their web site. They've been much better to work with than VA Research has been for us (then again, we only bought one order from VAR and ever since have been buying hardware from the other guys).

file system and AFS (0)

Anonymous Coward | more than 15 years ago | (#2038745)

Hi,
Why not look into AFS, transarc makes a commerical
AFS server for linux. I have not used it on linux yet, just on AIX machines. We have 4+T on AIX machines where I'm at, but I"m sure some linux AFS server would have no problem at such large sizes.

Try XFS.... (0)

Anonymous Coward | more than 15 years ago | (#2038746)

I'd personally go ahead, decode the mysteries of the XFS filesystem(SGI's). And use It! It's rock stable, fully journaled, and even tells you when shit happens(yeh:-) and fixes it! well....maybe ext2 does too....but....

Just on the brink of installing one (0)

Anonymous Coward | more than 15 years ago | (#2038747)

Hey,

anybody got some experience with Dat Tape Changers ( HP ??48al ) on Linux ?

uwe.klein@cruise.de

Andrew File System (0)

Anonymous Coward | more than 15 years ago | (#2038748)

There are a bunch of universities in the US that share the Andrew File System. It's an FS that was made at Carnegie Mellon and is maintained by the universities that use it. I'm sure that you can mount AFS drives on a linux box, but using them as servers I'm not sure about. I'm pretty sure the universities use Solaris for the servers.

Re:How Reliable are Enormous Filesystems in Linux? (0)

Anonymous Coward | more than 15 years ago | (#2038749)

You can also use tune2fs if you've already created the filesystem - of course, you need it unmounted to change it (can't remember whether it'll work if it's just mounted read-only).

Do *NOT* reduce the "reserved" disk space! (0)

Anonymous Coward | more than 15 years ago | (#2038750)

[ ... ] don't forget to reduce % of filesystem reserved for root (5% by default, that would be overkill for 120Gb)...

On the contrary, reducing the reserve spaced would be a very *BAD* thing to do. Yes, part of the reason for the reserved space is so that root can always have some space to play with, but the much more important reason is so that the file system doesn't get fragmented.

Reserving 5% of the disk means that there will almost always be a "free block" near by. This percentage doesn't change as the disk size changes. For the 5% value, you will typically not need to go futher than 20 blocks to find a free one (5% = 1/20).

For a disk like this, I would think you would actually want to *increase* the reserved space to keep the amount of seeks down. The BSD Fast File System papers showed that you needed at least 10% to keep the system fast. I really don't think any similar studies have been done for ext2fs, but I would suspect that it was similar.

IDE Filesystems' (Un)Reliability (0)

Anonymous Coward | more than 15 years ago | (#2038751)


It seems Linux with a supported RAID card and SCSI disks can support 500+GB filesystems reliably and even efficiently if you have the journaling patches. But how about the reliability of more mundane medium-sized IDE filesystems, eg in the 5-20GB range? I guess most people here are using IDE disks rather than SCSI disks.

Would anyone else care to run the IDE disk test below and share their results here giving your IDE/motherboard specs? On my machine this test intermittently generates file read errors on reading large files from IDE disks

1. Create a large 10MB test file on an IDE partition.

  • dd if=/dev/urandom of=bigfile count=20k

2. Do lots of checksums using sum or md5sum.

  • sum bigfile
    sum bigfile
    sum bigfile

    repeat 20-30 times
Q: Do you get the same checksum each time?

I get different answers intermittently. If I do a byte-wise difference between two identical files it reveals bytes are occasionally misread with bit 5 being flipped randomly i.e. the values are +/- 32 from the correct value.

My setup is a Supermicro P6SLA motherboard with Intel 82371AB PIIX4 IDE interface and Quantum Fireball 6-16GB IDE disks with 12-13 MB/second transfer rates. DMA but not UDMA is on, but this makes no difference apart from speed to my results. If the hardware had a fault, I should see the errors under Win95 but I don't.

The IDE driver for all current kernels 2.0.x and 2.2pre/2.1.x kernels has a bug relating to DMA status signalling (see Mark Lord's reply in kernel mail list 15 Dec 1998).

My experience of linux and large file systems (0)

Anonymous Coward | more than 15 years ago | (#2038752)

Hi,

I don't run any monster filesystems i e 100 Gb +, the biggest filesystem we have got at our site is some 70 Gb, although we have a couple of volumes in that size.

However we had the need for a big and *really* cheap fileserver. And being a linux fan I took an old P166 that was lying around, bought 4 quantum fireball 12 Gb and installed linux and created a single filesystem spanning over the disks using raidtools raid0. Everything counted of the usable disk that was left was some 43 Gbs.

I know that raidtools is currently beta software. But it works quite well. It is easy to setup and quite fast.

The server stability is ok, it crashes now and then, maybe once a month. Don't know if it a hardware or a software problem. I havent had the time to look into it.

The ext2 filesystem seems to be the more stable part of the raidtools/ext2 combination we use.

The server is backed up using our legato networker backup server, which works very well.

My previos experience in large filesystems are all on sun-servers using much more expensive hardware and management tools disksuite/veritas. Those servers are rock-solid, fast and expensive.

To sumarize, the server gets the job done at a unbeatable prize/performance ratio, it could be better.

I am looking forward to see what comes out of the logical volume project.

Don't forget LVM! (0)

Anonymous Coward | more than 15 years ago | (#2038753)

Hi,
This is more important, IMO. When does Linux get LVM?

LVM + JFS = Satori..

Backup and Restore? (0)

Anonymous Coward | more than 15 years ago | (#2038754)

Doesn't IBM have an ADSM client for Linux? A company called EMASS makes an IBM 3494 compatible tape library that will hold around ~6.9 terrabytes for about the half the price and half the footprint of a genuine IBM 3494 tape library. Thats more storage then your likely to need, but you can start adding extensions if you need more. Of course, you still need a non-Linux machine to run the ADSM server software, but look for that to change.

Macs have had this for a while.. (0)

Anonymous Coward | more than 15 years ago | (#2038755)

Macs have had a b-tree based file system for quite a while. I've always thought it wasn't well suited for multi-user systems because you have to lock most of the filesystem when maintaining the b-tree.

split it up, if you can (0)

Anonymous Coward | more than 15 years ago | (#2038756)

This is *nix, you can split up partitions and use symlinks to redirect a directory onto a different partition. I see this all the time on many large ftp archives. This has advantages in terms of reliability and ability to check the filesystem on individual partitions without bringing everything offline. You'll definitely want to use subdirectories if you have lots of files, because e2fs doesn't handle directories with large numbers of files very well (it slows to a crawl).

AFS Not the answer!! (0)

Anonymous Coward | more than 15 years ago | (#2038757)

AFS is a distributed file system. It allows
the seemless combination of multiple servers
into one filesystem. A big RAID on an
AFS server would still need to have an ext2 (or
similar) partition(s).

AFS is used a lot for universities because
there is no need to know the machine name
to get to a networked resource. It is all
mapped to directory names in a cell (ie.
athena.mit.edu) by AFS volume servers.

Transarc just announced commercial support for
Linux, but I do not know if that includes the
server portions of AFS, probably just the clients.

async? (0)

Anonymous Coward | more than 15 years ago | (#2038758)

You haven't mounted this mother async, have you?

no problems here (0)

Anonymous Coward | more than 15 years ago | (#2038759)

K6-333; MVP3 chipset, IDE 6.4GB, 128MB., RedHat 5.2, 2.2pre7

40 loops, no errors.

lates.

Netware (0)

Anonymous Coward | more than 15 years ago | (#2038760)

another thing to keep in mind when dealing with netware is it absolutely hates ide drives, scsi is the way to go with netware, and if your going to have large volumes, you ought to check out nss.
you could power off a server in the middle of a major database update and the nss volume would mount up instantly without any problems, now you would probably have some database integrity problems, but atleast the file system is in tact

Definition of Journaled Filesystem (0)

Anonymous Coward | more than 15 years ago | (#2038761)

if you use a real raid controller in a real server such as a compaq proliant w/ a smart array 2 controller, the cache on the raid controller is protected by a battery pack so in the event that the server loses power during writes, it can finish writing its cache when the power comes back on before the file system is ever booted

Backup and Restore? (0)

Anonymous Coward | more than 15 years ago | (#2038762)

no, see, what you need is an Ampex DST 812 [ampex.com] tape library. 12.8 TB capacity, 80 MB/sec sustained data rate. nice!

Tweedie, Red Hat, Journaled File System (0)

Anonymous Coward | more than 15 years ago | (#2038763)

Red Hat has Steven Tweedie working on a fully journaled file system for Linux. I haven't heard the release date. Maybe someone who knows Tweedie could ask him. [ sorry if I misspelled S.T.'s name.]

Don't forget LVM! (0)

Anonymous Coward | more than 15 years ago | (#2038764)

LVM is here.

http://linux.msede.com/lvm/

Alpha software but I've tried it and it actually works!

Files over 4 GBs (0)

Anonymous Coward | more than 15 years ago | (#2038765)

Actually, the P2 core has a 36-bit address bus. I'm not sure how you're supposed to address this from software, though, since 32-bit is pretty ingrained in the MMU structures.

Novell: GroupWise gone third party? (0)

Anonymous Coward | more than 15 years ago | (#2038766)

Excuse me, but I have run into some badly written NLMs full of memory leaks and some really cleanly written NLMs. What determines if a NLM is cleanly written or not seems to have nothing to do with if it is third party. I believe the following NLMs I have had expierence with are still consider "true red" (from Novell). In fact, if you aren't running ANY third party backup software then you probably have yet to have a reliable backup.

Servers running GroupWise 5 based NLMs (and no third party):
POA (Post Office Agent):
Average time til auto-abend recover:
Less than 24 hours
Average time til complette server halt:
Less than 2 weeks

SMTP (GroupWise to Internet gateway):
Average time til complette server halt:
Less than 3 days

Web Access Gateway+Novonyx:
Average time tel complette server halt:
Less than 12 hours

One the other hand, we have had servers running ADSM v3 (beware of ADSM v2) and Open File Manager on servers that have remained up for months.

It would be nice if Novell took a hint from IBM/Lotus and started providing some of their GroupWise server & client services for Linux. Getting those GroupWise (third party?!) NLMs off Novell (and not onto NT) would definately help Novell image, at least at where I work.

Files over 4 GBs (0)

Anonymous Coward | more than 15 years ago | (#2038767)

That's ridiculous. Any commercial 32 bit UNIX has no such limitation. In this respect Linux is just lame, there's no two ways about it.

I don't buy the efficiency reasoning either. As you point out it wouldn't be POSIX to change the library call types (seek/fseek/tell/ftell et al) so new calls are required (seek64 or whatever). Hence applications using the old entrypoints have no efficiency hit. Besides, we're talking about file access here, even a fairly large amount of CPU overhead will be swamped by the actual time it takes to do the I/O operation.

I'm working on a product that requires large files and as a result Linux is completely ruled out. I'd love to port to Linux, but it's just not possible. So we can go Alpha as you suggest, or we can just screw Linux and use a non-crippled OS.

Re: Just on the brink of installing one (0)

Anonymous Coward | more than 15 years ago | (#2038769)

The one tool you might wanna check into is "Fake" (search for this on www.freshmeat.net)

It basicly does a arpa scan of the other machines IP, and if that IP doesnt respong (arpa/ping) its conciderd dead, and it SPOOFS its IP to that IP, thus taking over all its services.

Ive used this tool in a few live situations, and i loved it, its the ultimate redundency, and security..

IDE Filesystems' (Un)Reliability (0)

Anonymous Coward | more than 15 years ago | (#2038770)

It works just fine on my Asus TXP4 w/P200, 128MB DIMM, 8.4GB UDMA IBM Deskstar. I actually used a dump from an old (and full) 135MB drive which I had sitting around. It does take almost 25 seconds to complete, though.

Netware (0)

Anonymous Coward | more than 15 years ago | (#2038771)

It shouldn't take anywhere near 5 minutes to mount 9Gb volume. That's a fairly small volume when working with Netware.

I'm using a 17.2 Gb volume and the mounts take less 5 seconds if that long. That is assuming that the filesystem isn't damaged and that you don't have to run vrepair (the fsck equivalent) which on my volume requires less than 2 minutes.

With mount times like that you've got one of several problems: 1) lack of memory on server (I'm using 128Mb and that's the minimum I would run with, Novell has a formula to calculate the amount of memory necessary to adequately use a volume), 2) Server hasn't got adequate processing power (doubtful), 3) inadequate I/O subsystem.

Macs have had this for a while.. (0)

Anonymous Coward | more than 15 years ago | (#2038772)

Although the Mac's use of b-trees in its filesystem might not be the best, b-trees in general are very good for multiuser situations. They have a high degree of locality when updating on-disk structures. Certainly a b-tree structure requires at most the same degree of locking as simpler structures and, usually, considerably less.

Files over 4 GBs (0)

Anonymous Coward | more than 15 years ago | (#2038773)

SCO Open Server 5 has a limitation of 2 Gig...

Info of e2fs (0)

Anonymous Coward | more than 15 years ago | (#2038774)

Where is the best place to get indepth info
on e2fs and how it works. Implimentation,
max size, etc... I have not been able to
find any HOWTO's or FAQ's on the subject

IDE Filesystems' (Un)Reliability (works great) (0)

Anonymous Coward | more than 15 years ago | (#2038775)

Maxtor EIDE 8Gb drive, P166, Intel Triton MB - works like a charm, no problems whatsoever.

Perhaps the original poster should run an fsck on his/her drive.

- Steve (steve@badcheese.com)

Files over 4 GBs (1)

Anonymous Coward | more than 15 years ago | (#2038805)


When will we (do we) have support for files larger than 4 gigabytes ?

Files over 4 GBs (1)

Anonymous Coward | more than 15 years ago | (#2038806)

There are a few hacks that allow support for files over 4G. You won't see files over 4G supported natively until we move to a 64-bit OS (at which time we will then have support for files up to 18 Exabytes - that is, 18,000,000,000,000,000,000 bytes).

When do we get a fully journaled file system? ... (1)

Anonymous Coward | more than 15 years ago | (#2038807)

My experience with journalled file systems on commercial UNIX systems has been overall negative. They may require less (or no) fsck'ing, but you pay a heavy price for that in terms of performance.

The only way journalling can work reasonably well is if you have a battery backed RAM to hold the journal.

3.5 MB/s is really slow. (1)

Wakko Warner (324) | more than 15 years ago | (#2038808)

I get just under 9 MB/s in linux with my Seagate Barracuda (it's a narrow drive, but narrow versus wide doesn't matter when you only have a few devices.)

At any rate, 3 MB/s seems too slow, by a factor of 3 or so.

- A.P.
--


"One World, One Web, One Program" - Microsoft Promotional Ad

My God.. (1)

Wakko Warner (324) | more than 15 years ago | (#2038809)

For a minute there I thought you were talking about the tape drives. Why anyone would want even one, let alone eighteen, of those drives connected to their computer had me baffled fot a moment.

- A.P. (Yes, I've had several bad experiences with Exabytes.)
--


"One World, One Web, One Program" - Microsoft Promotional Ad

SW Striped 100GB + FS at VA.... (1)

KMSelf (361) | more than 15 years ago | (#2038810)

I can't tell you how well it worked, but I was watching VA set up at least one 100 GB+ FS for a customer who insisted on SW striping several nonsymetrical disks together to form a 100 GB partition. It's doable, but not recommended (VA was recommending several saner alternatives but the customer wouldn't buy it). This is pretty much the best way to guarantee yourself problems down the road -- HW RAID 5 or RAID 1/0 is a much better alternative for a reliability standpoint. If anyone at VA is interested in commenting on successes/failures with very large filesystems....

There are a number of well-known websites which utilize Linux, including Deja News [dejanews.com]. Not sure what kind of partition sizes they're using, but it would be fun to know.

FWIW, you can modify the reserved % parameter using tune2fs rather than mke2fs and save scads of time. You can also force an fsck (man fsck) to time the operation if you want.

Logging Filesystem (1)

Jordy (440) | more than 15 years ago | (#2038811)

Ext2fs would take months to fsck that thing. :)

There is a log-structured filesystem for linux called "dtfs" available at their home page. [tuwien.ac.at] The author tells me he will be shooting for inclusion in 2.3.0 and that the bulk of it is working just fine.

Nothing much (1)

Codifex Maximus (639) | more than 15 years ago | (#2038814)

The important thing is the filesystem itself. The journal, as I understand it, is just a record of changes to be made. Rollbacks are possible for multiple levels of updates.
If the glitch occurs prior to journal recording then there is nothing to fix.
If the power outage or problem occurs after the journal entry has been made but prior to the commencement of writing then the changes can be rolled forward or back - posted or rejected.
If the problem occurs after journaling but while writing is in progress then the changes can be rolled back and then possibly reposted.
If the problem occurs after journaling and after writing but prior to reconciling the journal then the changes can be rolled back or the journal updated to match the filesystem.

Journaling is good for systems that require very very high reliability - such as banking systems. There is obvious overhead involved in journaling.

An optional, journaling filesystem for Linux would be a nice addition - hey, NTFS for Linux isn't far from being read/write is it?

VA Sells RAID machines up to 1.1TB (1)

gavinhall (33) | more than 15 years ago | (#2038815)

Posted by GregK:

We are selling 1.1TB(that's terabyte)machines currently using the Dac960 and external drive enclosures. You can check out our systems at the following URL;

http://www.varesearch.com/products/vs4100.html

They are quite reliable, mostly due to the fact that the author of the Dac960 Linux drivers,(Leonard Zubkoff), works for us.

EXT2 and Raid (1)

gavinhall (33) | more than 15 years ago | (#2038817)

Posted by Jim @ ImageStream Internet Solutions:

The company I work for has been selling Linux based servers running the linux software raid. It works great in the 2.1 kernel, the largest raid we built under software was 150 gig. We did this to backup a clients old raid system. One of are clients has a server which is running 100 gig raid(linux software) and is moving huge databases on it daily without fail and been up over 6 monthes running 24/7 crunching data.

Some issues... (1)

felicity (870) | more than 15 years ago | (#2038818)

It seem that for Linux ext2 partition it was discussed that there is some loss for big filesystem (a % of space is lost for superblock copy and such)

you want to turn on "sparse-superblocks" support when doing a mke2fs. it reduces the number of duplicate superblocks. the catch is that it is really only supported in the late 2.1.x/2.2.x kernels. 2.0.x will bawk like a dying chicken.

Files over 4 GBs (1)

rlk (1089) | more than 15 years ago | (#2038819)

Sorry, but I don't buy that. Solaris and AIX have no problem with big files on 32-bit platforms. 32-bit platforms are normally limited to 32-bit address spaces (modulo games such as segments and some of the other truly horrible DOS and Windows hacks), but there's no compelling reason why file size and physical memory should be constrained by processor word length.

Backup and Restore? (1)

cph (1317) | more than 15 years ago | (#2038820)

Wrong answer! I guess your instructor is being paid too much.

See IBM's web page, which shows how to download an unsupported ADSM client for Linux:

http://www.storage.ibm.com/software/adsm/adsercl i.htm

reducing reserve space will reduce performance (1)

wayne (1579) | more than 15 years ago | (#2038821)

also the reservered blocks percentage (man mke2fs). On large partitions you can waste alot of space if you keep the default 5% reservered blocks percentage.

Reducing the amount of reserved space may save you some space, but it can cost you a *lot* of time. Having 5% reserved space will mean that there will almost always be a "free block" within about 20 blocks from the end of the file.

Unless you want your expensive raid system to spend lots of time seeking, you should keep the 5% min free value, or even increase it to 10% like the BSD folks use. You certainly don't want to constantly run the filesystem close to being full most of the time.

IDE Filesystems' (Un)Reliability (1)

wayne (1579) | more than 15 years ago | (#2038822)

1. Create a large 10MB test file on an IDE partition.

2. Do lots of checksums using sum or md5sum.

Q: Do you get the same checksum each time?


I once had a problem like this where I would get a corrupt byte from the disk about once very 50-100MB of data read. It happened on two different disks, I tried three different IDE controllers, I swapped RAM, I ran RAM tests, I made sure my bus speeds were within spec. One byte every 50-100MB might not be very high, but it was enough to crash my system once or twice a week.


It turned out that I needed to set my RAM speed in the BIOS to the slowest setting, and everything worked. The RAM test programs didn't detect anything, I think because the errors only occured if I was doing heavy disk access at the same time I was doing very CPU intensive things.


Moral of the story: PC hardware is crap and Unix tends to push the hardware much futher than other systems.


Set your BIOS settings down to the very slowest settings and see if the problem goes away. Try swapping components, and try a *good* memory tester (Linux has one called mem86 or something).


Good luck

ext3? (1)

jd (1658) | more than 15 years ago | (#2038823)

I believe ext3 is "work in progress", though feel free to contradict me if I'm wrong on that.

From what I understand, ext3 would be better suited to giant partitions.

Bad memory? Overclocked system? (1)

shani (1674) | more than 15 years ago | (#2038824)

Dude, it sounds like you have either some bad memory or are pushing your processor a wee bit too hard. Are you using parity memory? Odds are not, which is fine, but you can get strange results like this. Maybe you should borrow some SIMMs from a friend and see if the problem persists.

Some issues... (1)

krynos (1706) | more than 15 years ago | (#2038825)

It seem that for Linux ext2 partition it was discussed that there is some loss for big filesystem (a % of space is lost for superblock copy and such), don't forget to reduce % of filesystem reserved for root (5% by default, that would be overkill for 120Gb)...
If you find 3mins long to mount a 120Gb filesystem, you should have seen a Netware server with 13Gb, that take at least 5-15 mins to mount the filesystem...

Netware (1)

krynos (1706) | more than 15 years ago | (#2038826)

The main problem with Netware (before 5), is the lack of memory protection and a illegal read or write does crash the server (there is an option to ignore this, only use it when developping NLMs)
A well configured Netware server with updates is very stable. For 3rd party NLM, if they are well written, it shouldn't crash the server (I'm also a NLM developper, and Unix programmer. One of my NLM hadn't crashed and the server uptime is about 2 months, and it's used fairly often, it mainly convert print job from Epson FX-80 format to another printer format. NLM are a pain to develop, develop (and test) most of the NLM under another OS, like Linux then to the last few lines under Netware is the best way, with of course set of routines for easy porting between Unix, Netware and Win32 with the same code.)

What about reiserfs? (1)

red_dragon (1761) | more than 15 years ago | (#2038827)

There is work in progress to develop a binary tree-based filesystem for Linux, which is currently on second beta. The paper and source files are located at http://idiom.com/~beverly/reiserfs.html [idiom.com] . It is supposedly faster than ext2, and might be better suited for gigantic partitions, although I cannot attest to that, as I have no experience with it. Does anyone here know anything about this?

No prob here (1)

red_dragon (1761) | more than 15 years ago | (#2038828)

Tried it, and got the same every time. Setup here is an AST Premmia GX P/100 in single-processor mode (has a 82430LX/NX Mercury/Neptune chipset and PCI and EISA buses), 32MB no-parity RAM, kernel 2.0.35, Western Digital WDAC1600 disk, and a *shudder* CMD 646 rev 1 controller. Well, actually, the disk mentioned above is connected to the second channel, which uses a different controller that gives me problems when I configure it to do IRQ unmasking and 32-bit transfers (the CMD646 does that fine here). I recommend you to try tuning your disk setup with hdparm and try again.

Journaling FS (1)

ChrisRijk (1818) | more than 15 years ago | (#2038829)

Sometimes called a 'log' file system, though they aren't the same. AFAIK the difference is that a log file system just writes the meta-data super-safely (so you don't need to fsck), while a full jounaling writes all data out as a 'log' - it just appends to where it previously wrote. This does mean you need a background garbage collection processes.

Doing writes in this way makes writes go MUCH faster. I read a review by one journalist (no pun intended) who didn't believe Sun's claims that it made long sequential writes go 3x faster or more... It did. Unfortunately, Sun haven't (yet) put full journaled FS support into standard Solaris, though there is an option to put "UFS logging" on - it can also be done on the fly. Still, deleting files and creating lots of small ones goes about 5-10x faster when you put logging on.

When do we get a fully journaled file system? ... (1)

Q*bert (2134) | more than 15 years ago | (#2038830)

A journaling (not "journaled") filesystem is one that keeps track of all the writes it's going to make on a special part of the disk. That way, if you lose power with the disk still spinning, the FS can read its record of "pending transactions" and make the needed changes immediately when you boot again. Journaling thus eliminates the need for fsck'ing. Cool, huh?

Backup and Restore? (1)

Q*bert (2134) | more than 15 years ago | (#2038831)

Piece o' cake: Use DLTs. I work at Indiana University (an awesome CS school, by the way; I just finished my bachelor's here ;) ) that just purchased what's called a "tape silo". It's a huge enclosure with shelf upon shelf of DLT-sized cubbyholes and a robot that moves among them in two dimensions (up/down and left/right) inserting and removing tapes. They plan to obtain around 15 TB of storage, with 1 TB of spinning disk to keep the whole thing moving. For a university of 30,000+ students and tons of research on the primary campus, that's not unreasonable!

The moral of this story: DLTs are a perfectly feasible backup medium. You can get 17GB on one tape.

Definition of Journaled Filesystem (2)

Chris Tyler (2180) | more than 15 years ago | (#2038832)

I know this sounds ignorant, but what is a journaled file system?

A journalled file system writes all of the proposed changes to control structures (superblock, directories, inodes) into a journalling area before making those writes to the actual filesystem, then removes them from the journal after they have been committed to disk. Thus if the system goes down, you can get the disk into a sane state by replaying/executing the intention journal instead of checking every structure; thus an fsck can take seconds instead of minutes (or hours).

For example, if you're going to unlink the last link to a file (aka delete the file), that involves an update to the directory, inode, and free list. If you're on a non-journalled system and update the directory only, you have a file with no link (see /lost+found); if you update the directory and inode only, you have blocks missing from your free list. Both of these require scanning the whole disk in order to fix; but a journalled system would just update the directory, inode, and free list from the journal and then it would be sane.

Problems with journalled filesystems include conflicts with caching systems (e.g., DPT controllers, RAID subsystems with cache) where the intention journal is not committed to physical disk before the writes to the filesystem commence.

IDE Filesystems' (Un)Reliability (1)

Rendus (2430) | more than 15 years ago | (#2038833)

Cyrix 6x86 PR233MX with 64MB of RAM, EIDE hard drive 2.5 gbytes (Fujitu I think, don't feel like opening the case), Amptron 8600 motherboard. Linux 2.2.0pre4 kernel. No problems here, checksum was the same each time.

fsck time? (1)

Fandango (2618) | more than 15 years ago | (#2038834)

My god, how long does it take to fsck such a beast? Unfortunately, I haven't looked into the journalled filesystem that's supposedly available for Linux (I think it's commercial), but a journalled filesystem is exactly what you'll need for this. Even with a UPS, hour-long fsck times are not my cup of tea.

-Jake

anyone know about the BeOS FS? Name the book. (1)

Fandango (2618) | more than 15 years ago | (#2038835)

A quick amazon.com search would have revealed the title:

Practical File System Design with the Be File System

by Dominic Giampaolo

From this book, you could literally write your own compatible implementation of BFS for Linux. The question is: would BeOS compatibility be worth missing the opportunity to create a new filesystem tuned for what Linux is used for? The nice thing about dbg's book is that he covers the reasoning behind every decision that he made when developing BFS. Clearly, some of these decisions are closely tied to what BeOS is being targeted for (a single-user power desktop for media professionals), rather than what Linux is most often used for (a multi-user Internet server).

-Jake

goodbye to ext2? huh? (1)

Sinner (3398) | more than 15 years ago | (#2038839)

I like ext2fs. It makes ever other filesystem I've ever used look shoddy and slow. And I've read that it's benchmarked as the fastest filesystem on Intel. The only conceivable reason to replace it would be if someone came up with a new filesystem for Linux that was even faster and had lower overhead. I won't be holding my breath.

Of course, people with big-ass disks and high uptime requirements need that journalling crap. And they'll have it. So don't be dissin' ext2!

OK ... (1)

Bwah (3970) | more than 15 years ago | (#2038840)

... Please excuse the stupid question here, BUT:
How does a filesystem of this type allow for a 3x sequential write speed improvement?

I understand what the journaling part is describing, but don't understand how this would be that much faster. Especially under a really heavily loaded server.

/dev

anyone know about the BeOS FS? (1)

cyberassasin (4943) | more than 15 years ago | (#2038841)

I read that it can handle volumes of over 1 terabytes....does support for this exist? I think it is a 64-bit journaling FS......seems pretty nice, but I haven't played with BeOS in quite some time...any help out there?



Misc problems (can be tuned around) (1)

Eivind Eklund (5161) | more than 15 years ago | (#2038842)

Note that I'm a FreeBSD person; all of this from theory, as I've not run Linux for anything serious for a couple of years.

If you run this using the default parameters and get an unplanned shutdown (crash, power outage, whatever), you are likely to get minor file corruption. To get correct behaviour, you should mount the filesystem in sync mode, and rely on the underlying RAID setup to handle write caching if you need it (as this remove one failure layer).

You will also want to modify e2fsck to avoid silent data corruption. e2fsck will (or would, the last time I was in a discussion with the author on these issues) handle a block that is shared between two files by duplicating the block.
This silently corrupts at least one file. You will probably want to change it to delete both files, possibly making that an interactive question. (Deleting is the default action on the *BSD fsck, BTW).

Eivind.

Netware (1)

law (5166) | more than 15 years ago | (#2038844)

I have a 38 gig partition on netware and it take about 6 minutes.... raid 5, mylex 960 32 megs of ram.

Netware 5 (1)

Bryan Batchelder (6115) | more than 15 years ago | (#2038845)

Heh, I know of a Netware server that mounts a 50GB volume in the blink of an eye :-) Seriously, its less than a second.

Files over 4 GBs (1)

LinuxGeek (6139) | more than 15 years ago | (#2038846)

Dang, the ftp site with the patch gives a 'can't set guest privileges'. Anyone have an alternate site for the >4GB patch? It isn't at linuxmama...

When do we get a fully journaled file system? ... (1)

doomy (7461) | more than 15 years ago | (#2038850)

Ah.. that would be nice.. wouldnt it? goodbye to ext2


--

Journaling OS! (1)

tilly (7530) | more than 15 years ago | (#2038851)

Not only are disk-writes faster on journeled file-systems, there are also such things as journeled operating systems.

That is, if you turn the power off and turn it on, the entire OS comes back on to a state within a few minutes of where it was. One example that looks interesting is EROS [upenn.edu].

I have not seen this one in operation, but there are theoretical arguments for their speed claims, and (as they say) it is theoretically impossible for *any* OS based on access lists (such as Unix) to achieve the same level of security that a capability based system can. (Note, I said "can", not "does".)

Regards,
Ben Tilly

Netware (1)

mindedc (7819) | more than 15 years ago | (#2038852)

If you're netware server crashes that much you have a problem. It can easlily be hardware or an errant NLM or running your backup during file compression.

I routinely see netware servers that have uptimes of 400-600 days.. record is 900 days so far (took a polaroid of that one).

If you want some help with your system, I would be happy to help you wih your problem for free. You can contact me at dminderh@baynetworks.com if you'd like.

The new file system in netware 5 will mount & vrepair 1.1 TB in 15 secconds (that's the largest I have seen..I'm sure it will do more..)

And your mount time isn't that bad. Chrystler has a 500 GB volume that takes 22 hours to mount :)

Global File System (1)

kpreslan (8125) | more than 15 years ago | (#2038853)

If our testing of GFS, we have created ~108 GB filesystems (12 9GB disks software striped together). The only limit in the file system size is 1TB (which all Linux filesystem share).

GFS is a 64-bit filesystem. It supports files up to 2^64 bytes in size (on the alpha).
It is much faster than ext2 for moving around big files.

GFS will support journaling by the fall.

http://gfs.lcse.umn.edu


fsck time? can't be worse than NT! (1)

Cato (8296) | more than 15 years ago | (#2038854)

Since you were running NTFS on that size of file system, why do a chkdsk? NTFS is a journalled filesystem, so there is really no need - and of course journalling is designed to avoid long fsck's.

NT may have its faults, but NTFS is not bad in this respect - Linux does not yet have a widely used journalling filesystem that I'm aware of.

Disk Space is the key (1)

kellman (8394) | more than 15 years ago | (#2038855)

Out of disk space is the fastest way I've seen to crash a Netware server. At one place I used to work, the default mailbox location was on the system volume. Once it was full, BOOM! the server was dead.

Just on the brink of installing one (1)

Ramana (9031) | more than 15 years ago | (#2038856)

We are about install a central machine that
runs NFS, sendmail, DNS, NIS, httpd for internal
use, gnats for around 60 users. Here is the
plan. Two identical machines with 512M ram and
9.0G disks with OS installed. One machine
would be running as NFS server and the other
machine would have all the servers sendmail,
DNS, NIS etc. The NFS server is connected to a diskarray with 7 18.0G disks and a backup
tape autochanger. I want to leave one of the
disks as a hot spare. I would like to write scripts such that if one machine fails, the other can take over by just running a script.

It is the RAID part that is not clear to me. The
last RAID I checked was Veritas on Solaris which
was a major pain in the neck to manage. Don't
know if managing RAID on Linux is any simpler.
I am inclined to wait till RAID becomes a standard
part of Redhat. Until then, I would rather
depend on the tape backups than
Linux RAID support.

I am curious to hear any experiences on people managing large file systems 100G+.

BTW, I haven't still figure how to use our
Exabyte autochanger effectively with a GPLed
backup sofware. Exabyte tech support wasn't very
useful.

Ramana

Exabyte autoloaders under Linux. (1)

John Barnette (9673) | more than 15 years ago | (#2038857)

While there aren't any effective GPLed solutions for using Exabyte (or any other SCSI/Medium Changer unit, for that matter) libraries and autoloaders under Linux, I've been playing around with 'em quite a bit lately, with some success.

Drop me a note: johnbar@exabyte.com

When the system fails during a journal write.. (1)

Bobski (9886) | more than 15 years ago | (#2038859)

If the system fails during a write to the journal, I believe that the whole operation is regarded as failed, and what's known in database circles as a ROLLBACK occurs... That is, since the disk operation (e.g. unlink) couldn't be completed, it's not done at all, and thus not left half-done and messy.

Files over 4 GBs (1)

Jerenk (10262) | more than 15 years ago | (#2038860)

IIRC, isn't it NOT the OS, but rather the FS that must be 64-bit? I know NTFS is 64-bit and can handle files over 4gig (I've seen it). And, we all know that NT isn't 64 bit (yet). How it does it I am not sure - need an NTFS reference manual OR the MS source code. Fat chance of that...

I kinda have to suggest this (shrug), but why couldn't we get the NTFS driver bulletproofed (r&w)?? Other than the anti-MS reason, NTFS isn't a bad FS (and is proven) and there is already substantial work done with it... It'd be great for that "Hey, NT admins, come to Linux?"

But, then again, if people like Tweedie from RH working on designing ext3, why bother with NTFS?? ;-) Who knows where they are?

L8r,
Justin

Tried 200 Gig once. (1)

TBC (11250) | more than 15 years ago | (#2038863)

Big file systems are great when they work, but FSCK is a nightmare. We have 24 9.1 Gig Drives on a three channel DPT controller. We tried it as a single volume, but weird things happened with the DPT firmware.

We originally used this as a Usenet news server. We tried 24 seperate volumes to have the maximum number of spindles, but Linux has a limit of 16 SCSI drives in the 2.0 kernels. We ended up creating 12 2-drive stripe sets. (no redundancy) We then created 6 partitions. 5 that were 2 gigs in length, and one with the remainder. We used a patch to allow the partitions to be handled as 2 gig files. This was very fast, and had no FSCK issues as there were no file systems. If a few articles were mangled because of a crash.....

We ended up outsourcing our usenet service, and had this server to reuse. We created 3 volumes of 7 drives each, along with 3 hot spares. (One hot spare in each external drive chassis) Each volume is ~50 Gigs in size. One thing we have found is that if we HAVE to fsck the whole thing, (150 Gigs) you need about 4 hours. The PCI bus just doesn't have the bandwidth to check huge volumes in a reasonable time. We end up checking "/" and mounting it rw. We then mount the rest of the volumes "ro". We can then restart basic services (mail, web) and continue the fsck on the read-only volumes.

It's a balance you have to strike. If you really need that large of a file system, understand the time to restart. For us, just a basic reboot takes 12 minutes. With FSCK, it's ~4-5 hours of time to babysit. If you don't need that much space, look at setting up several individual file servers. It will help spread the load.

Just on the brink of installing one (1)

TBC (11250) | more than 15 years ago | (#2038864)

For robotic changers, one of the best solutions I've seen is Arkeia by Knox Software. www.knox-software.com. It's commercial, and expensive, but worth the money if you need large-scale backup.

When do we get a fully journaled file system? ... (1)

Ween (13381) | more than 15 years ago | (#2038867)

I know this sounds ignorant, but what is a journaled file system. An explanation or link would be appreciated.


talking windows users is where I draw the line ..

fsck time? can't be worse than NT! (1)

dirty (13560) | more than 15 years ago | (#2038868)

Actually IIRC journalling filesystems still require fscks. The process is just supposed to take a very short time. Besides, if you weren't supposed to run chkdsk on ntfs, why would they give it to you? And if ntfs is a journalling fs, why did it take 3 days? MS is obviously doing something (else) wrong.

RAID perfectly reliable? (1)

CopiceC (13944) | more than 15 years ago | (#2038869)

Gee, too many people here act like RAID means totally reliable. It doesn't. RAID controllers go wrong; power supplies go wrong; many RAID controllers don't handle power outage properly (there is no such things as an "uninterruptable" power supply) as they have no battery support for data in transit; RAID controllers with battery support don't properly detect when the batteries have died, and won't provide any real support when their big day comes; hosts go wrong and scramble their disks; and even the best operating systems can go a bit pear shaped some days. There is also the good old "some idiot just typed a dumb command and wiped thousands of files" issue.

I'm not saying RAID is a waste of time. It improves reliability a great deal, and the better designs make things go faster. They aren't perfect, though.

Backing up a monster partition is a pain in the neck. If you have a monster database you have little choice, but smaller partitions make life easier.

IDE Filesystems' (Un)Reliability (1)

takis (14451) | more than 15 years ago | (#2038872)

I tried it and got the same checksum every time (Tried both checksums 4 times).
I'm using a AMD K6-2 on an Asus P5A-B motherboard (uses Alladin (Ali1xxx) with a Quantum Fireball ST3.2A.(UDMA 2). Don't know the transfer rates.

Greetz, Takis

Re:How Reliable are Enormous Filesystems in Linux? (1)

ossie (14803) | more than 15 years ago | (#2038874)

I am interested in this as well. I am currently in the process of setting up a 180GB fileserver. I am using dual redundant CMD RAID contollers and 10 - 18GB UltraWide SCSI drives. The RAID provides a mechanism to create partitions that show up to the OS (linux) as individual drives. This is done by giving each partition in the RAID set it's own LUN. The bigest RAID partion I have made is 80GB. I am booting off the RAID set as well(no hard drive in the server box). I have had no problems so far. I also noticed that it takes quite some time to mount the larger partitions. One thing you might want to experiment with is varying the bytes/inode and also the reservered blocks percentage (man mke2fs). On large partitions you can waste alot of space if you keep the default 5% reservered blocks percentage.

Tape Changers Under Linux (1)

pwb (14817) | more than 15 years ago | (#2038875)

I'm interested in info on using dat changers under linux. Is anyone doing this? If so what changers are supported?

Tape Changers Under Linux (1)

pwb (14817) | more than 15 years ago | (#2038876)

I'm interested in info on using dat changers under linux. Is anyone doing this? If so what changers are supported?

The machine I'm sitting at has an APS Technologies changer attached to it, of unknown model. The tape changer says "DATLoader600", but that is not an APS name.

Here is a link to the APS website. [apstech.com]

Are Adaptec RAID cards supported yet? (1)

RoaminCatholic (14841) | more than 15 years ago | (#2038877)

All this talk about RAID makes me yearn to run Linux off our cool new Adaptec ARO-1130SA and AAA-130SA RAID controllers. However, to the best of my knowledge there are no drivers available yet. Has anyone else had luck getting one of these critters to run under Linux? If so, how'd you do it?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Erik Norvelle

Please try the IDE test above! (1)

Rick Deckard (14845) | more than 15 years ago | (#2038878)

Linux 2.1.132
Intel p255MMX (Yeah, it's overclocked, bus is running at 75Mhz.)
128Mb edo, bios default for 60ns in my award bios.
(simms are 4x 32Mb, 2x TI and 2x Panasonic)
Mobo: Asus T2P4 Cache: 512Kb
HDD's:
1.0 Gb samsung pio-4
2.5 Gb bigfoot pio-4
4.3 Gb bigfoot pio-4

On all the disc the outcome was the same,
I "summed" for 30 times each disc.

I also tried it on my nameserver,
Linux 2.0.36 + egcs patch
AMD 386DX40
Motherboard = unknown
8Mb "topless" (8x1Mb)
420 Mb Seagate
BIOS MEM Setting: as conservative as you can get
I tried it 20 times here, also no difference in
sum.

Weird shit happenin' in yer machine...
Try other ide cables, I had problems with that in the past, my hdd's used to spin down (a bit) click and the get up to normal speed again. Bad connectors caused the hdds to reset once in a while, this cost me some write and read errors including some badblocks! (my system tried to read/write while the heads were resetting -> the "click" sound).

Anyone here who has/had this problem too?

Mounting large volumes with Netware... (1)

z80 (103328) | more than 15 years ago | (#2038879)

.. in older versions of netware, i.e v3.x or v4.x it really takes ages. But take a look at Netware5 and the new file system... it rox!

/z80

Just on the brink of installing one (1)

Cam (110570) | more than 15 years ago | (#2038880)

At my work we have been toying with an 18GB RAID0 partition under Linux. We would like to perhaps stick even more disk on it, however I don't know of a good GPL backup package. Does anyone have any pointers here? I don't think that a simple dump or tar will cut it.

Cheers,

Cam.

[mailto]
Cameron_Hart@design.wnp.ac.nz
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...