Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Choosing the Right Cluster System

Cliff posted more than 14 years ago | from the choices-choices-choices dept.

Linux 106

ckotso asks: "So I've read here and there about linux clusters, and I am ready to set on creating one with some help of the educational institute I am working for. So far I've found out about Beowulf, SCI and MOSIX. I really wish I can get some help on this, since NT is making its way into the University gradually and I hate to see this. I want to give a cheap and robust alternative to this place, I simply have to change their minds! " Interested? There's more information inside.

"My questions are:

  1. Have I missed any other serious competitor in the cluster field?
  2. What are the pros and cons of these systems?
  3. Has anyone tried them all and written any report as to how they compete?
Thanks!"

Sorry! There are no comments related to the filter you selected.

What are you gonna use it for? (3)

Nicolas MONNET (4727) | more than 14 years ago | (#1490762)

Your question is a bit like: "So I keep reading about those 'automotive' thingies, and I wonder which one to buy: a 747, a Suzuki motorbine or a cruiseliner?"

--

What do you want to use it for? (2)

Bryan Andersen (16514) | more than 14 years ago | (#1490763)

I think we need to start out with this very important question. What do you want to use it for? Depending on how you want to use it will determine the style of cluster you need. Each style has it's plusses and minuses. Some are better for batch processing, other are better at handling large amounts of interactive work. What do you need?

Choose Mongolian! (0)

Anonymous Coward | more than 14 years ago | (#1490764)

I've found that the Mongolian Cluster's fsck is the best and fastest around! :-)

Depends on the purpose (1)

Anonymous Coward | more than 14 years ago | (#1490765)

If you want a highly parallel application engine, Beowulf is for you. Otherwise check out the linux-ha pages at http:/ /metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Ava ilability-HOWTO.html [unc.edu] .

Linux is the OS for parallel clusters, but for High Availability you might to better just to buy redundant hardware and run a single linux box for now (since the main reason you need NT clusters is 'cause you have to reboot the box for installs, memory problems and general BSOD problems). If you want to run a clustered database server or something for scalability, look to Sun right now. Maybe with the 2.4 release we'll be able to make some inroads there, but right now Linux has some architectural issues with sharing disk systems.

flamebait (0)

Anonymous Coward | more than 14 years ago | (#1490766)

Sounds like you want Windows clustering services.

Free with NT advanced Server, 32 $500 boxes later and you have a zero Admin web/application farm.(sic)

Oh and I'm told it's easy to find support personel who know how to install Service packs.

A few ideas... (5)

Noryungi (70322) | more than 14 years ago | (#1490767)

OK, here is my take on your question. Watch out, though, as I am not a Beowulf expert.

Here are some information you may consider before starting your own cluster:
  • Beowulf clusters have to be useful for the kind of scientific projects your university undertakes. Large science (physics, astronomy) projects, usually coded in Fortran and involving lots of calculations that can be computed in parallel, are ideal applications for them. Other applications may be a lot less interesting. A Beowulf cluster, depsite its power, is not always the perfect solution.
  • If your University is short on cash, you may want to investigate the "Stone Soup" [ornl.gov] cluster -- recycled old Pentiums and 486s can find a second lease on life in a Beowulf cluster. Pros: cheap. Cons: require a lot of labor and patience and is less powerful than Beowulf cluster using up-to-date CPUs and network connections.
  • To be truly effective, Beowulf clusters require at least a couple of very powerful servers and very advanced network hardware -- be sure to compute this into the total cost.
  • Beowulf clusters are not for the faint of heart. They require quite a lot of skills, as far as the network configuration, machine configuration and traffic optimization are concerned. It's not surprising the first Beowulf were born at NASA -- It did require rocket scientists to make them work! =) Once they are up and running, though, their performances are close or better than dedicated supercomputers -- for a small fraction of the price.
  • Another good side of Beowulf is the fail-safe possibilities and evolution capacities of such a machine. If a "node" goes down, the machine does not crash, and the node share of the task(s) can be assigned by the main server to another machine. If you need a more powerful machine, simply add a dozen new PCs to your mix and watch those MIPS/Gigaflops go up!
  • Finally, never forget the one argument that wins them all: price, price, price, price! Linux is free, Intel PCs are dirt cheap, all you need is a lot of space and a dedicated team to make it work. Oh, and lots of network cards & cables... =)

So, some positive factors, some negative ones. If you want to convince your University, always remind them that they can always count on the support of other universities and research centres the world over that are using this technology right now.

Good luck!

Depends on your needs. (5)

SEWilco (27983) | more than 14 years ago | (#1490768)

That's quite an assortment. What you want depends on your needs and on the characteristics of the choices. As for NT, the availability of source for many of these things will be nice for research activities.
  • Beowulf [nasa.gov] is one of a family of parallel programming API tools. Programs must use the API to accomplish parallel programming.
  • SCI [nicewww.cern.ch] is fast hardware with support for distributed shared memory, messaging, and data transfers. Again, if you don't use the API then no gain.
  • DIPC [cei.net] is distributed System V IPC. Programs which use the IPC API can be converted to DIPC easily, such as just by adding the DIPC flag to the IPC call.
  • MOSIX [jhu.edu] is the most general-purpose. Processes are scattered across a cluster automatically without having to modify the programs. No API needed other than usual Unix-level process use. Allows parallel execution of any program, although full use requires a parallel program design.

Sun HPC ClusterTools is a recent addition (1)

Anonymous Coward | more than 14 years ago | (#1490769)

I'm not associated with Sun and would rather not have another discussion on whether the SCSL is either a Good Thing or Evil Incarnate, but recently Sun announced the availability of their high performance clustering toolkit as source under the SCSL that has been used by many, many machines in the top 500. Pulling that announcement out of my mailbox, it looks like:

QUOTE ON

Sun is pleased to announce that on November 15th Sun HPC ClusterTools[tm] software was made available through Sun Community Source Licensing (SCSL).

We appreciate the interest you have already expressed in this offering and are excited about the opportunities that the Community Source Licensing model presents. Since our initial June announcement we have completed the work needed to make HPC ClusterTools available through SCSL, and we are very encouraged by the large number of individuals who have registered their interest at our Web site. We look forward to development by the community and wish to extend our thanks to all who will be joining this community effort.

You can access HPC ClusterTools SCSL source code via the Web at:
http://www.sun.com/hpc/communitysource [sun.com]

This site contain a licensing overview, FAQ's, technical information, download information, support information, and related links. This provides an excellent introduction to the Sun HPC ClusterTools SCSL product and should make it easy for you to download, build, and start developing with the product.

Again, we thank you for your interest and look forward to your active participation in the ongoing development of the community.

QUOTE OFF

All that stated, the most useful info is actually on the Technical Description page [sun.com] or on the previous ClusterTools 3.0 documentation. [sun.com] .

Beowulf (1)

Anonymous Coward | more than 14 years ago | (#1490770)

Sorry for the anonymous post but I just haven't logged in for a while. MOSIX only does resource management (balancing). The code to get data distributed over a MOSIX cluster is very, very simple. (A basic fork would do the trick). I have setup a MOSIX cluster in an afternoon and got my processes distributed automatically. There is some limitation but all in all it's very nice. If you want to do a quick demo try the MOSIX solution. If you have a lot of expertise the Beowulf seems to be the highest performance solution but has the drawback of needing expertise and high bandwidth connections between nodes that translate into cost. If you have a LAB with a few computers setup the MOSIX cluster and play with it it should not take to long. The Beowulf I would only really investigate if the problem domain cannot be subdevided into large chunks. Regards jacobus@oddesy.co.za

Depends on what you want (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490771)

If you are looking for the kind of clustering that Windoze NT does, then you want something like TurboCluster Server from TurboLinux, it is clustering for high availability and high throughput for web servers. TurboLinux [turbolinux.com]

If you need more general load balancing clustering for enterprise applications, look at Linas Vepstas's Linux Enterprise Computing pages at http://linas.org/linux/ [linas.org] , he has a section on clustering on that page.

If you need supercomputer numbercrunching or render-farm type clustering, then the Beowulf approach is what you want. Linas' pages also have a section on Beowulf type clustering.

How about an Apple cluster (3)

Mononoke (88668) | more than 14 years ago | (#1490772)

The Appleseed Mac cluster at UCLA [ucla.edu]

Click here [ucla.edu] to go directly to the project abstract (more details, less graphics.)


--

Re:flamebait (errrm...) (0)

Anonymous Coward | more than 14 years ago | (#1490773)

32 $500 boxes later...

You forgot...

  1. Each of the 32 machines requires a Windows NT Enterprise Edition license ($4500). Each of those requires extra connection licenses (since you only get about 25 connections in the standard pack)...
  2. Windows NT clustering only allows for failover/load distribution clustering over two machines.

You need to define what you need from your cluster (4)

Eivind Eklund (5161) | more than 14 years ago | (#1490774)

First: You need to define what you want out of your cluster - what kind of applications it is going to run, what sort of environment you want for them, how large a cluster you want to build, whether you want to do 'free cycle stealing', and whether you want high availability. A 'cluster' is much to vague a term for it to be possible to give much advice based on just that, or even further references.

Second: SCI is orthogonal to the other two technologies - it is a special hardware network technology (Scalable Coherent Interface), originally made to support distributed shared memory. You may be thinking of the software Dolphin Interconnect Solutions [dolphinics.com] provide with their SCI solutions, but as far as I know, that doesn't directly enter into the same space, either. Their web pages does certainly not indicate that it does, and my discussions with (one of?) their Linux developer(s) implied that it contained somewhat more (lock managers etc), but not in the same space. A technology that compete with SCI, though proprietary, is Myrinet [myri.com] . This has a longer history than SCI, and has been less plagued with problems than SCI (though SCI is supposedly quite stable now).

Third: There are a bunch of other technologies (some cross-platform, some single-platform) that compete in making it easy to build clusters. MOSIX and Beowulf are just two of them. If you give more details of what you want to achieve, I'll dig out references from my collection (made to support the development of FreeBSD-specific clustering improvements, so some types of references may be lacking, but I'll probably be able to come with at least some points to start for any wanted cluster workload.)

Eivind.

Re:Why is this question on Slashdot? (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490775)

On the other hand, administration/management type people who don't know what a Beowulf cluster does or how it is used are often involved in the decision making process for computers at universities/companies. Pro-Windows zealots of that type try to throw NT at everything too. And as we all know, Windows NT is not a drop in replacement for UNIX/Linux for every task either.

What is your problem with someone trying to do a little research into what alternatives are out there? What have the Windows zealots got to hide? If they can't ask questions like this in a forum like Slashdot, where can they ask?

As for making real-world decisions in the future, it is best to have the information on which to make a good decision. You seem to be implying that people should just roll over and go for the 'easy answer'.

Be careful of implementation strategies.. (2)

Matt Bridges (97198) | more than 14 years ago | (#1490776)

One VERY important thing to consider is how the clustering technology is implemented in a given solution. For example, Mosix does not require any rewrites to your code, except maybe making the program fork off more often so that the processes can become more distributed. Paralell Virtual Machine (PVM), which is one of the most popular methods of implementing Beowulf clusters, however, is a code library that must be integrated into each program you want to run on the cluster. So, depending on your (or your users) programming knowledge, you should be careful about what clustering architecture you use.

Re:Why is this question on Slashdot? (1)

caldroun (52920) | more than 14 years ago | (#1490777)

Admitting that you don't know is the acceptance of knowledge.

Clustering (1)

Hard_Code (49548) | more than 14 years ago | (#1490778)

Well, I can't find the article now, but didn't we have an interview a while ago with a guy (whose name I forget) that was doing some cool Linux clustering.../real/ pull-the-plug-and-it-fails-over clustering? I remember it was a bit different from Beowulf and it was really cool.

VMS (2)

Anonymous Coward | more than 14 years ago | (#1490779)

Do you want great clustering? Use VMS, nothing beats it for clustering. The UI sucks, its a pain to admin, but the clustering gives your tons of power. Most clustering uses custom software, you can just as easily right that custom software for VMS as you could for UNIX.

Re:A few ideas... (4)

Chalst (57653) | more than 14 years ago | (#1490780)

Good post, just one main thing to add. In a cluster system hwat you
can do is very much constrained by the way you glue the individual
nodes together. The 100Mbits per second throughput of a fast ethernet
connection may sound as if it gives you all the connectivity you need
but if a machine sends each 100bit opacket to a different machine, it
will slow down to a snails pace as it is not very fast at these kind
of switching tasks.

Good routing software can make up for this, as can careful forethought
about the network geometry. An ATM network is the best of all worlds,
but very expensive ... actually what happened about all those claims
that ATM routers would become as cheap as water? A last point: look
at the Parallel processing HOWTO [redhat.com] .

Shared Disks in Linux with GFS (1)

Anonymous Coward | more than 14 years ago | (#1490781)

Linux is better than Solaris at sharing disks because you can use GFS: www.globalfilesystem.org

Matt O'Keefe

U. of Minnesota

In Search of Clusters (3)

Zootlewurdle (84970) | more than 14 years ago | (#1490782)

I would whole heartedly recommend that anybody interested in clustering should read Greg Pfisters "In Search of Clusters" published by Prentice Hall - ISBN 0-13-899709-8. It is the seminal work in this area.

Other good resources include:
- IEEE Task Force on Cluster Computing
http://www.dgs.monash.edu.au/~rajkumar/tfcc/index. html
- Linux-HA http://linux-ha.org/
- some general links
http://www.tu-chemnitz.de/informatik/RA/cchp/index .html

There are more clustering products out than you can shake a stick at, and everybody seems have a different take what they mean by a cluster.

Does anyone have any information on what the Linux Cluster Cabal are up to ?

Probably the best thought out cluster solutions are OpenVMS Clusters and UnixWare NonStop Clusters.

Zootlewurdle

Re:Depends on your needs. (2)

Col. Klink (retired) (11632) | more than 14 years ago | (#1490783)

> Beowulf is one of a family of parallel programming API tools. Programs must use the API to accomplish parallel programming.

Not quite. A Beowulf class supercomputer is a high-performance network of workstations built from commodity hardware running on a free operating system like Linux.

Beowulf is *not* a parallel programming API and, in fact, there are several common APIs currently used on Beowulfs: the old standard PVM (Parallel Virtual Machine), the up and coming MPI (Message-Passing Interface), and less common AFAPI (Aggregate Function Application Program Interface).

Re:VMS (1)

Anonymous Coward | more than 14 years ago | (#1490784)

Do you want great clustering? Use VMS, nothing beats it for clustering. The UI sucks, its a pain to admin, but the clustering gives your tons of power. Most clustering uses custom software, you can just as easily right that custom software for VMS as you could for UNIX.

You are correct that nothing beats OpenVMS for true clustering [digital.com] . VMS was first with clustering, and it is still far ahead of the pack thanks to Galaxy technology [digital.com] . However, I have to disagree with a few of your points.

As far as UI is concerned, it has the same interfaces as *nix - X-Window and CLI, and the DCL CLI is far easier and more intuitive than any *nix shell. Also, I find OpenVMS systems far easier to manage than *nix, due both to its well thought-out design and to the power of clustering. Managing 200 clustered systems is no more difficult than managing two thanks to clustering.

I do concede that this is all rather subjective. It all depends upon what you are used to, and there aren't many people out there who are experienced at OpenVMS system management. I do maintain however that the learning curve for OpenVMS is not nearly as steep as that for *nix.

Of course, with OpenVMS you lose one of the factors the original poster was looking for - low cost. OpenVMS base licences are pricey to begin with, and cluster licences are stratospheric. That's why only the big boys like banks, credit unions, stock exchanges, and semiconductor fabs use OpenVMS. (Most microprocessor fab lines, including Intel's, are controlled by OpenVMS systems.) These are businesses where downtime costs serious money.

I also have a serious beef with people referring to Beowulf and its ilk as "clusters." It is not. Such technologies are examples of massively parallel distributed computing systems, not clusters. For crying out loud, they don't even have a distributed lock manager, or an equivalent to MSCP disk serving (NSF doesn't even remotely come close). And don't even get me started on the 2-node failover crap that Microsoft is calling NT Clustering...

Sincerely,
an OpenVMS System Manager who is severely irritated by all of these pretenders to the throne...

OpenVMS FAQ [digital.com]

Re:How about an Apple cluster (1)

HiredMan (5546) | more than 14 years ago | (#1490785)


On the Apple front I'd also look to the upcoming OS X - due early next year.

While Apple hasn't stated whether the the new OS will support it there have been several "rumors site" reports of OS X supporting Next's Autoclustering features over existing networks- perhaps even autoclustering over the internet. (Depending on data needs etc of course...)

If this interests you I'd look to old docs on Next's clustering features and if they sound appealing look to see if they're coming in the first release of OS X. As with all clustering issues YMMV....

=tkk

Try VA? (2)

Anonymous Coward | more than 14 years ago | (#1490786)

I'm kinda surprised I didn't see this as the First Post, but what about giving a call to VA?

They do clusters, and they'll slap together an entire system for you.

Mind you, I haven't tried their systems (umm, clusters are expensive ya know), but it seems to me that going with VA would get you pretty damn close to a turnkey solution.

Give 'em a call. It couldn't hurt, anyway.

Linux clustering is not cheap! (1)

Anonymous Coward | more than 14 years ago | (#1490787)

Sure you can get Intel boxes for a dime a dozen...sure linux is free. Sure Windows costs money to buy. BUT they both require something that costs alot: service.

Beowulf clusters are potential security risks if not properly administered and kept patched up to date. Installation and maintanance require *good* sysadmins. What's the average salary of a sysadmin? Makes hardware costs look like peanuts.

I have to beleive that your "educational institution" is looking for some serious numerical problem (certainly doubt that it is for database management...). Look at what you currently have in house for expertise. Then do some homework on MPI or PVM, both standards/frameworks for doing parallel computations. One of those 2 will fit your needs. There are ports to almost every known OS (haven't seen one for the Amiga, sorry :), and can use hetero/homegeneous CPUS, etc.

Coupla caveats (2)

costas (38724) | more than 14 years ago | (#1490788)

First of all, as some other posts said, don't go off building a Beowulf (that name has been overused IMHO) unless you know what you are building it for: clusters are built around the software you're gonna run on them, not the other way around.

If you plan to port MPP applications from a Cray or an Origin 2K, a Beowulf with and MPI port will most likely do what you want. If you are interested in a HA cluster, then we're really not talking Beowulfs; take a look at TurboLinux's TurboCluster distro.

If you want to throw lotsa CPU power to a problem that's not already MPP'd, a port to Mosix might worth your while, but investigate cautiously: Mosix does a good job (I am told) of process migration, but it doesnt migrate sockets yet, so it may effectively double your network bandwidth --this may not be a problem if your interprocess comm is minimal, or it might be a show-stopper. Do consider a port to MPI in this case: MPI is an industry standard and it works almost as well on a Cray as it does on a Beowulf.

Network communication is not as big a deal as it used to be: besides SCI, there's Myrinet (with OSS drivers and software too), Gigabit ethernet (also OSS drivers from some companies) and they all more or less work with Linux. Or you can go with the original Beowulf solution and bond Ethernet channels (i.e. make 2 NICs look and feel like 1 to the OS, almost doubling your capacity). It all depends on your application's inter-process communication requirements.

If you do decide on a Beowulf, heed these words: be carefule of SMP machines, at least this early in the game. Linux SMP is deficient at best --hopefully 2.4.x will solve it, but I wouldn't hold my breath. If you decide on an SMP machine, stay away from Xeon's as the extra cost will be useless right now --because of SMP problems with Linux, you might as well have a regular Pentium in there, or even some Celerons (hey, it'll buy better networking equipment ;-). Also, do not plan to rely on NFS: Linux NFS is spotty when stressed by some high-bandwidth processes.

I guess the best advice would be: Don't go spending all your NSF money right away. Get 2-3 machines with some Fast Ethernet, set the thing up, port your software, make sure it works as well as you expected it to, THEN go spend the big $$$ on SCI, more nodes, etc. The biggest advantage of Beowulfs is *Freedom* as in flexibility ;-)...

As the old maps said: "Beware: Monsters Here" ;-)...


engineers never lie; we just approximate the truth.

Beowulf Is Still Stuck on 2.0.x (1)

randombit (87792) | more than 14 years ago | (#1490789)

One thing to keep in mind is that, last I checked, Beowulf clusters can still only be used on 2.0.x kernels, which implies reduced performance, security, and stability compared to a system such as MOSIX which will run on 2.2.x (or future 2.4.x) kernels. If this has changed, please let me know, but it was accurate as of September, and I haven't seen any new announcements on freshmeat about it, so I'm assuming it hasn't changed.

I admin a small (8 node) cluster, and so far my options are stay at Redhat 5.2/2.0.36 until Beowulf is ported to 2.2 (with all the problems of finding glibc 2.0.x RPMs now, etc), or install Redhat 6.1 and then reinstall 2.0.36. Neither of these seems particularly optimal.

Re:beowolf!!! (0)

Anonymous Coward | more than 14 years ago | (#1490790)

Please be more respectfull to our god and savior Linus....ohhh crap... I just jizzed down my pants from saying the name... crap...

Re:Beowulf Is Still Stuck on 2.0.x (1)

ericski (20503) | more than 14 years ago | (#1490791)

Um, I'm not sure what you're talking about. My "cluster" consists of two 2.2 nodes. I d/l'd and compiled PVM 3.4 and it seems to work fine. Our school's 8 node, 16 CPU cluster is only on a 2.0 kernel because of a network driver, not because of any of the other software. IIRC, the plan is upgrade next semester.

Is Billy G your name? (0)

Anonymous Coward | more than 14 years ago | (#1490792)

Do you yahoo?

Beowulf is not stuck on 2.0.x (2)

Mr Donkey (83304) | more than 14 years ago | (#1490793)

No, Beowulf does work just fine with 2.2.x kernels

Lobos2 at NIH is a beowulf cluster with 100 compute nodes that were recently updated to 2.2.13

from my understanding... (0)

Anonymous Coward | more than 14 years ago | (#1490794)

(mainly from my moderation limit set to -1), I believe there's only one way to do things, and that's via a beowulf cluster. but only if you're an 3l337 h4x0r and you get f1r57 p057 on \.

$4500 OS on a $500 machine?!!! (0)

Anonymous Coward | more than 14 years ago | (#1490795)

yah, sure bill. I'd rather have 9 more machines with a FREE os.

Three words: NT doesn't scale (0)

Anonymous Coward | more than 14 years ago | (#1490796)

Almost no one tries to build NT clusters. One cluster vendor reported that only about 10% of their clients specify NT. The reason is simple - you are simply throwing your money away, as NT doesn't scale. That alone should be sufficient to make the case against NT clustering.

Condor (1)

foop (30304) | more than 14 years ago | (#1490797)

Also notable is Condor (similar to Mosix).

http://www.cs.wisc.edu/condor/

I suspect you just want lots of general purpose
compute power available to many users.

Re:flamebait (a correction) (1)

rancor (18260) | more than 14 years ago | (#1490798)

One of the other posts already pointed out that NT Enterprise Edition only supports 2 node fail-over. However, I believe that [he] is talking about Windows load balancing service. This used to be "Convoy Cluster Server". This is a pretty slick little piece of software, but it's not 100% transparent. 1. I believe that it supports up to 16 nodes in a cluster. 2. It distributes load based on which machine is least loaded in the cluster getting the next request. 3. You can't use things like ASP sessions because they don't work in a distributed environment. Jim

Re:Beowulf Is Still Stuck on 2.0.x (0)

Anonymous Coward | more than 14 years ago | (#1490799)

Uh, no.

It all depends on your application. PVM and MPI both work fine with 2.2.

If you're using some whack ass network interconnect that doesn't have 2.2 support then maybe you can say "my Beowulf is still stuck on 2.0.x" but surely you can't make a blanketting statement like "Beowulf is Still Stuck on 2.0.x"

It's also important to note that, contrary to somewhat popular belief, people have ported the channel bonding patch (one of the few worthwhile kernel patches that apply to Beowulf class computing) to 2.2 as well.

Really, I'm tired of people saying "Beowulf won't work with this" (like Beowulf is the name of a software package you install -- rpm -ivh beowulf-2.2-i386.rpm) because in reality all they're doing is running a bunch of Linux boxes with pvm or mpi on them. It's not that hard to compile pvm/mpi.

clusters... (1)

ray j (55756) | more than 14 years ago | (#1490800)

Why don't you go to the website for VA Linux which is building clusters. Maybe you can learn something there.

security on a cluster (1)

Psychofreak (17440) | more than 14 years ago | (#1490801)

There is no reason a cluster needs to be connected to the rest of the world. It would actually be advantageous to have the cluster isolated from the world to prevent it from talking to the internet/intranet and using it's time in a meaningess manner because a packet got misrouted. This can be accomplished by pulling the plug to the world, or by having the plug to the world running through a single high security firewall, that is if outside access is required.

Now the problem is physical security, which for all but the determined, is trivial: Lock the door!

Re:VMS (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490802)

DCL CLI is far easier and more intuitive than any *nix shell

Woof. I totally disagree on that one. I always hated DCL and found navigating the VMS file systems and directory structure maddening. The VMS file versioning was also inconvenient. I really hated VMS's error messages, which were always nearly indecipherable and full of %%%% signs. Blech.

Of course, with OpenVMS you lose one of the factors the original poster was looking for - low cost.

That is an understatement. Commercial UNIX looks like a bargain compared to OpenVMS. Not to mention the cost of the support and maintenance contracts and the cost of other software for OpenVMS.

Re:Coupla caveats (1)

Psychofreak (17440) | more than 14 years ago | (#1490803)

SMP isn't too bad in the 2.2.x kernel. I am running it on my ppro system, 2 processors, 128MB ram. It scales very well with 2 processors, especially compared to the 2.0.x kernel. I almost didn't recognize my machine when I upgraded!

I will agree that celerons are a better way to go. I have built a few celeron computers for less that $500 for friends (their money) that kick the pants off of everything I own. My system, a dual PPro 200/256k with 128MB ram cost just over $1000. My friends system, Celeron 400, 64 MB ram is actually slightly faster in Linux. Bus speeds are the same, so it's the chip, especially with the deficency in ram on her machine.

Now look at annother friend of mine. His machine is a pentiumii 400, on the 100MHZ bus, with 128MB ram. This machine performs negligably faster than the celeron machine.

My apologies, I don't have numbers for you at this time. I probably won't be able to get numbers in a reasonable ammount of time either.

Re:Condor (1)

MisterBad (40316) | more than 14 years ago | (#1490804)

This looks good, but it's not Open Source.

So, screw it.

Re:Depends on the purpose (2)

maney (121165) | more than 14 years ago | (#1490805)

I would definitely agree with this. The Sun Enterprise Cluster software is the way to go if you are looking for 2 to 4 node clusters for HA application failover or for something like a 2 node parallel server for a database. I am fairly certain that it will run on Solaris x86 too, though if you really need clustering (and the inherent concept of "no single points of failure"), you probably should be looking at the more dependable, and redundant, Sparc platforms.

However, if you are looking for a very large parallel processing "machine" reminiscent of the old Sequent machines for distributed parallel applications, it seems to me that Beowulf would be a better solution.

It really does depend on what you are looking for.

-Solaris/Sun Cluster Administrator

Re:Three words: NT doesn't scale (1)

SpamapS (70953) | more than 14 years ago | (#1490806)

Almost no one tries to build NT clusters. One cluster vendor reported that only about 10% of their clients specify NT. The reason is simple - you are simply throwing your money away, as NT doesn't scale. That alone should be sufficient to make the case against NT clustering.

I'm as anti-NT as the next Slash-Dotter, but lets not go as low as MS does. That comment is pure FUD. NT Will definitely "scale". The Mindcraft benchmarks, with all their bias, effectively showed that.

The problem is that while the first box that crashed is still re-booting, the fail-over box crashes as well. Even their HA clusters are only 99% available... pathetic.

Re:Why is this question on Slashdot? (0)

Anonymous Coward | more than 14 years ago | (#1490807)

Admitting that you don't know is the acceptance of knowledge.

True, but from the tone of the original question it really doesn't appear that this person should be responsible for making the decision. He's acting like the Slashdot poster child "they want to use NT but I know Linux is better".

And why hasn't anyone mentioned PVM? It works to cluster/parallel heterogeneous networks (thats mixed unix/win32 for the less educated among you). http://www.epm.ornl.gov/pvm/pvm_home.html [ornl.gov]

Re:Beowulf Is Still Stuck on 2.0.x (1)

randombit (87792) | more than 14 years ago | (#1490808)

OK, thanks for the replies. I had seen this on NASA's Beowulf Project web page somewhere, so I assumed that was that. I'll check it out.

Re:Beowulf Is Still Stuck on 2.0.x (0)

Anonymous Coward | more than 14 years ago | (#1490809)

He said Beowulf, you said you're using PVM. Did you read the post?

It will be a cold day in hell before a positive Microsoft or a negative Linux article appears on Slashdot.

Re:Three words: NT doesn't scale (0)

Anonymous Coward | more than 14 years ago | (#1490810)

Bullshit. Do you even know what scaling is or means? Or are you simply repeating the drivel you've read on Slashdot?

http://www.unisys.co m/events/comdex99/presentations/uis-ms.asp [unisys.com] "Aberdeen finds that Unisys' new Windows 2000 solution clearly and convincingly demonstrates Microsoft Windows 2000's ability to handle the scalability, availability, manageability, and security needs of most - if not all - of today's most demanding enterprise-data-center-type IS environments." In addition to this NT4 has been handling some of the largest web sites on the Internet for quite some time, can Unix do the same thing? Of course, but saying that NT doesn't scale is just showing your ignorance.

Links! (0)

Anonymous Coward | more than 14 years ago | (#1490811)

Since they weren't mentioned, I thought I'd mention them here: Mosix [huji.ac.il] , Beowulf [beowulf.org] , and I couldn't find SCI.

Beowulf Questions (1)

Anonymous Coward | more than 14 years ago | (#1490812)

We [xtreme-machines.com] build turn-key Beowulf systems for a living. You may want to consult:

Beowulf.org [beowulf.org]

Beowulf FAQ [dnaco.net]

Beowulf Underground [beowulf-underground.org]

Beowulf Quick Start [xtreme-machines.com]

In addition, you may want to contact us directly about helping you convince your management that turn-key and supportable Linux Beowulf Clusters are available and do useful things. Take a look at our success stories [xtreme-machines.com] .

Doug Eadline,
deadline@plogic.com
Paralogic, Inc. [plogic.com]

Novell does Clusters! (0)

Anonymous Coward | more than 14 years ago | (#1490813)

Novell has a very nice clustering solution for NetWare 5. Check out http://www.novell.com/products/clusters/ [novell.com]

Re:Coupla caveats (0)

Anonymous Coward | more than 14 years ago | (#1490814)

costas is correct about MOSIX. It doesn't migrate sockets yet. I'm 95% sure about this and 100% sure you don't want to double your bandwidth.

Re:Novell does Clusters! (0)

Anonymous Coward | more than 14 years ago | (#1490815)

Novell is also very expensive.

Beowulf Administration? (0)

Anonymous Coward | more than 14 years ago | (#1490816)

Check this out, some actaully is offering a course on Beowulf Administration [xtreme-machines.com] .

what about... (0)

Anonymous Coward | more than 14 years ago | (#1490817)

G4 clusters.... =)
what's that? clusters with supercomputers? count me in!

Novell Cluster (1)

Ropati (111673) | more than 14 years ago | (#1490818)

As all the other posts note, the term cluster is too vague. If your plant needs clustering for High Availability and you're using Netware for file and print sharing, then you might want to check out Novell's Netware Clustering Services.

The Novell cluster works via NDS. Novell has designed a new directory object called a cluster. Servers in the cluster are mapped via NDS. The cluster object runs on any one of the servers in the cluster.

Data is shared via a Fibre Channel SAN. The cluster knows who has what LUN's mounted, what applications are running, and where they are in cache.

A LAN link is required for heartbeats and there is a special white board in the shared storage to monitor server stats.

If one of the servers goes down, then the cluster migrates the applications that were running on the server to the other servers on the system. The whole process takes less than a minute. The NCS also has some elaborate algorithms to decide if a server is down, and applies a poison pill to insure non-duplicity of processes.

It is not for the faint of heart to set up but once it is running, the HA is great. If, via the Novell management software, you take the server down, the applications migrate without a hitch. Maintenance on the machines (new NIC cards, drives, or processors) can be accomplished without any downtime. Backups still require server processing but they can be done without impinging on the LAN. The heartbeat also adds very little overhead to the LAN.

MOSIX vs. Beowolf (0)

Anonymous Coward | more than 14 years ago | (#1490819)

MOSIX is implemented in the Linux kernel. It uses the most up to date information about the availability of the cluster-wide resorces, e.g CPU loads or free memory. Thus if you prefer to use PVM/MPI, then after the initial assignment of the processes to machines (by PVM/MPI), MOSIX will migrate processes to the best available machines during the execution. Beowolf use PVM/MPI for initial (static) assignment of processes, but is unable to take advantage of new resources, e.g. faster machine, that might become available during the execution of your processes. Bottom line: if you run MPP mode (one process per node) and all your machines are the same speed, then you'll get the same performance with MOSIX or Beowolf. However, if you have a time-sharing system, or machines with different speeds (or different amounts of memory), MOSIX will take this into account and thus will outperform Beowolf.

Re:Why is this question on Slashdot? (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490820)

they want to use NT but I know Linux is better

Perhaps it is more like "They want to use NT, but I don't want to get stuck supporting it". The other question is, should pointy haired bosses who won't be the ones using the machines be the ones making the decisions on platforms either? I think the guy asking the question is a sysadmin. From my experience, he should certainly have some input on what platform is chosen, since it will directly affect his job. Contrary to how Microsoft markets it, in my experience, NT requires not only more service and administration because it is less reliable, but it is less convenient and more complex to administer, especially as the number of servers increases. GUI based admin tools really become more of a hindrance than an advantage when you need to perform the same functions on dozens or hundreds of machines, and you start wishing for a nice, fast command line interface and/or scripting alternatives.

a ready cluster (1)

chguy (96364) | more than 14 years ago | (#1490821)

Please go to see www.turbo-linux.com [turbo-linux.com] . They have set up many systems. They have the lead in Japan.

Shared Memory vs. Message-passing (3)

jclip (113040) | more than 14 years ago | (#1490822)

(background: I've done a lot of C coding using MPI on large Beowulf clusters at Caltech. I've implemented the same codes on Sun and SGI shared memory machines as well as DEC and RS/6000 clusters.)

If you're looking to cluster for high performance, you need to decide which HPC paradigm you're going to go with and choose your clustering based on that.

Distributed shared memory (DSM) is great on the programmer side. You've got a big steaming chunk of memory shared among processors, and a bunch of parallel threads/processes (depending on OS) acting on that chunk. DSM makes a lot of sense for database servers, and is the prevalent HPC solution among the big server companies (Sun, SGI, etc.) since multithreaded code runs dandy without any modifications. MOSIX implements DSM.

The downside: vast memory bandwidth required for sharing and high overhead. In an educational environment (and IMHO), DSM is a Very Bad Thing, since programming DSM teaches you nothing about actually using parallelism--it's just like working in any other multithreaded environment.

Students are better served by learning on a message-passing system, which is what Beowulf clusters are. You have a bunch of computers and a way to make them talk to one another (PVM or MPI)--"now implement some algorithms!" MP machines are [given equal-quality implementations, a big given] generally faster and more scalable than DSM machines, as well as being more "pure". Optimizing DSM programs is much easier if you have MP experience.

Downside: MP is a pain to program and even more of a pain to debug. But students could use more suffering, right? Language support is a little iffier for MP, too, with Fortran and C being prevalent.

Here is a little guide to Beowulf systems. (2)

exa (27197) | more than 14 years ago | (#1490823)

You should be first reading the Beowulf FAQ. (There are links in prev. posts), but my take for the /. readers here.

First of all, distributed/parallel computing is not an area in which there is consensus on hardware/software systems. Though, there is an accelerating trend towards building supercomputers comprised of clusters of commodity components. The fastest computer on Earth, last time that was ASCI RED (and I hear it's gotten an upgrade!), is built as such. That is vector computers and ultra-expensive parallel computers are being phased out in favor of cluster systems.

A cluster is basically a number of computers connected through a decent interconnection network. On each compute node a traditional OS runs. On top of the OS, a software interface that implements either a message-passing environment or a shared-memory environment sits. In some cases, it may be desirable to modify the OS itself for a better single-system image (such as MOSIX patches). Thus, a cluster is nothing but a number of connected machines. With proper software, it might be possible to run huge clusters on the internet (Check the Globus project!)

In the design of this system, a couple of parameters must obviously be determined.
1) Number of compute nodes
2) Processor/Memory configuration of each compute node.
3) The interconnection network:
a) Network interface of each compute node
b) The switch that is used to connect each machine.

An incorrect estimation of these parameters may give rise to a very sub-optimal hardware configuration. That is the network must be fasst enough to account for the messages being sent, the memory must be JUST large enought to support the granularity of processing, etc. I advise you to read some introductory text on parallel programming before embarking on a cluster effort.

Notice that there is no SINGLE piece of software that will magically parallelize and distribute your applications gracefully. You will find that explicitly parallel or distributed applications will run much more efficiently. While you can get a global process implementation, or even shared-mem implementations with some software, the real "speedup" is going to be observed for explicitly parallel programs, for instance linear algebra libraries designed to run on message passing architectures.

Finally, let me give you a list of component we're in the process of acquiring.

32 PII-450, 128 MByte compute nodes with 3COM Fast Etherlink 100Base-TX
1 master node, a plain PIII-450, Gigabit ethernet and another NIC to connect to internet, some megs of disk
1 devel workstation, plain PIII-450...
3COM SuperStack II 3900 36-port 100Base-TX, 1 1000Base-SX managed switch
32-port (hopefully) multi-port serial board (you use this for diagnostics)

The software will simply be the stable Debian release, on the compute nodes not much software will run. Of course the lam package will be resident since it is a pretty good MPI implementation. The server will be uplinked to the switch with 1000Base-SX so that it can act as a synchronizing source (you know, beowulf master node, file server, etc.) The multi-port serial board provides a shell over the serial cables to each compute nodes, that gives you a good chance for repair when a node goes down.

Keep clustering,


Re:VMS (1)

BandSaw (104086) | more than 14 years ago | (#1490824)

I always hated DCL and found navigating the VMS file systems and directory structure
maddening.

I don't mean to flame you, and my reply is based on 7 years of working at DEC in a non-software engineering job. Plus, I took computer programming when the computer lab at MIT used a VAX 11/780 :^)

I found DCL to be very consitent in the format of all commands. If you want to see a value or setting you use SHOW.. if you want to set one, you use SET. If you want to find the file foo in all of your subdirectories, you say DIR[...]foo.txt . If you want to move to a specifice directory, you use SET DEF [whereiamnow.whereiwanttobe]

The documentation is excellent, even if it spans something like 13 3" thick 3ring binders

The VMS file versioning was also inconvenient

Why? whenever you save a file, its version number goes up by 1. So when you say SEDT LOGIN.COM you by default open the most recent version of LOGIN.COM, and when you save your changes a new file is written called LOGIN.COM;2

You don't need to worry about the version # unless you want to look at your old versions (all are saved by default).

Oh, and if for some reason you unplug your microvax accidentaly while editing the file, (the only way a non-sysadmin user can make it crash) you can recover from the journal file which is created, by default, with the command EDT/RECOVER LOGIN.COM, which has captured all the keystrokes of you editing session, and will redo them at warp speed on the screen while you watch.



I really hated VMS's error messages, which were always nearly
indecipherable and full of %%%% signs. Blech.

I found most of these self-explanatory, but they can be looked up in the documentation

Sometimes I wonder, with a bemused grin: What if Ken Olsen had said "The future of computing is cheap, cheap, cheap hardware with a free operating system. Design a cheap vax for home users. Bypass and ignore the DEC standards which add millions of dollars of costs to our hardware. Don't send this product to the Maynard testing lab for 10g operation shock and vibration, don't worry about the noise levels, and don't test it at 50 deg. C. Oh, and put a version of VMS on it which is included free with the Hardware"

Oh I know! I'd be living in the Saychelles on the proceeds of my DEC stock ;^)

Re:Linux clustering is not cheap! (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490825)

Sure you can get Intel boxes for a dime a dozen...

Hardware costs are the same or slightly lower for Linux, because Linux has lower hardware requirements.

sure linux is free.

Linux development tools are also much cheaper than those for Windows, and this is a direct issue because the kind of apps that are run on a Beowulf type machine are typically homegrown.

Sure Windows costs money to buy.

Windows costs a lot more than you might even think by the time you add in all the add-on software and development tools you would need to build a cluster and development environment. We are talking 10's of thousands of dollars difference for a few dozen nodes.

BUT they both require something that costs alot: service.

And they both require it. That is true of any type of computer system. Nothing I've seen would tell me that there is any reason to expect that Windows would cost less for service, support and administration than Linux, in fact from what I've seen, and despite Microsoft's marketing, it is the opposite.

Beowulf clusters are potential security risks if not properly administered and kept patched up to date.

The same thing is true of Windows boxes. The same thing is true of any type of system. Actually most Beowulf boxes are hidden away behind firewalls and not something that is accessable to every Joe random student and outsider, so you are overstating the relative security risk compared to any other computer system.

Installation and maintanance require *good* sysadmins. What's the average salary of a sysadmin?

It's not insignificant, but it is cheaper around most major universities (especially since they have the advantages of indentured servants... err... coop and graduate students and generally depressed markets for tech staff). And from what I've seen in the university environments, it is a lot easier to find people with skills in UNIX/Linux administration than it is to find MSCEs. Also in my experience UNIX/Linux not only require less administration work because they are more reliable, it is easier for a smaller number of admins to administrate a larger number of *nix boxes than Windows.

Look at what you currently have in house for expertise.

It sounds like this guy is one of the in-house sysadmins. Most universities have more in-house expertise in UNIX/Linux for large scale implementations, which is one of the big reasons that all of the big research orgs are using Linux for their Beowulf clusters.

Check out Appleseed. (2)

-cman- (94138) | more than 14 years ago | (#1490826)

Just throwing out a new Appleseed thread because I think it deserves it. I wrote an article on the clustering project at the UCLA Physics department nearly a year ago. They had achieved spectacular results using 300HHz Beige G3's and 100BaseT. Very simple setup using off-the-shelf hardware. I do not know if they have looked into using Firewire or Gigabit Ethernet and/or the new G4's yet. But, I would expect that the performance from a 8 or 16 box cluster of G4's with Gigibit Ethernet would pretty much blow away a beowulf cluster in both the performance and price categories. As for NT... don't make me laugh. Doctor Viktor Decyk is the project coordinator and would be glad to speak to you I am sure. The project website has been posted in a previous thread. Connor W. Anderson IT Manager Department of Radiology Univerisity of Chicago

You missed one KAOS/PAPERS (2)

Grey (14278) | more than 14 years ago | (#1490827)

This is different from Beowulf cluster in that you have two networks going one low-latency/low-bandwidth and one high-latency/high-bandwidth. This is what Avalon [lanl.gov] did, though with an out of the box Parallel port network, rather than a parallel port network optimized for message passing parallel processing.

Depending on what you need the cluster for you should adapt network at your clustering technology.

KAOS [uky.edu]
PAPERS [purdue.edu]

Re:VMS (0)

Anonymous Coward | more than 14 years ago | (#1490828)

DCL CLI is far easier and more intuitive than any *nix shell
Woof. I totally disagree on that one. I always hated DCL and found navigating the VMS file systems and directory structure maddening. The VMS file versioning was also inconvenient. I really hated VMS's error messages, which were always nearly indecipherable and full of %%%% signs. Blech.

I agree that file system navigation is cumbersome under DCL, but you are dead wrong about file versioning. I regard this as the most useful OS feature ever invented. I truly miss it when using other operating systems.

Error messages are also very descriptive. After getting an error, simply type HELP/MESSAGE to receive a full, plain English explanation of the error and what action to take to correct it (OpenVMS v6.2 or later, I believe).

And there is only one % in any DCL error message:
$ dir/fill
%DCL-W-IVQUAL, unrecognized qualifier - check validity, spelling, and placement
\FILL\

Of course, with OpenVMS you lose one of the factors the original poster was looking for - low cost.
That is an understatement. Commercial UNIX looks like a bargain compared to OpenVMS. Not to mention the cost of the support and maintenance contracts and the cost of other software for OpenVMS.

Absolutely no arguement here. DEC, and now Compaq, completely price OpenVMS out of small and medium sized shops (the sole exception being the free Hobbyist License.)

ALINKA releases "instant beowulf" software GPL'd (2)

Brinx (29583) | more than 14 years ago | (#1490829)

My questions are:
1) Have I missed any other serious competitor in the cluster field?

You sure want to try the newly released GPL program ALINKA LCM [alinka.com] to do the management and configuration of Linux beowulf-type clusters.
Once installed, the software can automatically setup from the network a beowulf cluster (with or without using the hard disks) within 2 minutes. With this software, it is dead easy to build an "instant beowulf" cluster...
The current version is 1.1.3 and can be considered as a beta release, although some sites use ALINKA LCM v1.1.3 in production. If you wish to know more about ALINKA LCM, you can read the on-line documentation here [alinka.com] . ALINKA provides software tools for commodity clusters running Linux since August 1999. Customers of ALINKA include the French CEA (Center of Atomic Energy) and public research laboratories.
The ALINKA company provides commercial support for ALINKA LCM and also sells a GUI for ALINKA LCM, called ALINKA RAISIN [alinka.com] , running within a web browser.
You can check http://www.alinka.com [alinka.com] for more information on this new killer software !

Re:Coupla caveats (1)

costas (38724) | more than 14 years ago | (#1490830)

Maybe I should clarify what I meant by bad SMP performance: of course, an SMP machine will be faster than a singe-CPU machine, but: a) you're not gonna get anywhere near (n-1)*100% (n=number of processors) speed increase. b) Not all utilities, helper progs, etc. know about or take into consideration SMP. I could live with these problems...

But the real kicker is this: No CPU affinity. I.e. the kernel doesn't stick a job to one CPU and let it crunch away there; instead, it dynamically load-balances all jobs around all available CPUs. This may not be too bad on a single-user desktop machine, but for a Beowulf cluster is horrendous because you loose all the advantages of CPU cache. Especially if, like yours truly, you are working with Xeons. They might as well be Pentiums, or for that matter Celerons.

Of course, you can always wait for 2.4.x or just use single-CPU machines, if you have the space, air conditioning and patience ;-)...

engineers never lie; we just approximate the truth.

Now all i wanna try is.. (0)

yannz (84387) | more than 14 years ago | (#1490831)

quake on a really beefy cluster :D
or maybe pingus?


Imagine the droolpits.

Re:Three words: NT doesn't scale (0)

Anonymous Coward | more than 14 years ago | (#1490832)

Um, before you guys flame this guy...

DUH, Wolfpack (NT's clustering) is only a failover solution, if one machine dies, the other takes over. It does not do processing on multiple machines, or do parallel processing.

What he meant was that NT doesn't scale well with multiple machines, not with one machine with SMP.

IMHO if you want a REAL cluster, go with Tru64 or VMS. Beowulf is good for MMP, and most other clusters are just there for failover and not for scalability.

Re:VMS (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490833)

but you are dead wrong about file versioning. I regard this as the most useful OS feature ever invented. I truly miss it when using other operating systems.

I don't like versioning that happens so automatically. Having version control available when I want it is a good thing, having it forced on me all the time can be annoying. It is probably a matter of personal preference, but I just didn't like the way that it was implemented in VMS. Something like RCS/CVS/PVCS is more what I am comfortable with. To add insult to injury, it tended to cause directories not only to fill up with extraneous garbage that required time to clean out, it also tended to screw people over on disk quotas if they were negligent in keeping things tidy.

routing? (2)

halbritt (30189) | more than 14 years ago | (#1490834)

ATM is not necessarilly the best connectivity solution for this particular application, nor is routing. ATM is a cell-based OSI layer 2 technology that breaks each cell down into a 53 bytes. On an OC-3 (155Mbps) one can incur quite a bit of overhead for LAN-based traffic so you won't necessarilly see your full 155Mbps of traffic. ATM works well for native ATM devices that require real QoS and are able to manage the setup and tear down of the various types of circuits that are available in an ATM cloud. IP in all it's forms have to be adapted to ATM in one of a couple of ways, LAN emulation being one of the most popular. Setting up permanent point to point PVCs is also another way to do it.

One of the qualitative differences between clustering and MP is that in a clustering environment one has to be able to write applications that can be made parallel and are capable of taking advantage of the massive amounts of CPU time available while not suffering from the relatively small amounts of memory bandwidth available. Most ACs don't understand this, so we get comments like "I want to run quake on a Beowulf". It follows that increasing the amount of bandwidth between machines will make the clustering environment less restrictive from a memory bandwidth point-of-view. One never wants to "route" in a clustered environment. Devices that make forwarding decisions at OSI Layer 3 are all inherently slower than devices that make forwarding decisions at OSI Layer 2. There are L3 switches that forward packets at wire speed, but these are expensive and pointless to use in this type of environment, as it's not needed. Basically, one would want to put their cluster into a single subnet (and vlan) in a completely switched environment and endeavour to minimize broadcast traffic. At a minimum I would recommend a completely switched 100Mbps environment for a low-cost cluster.

It should be noted though, that all 100BaseTX switches are *not* non-blocking. I wouldn't consider using anything that isn't. If one requires additional bandwidth for a particular type of application there are a couple of other options. Gigabit ethernet will provide approximately 3 times the bandwidth of fast ethernet in a Linux machine, mainly being limited by the throughput of the stack. One also may want to consider HIPPI if the need is there. HIPPI is very very expensive, and to the best of my knowledge only available from a handful of vendors. One of those being Essential/ODS (my previous employer). I believe that there is a driver for Linux for the Essential/ODS HIPPI NIC, though I'm not certain what the throughput is. HIPPI is being used by the big boys, Sandia National Labs, Nasa Ames, Lawrence Livermore, mostly in SGI environments. Beyond HIPPI, there is something called GSN (Gigabyte Switch Network) a 6.4Gbps environment being adopted as the next level of bandwidth by both ODS and SGI. ODS filled the first order for GSN switches sometime in January of 1999 I believe. I'm not even sure there is a NIC available for the type of hardware that's supported by Linux. For info on HIPPI and GSN stuff check out ODS' web site [ods.com] I would recommend HIPPI, then Gigabit Ethernet for a high performance cluster. The Lanblazer from ODS and the Cajunswitch 550 (the same switch, one is OEMed from the other)for gigabit ethernet or fast ethernet. In addition, there are products from Extreme Networks, Fore Systems (Berkeley Systems Gig E stuff), HP and I'm sure there are a few others. Most of the stuff from the top 3 (Cisco, Nortel, 3Com) are not non-blocking, one should do the research before making a purchase.

64 bit woes (1)

trancemonkey (106873) | more than 14 years ago | (#1490835)

So another problem is that most of these clusters are 32 bit. I am constantly banging my head against that fact when I try to port codes from big iron to clusters. Makes me wanna kill myself sometimes...trying to figure out how to get a multi-thousand line 64-bit fortran code to run on a 32-bit machine with half-assed compilers.

Re:A few ideas... (0)

Anonymous Coward | more than 14 years ago | (#1490836)

last year we set up a 16 node hypercube. We had no powerful servers (yeah disk caching) and had no expensize networking hardware. Each machine had 5 NIC cards in it. Who needs that extra IDE IRQ anyway. 4 100baseT PCI and 1 ISA 10BaseT to the outside world. Each machine was cost about 350 apeice. We got very good speedup out of this configuration. But like everyone else said, it really depends on what ya want to do with it.

Re:VMS (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490837)

Plus, I took computer programming when the computer lab at MIT used a VAX 11/780 :^)

When I started college they used a 5 node cluster (four 11/780's and an 11/785) for most of the undergraduate class work at the university I attended. The Com Sci department had their own VAX 11/780 running 4.2 BSD (later upgraded to 4.3), and several other departments had their own VAXen running BSD. I later went to work for one of those departments.

The documentation is excellent, even if it spans something like 13 3" thick 3ring binders

Much to the dismay of the DEC field service rep, when I worked at a university for a department where we ran BSD, we bought a new VAX, and promptly file-13'd the VMS doc and recycled all those bright orange 3-ring binders for two whole sets of BSD docs.

Why? whenever you save a file, its version number goes up by 1.

I don't like versioning that doesn't happen when I want it to, to the files I want to keep under version control. I prefer an approach like RCS/CVS/PVCS.

You don't need to worry about the version # unless you want to look at your old versions (all are saved by default).

It was the 'all are saved by default' that many people ran afoul of, as it chewed up all their disk quota and required constant vigilance to keep things tidy.

Oh, and if for some reason you unplug your microvax accidentaly while editing the file, (the only way a non-sysadmin user can make it crash) you can recover from the journal file which is created, by default, with the command EDT/RECOVER LOGIN.COM, which has captured all the keystrokes of you editing session, and will redo them at warp speed on the screen while you watch.

EDT... ack, that gives me the heebie-jeebies. Never did like that or TPU (the other display editor). Of course it could be worse, if you didn't get on a DEC terminal (a lot of the labs had ADM3a's, ADM5's or TVI910's), you were stuck with using SOS (the horror!). On the other hand if you were using one of the BSD machines, you could run vi just fine even on the ADM's which were dumb as rocks. BTW, vi also has a recover mode.

What if Ken Olsen had said

Instead he said "UNIX is snake oil", and history played out the way it did. Ah well, DEC built pretty nice hardware in those days, but most of us where I worked/went to school were more happy running BSD than VMS, especially because DEC's licensing and maintenance/support contracts for VMS made running VMS cost prohibitive. Today I can do everything and more on my cheapo home Linux box than I could do back then on $100K VAXen.

Re:64 bit woes (1)

jclip (113040) | more than 14 years ago | (#1490838)

gcc 2.8.1 under linux actually has okay 64-bit support under Linux, in my experience. Not my #1-most-favoritest platform for such things, but usable at least in the C-sphere.

Alpha cluster are 64-happy, or maybe an SGI cluster (tho' SGI's 64-bit support was horrid last I tried).

Re:Depends on your needs. (1)

bendawg (72695) | more than 14 years ago | (#1490839)

SCI is actually a CC-NUMA system.
You should be able to use any SMP enabled (threaded) program as long as you recompile it.
That's the beauty of CC-NUMA, no rewriting necessary.

Solaris 8 (1)

bendawg (72695) | more than 14 years ago | (#1490840)

The next Suns are going to be cluster machines. Solaris 8 will support a single operating system running over an internal cluster.

Re:A few ideas... (0)

Anonymous Coward | more than 14 years ago | (#1490841)

A cluster of old computers seems cheaper, but will comsume a lot more power (and generate a lot more heat) than a cluster with the same MIPS rating running on new computers. Keep this in mind if you will be paying the electricity bill.

G4s in clusters... (2)

Troy Baer (1395) | more than 14 years ago | (#1490842)

But, I would expect that the performance from a 8 or 16 box cluster of G4's with Gigibit Ethernet would pretty much blow away a beowulf cluster in both the performance and price categories.

I seriously doubt that. To use the AltiVec part of the G4 (which is what gives its absurdly high peak performance), you need to be either hand-writing PPC/AltiVec assembly code or using a vectorizing PPC/AltiVec compiler, and I have not heard of *any* of the latter. Also, the memory system on the G4 isn't much (if any) better than that on a standard Pentium III, which frankly sucks (~300MB/s). A Beowulf cluster comprised of Alphas with a Myrinet network will likely wipe the walls with a similarly sized G4 cluster with Gigabit Ethernet, and will cost about as much -- large GigE switches are expensive.

8 DS10 1Us (@$3k) + 8 Myrinet cards (@$1.4k) + 1 16-port Myrinet switch (@$4k) = $39.2k

8 G4s (@$2.5k) + 8 Gigabit Ethernet cards (@$0.7) + 1 8-port Gigabit Ethernet switch (@$15k) = $40.6k

--Troy

Re:Novell does Clusters! (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490843)

Novell is also very expensive.

Compared to Linux, perhaps, but not that much worse than NT...

Re:What are you gonna use it for? (0)

Anonymous Coward | more than 14 years ago | (#1490844)

Exactly!

Re:VMS (0)

Anonymous Coward | more than 14 years ago | (#1490845)

$ set file/version_limit=N filename

Where N is the number of revsions that you wanted to keep around. Of course this does not deal with the 32,767 version limit problem, but one can write scripts to deal with that easily enough.

VMS is king when it comes to clustering, hands down, load balancing and fault tolerance all built in. Compaq would do very well to release OpenVMS into the public domain.

Re:Three words: NT doesn't scale (1)

DJerman (12424) | more than 14 years ago | (#1490846)

Um... yeah. NT Doesn't scale by default, as in processes must take advantage of the NT Cluster Services API -and- do other funky things to scale. Off the top of my head, the only things that will run faster on two machines are Oracle and IIS (feel free to add to the list, these are what I know). IIS will run on up to two NT servers, Oracle will run on more if you can get the shared-disk hardware configured :-). Exchange won't scale :-). Before MS killed Alpha-NT, Oracle and Digital had (I believe) a 6-way scaling demonstration cluster. I'm not aware of any more-than-2-way NT clusters in production, but what do I know.

Setting up an NT cluster is a non-trivial exercise (we did it for our HA services, but we backed down as the clustering sw caused as many crashes as it saved us from) (yes, we used an MCSE). It's about as hard as setting up any other clustered system :-) But monkeys like me can "run" it with a little expert guidance.

ObUnix: Of course, 8 way Unix and VMS clusters are old hat. Linux is still developing into the HA and scalability market, but it has roots to grow on.

Re:VMS (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490847)

A partial solution (of course not one they told anyone about at the U), but I still prefer a system that I can check files in only when I want. Having it auto-delete versions means that if I want to save the nth version and not have it purged I have to copy it to some other filename. The auto file versioning also doesn't do much to help with multiple developer versioning and synchronization that checkin/checkout systems deal with. If you want to keep a large number of versions around, a system that stores deltas instead of whole files is much more disk space efficient (not that big a deal these days, but back in the 80's when I used VMS it was a huge deal).

Re:Why is this question on Slashdot? (1)

ckotso (121347) | more than 14 years ago | (#1490848)

First of all, forgive my not-so-good english but it is not my mother language (I live outside the US & UK).
So: It is true that I am not going to make any decision. I am a computer administrator, no darn executive. What I _CAN_ do though, is present my superiors with facts. This is what I am trying to find and why I asked slashdot users on the first place. As for what I know, yes I do know Linux is better than NT. I have a long background of Unix (or Unix-like, for that matter) based computers administration, and I consider it critical that this University's users have access to a high availability super computer. If this makes me a poster child, I accept this attribution. What I also accept is that my question was not clear as to the needs. What we need is a fail-proof number-cruncher. We need a system that can carry long computations (which could mean a beowulf cluster) and also be highly available (which I am not yet sure what it means). We have already setup a high-availability SUN cluster with 2 SUN250, but that will act as a web & mail server only, plus its cost is somewhat big. What I want is to built a system for the users to log in, and run their simulations & stuff. This should definitely be up on a 24x7 basis.
Now as to PVM, with which I do have experience, I must let you know that it cannot be considered a cluster solution as it is not an integrated solution. PVM is just a parallel programming API, just like MPI etc. PVM can be used with most cluster systems (Beowulf for example), but that's all. What we need is more than parallel computing, period.

The Unisys stuff is a joke (0)

Anonymous Coward | more than 14 years ago | (#1490849)

Ahh, that's your BEST example of NT scalability? ROFL! Congrats to Unisys - they are now where Sun was, oh, about 8 years ago in terms of the SMP. What utter FUD. What they've proven is that if you throw enough monkeys at a problem, they can come up with a solution. The best part, though, was the crapola about "security". LOL. The one thing Microsoft has proven is that Windows is the best backdoor into a computer that was every written. It's a pity no one at Unisys had the brains to design a fast system. It might actually perform well if they had used Linux.

LOL (0)

Anonymous Coward | more than 14 years ago | (#1490850)

So what you're saying is that maybe, just maybe, NT will scale if you put a lot of work into it. Up to 2 processors. Maybe a few more. Beowulfs typically run 16-32+ nodes. Putting one together is fairly straightforward.

Re:Coupla caveats (1)

Psychofreak (17440) | more than 14 years ago | (#1490851)

I see! Well, I think I am getting ~(n-1)*96% with the 2.2.x (couldn't bring myself to use 95, too M$ for me at the moment). With the 2.0.x kernel, I was getting probably ~(n-1)*80% Much MUCH better than NT with about ~(n-1)*30%. (n=2)

I think my performance does in fact vary depending on what I do. GIMP is SLOW by comparison, but compiling is rather fast. This is compared to the celeron machine, and is more a feel than numbers.

I can understand that if the kernel load-balances by the instant, then you will have no benifit form cache. That would explain GIMP and other graphical stuff being slow. I can also see how having one processor complete a task instead of splitting the task can allow a large cache to be used better. Heck, a small cache would be used better too!

I will have to do some more research to see how Linux really handles this. I am sure that there is some awareness in the development community if this is the case. Also I doubt that a SMP machine can achieve (n-1)*100%. The best I think can be expected, and I think would be usefull is (n-1)*98% (love them RPG 98%max values;-)) which would be impressive I think.

If as you say dynamic load balancing can hurt a process, I am sure that some rules can be added to improve the load balancing to utilize the cache. Although I belive that there is a lot done with this already. With an application to look at processor load in X, it would show a lot more being done on processor0 and less on processor1. Rarely did the load look the same. Problem here is my linux box got fragged(hard drive failure), and I reinstalled a different distribution and I don't remember the utilities I was using. I was running SUSE 6.0 and before that slackware 3.5, and curently slackware 4.0.

Again I admit a failure of my memmory and cannot give specifics to back up what I say.

Re:VMS (0)

Anonymous Coward | more than 14 years ago | (#1490852)

Very well put point. I work with TruCluser, the UNIX daughter of VAXCluster. The DLM is amazing, and the new CCFS (Cluster Common Filesystem), and finally, scriptable application failover (i.e., I can move the entire memory space of the application over the shared memory bus =), TruCluster really shows you what you can do. VAXCluster is still the best, and relatively inexpensive (the only real alternative are the massively overpriced, and frankly, underperforming, Himilaya's [sp?]). One of the big plusses to VAXClusters is the ability to leave a great deal of physical distance between nodes -- all you need is some sort of FDDI circut connecting them.

On the note of Beowulf, SCI, and MOSIX, please guys. MOSIX is the closest to anything cluster, but it simply succeeds by massive computing power, and relies on a flawed system (ethernet) for failover. Beowulf is just a distributed computing system wearing a clustering hat.

Frankly, too many people don't realize that Linux isn't everything -- and clustering isn't it's game yet. I'd like to see it move up through the ranks and keep building, but frankly, it just doesn't have the power to play with the big boys.

Re:64 bit woes (1)

Bryan Andersen (16514) | more than 14 years ago | (#1490853)

It's hard to squeeze a 100GByte address space into 2GByte chunks...

In my opinion if you go with 32bit CPUs in your cluster you get what you deserve. Go directly to 64bit CPUS, do not pass GO, do not try to save $200. In the long run it isn't worth it.

Re:VMS (0)

Anonymous Coward | more than 14 years ago | (#1490854)

I don't like versioning that happens so automatically. Having version control available when I want it is a good thing, having it forced on me all the time can be annoying. It is probably a matter of personal preference, but I just didn't like the way that it was implemented in VMS. Something like RCS/CVS/PVCS is more what I am comfortable with. To add insult to injury, it tended to cause directories not only to fill up with extraneous garbage that required time to clean out, it also tended to screw people over on disk quotas if they were negligent in keeping things tidy.

And the problem here is? PURGE gets rid of all but the latest version of all files in a directory.
SET FILE/VERSION=1 *.* turns off versioning for all existing files in the current directory.

Re:VMS (0)

Anonymous Coward | more than 14 years ago | (#1490855)

A partial solution (of course not one they told anyone about at the U), but I still prefer a system that I can check files in only when I want. Having it auto-delete versions means that if I want to save the nth version and not have it purged I have to copy it to some other filename. The auto file versioning also doesn't do much to help with multiple developer versioning and synchronization that checkin/checkout systems deal with. If you want to keep a large number of versions around, a system that stores deltas instead of whole files is much more disk space efficient (not that big a deal these days, but back in the 80's when I used VMS it was a huge deal).

You appear to be confused about the whole purpose of VMS' file versioning. It is NOT intended as a revision control system. Several such packages are freely available for OpenVMS.

Re:VMS (2)

SoftwareJanitor (15983) | more than 14 years ago | (#1490856)

You appear to be confused about the whole purpose of VMS' file versioning.

I'm not confused, I just have no use for VMS's file versioning. As I said, I haven't had the misfortune of using VMS since the 80's and frankly, it isn't likely I'll ever be subjected to using VMS again (since Compaq is pretty much burying it), so it really doesn't matter.

Uh - you've proven that NT doesn't scale, dude. (0)

Anonymous Coward | more than 14 years ago | (#1490857)

This is your best argument? How sad. I've been working with parallel processing since you were in diapers. Ok - Unisys has proven that if you throw a lot of engineers at a solution, you can make NT scale. You could do the same for DOS, too. NT doesn't scale. Period. Windows 2000 *might*. But it's not out yet, last time I looked. And Microsoft is going to have to go through the same learning curve on this as everyone else. And it will probably take them 2-3 years to work all of the issues out, that always crop up with a new O.S. release. In the meantime, you can expect to see lots of bugs, crashes and the usual headaches with a major O.S. release. Which is why the Press is reporting that IT managers aren't putting their mission critical stuff on these types of systems. At least not the ones who know better.

Re:Depends on what you want (TurboLinux) (0)

Anonymous Coward | more than 14 years ago | (#1490858)

I think that a lot of people don't realize that turbo linux "cluster edition" is just the LinuxVirtualServer project. www.linuxvirtualserver.org great stuff, btw, for big webservers

Re:VMS (0)

Anonymous Coward | more than 14 years ago | (#1490859)

$purge/keep=N
Where "N" is the number of versions you want to keep.

Re:G4s in clusters... (1)

Karrots (14012) | more than 14 years ago | (#1490860)

CodeWarrior Pro 5 supports AltiVec.

Re:Shared Disks in Linux with GFS (1)

spinkham (56603) | more than 14 years ago | (#1490861)

Oh gawd...
You can't be serious..
Solaris on sparc has a huge advantage over linux: Fiber Channel.
Their SPARCStorage Array, 5200 and 5400 storage arrays an be attatched to as many machines as need be with a fiber channel hub, and have 1Gbit transfer rates. Sorry, but at this point, Sun storage is just darn better then what we can use for linux, but it is quite expensive.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?