×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Learning High-Availability Server-Side Development?

kdawson posted more than 5 years ago | from the servers-not-breaking-a-sweat dept.

Databases 207

fmoidu writes "I am a developer for a mid-size company, and I work primarily on internal applications. The users of our apps are business professionals who are forced to use them, so they are are more tolerant of access times being a second or two slower than they could be. Our apps' total potential user base is about 60,000 people, although we normally experience only 60-90 concurrent users during peak usage. The type of work being done is generally straightforward reads or updates that typically hit two or three DB tables per transaction. So this isn't a complicated site and the usage is pretty low. The types of problems we address are typically related to maintainability and dealing with fickle users. From what I have read in industry papers and from conversations with friends, the apps I have worked on just don't address scaling issues. Our maximum load during typical usage is far below the maximum potential load of the system, so we never spend time considering what would happen when there is an extreme load on the system. What papers or projects are available for an engineer who wants to learn to work in a high-availability environment but isn't in one?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

207 comments

2 words (2, Informative)

andr0meda (167375) | more than 5 years ago | (#20330639)

Re:2 words (3, Interesting)

teknopurge (199509) | more than 5 years ago | (#20330859)

I just finished reading that paper and was left with the impression that I had just wasted 10 minutes. I could not find a single insightful part of their algorithm - and in fact can enumerate several 'prior art' occurrences form my CPSC 102 class during my undergrad - all were lab assignments.

I did, however, find this sentence disturbing:

However, given that there is only a single master, its failure is unlikely; therefore our current implementation aborts the MapReduce computation if the master fails.
Huh? So, because there is only one master it is unlikely to fail? This job takes hours to run. This is similar to saying that if you have one web server, it is unlikely to fail. I can't help but think this is a logical fallacy. I don't care how simple of complicated a job is - a single-point-of-failure is a single-point-of-failure.

Re:2 words (2, Funny)

teknopurge (199509) | more than 5 years ago | (#20330889)

I can feel the grammar Nazi's stalking me even now...

Re:2 words (1)

fimbulvetr (598306) | more than 5 years ago | (#20331005)

While the condescending attitude might make you feel better about yourself, it seems that they took this "lab assignment" and honed it into a system to make themselves a few bucks.

Oh, and they also use it all the time on one of the world's largest data warehouses.

They use it to make money, so no criticism allowed (0, Flamebait)

FatSean (18753) | more than 5 years ago | (#20332137)

Is that your point? 'Cause Microsoft is making a metric assload of money off of Windows, and Apple is making money off of OSX too.

I think I smell a Google fan-boy.

Re:They use it to make money, so no criticism allo (2, Interesting)

fimbulvetr (598306) | more than 5 years ago | (#20332725)

The point was that he seemed to consider it so academic and so "well known" that he could just dismiss it without considering it.

Google seems to have taken this elementary technique and turned it into a something that can kick the crap out of an over-engineered solution under the right circumstances. I've read the paper, and assuming this is really used how they say it is, I can say that it does a fantastic job of performing AND HA, based on my personal experiences with gmail, google, groups, adwords, maps, analytics, etc.

Fanboy? Maybe, depending on your definition. Impressed? Hell yes.

Re:2 words (2, Informative)

PitaBred (632671) | more than 5 years ago | (#20331907)

A single point of failure is better than multiple points of failure, though, where any one failing would stop things dead in the water (think how a RAID0 array is less reliable than a single drive by itself, statistically). I'd hope that anyone working at Google would realize that, and would mean that, rather than the meaning you took from it :)

But who knows... you could be right. I'm just playing devil's advocate.

Re:2 words (0)

Anonymous Coward | more than 5 years ago | (#20332353)

A single point of failure is better than multiple points of failure, though, where any one failing would stop things dead in the water (think how a RAID0 array is less reliable than a single drive by itself, statistically).

Huh?

Statistically the odds are the same for either drive failing as for a single drive failing. This is not a combined event, the variables are independent.

Re:2 words (1)

Brian Gordon (987471) | more than 5 years ago | (#20332937)

If you have a 200GB file spread across 10 hard drives in a RAID0, then there's 10 points of failure. Any one of the ten hard drives could fail and you've just lost your file. On the other hand, if you have your file taking up the whole of a single hard drive, then the only way you'll lose that file is if that particular drive fails. If a drive fails you stand to lose more data if they're on a RAID0 compared with a bunch of independent HDDs.

Re:2 words (0)

Anonymous Coward | more than 5 years ago | (#20333527)

If the odds of a single drive failing are 1 in 1000, the odds of that drive failing in a RAID are also 1 in 1000.

This is a common statistical fallacy; increasing the number of trials does not increase the probability of an outcome unless you're doing combination. In the case of the RAID, the odds of any drive failing are the same as a lone drive failing.

In this case, you lose your file 1 in 1000 times regardless of the number of points of failure (because one drive failing kills the RAID0).

No fallacy.... (4, Insightful)

encoderer (1060616) | more than 5 years ago | (#20333479)

Huh? So, because there is only one master it is unlikely to fail?

Yes. If you take that sentence in context, the answer is "Yes." Compared to the likelihood that one of the thousands of worker-machines will fail during any given job, it IS unlikely that the single Master will fail. Moreover, while any given job may take hours to run, it also seems that many take just moments. Furthermore, just because a job may take hours to run doesn't mean it's CRITICAL that it be completed in hours. And, at times when a job IS critical, that scenario is addressed in the preceeding sentence: It is easy for a caller to make the master write periodic checkpoints that the caller can use to restart a job on a different cluster on the off-chance that a Master fails.

If a job is NOT critical, the master fails, the caller determines the failure by checking for the abort-condition, and then restarts the job on a new cluster.

It's not a logical fallacy, nor is it a bad design.

For the benefit of anyone reading thru, here is the parapgraph in question. It follows a detailed section on how the MapReduce library copes with failures in the worker machines.

It is easy to make the master write periodic checkpoints of the master data structures described above. If the master task dies, a new copy can be started from the last checkpointed state. However, given that there is only a single master, its failure is unlikely; therefore our current implementation aborts the MapReduce computation if the master fails. Clients can check for this condition and retry the MapReduce operation if they desire.

Re:2 words (0)

stonecypher (118140) | more than 5 years ago | (#20330997)

People who think map reduce is the same thing as scalability have no idea what scalability is.

Re:2 words (2, Insightful)

andr0meda (167375) | more than 5 years ago | (#20331037)


Well, it's easy to say something isn't 'A', and then not spend a word on what IS 'A'.

If I'm so wrong on scalability maybe you can explain it here to me. Thanks.

Re:2 words (5, Informative)

stonecypher (118140) | more than 5 years ago | (#20332441)

This is a bit like saying "auto mechanics is a matter of turning a wrench," and when someone points out to them that there's a lot more to it, saying "well then maybe you could teach me to be a mechanic in your reply." If scalability was an issue simple enough to explain in a slashdot post, people wouldn't have trouble with it. Scalability isn't a problem; it's a family of problems. Suggesting that a single algorithm from a single library magically waves away the issues involved in a heavily parallel server is simply an exposition that you aren't aware what goes into scalable servers.

The Macy's Door Problem is a great example of a Scalability 101 problem that map_reduce has no way to address. In the early 30s, when most department stores were making big, flashy front entrances to their stores with big glass walls and paths for 12 groups of people at a time, doormen, signage, the whole lot, Macy's elected to take a different approach. They set up a small door with a sign above it. The idea was simple: if there was just the one door, it would be a hassle to get in and out of the store; thus, it would always look like there was a crowd struggling to get in - as if the store was just so popular that they couldn't keep up with customer foot traffic. The idea worked famously well.

In server design, we use that as a metaphor for near-redline usage. There's a problem that's common in naïve server design, where the server will perform just fine right up to 99%. Then, there'll be a tiny usage spike, and it'll hit 101% very briefly. However, the act of queueing and disqueueing withheld users is more expensive than processing a user, meaning that even though the usage drops back to 99%, by the time those 2% overqueue have been processed, a new 3% overqueue has formed, and performance progressively drops through the floor on a load the application ought to be able to handle. I should point out that Apache has this problem, and that until six years ago, so did the Linux TCP stack. It's a much more common scalability flaw than most people expect.

Now, that's just one issue in scalability; there are dozens of others. However, map_reduce has literally nothing to say to that problem. Do I need to rattle off others too, or maybe is that good enough? I mean, we have the exponential growth of client interconnections (Metcalfe's Law, which is easily solved with a hub process;) we have making sure that processing workloads is linear growth (that is, o(1) as opposed to o(lg n) or worse), which means no std::map, no std::set, no std::list, only pre-sized std::vector and very careful use of hash tables; we have packet fragmentation throttling; we have making sure that you process all clients in order, to prevent response-time clustering (like when you load an apache site and it sits there for five seconds, so you hit reload and it comes up instantly,) all sorts of stuff. Most scalability issues are hard to explain, but maybe that brief list will give you the idea that scalability is a whole lot bigger of an issue than some silly little google library.

Talk to someone who's tried to write an IRC server. Those things hit lots of scalability problems very early on. That community knows the basics very, very well.

Re:2 words (0)

Anonymous Coward | more than 5 years ago | (#20333037)

ircd programmers > google

lol

Give it up! (0)

Anonymous Coward | more than 5 years ago | (#20330659)

"Give it up, give it up," said he, and turned away with a great sweep, like someone who wants to be alone with his laughter.

Well... (3, Informative)

Stanistani (808333) | more than 5 years ago | (#20330685)

You could start by reading this book [oreillynet.com] for a practical approach:

Zawodny is pretty good...

Re:Well... (3, Informative)

tanguyr (468371) | more than 5 years ago | (#20331043)

I also recommend the book Building Scalable Web Sites [oreilly.com], also from O'Reilly. Loads of good ideas on clustering, performance monitoring, even some ideas on scaling the development process itself. Scalability and high availability are not the same thing, but much of the material covered in this book is relevant to both. /t

Re:Well... (1)

tholomyes (610627) | more than 5 years ago | (#20331267)

I just picked "Building Scalable Web Sites" up four or five weeks ago and I'll second that recommendation; the book is really well written and actually a fairly quick read, a rarity even among O'Reilly books. It covers a lot of ground comprehensively, and is organized in a way that makes sense.

The unfortunate thing about databases (3, Insightful)

PhrostyMcByte (589271) | more than 5 years ago | (#20331119)

Is that most of them have poor native APIs when it comes to scalability. Some of them have something like

handle = query("SELECT...");
/*do something*/
result = wait(handle);

But that is far from optimal. When will they be smart and release an async API that notifies you via callback when complete? This would be very useful for apps that need maximum scalability.

Microsoft's .NET framework is actually a great example of doing the right thing - it has these types of async methods all over the place. But then you have to deal with cross-platform issues and problems inherent with a GC.

It's not that much different for web frameworks either. None that I've tried (RoR, PHP, ASP.NET) have support for async responding - they all expect you to block execution should you want to query a db/file/etc. and just launch boatloads of threads to deal with concurrent users. I guess right now with hardware being cheaper it is easier to support rapid development and scale an app out to multiple servers.

Re:The unfortunate thing about databases (4, Informative)

Khazunga (176423) | more than 5 years ago | (#20331667)

Most databases have async APIs. Postgresql and mysql have them in the C client libraries. Most web development languages, though, do not expose this feature in the language API, and for good reason. Async calls can, in rare cases, be useful for maximizing the throughput of the server. Unfortunately, they're more difficult to program, and much more difficult to test.

High scale web applications have thousands of simultaneous clients, so the server will never run out of stuff to do. Async calls have zero gain in terms of server throughput (requests/s). It may reduce a single request execution time, but the gain does not compensate the added complexity.

Re:The unfortunate thing about databases (1)

PhrostyMcByte (589271) | more than 5 years ago | (#20332997)

The only async apis they have are like the example I gave before. These are sub-optimal!

It's true that with a single server handling 10, 100, 200 RPS, the stupid threaded model will likely not make a big difference in _throughput_. It will make a MASSIVE difference in CPU/RAM usage though, and let you easily scale up to 10000 RPS on commodity hardware using just a single thread. Some people like to maximize their hardware usage.

And async is certainly not much more difficult - it's a new way of thinking, sure, something new to learn. But it's not really that difficult! Compare the two:

/* threaded */
int res = send(sock, buf, size, 0);
if(res != -1) {
/* do something */
}
else {
/* handle error */
}

/* async */
send(sock, buf, size, 0, on_finish, ctx);

void on_finish(int res, void *ctx) {
if(res != -1) {
/* do something */
}
else {
/* handle error */
}
}

Re:The unfortunate thing about databases (1)

nuzak (959558) | more than 5 years ago | (#20333423)

I would be very surprised to see a database that didn't offer per-row callback functions in its call level API -- even SQLite has them. I won't hazard any guesses about MySQL, googling for the subject turned up too much PHP noise to be conclusive.

I actually find async programming with a good API to be easier, because everything's an event, and you don't have to design the flow of control of everything else around constantly returning to poll for results, or deal with the locking and race conditions if you do it threaded. Mind you, that's "with a good API", and those are far between.

Re:The unfortunate thing about databases (1)

M. Baranczak (726671) | more than 5 years ago | (#20331707)

Interesting.

Do you think that non-blocking IO really offers enough performance gains to compensate for the resulting spaghetti code? This isn't a rhetorical question, I'm really curious.

Re:The unfortunate thing about databases (1)

Jhan (542783) | more than 5 years ago | (#20332661)

Why the spaghetti?

Troubling blocking IO code in C++:ish pseudo:

result1 = doIo(foo); // Blocking IO, wait 5s
... // Maybe do other things
result2 = doIo(bar); // Blocking IO, wait 5s
... // Maybe do other things
// Result 1 and 2 are first used here
print("Result was "+resullt1+"&"+result2);
// Min 10s to get here

So, add this object

class Unblocker {
__operationHandle operationInProgress;
__Unblocker(parameter) {
____operationInProgress = sendIo(parameter); // Non-blocking version of doIo above, as provided by API
__}
__Result result() {
____return waitIo(operationInProgress); // API-call to block until IO completes and return result, as provided by API
__}
}

...and the result is a very similar program:

Unblocker unresult1(foo); // doIo call has been unblocked into sendIo call
... // Maybe do other things
Unblocker unresult2(bar); // same
... // Maybe do other things
// Result 1 and 2 are first used here
print("Result was "+unresult1.result()+"&"+unresult2.result());
// Min 5s to get here

50% speed-up! Not shabby!

Re:The unfortunate thing about databases (0)

Anonymous Coward | more than 5 years ago | (#20332867)

I believe the pattern/technique you are referring to is often called "IOU" or "futures" and can, indeed, be very useful when doing asynchronous programming. There really is no need for "spaghetti" code (as you rightly point out). And, with a decent library in place, most anyone can use async calls safely.

Rogue Wave has a nice, well thought out, API for C++ which you might find of interest: http://www.roguewave.com/support/docs/leif/sourcep ro/html/protocolsug/2-4.html [roguewave.com]

This technique can be used not only with multi-threading, but also with any async-based API (such as OS and DB calls). I mention this mainly due to the fact that sometimes people forget that techniques used in one domain of programming often are applicable in others with slight adjustment. Of course, that's not always the case :-). But here, it seems to be so.

Re:The unfortunate thing about databases (1)

bidule (173941) | more than 5 years ago | (#20333109)

Is that most of them have poor native APIs when it comes to scalability. Some of them have something like

handle = query("SELECT..."); /*do something*/
result = wait(handle);
But that is far from optimal. When will they be smart and release an async API that notifies you via callback when complete? This would be very useful for apps that need maximum scalability.
I don't understand what is wrong with that.

Are you unhappy about the /*do something*/ part, because you'd want the handle released early to minimize this process impact?

Are you unhappy about the wait() call, because you don't want to code and manage your own threads to handle the call asynchronously?

Re:The unfortunate thing about databases (1)

PhrostyMcByte (589271) | more than 5 years ago | (#20333461)

see this comment [slashdot.org] for more along the lines of what I'm talking about.

I'm unhappy about the wait() call because it doesn't lend itself to fully async coding - if you've got nothing to do in that context, you're stuck blocking the thread when it could be doing other things. So now you have to waste CPU on context switches and waste RAM on state for a new thread.

A good callback-based API doesn't have these deficiencies. You just call a function to dequeue completion callbacks, from however many threads you'd like. You're never stuck blocking when there is work to be done, and you don't waste time polling needlessly.

Re:The unfortunate thing about databases (1)

Jhan (542783) | more than 5 years ago | (#20333335)

Is that most of them have poor native APIs when it comes to scalability. Some of them have something like
handle = query("SELECT...");
/*do something*/
result = wait(handle);
</ecode
But that is far from optimal. When will they be smart and release an async API that notifies you via callback when complete? This would be very useful for apps that need maximum scalability.
</i></blockquote>
<p>Since you seem to maybe be talking circumspisciously about Java:
<ecode>
final Foo objectToNotify = theFoo;
Thread.run(new Runnable() {
public void run() {
handle = query("SELECT...");
/*do something*/
result = wait(handle);
objectToNotify.notify(result);
}
});

The same can be achieved in any language with threads. Are you asking for callbacks in a pure C environment?

Re:The unfortunate thing about databases (0)

Anonymous Coward | more than 5 years ago | (#20333499)

Sorry but I fail to see how asynchronous calls on a client to fetch results have anything to do with more scalability on the server. How does doing that change the actual load on the server? You say the server needs to keep more threads alive when the clients block, but I'd think that you would at least need to start the same number of threads in the asynchronous method of responding to a client. I always imagine a solution for more scalability would be in a middle layer (tier if you will ;-), which caches the results of common queries. But even then IMHO ultimately scaling is the task of the DBMS and a middle tier is a solution to a problem on the server.

scalability (0)

Anonymous Coward | more than 5 years ago | (#20330693)

so your are having scalability issues because of a poor design - "a second or two slower" - wow how do you get away with that poor performance? anything over a 1.5 seconds and I get major complaints here.

How does high availability come into it? and high availability isn't exactly difficult you just need budget

Re:scalability (1)

romango (632756) | more than 5 years ago | (#20330891)

The load you describe is incredibly low. You must have very large DB tables with improper indexes to get that poor of a response.

look at Saas development (1)

CBravo (35450) | more than 5 years ago | (#20330711)

Generally, Saas (software as a service) providers have to scale their apps. The development issues they have are more or less solved. Look it up on Google... ('saas scalability problem').

Here goes... (2, Informative)

Panaflex (13191) | more than 5 years ago | (#20330785)

Ok, first up:
1. Check all your SQL and run it through whatever profiler you have. Move things into views or functions if possible.
2. CHECK YOUR INDEXES!! If you have SQL statements running that slow, the likely cause is not having proper indexes for the statements. Either make an index or change your SQL.
3. Consider using caching. For whatever platform you're on there's bound to be decent caching.

That's just the beginning... but the likely cause of most of your problems. We could go on for a month about optimizing.. but in the end if you just stuck with what you have and checked your design for bottlenecks you could get by just fine.

Re:Here goes... (2)

bugg_tb (581786) | more than 5 years ago | (#20330909)

I don't think the poster was referring about his system and how to optimize it, I think he's interested in learning HA/Failover techniques that don't yet need to be implemented in his database.
I do vaguely recall an article a while back about, myspace iirc and how they had so much trouble expanding as soon as the boom took off, it wasn't very practical but gave a nice insight into how a large load on servers can cause interesting challenges.
Tom

Re:Here goes... (1)

Panaflex (13191) | more than 5 years ago | (#20333175)

Understood - it's just that oftentimes people overlook the basic issues. He could get by without having to spend much time & effort - just a little basic knowledge of databases could go a long way. He mentioned 1-2 second queries and I suspected he's just missing the basics.

Re:Here goes... (2, Informative)

rdavidson3 (844790) | more than 5 years ago | (#20331925)

Just to appending to this list:
4. Get your servers clustered and this will help with server load (not really necessary at this this time for what you need, but will position you for the future) and redundancy if the server dies. If this is not possible, look at "warm" backups. But then again ask the business side what is their expectation when a problem happens, and then plan for it.
5. For performance tuning look at the execution plans on the SQL
6. Use transactions whenever possible (BEGIN TRANSACTION / COMMIT / ROLLBACK).
7. If you see deadlocks on tables, try using table hints (NOLOCK) on SELECT statements.
8. Get an experienced DBA to peer-review your setup and code if necessary.

And a change to 1. You can use stored procs as well as views and functions. But moving the SQL code into views / functions will bring performance gains from the server by having the code already compiled and creating a saved execution plan.

Mastering Ajax with JSON on the server side (0, Offtopic)

IndioMan (411484) | more than 5 years ago | (#20330799)

As discussed in the previous article [ibm.com] in this series, JSON is a useful format for Ajax applications because it allows you to convert between JavaScript objects and string values quickly. In this final article of the series, you'll learn how to handle data sent to a server in the JSON format [ibm.com] and how to reply to scripts using the same format.

High availability!=high performance (5, Insightful)

dominux (731134) | more than 5 years ago | (#20330823)

start by being clear about what you want to achieve. If it is HA then you want to look at clustering, failover, network topology, DR plans etc. If it is HP then look for the bottlenecks in the process, don't waste time shaving nanoseconds off something that wasn't bothering anyone. At infrastructure level you might think about cacheing some stuff, or putting a reverse proxy in front of a cluster of responding servers. In general disk reads are expensive but easily cached, disk writes are very expensive and normally you don't want to cache them, at least not for very long. Network bandwidth may be fast or slow, latency might be an issue if you have a chatty application.

Re:High availability!=high performance (1)

TheRaven64 (641858) | more than 5 years ago | (#20330933)

Exactly right. I'd like to add that if you want to write really scalable code then use an asynchronous approach as much as possible. Some programming languages and toolkits make this easy, some make it hard, but it's possible in any. If your database server is slow responding to your application server, make sure your app server can do useful work while it's waiting. The same is true of communication between parts of the server.

I'd thoroughly recommend that you learn Erlang, if you haven't already. The language is almost certainly not suited to the kind of task described (it might be, but it's unlikely), however the programming style it encourages can be applied to languages that are. Learning Erlang helps you write scalable code in any language, just as learning Smalltalk helps you write good OO code, irrespective of whether you actually use either language in production.

Re:High availability!=high performance (1)

leuk_he (194174) | more than 5 years ago | (#20331495)

Agreed.

High performance=short response times. In your case you can think about caching more and tuning the system and database access. Maybe you can make the application more scalable, but once you move the database to different server than the application you first get some extra (network) overhead instead of performance, specially in low load situations. And more iron/servers means more money.

High availably [wikipedia.org] is about a 24x7 and no singe point of failure. One method for this is clustering [wikipedia.org] (more application web servers ,more replicated databases,)

You want to learn about eXtreme load? Simulate it in your testing environment. Testing is important but not simple.

One Of Many Possible Futures (1)

rubicante (696198) | more than 5 years ago | (#20330827)

I think the word scalable gets people into trouble when they program for a future that will never arrive. Instead focus on building elegant applications -- they are easy to maintain, and you know you'll be doing a lot of that.

I don't code for it directly (3, Interesting)

Applekid (993327) | more than 5 years ago | (#20330873)

Our in-house applications don't get built around performance at all (personally I find it disappointing but I don't write the rules... yet). We generally scale outwards: replicated databases, load distribution systems, etc.

Many of the code guidelines we have established are to aid in this. Use transactions, don't lock tables, use stored procedures and views for anything complicated, things like that.

I guess my answer is that we delegate it to the server group or the dba group and let them deal with it. I guess this means the admins there are pretty good at what they're doing. :)

get a just in case server (0)

ILuvRamen (1026668) | more than 5 years ago | (#20330899)

I've heard of companies who offer server networks for websites and corporate server backups in case of a massive flood of traffic. Basically it's just about free cuz you rarely use it but if your website shows up on The Daily Show and you get 1 million visitors, they sense that and host it from the backup on 50 of their servers at once until traffic dies down and bill you for it later.
Same with a corporate network. A bunch of people have to get in their last minute stuff on the last day of the quarter or whatever and your server is going nuts with the traffic so they're there to save you. Just take a day or so and write a "switch" sort of program on your server(s) that detects tons of traffic and contacts the emergency offsite servers that the company has your apps and DBs just sitting on and you use multiple servers of theirs until the traffic dies down. There is a little bit of a higher fee for corporate services but it's still really cheap. It's like a rented server but 99.999% of the time, they don't need to allocate any bandwidth at all to it so it's like 25x cheaper than renting a dozen actual, full time servers.

2 Cents (1)

twoplustwo (1146647) | more than 5 years ago | (#20330901)

Well, HA typically has to do with availability not performance. However, if you add redundant equipment, e.g. another column, you can improve performance and improve availability. So, db scaling issues can be resolved by adding memory and CPUs. Applications can be scaled by adding by cloning vertically, add memory, cpus, etc. A redundant column of equipment, e.g. web servers, etc.

Not sure what you want to test. (3, Insightful)

funwithBSD (245349) | more than 5 years ago | (#20330941)

Stress testing? Use LoadRunner or some other tool to simulate users.

If you are using Java on Tomcat, BEA, or Websphere, use a product like PerformaSure to see a call tree of where your Java program is spending it's time. Sorts out how long each SQL takes too, and shows you what you actually sent. If you have external data sources, like SiteMinder, it will show that too.

If you mean "What happens if we lose a bit of hardware" simulate the whole thing on VMware on a single machine and kill/suspend VMs to see how it reacts.

Most importantly, MAKE SURE YOU MODEL WHAT YOU ARE TESTING. IF you are not testing a scaled up version of what users actually do, you have a bad test.

Another option (1)

stonecypher (118140) | more than 5 years ago | (#20330977)

You'll find that Erlang doesn't even blink at those volumes, and that Erlang's entire reason to exist is scalability/reliability. Granted, it's a little severe to pick up a new language, but the benefits are enormous, and it's one of those boons you can't really understand until you've learned it. It is, however, worth noting that transactions on an MNesia database in the multiple gigabytes are typically faster than PHP just invoking MySQL in the first place, let alone doing any work with it.

Erlang is difficult to learn from what's on the web; consider starting with Joe's book [amazon.com].

Slightly off topic (2, Insightful)

Gazzonyx (982402) | more than 5 years ago | (#20331117)

I keep hearing about Erlang being the next greatest thing since sliced bread... unfortunately, I don't have time to look into it too much. Could someone give me an 'elevator' pitch on what makes it so great for threading? Is it encapsulation based objects, a thread base class, or what? How does it handle cache coherency on SMP?

Re:Slightly off topic (3, Informative)

stonecypher (118140) | more than 5 years ago | (#20332243)

It's not something you can cram into an elevator pitch; erlang is an entirely different approach to parallelism. If you know how mozart-oz, smalltalk or twisted python work, you've got the basics.

Basically, processes are primitives, there's no shared memory, communication is through message passing, fault tolerance is ridiculously simple to put together, it's soft realtime, and since it was originally designed for network stuff, not only is network stuff trivially simple to write, but the syntax (once you get used to it) is basically a godsend. Throw pattern matching a la Prolog on top of that, dust with massive soft-realtime scalability which makes a joke of well-thought-of major applications (that YAWS vs Apache [www.sics.se] image comes to mind,) a soft-realtime clustered database and processes with 300 bytes of overhead and no CPU overhead when inactive (literally none,) and you have a language with such a tremendously different set of tools that any attempt to explain it without the listener actually trying the language is doomed to fall flat on its face.

In Erlang, you can run millions of processes concurrently without problems. (Linux is proud of tens of thousands, and rightfully so.) Having extra processes that are essentially free has a radical impact on design; things like work loops are no longer nessecary, since you just spin off a new process. In many ways it's akin to the unix daemon concept, except at the efficiency level you'd expect from a single compiled application. Every client gets a process. Every application feature gets a process. Every subsystem gets a process. Suddenly, applications become trees of processes pitching data back and forth in messages. Suddenly, if one goes down, its owner just restarts it, and everything is kosher.

It's not the greatest thing since sliced bread; there are a lot of things that Erlang isn't good for. However, what you're asking for is Erlang's original problem domain. This is what Erlang is for. I know, it's a pretty big time investiture to pick up a new language. Trust me: you will make all your time back in writing far shorter, far more obvious code than you did in learning the language. You can pick up the basics in 20 hours. It's a good gamble.

Developing servers becomes *really* different when you can start thinking of them as swarms.

Re:Slightly off topic (0)

Anonymous Coward | more than 5 years ago | (#20332483)

In Erlang, you can run millions of processes concurrently without problems. (Linux is proud of tens of thousands, and rightfully so.)
But that's not a valid comparison - they're not *real* processes.

Re:Slightly off topic (1)

poopdeville (841677) | more than 5 years ago | (#20333567)

Think functional language (which gives you concurrency for free) with very strong OO support (using the Actors model).

Similar Question - Interviews (1, Offtopic)

Greenisus (262784) | more than 5 years ago | (#20330987)

I have a question sort of along the same line. I interviewed for a position at a very large internet company, and one of their primary concerns was very high performance and scalability. I went through the phone interviews and then the in-person interviews, and I actually did quite well, and was even told that I did quite well. However, in the end, I was told that while I did well, they would have liked to see more experience with very large web applications (I've worked at smaller companies). So, how do I go about learning something I think I already know, and from your experience, was that not the real reason I was not accepted?
 
Sorry this is a bit off-topic; I've just been dying to ask the slashdot community and this seems to be the most appropriate forum for the question.

Re:Similar Question - Interviews (1, Informative)

Anonymous Coward | more than 5 years ago | (#20331445)

There is a good chance that this was the reason you were not accepted. I work at a very similar firm to the one you describe, one anyone here would recognize, and we do reject people for the reason you mentioned. Basically, the problem is that unless someone does have experience with very large scale applications, we find that they have a pretty steep learning curve ahead of them. While many candidates think that they know how to build a scalable app, what worked for them on an application that has 100k users totally breaks down when there are 100 million users.

Its very difficult to get that kind of knowledge/experience without having actually done it before. The way I got it was by being hired into a project which was a rewrite of a very large scale system, and I got hired right as that project was starting. This was a great way to make it up the learning curve without too much pain, because I got to hear from the team directly about what decisions they thought were wrong about the previous design, and got to participate in the design discussions for the next generation. The team was very experienced at this and the choices they made (and mistakes/bumps along the way) taught me a lot about how to build such an application.

Languages, Libraries, Abstraction, Audience (2, Interesting)

DonRoberto (1146645) | more than 5 years ago | (#20331033)

From working on both academic and enterprise software designed specifically to scale, these are four things I've noticed are incredibly important to scalability:

Languages - I recently saw a multi-million dollar product fail because of performance problems. A large part of it was that they wanted to build performance-critical enterprise server software, but wrote it mostly in a language that emphasized abstraction over performance, and was designed for portability, not performance. The language, of course, was Java. Before I get flamed about Java, the issue was not Java itself and alone, but part of it was indeed using a language not specifically designed for a key project objective: performance. The abstraction, I would argue, did the project worse than all the other peformance issues associated with bytecode however. Relevant books on this subject are everywhere.

Libraries - Using other people's code (e.g. search software, DB apps, etc.) will always introduce scalability weaknesses and performance costs in expected and unexpected places. Haphazardly choosing what software to get in bed to can come back to bite you later. It is an occupational hazard, and each database product and framework and even hardware configuration has its own pitfalls. Many IT book on enterprise performance or even whitepapers and academic papers can provide more information.

Abstraction - There is no free lunch. When you make things easier to code, you typically incure some performance penalty somewhere. In C++, Java, and most other high level languages, the sheer notion of modularity and abstraction eventually add so much hidden knowledge and code that developers either lose track of what subtle costs everything is incurring, or are suddenly put in a position where they can't go back and rewrite everything. Sometimes it is better to write a clean, low-level API and limit the abstraction eyecandy or it will come back to bite you. On the other hand, sometimes a poor low-level API is worse than a cleanly abstracted high-level API. In practive, few complex and performance-oriented systems are architected in very high level languages however. I have seen few books on this subject, and it is pure software engineering. Design patterns might help, however.

Audience - Both clientelle and developer audiences make a big difference. Give an idiot a hammer with no instructions... and you get the point. Make sure your developers know what they're doing and what priorities are, and also design your interfaces and manuals in such a way as to keep scalability in mind. Why have a script perform a hundred macro operations when a well-designed API could provide better performance with a single call? This entails both HCI and project development experience.

Wish I could suggest more books, but there's just too many.

Re:Languages, Libraries, Abstraction, Audience (4, Insightful)

EastCoastSurfer (310758) | more than 5 years ago | (#20331339)

Language - Doesn't matter much if you know how to design a scalable system. Some languages like Erlang force you into a more scalable design, but even then it's still easy to mess up. Unless this multi-million dollar project you're talking about was an embedded system I would bet language used was the smallest reason for bad performance. Although it is fun to bash java whenever the chance.

Libraries - Bingo lets throw out nice blocks of tested and working code b/c it's always better to write it yourself. You pretty much have to use libraries to get things done anymore. And are you suggesting someone should write their own DB software when building a web app? Um, yeah see that web app ever gets done.

Abstractions - While most are leaky at some point, abstractions make it easier for you to focus on the architecture (which is what you should be focusing on anyways when building scalable systems).

I see these types of arguments all the time and they rarely make sense. It's like arguing about C vs. Java over 1ms running time difference when if you changed your algorithm you could make seconds of difference or if you changed your architecture you would make minutes of difference...

Re:Languages, Libraries, Abstraction, Audience (1)

marcosdumay (620877) | more than 5 years ago | (#20333475)

Or the GP was completely wrong... or maybe he has just tighter resources than you.

All things he said are usefull to improve performance, and can lead to errors that will decrease said performance if you are not carefull enough. Of course, if your performance hits are due to gross architectural errors, you shouldn't even think on looking into them.

Re:Languages, Libraries, Abstraction, Audience (1)

Reverend528 (585549) | more than 5 years ago | (#20331383)

a language that emphasized abstraction over performance... The language, of course, was Java.

When has abstraction ever been a strength of java? It has one fucking abstraction, and there are programmers out there who say that sun didn't even get that one right.

Re:Languages, Libraries, Abstraction, Audience (1)

plsander (30907) | more than 5 years ago | (#20332195)

This is where profiling and "throw the first one away" come into play.

Use the OO and abstraction language and tools to design the application. Then look at the performance data and optimize those sections that are called frequently.

Abstraction is great for design, maintenance, and upgrade/redesign (your application's requirements never stay static, right?)

HA is not load balancing (1)

PlatinumRiver (846268) | more than 5 years ago | (#20331045)

This is a broad topic, but I would say begin by identifying your single points of failure. You can then research setting up HA solutions for each of those resources. Also, understand the difference between high-availability and load balancing. Just because your database is fault-tolerant, it does not necessarily mean it can scale to cope with increased traffic.

Draw a high level map of your application and all the server/network resources it uses. Take each one of those components and analyze them for load balancing and fault tolerance. Any single component failure should not affect the overall uptime of the application. Part of a high-availability system is having proper monitoring and notification tools in place. It takes a lot to make a high availability environment work and some of it is not engineering related, but business process related. If your servers are in a data center and a database server goes down, yet your notification system sends an email to a database developer who works 9am to 5pm (maybe on vacation) alerting him/her of the issue... You can see how this can lead to problems. Proper health checks, escalation paths, etc. are all part of making your system work.

My $0.02.

 

Use your strengths (1)

Tablizer (95088) | more than 5 years ago | (#20331113)

Often there are a narrow set of query/reports types are the most common and consume the most resources. Perhaps consider making a nightly customized copy of a view(s) via batch tech that fits that frequent need well and put it on a separate server. This will not only speed up the common need, but also the other queries since their server load is lightened. In general, also make sure your indexing is designed well. In other words, put indexes where they are needed but don't put unnecessary ones. Study the usage needs carefully.

Availability Isn't Scalability (2, Insightful)

John_Booty (149925) | more than 5 years ago | (#20331145)

From what I have read in industry papers and from conversations with friends, the apps I have worked on just don't address scaling issues. Our maximum load during typical usage is far below the maximum potential load of the system, so we never spend time considering what would happen when there is an extreme load on the system.

Is it just me, or is the question hopelessly confused? He's using the term "availability" but it sounds like he's talking about "scalability."

Availability is basically percentage of uptime. You achieve that with hot spares, mirroring, redundancy, etc. Scalability is the ability to perform well as workloads increase. Some things (adding load-balanced webservers to a webserver farm) address both issues, of course, but they're largely separate issues.

The first thing this poster needs to do is get a firm handle on exactly WHAT he's trying to accomplish, before he can even think about finding resources to help him do it.

Define your thread's purposes (5, Informative)

srealm (157581) | more than 5 years ago | (#20331189)

I've worked in multiple extremely super-scaled applications (including ones sustaining 70,000 connections at any one time, 10,000 new connections each minute, and 15,000 concurrent throttled file transfers at any one time - all in one application instance on one machine).

The biggest problem I have seen is people don't know how to properly define their thread's purpose and requirements, and don't know how to decouple tasks that have in-built latency or avoid thread blocking (and locking).

For example, often in a high-performance network app, you will have some kind of multiplexor (or more than one) for your connections, so you don't have a thread per connection. But people often make the mistake of doing too much in the multiplexor's thread. The multiplexor should ideally only exist to be able to pull data off the socket, chop it up into packets that make sense, and hand it off to some kind of thread pool to do actual processing. Anything more and your multiplexor can't get back to retrieving the next bit of data fast enough.

Similarly, when moving data from a multiplexor to a thread pool, you should be a) moving in bulk (lock the queue once, not once per message), AND you should be using the Least Loaded pattern - where each thread in the pool has its OWN queue, and you move the entire batch of messages to the thread that is least loaded, and next time the multiplexor has another batch, it will move it to a different thread because IT is least loaded. Assuming your processing takes longer than the data takes to be split into packets (IT SHOULD!), then all your threads will still be busy, but there will be no lock contention between them, and occasional lock contention ONCE when they get a new batch of messages to process.

Finally, decouple your I/O-bound processes. Make your I/O bound things (eg. reporting via. socket back to some kind of stats/reporting system) happen in their own thread if they are allowed to block. And make sure your worker threads aren't waiting to give the I/O bound thread data - in this case, a similar pattern to the above in reverse works well - where each thread PUSHING to the I/O bound thread has its own queue, and your I/O bound thread has its own queue, and when it is empty, it just collects the swaps from all the worker queues (or just the next one in a round-robin fashion), so the workers can put data onto those queues at its leisure again, without lock contention with each other.

Never underestimate the value of your memory - if you are doing something like reporting to a stats/reporting server via. socket, you should implement some kind of Store and Forward system. This is both for integrity (if your app crashes, you still have the data to send), and so you don't blow your memory. This is also true if you are doing SQL inserts to an off-system database server - spool it out to local disk (local solid-state is even better!) and then just have a thread continually reading from disk and doing the inserts - in a thread not touched by anything else. And make sure your SAF uses *CYCLING FILES* that cycle on max size AND time - you don't want to keep appending to a file that can never be erased - and preferably, make that file a memory mapped file. Similarly, when sending data to your end-users, make sure you can overflow the data to disk so you don't have 3mb data sitting in memory for a single client, who happens to be too slow to take it fast enough.

And last thing, make sure you have architected things in a way that you can simply start up a new instance on another machine, and both machines can work IN TANDEM, allowing you to just throw hardware at the problem once you reach your hardware's limit. I've personally scaled up an app from about 20 machines to over 650 by ensuring the collector could handle multiple collections - and even making sure I could run multiple collectors side-by-side for when the data is too much for one collector to crunch.

I don't know of any papers on this, but this is my experience writing extremely high performance network apps :)

Re:Define your thread's purposes (0)

Anonymous Coward | more than 5 years ago | (#20333303)

How would the least loaded pattern compare to instead allowing threads to perform work-stealing on CAS-based data structures? Or do you instead combine the two approaches?

multiple live servers (0)

Anonymous Coward | more than 5 years ago | (#20331209)

If possible, try to have multiple live servers instead of fail-over. As Scott Courtney of VeriSign (who run the .COM DNS infrastructure) said in a talk recently:

What we prefer is all active equipment. I don't want spares sitting in racks, I don't want spare sites if I can help it. Active to everything. Nothing like production traffic to flush out an issue.


http://video.google.com/videoplay?docid=-552524691 9548243924#26m20s [google.com]

Re:multiple live servers (0)

Anonymous Coward | more than 5 years ago | (#20331431)

There's a couple problems with using all equipment all the time: if load failure or timeouts are an issue with your application then increased load on single component failure can cause failure, synchronization of a live view can have a response time cost and a complexity cost (development time and uncaught bugs).

My suggestion to the article submitter would be to do a google search on the obvious terms ("high availability", "failover", "load balancing") and find out what the possibilities are, then figure out which approach best matches his requirements and resources.

Has anyone actually answered the question? (3, Insightful)

smackenzie (912024) | more than 5 years ago | (#20331235)

I see a lot of recommendations for various technologies, software packages, etc. -- but I don't think this addresses the original question.

What you are asking about, of course, is enterprise-grade software. This typically involves an n-tier solution with massive attention to the following:

- Redundancy.
- Scalability.
- Manageability.
- Flexilibility.
- Securability.
- and about ten other "...abilities."

The classic n-tier solution, from top to bottom is:

- Presentation Tier.
- Business Tier.
- Data Tier.

All of these tiers can be made up of internal tiers. (For example, the Data Tier might have a Database and a Data Access / Caching Tier. Or the Presentation Tier can have a Presentation Logic Tier, then the Presentation GUI, etc.)

Anyway, my point is simply that there is a LOT to learn in each tier. I'd recommend hitting up good ol' Amazon with the search term "enterprise software" and buy a handful of well-received books that look interesting to you (and it will require a handful):

http://www.amazon.com/s/ref=nb_ss_gw/002-8545839-8 925669?initialSearch=1&url=search-alias%3Daps&fiel d-keywords=enterprise+software+ [amazon.com]

Hope this helps.

Re:Has anyone actually answered the question? (3, Informative)

lycono (173768) | more than 5 years ago | (#20332549)

The list of books in that search that are even remotely related to what the OP was asking is very short. I count 7 total results (of 48) in that list that _might_ be useful. Of which, only 1 actually sounds like it might be what the OP wants:
  • How To Succeed In The Enterprise Software Market (Hardcover): Useless, its about the industry, not about writing the software.
  • Scaling Software Agility: Best Practices for Large Enterprises (The Agile Software Development Series): Useless, it describes how to use agile practices in large enterprise level software teams.
  • Groupware, Workflow and Intranets: Reengineering the Enterprise with Collaborative Software (Paperback): Useless, it just describes how various, existing enterprise level software categories are supposed to work together.
  • Metrics-Driven Enterprise Software Development: Effectively Meeting Evolving Business Needs (Hardcover): Possibly useful, if he wants to know how to use metrics to help write the software. I have a feeling this doesn't give him techniques for writing enterprise software specifically though, which sounds more like what he wants.
  • SAP R/3 Enterprise Software: An Introduction: Useless, it's an SAP manual.
  • Essential Software Architecture (Hardcover): Possibly, can't tell from the description whether there is enough enterprise specific information to be useful.
  • Large-Scale Software Architecture: A Practical Guide using UML: Aha! Something that sounds like what the guy is asking for.
(Not that I'm saying the books are useless in general, I'm just not sure they're what this guy/girl is looking for.)

I think this guy/girl is looking for something along the lines of this comment [slashdot.org] but in "accepted" book format. It doesn't look like the search returns a "handful"....

Re:Has anyone actually answered the question? (1)

smackenzie (912024) | more than 5 years ago | (#20332907)

I can't believe that you didn't include the "Fisher Price Baby Bowling Set" (comes up on the third page)! What were you thinking?

Actually, I'm going to completely agree with you; bad original search term. Amazon usually does better (and I should have checked).

The search term "enterprise architecture" seem to produce better general results.
http://www.amazon.com/s/ref=nb_ss_gw/102-6220372-7 109710?initialSearch=1&url=search-alias%3Daps&fiel d-keywords=enterprise+architecture [amazon.com]

Statelessness (3, Interesting)

tweek (18111) | more than 5 years ago | (#20331559)

I don't know if anyone has mentioned it but the key to a web application being scalable horizontally is statelessness. It's much easier to throw another server behind the load balancer than it is to upgrade the capacity on one. I've never been a fan of sticky sessions myself. This requires a different approach to development in terms of memory space and what not. With a horizontally scalable front tier, you can't always guarantee that someone will be talking to the same server on the next request that they were on the previous request. It requires a little more overhead in terms of either replicating the contents of memory between all application servers or on the database tier because you persist everything to the database.

At least that's my opinion.

Re:Statelessness (1)

sapgau (413511) | more than 5 years ago | (#20332913)

Agreed. I've read this somewhere else confirming the same reasons on why to eventually design your application for scalability from the start no matter how small the application.

It is a greater headache to convert for example your webapp from file based cookies to persistent cookies in the DB. Of course trying to be careful not to serialize too much data in cookies either.

my $0.02

HA is an IT thing (1)

Ryan Amos (16972) | more than 5 years ago | (#20331767)

Developers need not worry about HA too much. Your IT department should be able to set this up for you rather seamlessly. With things like LVS/Keepalived you can easily implement load balancing and auto-failover for databases, web servers, etc (you don't even need to code in multiple DB servers; VRRP works wonders for this kind of thing.) As long as the application is designed sanely to begin with, HA as it is typically discussed comes down to minimizing the impact of hardware failure by buying two of everything and making failover happen automatically (human response time is anywhere from 5-15 minutes in a best-case scenario, where worst-case for auto failover is http://keepalived.org/) is an excellent solution for something your size. It's basically an LVS frontend with host checking and automatic failover capability (via VRRP,) custom host checks (i.e. run a typical SQL query every 3 seconds to check that everything is ok, if it's not, remove the DB from the pool, do some other stuff and rebalance the cluster. It can all run off one IP on the front end so the app won't notice what happened.)

Re:HA is an IT thing (2, Insightful)

dennypayne (908203) | more than 5 years ago | (#20332659)

The first two sentences here are one of my biggest pet peeves...if application developers don't start becoming more network-aware, and vice versa, I think you're dead meat. Hint: there are very few applications these days that aren't accessed over the network. I see so many "silos" like this when I'm consulting. The network guys and the app guys have no idea what the "other side" does. Where if they actually worked together on these type issues instead of talking past each other, something actually might get done.


So yes, developers absolutely need to worry about HA. It makes a difference whether your app is stateless or not. How the app is health-checked from the load balancer. How chatty the app is on the network. Etc, etc...


Denny

You are talking about two things (2, Informative)

ChrisA90278 (905188) | more than 5 years ago | (#20331771)

You are talking about two things: reliability and performance. And there are two ways to measure performance: Latency (what one end user sees) and through put (number of transactions per unit time). You have to decide what to address.

You can address reliability and through put by invest a LOT of money in hardware and using things like round robin load balancing, clusters and mirrored DBMSes, RAID 5 and so on. Then losing a power supply or a disk drive means only degraded performance.

Latency is hard to address. You have to profile and collect good data. You may have to write test tools to measure parts of the system in isolation. You need to account for every millisecond before you can start shaving them off

Of course you could take a quick look for obvious stuff like poorly designed SQL data bases, lack of indexes on joined tables and cgi-bin scripts that require a process to be strarted each time they are called.

Doing this currently (0)

Anonymous Coward | more than 5 years ago | (#20331805)

I just got hired at a company whose code is a mess and I have to clean it up. The biggest mistakes i see causing this are tables not normalized, business logic is not properly separated (neither in the DB nor in the code), MVC standards not followed, redundant coding and code/data bloat (ie. loading more data and/or code than is needed to perform a given function and/or task).

These are alot to wade through. My best suggestion is to go with a prebuilt MVC architecture and do your best to throw it all into there. The new STRUTS2 is awesome if you know JAVA. Stay away from Ruby on Rails if you want scalability (as even the RUBY ON RAILS site requires PHP to scale). If you are using PHP, PHPulse is the fastest framework out their but is lacking in documentation. Comparable ones like Cake and ZEND have loads of documentation but are more bloated and far slower

Lots of Options (3, Interesting)

curmudgeon99 (1040054) | more than 5 years ago | (#20331897)

First of all, excellent question.

Second: ignore the ass above who said dump Java. Modern hotspots have made Java as fast or faster than C/C++. The guy is not up to date.

Third: Since this is a web app, are you using an HttpSession/sendRedirect or just a page-to-page RequestDispatcher/forward? As much as its a pain in the ass--use the RequestDispatcher.

Fourth: see what your queries are really doing by looking at the explain plan.

Five: add indexes wherever practical.

Six: Use AJAX wherever you can. The response time for an AJAX function is amazing and it is really not that hard to do Basic AJAX [googlepages.com].

Seven: Use JProbe to see where your application is spending its time. You should be bound by the database. Anything else is not appropriate.

Eight: Based on your findings using JProbe, make code changes to, perhaps, put a frequently-used object from the database into a class variable (static).

These are several ideas that you could try. The main thing that experience teaches is this: DON'T optimize and change your code UNTIL you have PROOF of where the slow parts are.

Re:Lots of Options (0)

Anonymous Coward | more than 5 years ago | (#20332069)

Second: ignore the ass above who said dump Java. Modern hotspots have made Java as fast or faster than C/C++. The guy is not up to date.

Really?

http://shootout.alioth.debian.org/gp4/benchmark.ph p?test=all&lang=gpp&lang2=java [debian.org]
http://shootout.alioth.debian.org/gp4/benchmark.ph p?test=all&lang=gcc&lang2=java [debian.org]

Java has certainly made performance gains over the years but it is most certainly not a replacement for C/C++ if you care about raw performance at all.

Re:Lots of Options (0)

Anonymous Coward | more than 5 years ago | (#20332447)

Java is just terrible, those who think otherwise are just kidding them selves.

You get Performance, Easy of Use , or Cost pick 2 (0)

Anonymous Coward | more than 5 years ago | (#20333327)

Anyone who uses Java and performance in the same sentence as fast, hasn't been around enough heavy systems to know what the hell they're talking about. You get Performance, Easy of Use , and Cost pick two. It's an old rule, but still applies to day. The C/C++ folks are laughting their arse(s) off at java and performance. And if anyone is old enought to know asembler try not to cringe to much when Java or the C/C++ folks talk about performance.

Disaster planning (1)

jhines (82154) | more than 5 years ago | (#20332301)

It sounds like you need to some basic disaster planning. Think in terms of "what if this happens?"

Like you loose your data center? How good is your backup, is it off site, do you have a tested plan for restoring the data and system on an interm basis on someone's system?

Then you can look at some more specific things, what happens if I loose this server, this connection, this router, and specific services, DNS, Email, etc.

The big question $$$ depends on how much you have to loose. If you can afford a day of downtime, you don't have to spend as much effort on HA as say the NYSE, or an airline.

you fail itW. (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#20332377)

BSD culminat3d in

High availablitiy and scalability... (1)

pjr.cc (760528) | more than 5 years ago | (#20332667)

These really are 2 different things. Though they do sometimes cross over - oracle RAC is a good example of that.

As for where to read from a developer perspective? (which alot of people replying seemed to have missed the actual question). There are TONNES.

But split the question in two, where can i read about HA:
start here-> http://en.wikipedia.org/wiki/High_availability [wikipedia.org] Theres also many books on the subject (i remember one of the few i happened to like is the things that came out of the sun blueprints books). The problem with HA is it very subjective. You can talk about HA for say web applications and just talk session sharing and an intelligent load balancer (ironically, the same thing gives you scalability until you get to the DB) or you can talk all the way down to fault-tolerant hardware. Also take a look at the whitepapers that came out of such projects as mosix, VAX clusters, oracle HA (both RAC and dataguard), IBM Websphere (There alot in the various IBM sites about HA for all their products and one is bound to be similar to yours in nature), Sun J2EE. Alot of these do go into development aspects as well and give some fantastic concepts and paradigms to follow. But you really need to define the requirements for HA. i.e. 0 dt or 30 minutes dt is a HUGE difference! (and that really is just one scenario in many, and as a developer your usually faced with multiple requirements).

Scalability is a different issue - and usually very application and environment dependent. Again http://en.wikipedia.org/wiki/Scalability [wikipedia.org] is a good start but finding general literature is often very hard because its so dependent on the situation and the application.

Personally, i've found i learn best from example. Such things J2EE application servers (websphere, sun JES, etc), load balancing, oracle RAC vs dataguard, mysql ndb vs replication vs read-only replica database methods, apache, php, samba, windows (most things), pick just about any main-stream application and it'll almost certainly cover both HA and scalability at a level helpful to a developer.

If you want to get even more complex - take a look even cooler forms of scalability and HA that involve things like utility computing (vmware DRS, or egenera for eg). Have a look at their design documents because they offer even more diverse examples of both subjects at a more abstract layer (i.e. even below the OS and entirely on the HW)

In both cases, its hard to go from a "we weren't thinking about HA or scalability scenario when we build it" to "its HA and scalable". HA tends to be a little easier because clusters can wrap themselves around almost any situation, but scaling on such systems usually means "i need bigger, faster and more CPUs, more memory and better disk until i can figure out how to code scaling into it".

Always keep in mind though, the law of diminishing returns almost always applies.

Hard stuff (1)

ldapboy (946366) | more than 5 years ago | (#20333147)

It's just hard to do, and I've never seen a good book on the subject
(in fact I've considered writing one on and off for years but sadly
the $$ I cam make as a consultant on performance, scalability and
availability far exceeds the likely rewards from publishing a book).

Best advice is to look at some open source projects that are used
in highly scalable applications. The other thing I'd say is that
there isn't one true technique -- at this point everyone makes up
their own solution as they go. Often the applications' characteristics
drive the scaling architecture so each application is different.

A second or two? (1)

Bluesman (104513) | more than 5 years ago | (#20333247)

On modern hardware, on an internal network, "a second or two" is an eternity. Instead of worrying about what would happen if all 60,000 people used the app at once (unlikely), I'd find the bottlenecks you have now and fix those.

Prioritize. You have statistics already about typical usage, and typical wait and service times. Fix the problem that exists, instead of the problem that doesn't, but might someday.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...