Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: Optimizing Apache/MySQL for a Production Environment

Cliff posted more than 15 years ago | from the tweak-it-till-it-bleeds dept.

Linux 143

treilly asks: "In the coming weeks, the startup company I work for will be rolling out a couple of Linux boxes as production webservers running Apache and MySQL. Management was quick to realize the benefits of Linux, but I was recently asked: "Now that we're rolling out these servers, how do we optimize out of the box RedHat 6.0 machines as high performance web and database servers in a hosting environment"?

cancel ×

143 comments

Sorry! There are no comments related to the filter you selected.

Re:Optimisation of Apache/db (0)

Anonymous Coward | more than 15 years ago | (#1718610)

separate partition or separate hd?

Re:Optimisation of Apache/db (0)

Anonymous Coward | more than 15 years ago | (#1718611)

separate disks, controllers, partitions wherever possible.

Re:Optimisation of Apache/db (0)

Anonymous Coward | more than 15 years ago | (#1718612)

Hmmm... putting files on different controllers and disks but the same partition. Now that's a neat trick.

Re:Einstein of the year award (0)

Anonymous Coward | more than 15 years ago | (#1718613)

Uh, no dude. That's just somebody being funny. And like most funnicisms, there's a large grain of truth in it.

Re:Einstein of the year award (0)

Anonymous Coward | more than 15 years ago | (#1718614)

Did you check out his home page? He's a Ph.D student who lists "HTML" as a programming language.

Re:Stop being bitter about BSD losing (0)

Anonymous Coward | more than 15 years ago | (#1718615)

Thanks for your information-free post...naturally leading to *this* information-free post... injecting noise into discussions since 1993

Re:Here's what you do (0)

Anonymous Coward | more than 15 years ago | (#1718616)

This is very easy to say. I could just as easily say "Format your harddisk, install NT Server + 4 NICs". What hard data do you have to back this up? Any? Or are you another wannabe hacker blowing smoke?

Diff. boxes... (0)

Anonymous Coward | more than 15 years ago | (#1718617)

While this tip isn't specifically tuning, more likely optimization. We recently had to go through and restructure service to ensure rock solid reliability. Our best effort at this was to split the web serving, from the MySQL engine, from the actual DB. We came up with a structure that stuck two web servers on the front end, backed by two MySQL servers that talked to two replicated RAID arrays way on the back end. The whole point of this was to a) allow us to stick a load balancer on the back end, and b) ensure maximum performance for each component. It's not the cheapest way to do it, but it certainly kicks ass from a performance and reliability stand point...

Caching (0)

Anonymous Coward | more than 15 years ago | (#1718618)

If you anticipate a lot of hits, you might want to precompute/cache most of your database access, if at all possible.

There are a couple of experimental systems about which do this kind of thing. For example prol [uct.ac.za] does aggressive caching (but is not db-backed). I once heard somebody say that Roxen [roxen.com] also does caching, but I have not investigated that.

Re:Here's what you do (0)

Anonymous Coward | more than 15 years ago | (#1718619)

Yeah, right. There's a reason FleaBSD is not as popular as Linux. It's spelled p-e-r-f-o-r-m-a-n-c-e.

high-performance/high-availablity apache (0)

Anonymous Coward | more than 15 years ago | (#1718620)

Scaling up Apache is as easy as putting out a web server farm and setting up DNS round-robin entries for each web server. If you then add in use of the Fake [zipworld.com.au] package, it become trival to create a high-availablity enviroment.

Recompile the mySQL into apache? (0)

Anonymous Coward | more than 15 years ago | (#1718621)

Has anyone tried this?

Re:Einstein of the year award (0)

Anonymous Coward | more than 15 years ago | (#1718622)

The point being, the idiot that mistook an anonymous poster for Rob Malda was a Ph.D candidate...a slap in the face to the notion that "Dr." in front of your name guarantees in intellectual giant. Possibly this hit a little too close to home?

Flamebait? (0)

Anonymous Coward | more than 15 years ago | (#1718623)

I don't see why this comment got marked down... It just said that PHP based sites go together a lot quicker and easier than Perl CGI ones, which is true.

Re:Performace tuning (0)

Anonymous Coward | more than 15 years ago | (#1718624)

I forgot to add this:
One of the most expensive operations is the initial database connection. Use persistant connections wherever possible. (PHP is very nice for this)

Jerry

Optimizing PostgreSQL and Apache.... (0)

Anonymous Coward | more than 15 years ago | (#1718625)

I've been using PostgreSQL and Apache on redhat for a while (works great, good perf and stability) and the most important performance measure was separating the cgi/static server from the database server. I wrote a little library object on the cgi server that made a network connection to the database server (running PostgreSQL and Apache/mod_perl with normal Pg connections) so that the Apache on the db server provided a natural connection pool. Send the SQL statement back, get the results forward, and ta-da--automatic load-balancing across two machines and connection pooling at the same time. I also thought about splitting database tables across two (or more) database servers, and having the cgi server's library object run the two searches in parallel (perl threading, ain't it great) and combine the two incoming results (doing sorting and such). Anyway, there are lots of cool things to be done.

Software RAID? (0)

Anonymous Coward | more than 15 years ago | (#1718626)

What kind of idiot uses Software RAID on a production box where they want performance. Sheesh

Is this ok? (0)

Anonymous Coward | more than 15 years ago | (#1718627)

Have each Apache child fork a cgi process written in C/C++ linked with "libmysqlclient.a" (included in MySQL) to query the DB. Once the cgi is working, debugged and memory leak free, and only if you need the performance, convert your cgi into an Apache module. Stephen schan_ca@rocketmail.com

Install Windows NT with SQL Server 7 (0)

Anonymous Coward | more than 15 years ago | (#1718628)

It's already been proven over and over again that Linux with Apache and MySQL makes for a very poor web server platform. Just look at how poorly slashdot.org stays up. Therefore you should install something proven to be fast such as Windows NT with SQL Server 7. HA!

Re:A favor please. (0)

Anonymous Coward | more than 15 years ago | (#1718629)

yeah right, that's the first thing on every startup's mind... welcome to the commercialization of linux pal, it's downhill from here.

Depending on if your site is read-only or not (0)

Anonymous Coward | more than 15 years ago | (#1718630)

If the site is read-only (static data, like a news site or something), then RAID is a waste of time and money.

IF (1) then read "Web Performance Tuning" by ORA (0)

Anonymous Coward | more than 15 years ago | (#1718631)

This book pretty much says it all and is much more cetailed and investigative than the arguments here.

Re:Einstein of the year award (0)

Anonymous Coward | more than 15 years ago | (#1718632)

Logo kicks ass! What are you talking about. I challenge you to a duel of the logo. I am gonnd PEN DOWN with you buddy!

Ultimate performance improvements (0)

Anonymous Coward | more than 15 years ago | (#1718633)

The best thing you can do is format the drive, and install NT. Much better performance for dynamically built HTML and database connections. Has been proven by many different benchmarks. For a database, try Oracle 8i.

SlashDot can't keep it up (0)

Anonymous Coward | more than 15 years ago | (#1718634)

SlashDot can keep it up about as long as a high school virgin. It's pretty damn sad. And when it is up, it's slow as shit. Why do people want to use Apache & MYSql again...?

Re:Optimisation of Apache/db (0)

Anonymous Coward | more than 15 years ago | (#1718635)

Okay, it's not a RAID stack, a stack implies that you want your data to come out in LIFO order. A RAID is, like the acronym stands for, an array.

AOLserver for example (now open source) (0)

Anonymous Coward | more than 15 years ago | (#1718636)

AOLserver [aolserver.com] has always had very good database integration. With v.3 now open source this is a very smart choice for database backed web sites. There are drivers for a number of databases (including postgres). Scripting is done with TCL or ADP pages (similar to PHP and ASP).

mulitprocessor hardware? (0)

Anonymous Coward | more than 15 years ago | (#1718637)

[warning questions from linux newbie] i putting together requirements for new web/db hardware and would like to eventualy move our production env to linux. how well does linux take advantage of mulitprocessor hardware, or support multithreading in applications such as apache and other database packages? how does spawning a process differ from spawning a thread performance wise, and reliablity wise? [less deadlocks vs process startup overhead..etc?]

why redhat? (0)

Anonymous Coward | more than 15 years ago | (#1718638)

I'd say REDHAT's no more good for high performance, reliable servers.
Debian for general distributions or look for a distros that are optimized for your purpose!
- from a debian user

Re:MySQL, ?? (0)

Anonymous Coward | more than 15 years ago | (#1718639)

generally database interface in PHP3 is one of
the most idiotic designs I ever seen


PHP does not have any general database interface, thus support for it cannot be bad.

PHPLIB's generic db layer is nice. Get it here [shonline.de]

It couldn't have anything to do with BANDWIDTH (0)

Anonymous Coward | more than 15 years ago | (#1718640)

It must be that crappy Apache web server that most of the Internet is using.

I'm sure the hardware manufacturers & retailers love hearing comments like that.

3 words... (0)

Anonymous Coward | more than 15 years ago | (#1718641)

hotmail dot com

Whiner (0)

Anonymous Coward | more than 15 years ago | (#1718642)

If it was a really good solution, you wouldn't have to complain about Bandwidth. Did you see windows2000test.com complaining about bandwidth? No!

Install NT (0)

Anonymous Coward | more than 15 years ago | (#1718643)

Linux doesn't work on multiprocessor hardware. You should install NT instead.

Re:3 words... (0)

Anonymous Coward | more than 15 years ago | (#1718644)

6 servers for support.microsoft.com, which serves 129000 visitors per day (2.3M page views on 100000 pages) Something tells me they are pretty heavy duty servers, as well. see http://www.microsoft.com/backstage/bkst_cs_support online.htm

Call me a whiner if you must. (0)

Anonymous Coward | more than 15 years ago | (#1718645)

But, you're right. They didn't blame it on bandwidth. But then, they do have to leave a marketing opportunity for a subsequent release of Windows.

"Windows 2003 will solve *all* of your stability problems"

That *must* be it (0)

Anonymous Coward | more than 15 years ago | (#1718646)

Probly why the busiest site on the internet (cdrom.com) runs Linux. doh, wait it runs FreeBSD. On *one* computer. Not 20 or 10. 1.

Re:That *must* be it (0)

Anonymous Coward | more than 15 years ago | (#1718647)

I actually have to appologize for being to harsh. Linux does have a lot of advantages in compatibility with high end hardware (scsi raid cards and such.)

Re:Well, here's what I know (for what it's worth) (0)

Anonymous Coward | more than 15 years ago | (#1718648)

>using perl and CGI is quite clumsy for this sort
>of thing. I eventually switched to PHP3 because
>everything goes together much faster.

Well, while PHP (or indeed any scripting language) reduces lead time, it makes it easier to make some elementary mistakes, such as not separating code, formatting and content properly, or using good programming style. I'm not saying it's not possible to do it right, just that it doesn't seem very easy to do it right in PHP.

> world's slowest Web server hardware (the
> database server is a 486dx2-80

Now come on! Until quite recently, our web/db server was a 486dx2-50, but then we gave those nice people at DNUK.com a call...

Oh, and this is anonymous so that I get flamed to hell, but don't get to hear about it at work!

Re:Performance tips for Apache... (0)

Anonymous Coward | more than 15 years ago | (#1718649)

> Memory is pretty damn cheap currently SDRAM is getting more and more expensive...

Re:Einstein of the year award (0)

Anonymous Coward | more than 15 years ago | (#1718650)

You can write an emulator of a (memory limited) Turing machine in COBOL and LOGO. This qualifies them as "full" programming languages IMHO.

Perl compiles (0)

Anonymous Coward | more than 15 years ago | (#1718651)

> And before you start some "it's not real programming unless it's
> compiled" rant, tell it to a perl hacker...

Actually perl does a "run-time-compilation" before it executes something...

s/perl hacker/shell skripter/ for a better example.

Re:Optimizations (0)

Anonymous Coward | more than 15 years ago | (#1718652)

> Due to FreeBSD's proven track record for Web/Network performance, stability,
> and security (e.g. Yahoo, wcarchive, and others), it's a natural.

The best examples are missing: www.apache.org runs apache (haha!) on FreeBSD and don't forget the now MS-owned company that sends you so much spam...

Optimisation of Apache/db (1)

Anonymous Coward | more than 15 years ago | (#1718669)

Haven't done it with mySQL, but with Postgres/PHP/Apache I allocate 1Mb of RAM for each httpd child, and then enough left over for the entire db in RAM. I also run a separate server to feed images only, off a RAID stack. Make sure that the apache htdocs is on a separate partition and controller than the db files.

Watch mysqld's nice level. (1)

Anonymous Coward | more than 15 years ago | (#1718670)

My mail server was showing some strange performance problems under high loads which were caused by mysqld (used for authentication) running at nice +5. Apparently safe_mysqld did that behind my back. Under high loads mysqld would be put on the backburner.

A favor please. (1)

Anonymous Coward | more than 15 years ago | (#1718671)

The Apache/MySQL/RH is a common combination that will continue to grow in popularity for small and medium size installations. Judging from the responses that I have read no-one has performed a serious analysis of the performance issues of this trio. Most of the suggestions are useful first-pass ideas but there are likely other specific tuning approaches that can be used.

I ask that you document your development specifically focusing on any novel solutions that you found that increased performance. A faster CPU doesn't count (sheesh). Also, put this info together in a concise readable format and provide it to the Apache site or the Linux Tuning site (forget the URL at this moment.) It's very important that work of this type be formally documented and accessible.

Performace tuning (1)

Anonymous Coward | more than 15 years ago | (#1718672)

You'll probably need to tune your configuration on several levels: hardware, OS, application and sql/other.
Hardware Tuning:
- Use a caching raid controller fully populated with cache configured for raid level 0+1.
-Use IBM or Seagate 10000 rpm SCSI drives with lot's of cache.
- Consider multiple SCSI cards (or channels) to separate the OS + logs, indexes and data files on to separate raid arrays.
- Also strongly consider using separate web and database servers, so each can be fully optimized for its job.
- Obviuosly use as much ram as you can afford. (preferably 100 or 133 mhz)
-Uses multi-processor computers for the database and web servers.
- Connect the web and database servers using a back end network, separate from the internet connection.

OS Tuning:
I'm not terribly familiar with tuning the Linux OS, but I suspect that there are many resources already available.
In general you'll want to:
- Optimize the block size on your raid array's for mazimum performance (trial and error using bonnie or the like)
- Optimize the amount of memory used for cache.

Application Tuning:
- Look at http://www.mysql.org/Manual_chapter/manual_Perform ance.html#Performace for a list of parameters to tune Mysql with. You'll either need to talk with TCX about using the available optimizations, consult with someone else who does, or spend a lot of time with trail and error, since these optimizations are very dependant on your hardware configuration and what type of work you'll be doing with the databases (ie. write intensive, read intensive, or both)
- Look at http://www.apache.org/docs/misc/perf-tuning.html for information on tuning apache.
I would first suggest using PHP, but barring that, I would definately use mod_perl. There are probably a lot of other sources for tuning apache available on the Internet.
A suggestion: optimize the number of server children with the number of availble processors.


SQL/other Tuning:
Understanding how to properly build tables and indexes is somewhat of an art, but you can really make or break the whole site with proper use of SQL and indexes. I'd either spend some time learning table/index design, and coding sql for performance, or consult someone who knows.

Hope this helps a little bit.

Jerry
jerry@bellnetworks.net

Check out the optimization tips page at apache.org (4)

Anonymous Coward | more than 15 years ago | (#1718673)

At the Apache.org web site there is a guide to optimize Apache's performance.

Also Dan Kegel wrote an interesting web page in response to the whole Mindcraft NT/IIS vs. Apache/Linux fiasco and on that page are several detailed measures to improve Apache's performance under Linux:

Dan Kegel's Mindcraft Redux page [kegel.com]
Apache Week 'zine [apacheweek.com]

...as for my own personal experience w/ Apache I learned that when compile Apache, remove any Apache modules you won't be needing saves plenty of RAM, and in the httpd.conf file you want to set StartServers, MaxClients, and MaxRequestsPerChild so that Apache does not spawn new children too often -- the trick is before you start Apache look at "top" count the number of processes, now start Apache under normal traffic conditions, look at number of processes you're running now to see how many http children are running -- whatever that number is add 10, and that should be your StartServers setting. The MaxRequestsPerChild default is 30 but I like to crank it up to 300 or more so that http children are not being killed and recreated too often (the reason for that setting was to avoid possible memory leaks from sucking up all your RAM which hasn't been a problem with the httpd's I've worked with)

Here's how I handle several million hits per day.. (5)

Anonymous Coward | more than 15 years ago | (#1718674)

Basically, I just winged it. My site started out with maybe ten thousand hits per day, but quickly (over the course of two years) ramped up to about 5 million hits a day. I just hacked together some Perl scripts, and when I need to make changes, I just try 'em out on the production server. Who needs beta testing? If there are performance problems, I just buy faster hardware. If there are stability problems, people are understanding, after all, I *am* using Linux.

Sincerely,

Rob Malda

Re:General purpose advice (2)

Tom Rothamel (16) | more than 15 years ago | (#1718675)

Eliminate the use of directory overrides (via .htaccess) wherever possible. They're usually not worth it.

Not only that, turn them off. (AllowOverrides None, IIRC) If you simply don't use them but have them enabled anyway, you pay the price WRT all the stat(2) calls the server does looking for them.

This is all IIRC, but I usually have a good memory. Then again, I did just wake up.

Re:MySQL vs PostgreSQL (2)

Ranger Rick (197) | more than 15 years ago | (#1718676)

Basically, it comes down to: Postgres is much more complete (it has more of the SQL spec implemented -- transactions, etc.). MySQL is much faster. It all comes down to how you expect to use it. If you are going to be doing complex joins and transactions and such, MySQL probably won't cut it (yet), otherwise, MySQL (most definitely!) makes up in speed what it lacks in features.

There's obviously more to it than that, but I'm not aware of any specific comparisons...

fhttpd (1)

Alex Belits (437) | more than 15 years ago | (#1718677)

---plug alert---plug alert---plug alert---

My fhttpd [fhttpd.org] with combination of MySQL and PHP can be considered, too -- it allows some configuration options and optimizations that Apache doesn't provide -- you can limit the number of connections to database, use separate userids for sets of scripts, etc. If you want even more performance, program in C or C++ can be written as its module, and the API [fhttpd.org] is much easier to use than one of Apache.

Clarification of LIKE vs. == (2)

rodent (550) | more than 15 years ago | (#1718678)

By LIKE I was refering to full substring matching using LIKE "%foo%". "foo%" will use the index but "%foo%" won't and on my server running 4 %% types would bring it to a crawl. I finally had to go with exporting the table to a flat file after updates and use an awk script to search. It can handle 40 concurrent searches with awk.

RAM & RAID 1+0 is your friend. (3)

rodent (550) | more than 15 years ago | (#1718679)

Personally, I designed and currently admin a site that gets about 1 mil hits/day. Over a weeks time it averages about 50 queries/second with peaks at 500 queries/sec. The setup is dual p2/400's with 512 megs ECC (soon to be a gig) and the db's on a lvd scsi drive. The db's run a total of 2 gigs. There's typically 50 apache processes running at a time.

As for optimization, definately check your queries and always use keyed fields and == queries. Doing like queries will kill your performance to being unusable on decently large tables (>100k records). Definately read the MySQL docs concerning RAM usage and the various switches to optimize it's RAM usage. That is extremely important.

As for Apache, don't use .htaccess at all costs and only compile in required modules. Also check the tuning FAQ mentioned above.

Idiot of the year award (2)

Eric Green (627) | more than 15 years ago | (#1718680)

"HTML Programmer", eh? Talk about a skill that will be obsolete in 20 years, when we're all using XML and have WYSIWYG XML editors...

BTW, programmers write programs, not text. So "HTML Programmer" is a misnomer in the first place -- that should be "HTML page creator".

-E

Perfomance tuning and Availability (1)

tuttle (973) | more than 15 years ago | (#1718682)

You should consider availability in your plan. I recommend having a hot spare box. This is quite easy to do for the web server side. Just make sure the content on the two servers is synced up. Perhaps using rsync( or rsync through ssh for a little more security ).

On the database side theres a feature of mysqld called --log-update. Call it using mysqld --log-update=/usr/mysql/update_logs/update. This will create a log of eveything that changes in your DB and can be reinserted back into the mysql monitor. To go along with this everytime you call `mysql --flush-logs` a new update file will be created as update.# - where # increases for each call. At this point there are quite a few scripts written to insert this log file into another DB - most of them use perl DBI.

To increase the performance of your setup there are several options noted in the mysql manual. But none of them will do a whole lot of good if the queries and tables you construct are poorly designed and indexed.

Depending on what scripting language you use theres probobly a way to compile it into apache. Whether it be mod_perl,pyapache or PHP. I would plan on doing this. A good way to speed up your system after this is to run 2 httpd servers.

The first server compile plain apache with mod_rewrite and proxy support, the second server compile in your application support. If you put all your applications in one directory you can easily proxy to them with proxypass.

ex. proxypass /perl server:88/perl

This way things like images and html will be served by a webserver that only takes up 400-500K instead of one that could take up to 10M-20M depending on how many scripts and libraries are in memory(Of course some of that is shared). When your server gets hit hard you'll probobly notice having maybe 5 - 10 times as many regular servers as there are application servers this way.

A more advanced thing with this setup is to utilize your backup server. Its will take a little work but you could have apache proxy to a list of application servers that exist in a config file, and have this config file altered based on system availabilty. At this point though it may just be easier to get localdirector, unless your organization is really strapped for cash.

Re:Einstein of the year award (1)

jd (1658) | more than 15 years ago | (#1718685)

So? I've known people who consider LOGO, or even COBOL a programming language.

Tuning webservers (3)

jd (1658) | more than 15 years ago | (#1718686)

Here's my quick list of things I do, when tuning the webservers I've set up in the past. Note: I offer NO guarantees to the usefulness of this information. For all I know, it'll turn your pet hamster into a frog.

0) If you have LOTS of RAM, compile Apache, MySQL and optionally Squid with EGCS+PGCC at -O6. The extra speed helps.

1) Guesstimate the number of simultaneous connections I'm likely to have.

2) Guesstimate how much of the data is going to be dynamic, and how much static.

3) IF (static > dynamic) THEN install Squid and configure it as an accelerator on the same machine. Give most of the memory over to Squid, and configure a minimal number of httpd servers. You'll only need them for accesses of new data, or data that's expired from the cache.

4) IF (static 5) If you've plenty of spare memory, after all of this, compile the kernel with EGCS+PGCC at -O6, but check it's reliability. It's not really designed for such heavy optimisation, but if it works ok, the speed will come in handy.

NOTE: Ramping up the compiler optimiser flag to -O6 does improve performance, but it also costs memory. If you've the RAM to spare, it is sometimes worth it.

Re:Here's how I handle several million hits per da (1)

dclatfel (2737) | more than 15 years ago | (#1718687)

Is this for real? Rob Malda posting as an Anonymous Coward? What's up with that Rob?

Re:Performance tips for Apache... (1)

Oestergaard (3005) | more than 15 years ago | (#1718689)

Mount the /home/httpd filesystem with the noatime mount option, then there will be no writes generated from read-accesses, and your ext2fs will effectively work as a ram-disk, if you have enough memory. Only, it's a helluwalot easier to just change stuff where it resides, and know it will be written to disk, instead of back-forth-copying tar images...

Second benefit: if you really get low on memory, you're fucked with a 1 Gig RAM disk, whereas the disk-cache will quickly be thrown away and used for whatever memory hog you have running.

ram-disks are good for booting over-modularized kernels _only_.

Re:Software RAID? (1)

Oestergaard (3005) | more than 15 years ago | (#1718690)

Software RAID is usually faster than hardware RAID.

It's also more flexible (think RAID of RAID of network block-devices).

It's also cheaper.

It has features that some hardware controllers don't even have (like background initialization).

What kind of idiot talks about software RAID without knowing jack about it ? :)

Re:Idiot of the year award (1)

drewpt (3975) | more than 15 years ago | (#1718691)

In 20 years from now, there will be something else that we're using. XML won't be it.

Re:Software RAID? (1)

James Manning (4620) | more than 15 years ago | (#1718696)

Ya know, I truly find it hilarious that people believe hardware raid has some huge benefit...

I've been using software raid, both over normal SCSI and hardware raid, in production servers for quite some time... hardware raid, even fast ones like DAC1164P, are going to get smoked in something like RAID5...

Think about it... do you want to do 64k XOR's 32-bits at a time on a single strongarm 233 (at best) or half/full cache lines at a time using SIMD on an SMP?

The best approach so far has been to allow hardware to handle raid0 for simple striping and disk management, and leave the XOR's and large chunks done in your main processors (after all, this is all streaming to prefetching helps a good bit) if you can afford the cycles.

Refer to linux-raid archives and my performance postings there with any questions

Re:Depending on if your site is read-only or not (1)

James Manning (4620) | more than 15 years ago | (#1718697)

Nothing could be further from correct.

If read-only, raid 1+0 allows striping reads across all physical drives, so you get the performance benefit of raid0 with the mirroring (and drive death survival) of raid1. If you don't care about data redundancy (you might want to care about making sure your site is available), a pure raid0 will still get data off drives faster than a single drive.

Of course, I'm a strong software raid advocate, with a switch to hardware when it's cheaper to offload those cycles to other chips rather than speed up (or increase the number of) main processor(s).

Sites that have a lot of writes, OTOH, have to balance data amount available vs. performance (etc) wrt raid1, 5, or 10.

Re:RAM & RAID 1+0 is your friend. (1)

James Manning (4620) | more than 15 years ago | (#1718698)

Was it an indexed field? If so, the search should have been binary on == and possibly binary on LIKE (not sure though)... Any int fields you search on should definitely be indexed :)

Re:RAID! (2)

James Manning (4620) | more than 15 years ago | (#1718699)

First, PLEASE don't point people to that horrible howto... as soon as Linus will accept the real software raid versions (and howto) available over at:

http://metalab.unc.edu/pub/Linux/kernel.org/pub/li nux/daemons/raid/alpha/

and

http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/

Second, realize 0+1 (typically 1+0, or RAID 10) only gives you half of total physical space in effective space.... sometimes you can afford that, sometimes you can't... and you still generate the scsi bus loads of the full drive set :)

In the very typical (especially in these situations) case of reading the databases, it's worth agreeing that 1+0 becomes 0+0 (since you can split reads across a raid1, assuming no failed drives)

Last, as a side note to the mysql part, try to use isamchk (if the db server can have any down time) for pre-sorting your database instead of doing the sorting as part of your SQL

All the money in the world (1)

Sixty4Bit (6131) | more than 15 years ago | (#1718700)

We have successfully covered the topic if you are going to try and use fewer dollars. But if money were no object would you consider other technologies besides Apache/PHP/MySQL/Linux?

Such as Netscape/Cold Fusion/Oracle/Sun?

Besides not being able to call on the experience of all of you guys when the going gets tough, what are the other drawbacks besides the obvious (MONEY)?

MySQL is not a solution for me. It lacks many features that Sybase or Oracle provide (can you say TRANSACTIONS?). Netscape and Cold Fusion have better integration of security. Has a benchmark been done on PHP vs Cold Fusion? PHP seems to be able to handle Cold Fusion's role pretty well according to PHP's site.

Is the answer truely a mish mash of the both? Pay for Netscape for the SSL and Oracle for the STUD (I still like Sybase better) of a database that it really is, but go freeware where you can?

Just looking for a couple of good opinions.

a couple ideas (3)

felix (7014) | more than 15 years ago | (#1718701)

So some of the best things you can do have already been mentioned - split out your database from your front end webservers, let the backend have it's own machine and run raid 0+1 on the db server. The frontends won't need the raid since they'll be serving a lot of the static stuff out of cache.

Some other ideas, are to split image serving onto it's own apache, not necessarily it's own box. This apache can be completely pared down to absolute minimum modules, since all it will be doing is serving up static images. It also let's cache be used efficiently, since mostly the common images will be stored. As opposed to common images contending with common text files for cache space if images and content are served from the same apache.

Also, what are you using in apache to create dynamic pages and connect to the db? Use long running processes where possible, which means pick mod_perl, php, fastCGI, servlets, etc... over plain cgi scripts. This will save you lots of cycles and also let you have persistent db connections. Always a very good thing.

Taking the splitting out of machines to the next level, you could also try splitting all of your dynamic content to it's own machine, mod_proxied through your front end apache's. This makes the front ends very small since they barely need any modules installed at all. It also gets some extra performance out of your dynamic content apaches. Of course you're running a lot of boxes now. :)

Read this [apache.org] if you're running mod_perl. And read this [mysql.org] to optimize your db.

Top 10 Cheap Failover Strategies (0)

Fudge.Org (7036) | more than 15 years ago | (#1718702)

From the home office in RTP, NC:
  1. if you use the word db in any web application remember that the web server is merely acting like one very voracious user acting on behalf of lots of web site visitors
  2. use seperate hardware for the web server and for the database - resist the temptation to keep them on the same machine for $$$ reasons
  3. along these same lines... when available add more db servers that have replicated db's and tables using mysqldump and mysql db put OR statements in your MYSQL_CONNECT on the web page you write to allow it to failover to the db's that you have available when your db servers time out or exhaust the number of connections they can field
  4. render as much static content as possible using pre processors vs. having it "on the fly".
  5. recompile everything and look at the pages that cover the flags and considerations at compile time
  6. never never allow a user to "build" something that remotely looks like SQL in the URI and passed arguments to a CGI or application... if you let them stick malicious JOIN's into the MYSQL_QUERY your db will likely be choked over time
  7. if you have RAM to use put as much as possible on the web server and if you have fast processors give it to the db before you give it to the web server
  8. always put LIMIT into your mysql apps if you can get away with it
  9. write init scripts that will stop and start your db servers if you experience peak loads
  10. always put a "nice" message in the die statment for your sql connection... something other that "cannot connect to database"... make it something more flowery :)
I am sure there are more but that's all I could think of right now. :)

"You cannot uncook Mushoo pork once is has been cooked" -- wiseman

Linux and Databases (2)

doomy (7461) | more than 15 years ago | (#1718703)

I work in a research lab that does a lot of databases on Linux. We started off with msql and then graduated to mysql. We were initially running redhat with msql and slowly moved to Debian, since we felt it was a more stable server distribution. Also it was more configurable, and we were able to tweak almost anything in the system to it's limit. Recently, we moved to Oracle 8i, but we kept our mysql around.

Some of the thing you might need to know. If your going to do some serious databases, I recommend you spend more money in faster harddisks (SCSI preferble, multiple disks (oracle runs very nicely with the database spanned over 3-4 disks and the program running on another disk -- partitions wont do ). Have a generous amount of RAM and swap. If your making this a database box, dont use it for anything else. Even hosting a web server is not a good idea (As far as I'm concerned). Use WebDB if you like and host the database box seperately with just the database running as the main application.

Make sure you have a stable kernel. Make sure you have a secure system. Use ipchains to block out anything but local and remove all telnet and other daemons. Security is something a lot of people forget when making large databses.

Make sure you make daily, if not hourly backups (based on how sensitive your data is). RAID is a good way to keep your system running. Also if your database is web based, you might need to have 2 or 3 boxes set up identically and databse queries being distributed over all of them.

With Oracle, read everthing, they have a lot of tweaks listed on their pdf files and documents that come with the dist. Read all of them. Some tweaks are to the kernel. So pick a good stable kernel and stick to it. Forget about monthly kernel upgrades. I recommend yearly or every 6 months kernel upgrades. Software wise, if your doing Oracle 8i, make sure it's a glibc2.1 system (RH6 and debian potoato (we use potato, even though it's unstable, it lets us tweak the system and gives us the most familier interace ).

On mysql, it might help to read some of the online tweaks, also it might be a good idea to compile the server yourself, instead of using the one that came with your dist. Or compile it and copy it over what came with your distribution. Dont use msql unless there is no other way to do it.


And good luck.
--

Re:RAM & RAID 1+0 is your friend. (2)

gampid (8492) | more than 15 years ago | (#1718704)

Not to be heretical or anything but I was doing some benchmarking on MySQL's LIKE versus == matching on int's. It was actually faster using LIKE. I don't know why but I suspect it's because LIKE uses some sort of binary tree to find the int and the == tries to walk through them. This is not the case when you're using like to match a string or substring, in that case == seemed to work better.

-Evan

Re:mulitprocessor hardware? (1)

kuro5hin (8501) | more than 15 years ago | (#1718705)

Apache was written with reliability as a higher priority than raw speed, hence it's multiprocess rather than multithreaded. Threaded servers will tend to be faster, because there is a definite overhead for starting new server processes rather than starting a new thread, but on the other hand, it's way way harder to write a solid multithreaded server, so if your threading server has a problem, the liklehood is the failing thread will take out the rest of the server. If an Apache process fails, it'll just quietly die, while the server will continue serving.

As for multiprocessor hardware, Linux works just fine for me. I'm writing this on a dual P3, and my other workstation is a dual PPro. I haven't tried it on boxes with > 2 processors though. For a web server, more processors are unlikely to get you any benefit, however. I'm pretty sure that apache on a single processor will easily saturate your network bandwidth, no matter what it is. Now if you're doing really complex CGI's, like, for example, some kind of real-time stock calculations, that require a lot of processing, then multiple processors might help. But if this is the case, I'd probably advocate hooking up several boxes in parallel (Mosix [huji.ac.il] is designed for this) and farming your CGI's out to idle processors on separate machines. Your Database might also benefit from multiple processors, but (for a properly indexed DB) probably only in extremely liminal cases (very, very large DB's), and if so, you should have it on a separate machine too. In general, spend the extra money on RAM instead of another processor. Your clients will thank you :-)

Re:MySQL, ?? (2)

kuro5hin (8501) | more than 15 years ago | (#1718706)

Depends on what you want your DB to do, really. I can't speak specifically to syBase (I've also heard good things about it) but I know why we use mySQL. It's fast, and very low overhead for queries.

The caveat to this, of course, is that you must know how to set up your database right. I recently had an opportunity to play around with a fairly large db (upwards of 400,000 records) on mySQL. The records represent people, and some of the fields are birth month, birth date, last name and first name. I wanted to select las and first names for people who were born today. So, with no indexes, the query selected about 600 records, and took 11.8 seconds. Yes, that's right, 11.8 seconds. I was floored! Here's me thinking "mySQL's fast! It'll work great!" Well.

So then I went back through and indexed (birth month, birth date), checked that I had done it right with EXPLAIN, and ran the exact same query again. This time it took 0.8 seconds. A total time savings of 11 seconds. I learned an important lesson that day... Always index everything you're going to use as a key! With this in mind, mySQL is indeed damn fast, and low overhead.

Now, the other thing I can't really speak to is reliability. mySQL doesn't really support referential integrity, and I guess it's up to you whether you need it or not. I've seen my share of M$-trained database folks who use CASCADE as a cheap crutch to paper over their bad code. Rather than write queries that do what they really wnat them to do, they just spend the extra overhead to have CASCADE's do it for them. I've also seen times where this was crucial to a db's function. Either way, it's something to consider. I've also never seen mySQL handle failure, or had to rebuild it after one. Whatever you usde, your strategy should account for this possibility, in any case.

Does it have to be Apache? (1)

Rozzin (9910) | more than 15 years ago | (#1718710)

What about other (free) HTTP servers?

MySQL, ?? (1)

warmi (13527) | more than 15 years ago | (#1718711)

While it is acceptable for some minor stuff there is no point in using it once you hit something larger. I wonder why people don't use Sybase - it is free for production, very nice database that can handle a lot of stuff.Extremely easy to use, programm - there is excelent PHP3 support (well , as good as PHP3 offers - generally database interface in PHP3 is one of the most idiotic designs I ever seen)

Re:Performance tips for Apache... (2)

fdicostanzo (14394) | more than 15 years ago | (#1718712)

with linux caching, this isn't really necessary. with enough memory, the whole thing will be in memory anyway

General purpose advice (1)

sammy baby (14909) | more than 15 years ago | (#1718713)

By way of general-purpose web-db tuning advice:
  • There's no way to have a well tuned system here if the db isn't well tuned, especially as the db grows in size. Make sure that all of your queries are as efficient as possible. Check to make sure that queries are against indexed columns whenever possible. Use the "explain" feature of the server to check the complexity of the queries you're passing.
  • Compared to static page delivery, just about any parsed HTML is Evil and Bad for your performance. Limit db lookups to pages that truly need it. IIRC, Slashdot handles this problem by having the front page re-generated by a cron job every so often. Once created by the script, it's just a plain ol' static page.
  • Eliminate the use of directory overrides (via .htaccess) wherever possible. They're usually not worth it.
At this stage, you've probably heard all this advice before, but one repitition of the obvious never hurt anyone. Hope it's helpful.

Re:General purpose advice (1)

sammy baby (14909) | more than 15 years ago | (#1718714)

Eliminate the use of directory overrides (via .htaccess) wherever possible. They're usually not worth it.
Not only that, turn them off. (AllowOverrides None, IIRC)
Er. Yeah. That's what I meant. :)

Re:Major Performance Boost (1)

kinkie (15482) | more than 15 years ago | (#1718715)

Or just change webserver.
Some (I use Roxen Challenger [roxen.com] ) use a single-process approach.
They "compile" and then embed your scripts into the main process, and so you save time because you don't need to fire up the interpreter.
Also, because of the long-lived, single-process approach, you can share the DB connections among your scripts, and most of all cache

Re:Idiot of the year award (1)

GnuGrendel (16068) | more than 15 years ago | (#1718717)

I'm sorry, what do you write your programs in? Do you just type in the hex codes for the binary executable?

If not, you're probably writing it in text. And before you start some "it's not real programming unless it's compiled" rant, tell it to a perl hacker...

Major Performance Boost (4)

wintahmoot (17043) | more than 15 years ago | (#1718718)

I haven't read all of the previous comment, so it may well be that this has been posted before.

Okay, this is how I generally do it. First of all, I suppose that you're using Perl, so these tips are for a Perl/Apache/MySql environment.

1) Use mod_perl so that your script doesn't neet a whole perl compiler for each separate instance in memory. The performance boost is just incredible...

2) Use Apache::DBI. It will prevent your script from connecting and disconnecting your DB each time it's called and rather use a persistent database connection. Great for performance.

There are some other tweaks that you can do. If you're interested, just let me know [mailto] ...

Wintermute

Ready-made solutions (3)

rde (17364) | more than 15 years ago | (#1718719)

There are ready-made solutions out there such as E-smith [e-smith.net] ; you can download a cd image (or even buy the cd), and it'll install the system with extras built in; it's designed to be an 'out-of-the-box' sorta thing.

Oh wow!! Rob actually posted! (1)

Corndog (18960) | more than 15 years ago | (#1718720)

Yeah, sure.. damn stupid retard spammers.

"Only one thing is for sure in the Universe: me"
--Corndog

A separate server for the database? (1)

RyanGWU82 (19872) | more than 15 years ago | (#1718723)

I've got an application with a fairly small number of web pages, a number of which require very simple database access. I expect the database access to be fairly minor (it's just a couple of small sections of the website). We've decided to use Red Hat 6.0, Apache, PHP3 and MySQL.

Is it faster to put Apache and MySQL on separate Linux boxes, connected via 100Base-T? What sort of performance hit would we get if we put it all on one box? What about one box with double the RAM? Thanks in advance for your help.

Ryan

Re:3 words... (1)

platinum (20276) | more than 15 years ago | (#1718724)

And how many NT servers do they need to handle the load?

Optimizations (4)

platinum (20276) | more than 15 years ago | (#1718729)

First of all: IMO, if you have to ask how to optimize your company's equipment in a forum such as this, you need some real help (perhaps of the mental variety). There are a plethora of web sites on optimizing systems. OTOH, I might as well share our experiences.

Our company uses Apache, MySQL, and PHP extensively (and exclusively). You can't beat the price/performance ($0.00 / excellent == great value). Thorough our research, we settled with the following combination:
  • Web Server: FreeBSD 3.2-STABLE with Apache 1.3.9 / PHP 3.0.9 on a PII-400 w/128 Meg RAM, IBM 4.55G U2W Drive. Due to FreeBSD's proven track record for Web/Network performance, stability, and security (e.g. Yahoo [yahoo.com] , wcarchive [cdrom.com] , and others), it's a natural.
  • SQL Server: Linux 2.2.x with MySQL 3.22.25 on a PII-400 w/256 Meg RAM, IBM 4.55G U2W System Drive and a Mylex AcceleRAID 250 w/4 IBM 4.55G U2W Drives in a RAID-5 configuration. Linux was the obvious choice when considering MySQL performance and driver availability wrt RAID controllers.
Optimization suggestions:
  • Apache: Ensure you have adequate spare servers to handle the connections (StartServers, MaxSpareServers, MaxClients, and MaxRequestsPerClient in the config); nothing sucks more than clients not being able to connect. Also, if you are using embedded script of some sort (PHP, Perl, etc.), use modules compiled into Apache (mod_perl, etc.); this should significantly increase speed and decrease the overhead of reloading the module for each access.
  • MySQL: Tweak the applicable setting as appropriate. We increased (usually doubled in most cases) the following: Join Buffer, Key Buffer, Max Connections, Max Join Size, Max Sort Length, Sort Buffer, and Sort Buffer). If possible, depending on the amount of data, get as much memory in the system as possible. If the OS can maintain frequently used data cached, disk access won't be required which significantly increases the speed of queries, etc. In addition, get rid of that pre-compiled MySQL and compile it yourself. If possible, optimize using egcs/pgcc for your platform. Also, compile mysqld statically; this will increase it's memory overhead a bit but can increase it's speed by 5 - 10% by not using shared libraries.
  • Storage: For optimum speed, use SCSI (of course). For our data, we require RAID 5 for redundancy. If that is not required, RAID 0 (striping) can be used for increased speed. The optimal way is to use hardware RAID (external RAID or RAID controller). Luckily, Linux has drivers for quite a few different RAID controllers that are available for a reasonable price.
  • Linux: Beware of Redhat's security problems, disable all unnecessary services, et. al. Seek out security-oriented and Linux performance-tuning sites for more suggestions.
  • General: Don't skimp on hardware. A cheap component, be it a drive, network card, motherboard, or whatever, if it fails, will cause unrecoverable downtime. We decided on Intel NL440BX boards (serial console/BIOS support is nice), PII-400's, and IBM SCSI drives in both boxes. If one box were to have a catastrophic failure, the other is able to perform both webserver and SQL server functions if necessary. We can also simply replace a failed component with one pulled from a similarly-configured non-production (test) box, or just swap boxes altogether.
Both Apache [apache.org] and MySQL [mysql.com] have good sections on performance tuning. Do not be afraid to RTFM.

Any questions/comments can be directed to me. Flames directed to /dev/null.

Re:Optimisation of Apache/db (1)

Cowards Anonymous (24060) | more than 15 years ago | (#1718731)

LVM.

RAID! (1)

amon (24507) | more than 15 years ago | (#1718732)

RAID 0+1 is an absolute must if you want to seriously serve anything, Linux or not, RedHat or anything else.

Nice starting point if you are on a budget:

Software RAID mini-HOWTO [unc.edu]

Also take a look at:

Linux High-Availability HOWTO [metalabhttp]

Testing, testing, testing (1)

ingenthr (34535) | more than 15 years ago | (#1718735)

Some good suggestions out there, but one thing hasn't been mentioned. It's good to test the site with some kind of load generation tool (I think there's one at apache.org) when you're trying out different configurations *before* you go live. Every site is likely to be optimally tuned a bit different.

Also overlooked is possibly tuning the filesystem for caching and the like (file descriptors) and networking (maximum connections).

Possibly most of all, when I've seen performance problems, it's been due to how the code was written :).

Re:Flamebait? (1)

yomahz (35486) | more than 15 years ago | (#1718736)

Agreed.. I think the only flamebait posted on that post was the moderators comments.
--

A mind is a terrible thing to taste.

Performance tips for Apache... (2)

mgreenwood (37032) | more than 15 years ago | (#1718738)

Memory is pretty damn cheap -- I've been running my web server off a ramdisk. Archive your web server in a tar ball then just expand it onto the ram disk... just don't put your db there :-)

Re:All the money in the world (2)

coyote-san (38515) | more than 15 years ago | (#1718739)

Postgres is totally free and supports transactions. It might not have the performance of Oracle, but it doesn't have the cost of Oracle either. :-)

MySQL vs PostgreSQL (1)

Hotboxer (40332) | more than 15 years ago | (#1718740)

MySQL and PostgreSQL seem to be the two main backend db engines discussed here. Does anyone know if any comparisons exist between the two that cover their use (by a commercial organisation)?

Well, here's what I know (for what it's worth) (2)

jguthrie (57467) | more than 15 years ago | (#1718743)

The database-driven websites that I have The Houston Northwest Bar Association Website (with an attorney finder) [hnba.org] and the The C Bookstore [cbookstore.com] (plug!plug!plug!) are based on PostgreSQL [postgresql.org] rather than mySQL so I don't know how well these lessons apply, but I've learned that PostgreSQL has a considerable overhead to each query so one big query is better than lots of little queries.

Of course, neither of those sites is particularly busy and I'm more proud of the management utilities than the sites themselves, but that's par for this course.

The thing I did learn was that using perl and CGI is quite clumsy for this sort of thing. I eventually switched to PHP3 because everything goes together much faster. I don't know what it does to the performance, but since both sites are being served from the world's slowest Web server hardware (the database server is a 486dx2-80 and the database server has the HNBA website on it but the C Bookstore Web server is the 5x86-120 that I use for most of the four dozen or so domains that I host) and performance is not that big an issue, I'm not all that worried. It'd be nice if it got some hits, though.

Re:Einstein of the year award (1)

mochaone (59034) | more than 15 years ago | (#1718744)

It figures an eponymously named anonymous loser like yourself would have the balls to criticize someone else. I think it's fair to say that you don't post your online information because you haven't accomplished jackshit.

You cynical bastards are quite amusing. It'll be interesting to see how cynical you are 20 years into your dead end careers. I'm sure the "HTML programmer" will be doing quite fine.

Re:3 words... (1)

PrinceOfChaos (62778) | more than 15 years ago | (#1718745)

microsoft dot com
guess what site gets more traffic :))

(www.mediametrix.com says microsoft dot com does :)

Performance with Servlets (1)

pbryant (65660) | more than 15 years ago | (#1718746)

I'm replacing some perl code with Java servlets using mySQL, Apache JServ, XML and XSL.

Apache JServ allows load balancing (basically doing a round robin over each of your servlet engines). I've found performance goes up about 30% for each PC you throw into the mix (I've only been able to test this up to 3 PCs).

FYI: I've found I needed servlet engines running on 2 PC's connected to 1 mySQL database to reach the performance of the perl app which stores its data as | delimited files.

While this may seem pretty poor, using a database means that the scalability (for size of data) is going to be a lot better than the file solution and the servlet solution used XSL which gives us a lot more flexibility over the HTML that we generate (basically each one of our users can have a completely different looking site while running the same app as all the others).

I'll be posting some benchmarks at http://objexcel.com in a few days if anyone is interested.

Peter

Re:Tuning webservers (1)

Joe Schmo (73744) | more than 15 years ago | (#1718748)

Doesn't inlining of code occur at such a high optimization number - mind you I noticed that MySQL does this anyways...so at least this is half the problem - time for Apache - heh heh.

Interesting idea...

Re:Diff. boxes... (1)

nissimK (82160) | more than 15 years ago | (#1718750)

...that talked to two replicated RAID arrays way on the back end...


How did you implement this, may I ask? Particularly, how were the two RAID arrays mirrored, and how did the Web Servers/Database servers do I/O with them?

Cheers,

-NiS

what about Eddie? (1)

nanner (83452) | more than 15 years ago | (#1718752)

Why not use multiple small servers than load
balance them with eddie. check out the
eddieware project. Cool thing is, it runs on
FreeBSD AND Linux and it's open source :)
http://www.eddieware.org [eddieware.org]
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?