What Would You Want to See in Database Benchmarks? 42
David Lang asks: "With the release of MySQL 5.0, PostgreSQL 8.1, and the flap over Oracle purchasing InnoDB, the age old question of performance is coming up again. I've got some boxes that were purchased for a data warehouse project that isn't going to be installed for a month or two, and could probably squeeze some time in to do some benchmarks on the machines. However, the question is: what should be done that's reasonably fair to both MySQL and PostgreSQL? We all know that careful selection of the benchmark can seriously skew the results, and I want to avoid that (in fact I would consider it close to ideal if the results came out that each database won in some tests). I would also not like to spend time generating the benchmarks only to have the losing side accuse me of being unfair. So, for both MySQL and PostgreSQL advocates, what would you like to see in a series of benchmarks?"
"The hardware I have available is as follows:
- 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI
- 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI
- 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA
For my own interests, I would like to at least cover the following bases: 32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type tests (data >> memory); and web prefs test (active data RAM)
What specific benchmarks should be run, and what other things should be tested? Where should I go for assistance on tuning each database, evaluating the benchmark results, and re-tuning them?"
Ask the dev teams for both! (Score:4, Insightful)
What I'd really like to see... (Score:1)
Re:What I'd really like to see... (Score:4, Interesting)
The person publishing the benchmark does not use Oracle.
The Oracle user running the benchmark remains anonymous.
But there are many ways that Oracle (or any other database software) can be made to perform badly in a benchmark that would be no fault of the software. If someone wants to benchmark against Oracle, Oracle wants to make sure they do it correctly, or else not at all. If they didn't have that clause, Microsoft would have dozens of studies and benchmarks saying that Oracle is slower than SQL Server under certain setups, just like those bullshit VeriTest benchmarks they have against crippled setups of Red Hat, Apache, and Samba.
single-user - long query (Score:1)
Most benchmarks only deal with multi-user performance. Most problems I had to solve dealed with managing large datasets with complicated relationships, but only one user that has to access them.
Althought I highly suspect that mysql is not suited for that sort of usage anyway. But I am curious how postgresql would compare there with firebird.
Re:single-user - long query (Score:3, Insightful)
On the contrary, I bet that MySQL w/ MyISAM would be well suited to that task.
Re:single-user - long query (Score:1)
BTW, do you know of any benchmarks that deal with this sort of thing?
Don't bother (Score:4, Insightful)
in fact I would consider it close to ideal if the results came out that each database won in some tests
With an attitude like that, there's no point running benchmarks. The idea is that you run the benchmarks to get an idea of how the databases perform. But it seems you are already rejecting one possible result (that one database performs worse than others in all respects) because you don't consider it "fair".
Well life isn't fair. I'm sure people worked hard on all databases, but that doesn't mean they all have value. Sometimes people try hard and fail. And you want to ignore the numbers that tell you this because you think it's fairer that way? Give me a break, you don't want to run a real benchmark, you want to run something that will tell you what you have already decided upon is the best.
Re:Don't bother (Score:2)
Rather than benchmarking the two different databases, measure how the settings for each database changes. e.g. Plot the performance of some staple queries with different memory pool sizes.
Stop whinning Coward (Score:1)
And your whinning helps because..?
He didn't claim to be a true scientist. He didn't claim that this result is what MySQL developers or PostgreSQL developers will die over.
It's the Thanksgiving weekend, and one of the readers want to do something fun with his spare machines.
In fact, he's trying to run a test scenario that I happened to be interested i
Maybe but I disagree (Score:2)
Part of the issue with benchmarking is that different types of queries can result in wildly different performance ratios between RDBMS's.
For example:
Try joining a really large table in PostgreSQL against
Re:Maybe but I disagree (Score:2)
MySQL should win on some tests (even if it's only the count(*) test that postgres chokes at)
This is an invalid test. MySQL only retrieves an estimate rather than the real number. Or at least this should be stated specifically in the results.
This is simple (Score:4, Insightful)
Compile your application to use each database
Now go and compare which database runs fastest on your application
Anything else just doesn't matter - your application is going to be different than every benchmark, so what you need is to run your application on the database and see what happens.
What I have usually found is that while you can highly tune the database, and have great database benchmarks - most of those are ruined by completely brain dead applications that do very stupid things, ruining any kind of performance the database will give you
Re:This is simple (Score:1)
Even if the application if fully DB-agnostic, who has their schema laying around in two formats?
Re:This is simple (Score:2)
So in other words - speed might not even be important at all, imagine that. So why benchmark then ?
easy, TPC (Score:2, Informative)
Good luck.
Very large database operations (Score:4, Interesting)
(*) Reindexing caused me a ton of grief. I inherited a huge mysql db once that required an emergency reindex. Unfortunately mysql locked the table while it did a full table copy, which took hours.
Re:Very large database operations (Score:2)
Unless queries were just very slow because the indexes were poor.
Sam
Re:Very large database operations (Score:3, Informative)
Re:Very large database operations (Score:2)
Re:postgres has several replication options (Score:2)
Re:postgres has several replication options (Score:2)
1: Slony is about as hard to setup as it is to compile PHP or apache from source and then configure them. Honestly, if you're not capable of configuring slony, you probably shouldn't be administrating servers.
There are a bunch of admin scripts that come with it to setup replication and things like that. Considering it's a pretty new piece of code, it's not bad, mostly lacking in detailed documentation.
2: Notice your replication on MySQL didn't keep you guys up when the db server fried yesterd
Re:postgres has several replication options (Score:2)
Re:postgres has several replication options (Score:2)
Use the App (Score:2)
1) take a snapshot of the database
2) turn on the query log for the server and run it for a day
3) install the snapshot on your test servers
4) play back the query log and see which one goes the fastest.
There's no reason to complicate it by trying to stress certain functions. I'm looking for real application performance, and I have no control over the queries that the programmers are throwing at the box (unless I see it bogging down the server and I go and hit them over the head).
--Ajay
Heavy querying (Score:2)
Never use single CPU (Score:2)
It'll spend too much time task switching. Better to have 2 "less than half" slower CPUs, than 1 really fast CPU.
16G ram,
Could you put the 16G into one of the 2x boxes?
2x36G 15Krpm SCSI 16x400G 7200rpm SATA
data warehouse type tests (data >> memory);
DW does not mean that data
Re:Never use single CPU (Score:1)
1x dual Opteron
It'll spend too much time task switching. Better to have 2 "less than half" slower CPUs, than 1 really fast CPU.
Sounds like he meant dual core.
16G ram,
Could you put the 16G into one of the 2x boxes?
Why not? 4 memory slots of 4GB sticks, per CPU.
Re:Never use single CPU (Score:1)
Re:Never use single CPU (Score:2)
Yeah, good point.
Contact us on the pgsql-performance list (Score:2, Informative)
Performance (Score:2)
Performance you can (almost) always go to a bigger box - if your whole engine goes down and can't recover because of a few bad sectors or insufficient logging or such, you're screwed.
Peter.
It all depends (Score:2)
Are we talking about workgroup size database apps, or
A weird test (Score:2)
MySQL is much faster
Postgres does a lot more
If you can tolerate feature poor you go with MySQL for ease + speed
If you can tolerate slow you go with Postgres
If you need feature rich and fast you go with Oracle
If you don't have dedicated DBAs then you must not care about reliability go with SQLserver
Anyway:
1) MySQL has corruption problems. Good measures (load testing) would be wo
Re:but is the common wisdom correct? (Score:2)