Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

High Availability Solutions for Databases?

Cliff posted more than 8 years ago | from the in-search-of-redundancy dept.

Databases 83

An anonymous reader asks: "What would be the best high availability solution for databases? I don't have enough money to afford Oracle RAC or any architecture that require an expensive SAN. What about open source solutions? MySQL cluster seems to be more master/slave and you can lose data when the master dies. What about this Sequoia project that seems good for PostgreSQL and other databases? Has anyone tried it? What HA solution do you use for your database?"

Sorry! There are no comments related to the filter you selected.

Whatever else they tell you (2, Insightful)

Anonymous Coward | more than 8 years ago | (#14031773)

Don't buy the RAC hype. I've seen too many misperformant RAC clusters that Oracle couldn't fix to save their life (and no, they weren't all bad vendor configuratins either).

MySQL Cluster != master/slave (5, Informative)

cravey (414235) | more than 8 years ago | (#14031861)

While MySQL supports master/slave replication, MySQL Cluster specifically avoids that entire model. It's an entirely synchronous database storage engine. If you want master/slave, use postgres. If you want high availability and can handle the lack of a small number of features, MySQL Cluster is the way to go. The only real downside to the architecture required for CLuster is that all of the data is stored in RAM based tables. transactions are logged to disk every (configurable) time interval. If you're going to try for HA, you might want to RTFM on the available options before you settle on one.

Re:MySQL Cluster != master/slave (4, Interesting)

afabbro (33948) | more than 8 years ago | (#14031925)

The only real downside to the architecture required for CLuster is that all of the data is stored in RAM based tables

...for an "only real downside" that's a pretty big one. I mean really, what sort of database is this - 256MB? 500MB? 1GB? Fine for small websites, not fine for large apps. I don't mean to be a "big shop" snob, but this is a ridiculous limitation.

Unfortunately, open source hasn't caught up to the big guys yet in the area of replication.

Re:MySQL Cluster != master/slave (3, Interesting)

cravey (414235) | more than 8 years ago | (#14032064)

Name a DB from a 'big guy' that doesn't require a shared resource. Oracle? Nope. RAC requires a shared storage solution. Lose your single storage device device and you lose access to your db. MS-SQL? Don't make me laugh. SAP? No. Postgres? No. Firebird? No. Who's left? MySQL is the ONLY available 'shared nothing' (tm) solution available.

Yes, the RAM only tables suck for large DBs. On the other hand, they're REALLY fast and they can be easily scaled up on commodity hardware rather than requiring faster and bigger servers and storage. I'm in medium shop that only requires a few gigs of table space. I'm n+2 on webservers/sql servers and data storage nodes for under $15k. If I wanted to use Redhat on amd64 with gobs of ram/machine, I could probably be n+2 to 64G for under $50k. The only time I will EVER have to take my cluster down is to add more storage space or to replace my switch. That will result in about 10 minutes of downtime. Based on growth, I won't have to do that for another 18 months. If I wanted to spend another $10k, I could avoid having to take down the system in the event of switch failure. I've never had a carrier grade switch failure on any of my networks, so I'm feeling pretty good about availability.

Yes it's expensive to get up into the terabyte range with ram based tables, but there's also no other way to do it without either spending ridiculous amounts of money or being dependent on a single piece of hardware.

Re:MySQL Cluster != master/slave (1)

Klaruz (734) | more than 8 years ago | (#14032752)

RAC is one way to cluster in Oracle. There are ways to do replication though. The easy ones are one way, but multimaster local/distant site replication is totally doable.

I have yet to find anything oracle can't do. It is a bit of an 8000 pound gorilla though...

Re:MySQL Cluster != master/slave (1)

ragnarok (6947) | more than 8 years ago | (#14032971)

Multimaster replication really sucks for HA. It only pushes transactions every N seconds, so you lose data if your primary master fails. You can theoretically set it up to push continuously but it leaks all kinds of resources.

I really don't think they had HA in mind when they designed multimaster. Actually, I don't know what they hell they were thinking...

Re:MySQL Cluster != master/slave (1)

turbidostato (878842) | more than 8 years ago | (#14039059)

"so you lose data if your primary master fails"

No, you won't.

Two phase commit, remember?

Re:MySQL Cluster != master/slave (1)

ragnarok (6947) | more than 8 years ago | (#14066131)

Oracle multi-master does not use two phase commit.

Re:MySQL Cluster != master/slave (1)

turbidostato (878842) | more than 8 years ago | (#14066519)

So what? The question still is that _under proper configuration_ loosing master-node doesn't mean loosing data (even worse: risking non-consolidated data)

Re:MySQL Cluster != master/slave (1)

ragnarok (6947) | more than 8 years ago | (#14068233)

Technically you are correct, you will not lose data, but it won't be available until the master comes back.
I should clarify that I don't mean the real master. The scenario goes something like this:
Client commits a change to Node A. Node A queues the change to be sent to node B.
Node A explodes.
That change never gets to node B until you repair and restart node A.

This is why multi-master is really not a good solution for HA.

Re:MySQL Cluster != master/slave (1)

turbidostato (878842) | more than 8 years ago | (#14077587)

"Client commits a change to Node A. Node A queues the change to be sent to node B.
Node A explodes."

Then client either won't recieve an OK status for its transaction (since it won't be declared as commited till Node A recieves confirmation from node B and confirms again it has recieved that OK -we are talking now about 2PC, remember?) and it will retry or it will recieve that OK status from node B if the system is configured to do so in case of loosing node. It will then recieve either a timeout (if still trying against node A) or it will successfully commit via node B which still is up (if the system declared a transaction as effectively commited in such a case).

Re:MySQL Cluster != master/slave (1)

ragnarok (6947) | more than 8 years ago | (#14078179)

But Oracle MultiMaster does not use a two phase commit! There is no guarantee that node B has received the data after the client commits. There is a guarantee that the data has been queued and will eventually be propagated to node B.

The problem comes when a client commits to node A, then switches to using node B (because it has detected node A exploding). It expects the data that it committed earlier to be there now, but it isn't.

Re:MySQL Cluster != master/slave (1)

notfancy (113542) | more than 8 years ago | (#14033615)

Oracle? ... MS-SQL? ... SAP? No. Postgres? No. Firebird? No. Who's left?

Sybase?

Re:MySQL Cluster != master/slave (1)

dheltzel (558802) | more than 8 years ago | (#14034401)

You forgot Oracle Advanced Replication (or maybe you just didn't know it existed). It's ideal for running across a (relatively) slow WAN and allows for 100% database uptime, even if an entire site is lost.

It was not trivial to setup back then (probably much better now), but that's what DBA are for. If you're a highly compensated admin, complexity is not necessarily your enemy.

Re:MySQL Cluster != master/slave (4, Informative)

delta407 (518868) | more than 8 years ago | (#14034487)

Name a DB from a 'big guy' that doesn't require a shared resource. Oracle? Nope. RAC requires a shared storage solution. Lose your single storage device device and you lose access to your db.
...
The only time I will EVER have to take my cluster down is to add more storage space or to replace my switch.
Sorry, what? You're bashing Oracle for having a system with a single point of failure, then add two of your own?

"Shared storage" doesn't mean "not highly available". If you're serious about building a database that cannot go down, you get multiple servers, each with two Ethernet interfaces and two Fibre Channel cards. Get two Fibre Channel switches, and two Ethernet switches. Get two Fibre Channel disk arrays. Hook together as follows.

Both disk arrays have one or more uplinks to both FC switches. Each server has one HBA connected to each switch. Then, use Linux multipathing to provide automatic failover in case either switch dies or either HBA dies, and use Oracle ASM or Linux MD to mirror the data so you're good even if you unplug the shared storage.

Set up the servers to use 802.1q VLAN tagging. (Remember, it's a good idea to keep your inter-node communication separate from client-to-server communication.) Create two Ethernet bridges, using the Linux bridging driver, binding eth0.2/eth1.2 to one and eth0.3/eth1.3 to the other. Take the two switches, set up 802.1q on the ports going to your database servers, and connect them together (preferably using high-speed uplinks, but channel bonding works fine). Enable spanning tree on the switches and on the bridges. Connect both switches to the rest of your network.

Now, if a switch starts on fire, spanning tree on the servers will fail over to the other one automatically. If a network cable gets cut or a network card goes out on one node, that one node will fail over all traffic to the remaining interface, and the inter-switch trunk will make everything keep working. Suddenly, you've got a robust network.

So, we've got network and storage covered. What about the database software? Neither MySQL nor Postgres can use this sort of configuration, but Oracle RAC can. Add a dash of TAF, and suddenly, any component -- network switch, database server, SAN switch, disk array -- can fail in the middle of a SELECT and the application won't notice. That is highly available.

Yes, this solution costs more. Our data -- more accurately, the cost of it not being accessible -- justifies the expense. But don't tell me that shared storage is a weakness.

Re:MySQL Cluster != master/slave (1)

msanto (81364) | more than 8 years ago | (#14038920)

I'm not a network engineer so I may be missing some subleties of your design but Oracle RAC does have a single point of failure. Even if you mirror the data between 2 disk arrays, you need a quorum disk to avoid the split brain problem. In our case, we wanted to cluster Oracle between 2 data centers with duplicate systems in each. The problem is if networking were completely lost between the 2 DC's (small chance but not non-existant) but each Oracle instance were running then they each would think the other went down and if some users were able to access on DC while others were still on the other then we'd have both Oracle instances with diverging production data. The traditional solution is a quorum disk which is not mirrored and whomever has access to that disk is the only production db. Thus the quorum disk is a single point of failure.

Re:MySQL Cluster != master/slave (2, Informative)

delta407 (518868) | more than 8 years ago | (#14039435)

Yep, Oracle RAC requires a quorum disk (CRS calls it a "voting disk"). However, if you use OCFS2 to store the data files (including the voting disk) the solution above cannot produce split-brain. Putting the data files in OCFS2 puts the responsibility of guarding the shared storage in the filesystem, which solves the problem as follows.

Each node in an OCFS2 cluster updates a special file every two second and maintains a TCP connection to its neighbors. If this process doesn't work (i.e. writes are failing, are backlogged, or for whatever reason don't hit the disks in a timely fashion), the node will fence itself. Likewise, if the node starts on fire, it will naturally stop updating the heartbeat data. Either way, other nodes are aware of who is up and who is down, even if the IP network dies. If a node's access to the storage system dies, OCFS2 won't be able to successfully read back its own heartbeat timestamp, and the node fences itself.

Note that this heartbeat procedure happens on the same block device that the data files are on. There's no way to reach the data files unless a node is in agreement with every other node, thus no way to cause "split-brain".

Say, somehow, a series of failures occur all simultaneously that results in Array 1 being connected only to nodes 1-4 and Array 2 being connected only to nodes 5-8. Each node keeps heartbeating, and after a little bit, Node 5 notices that Node 1 hasn't updated its timestamp, but that Node 1 just said that it did (over Ethernet). OCFS2 notices this immediately, much inter-node communication happens, and nodes 5-8 decide to fence themselves, leaving 1-4 as the only nodes allowed to touch either disk array. Still working as advertised.

Let's examine that failure. First, you'd have to terminate the links between both fibre channel switches. Then, you'd have to break uplink 2 on array 1 and uplink 1 on array 2. Then, you'd have to take out the second HBA on nodes 1-4 and the first HBA on nodes 5-8. Simultaneously. And the system handles this gracefully.

Now, what happens when the network goes out too, you ask? Remember that each node has one connection to both switches. Say we lose a switch -- all the nodes fail over to the other one. Hmm. Say we unplug nodes 1-4 from switch 2 and unplug nodes 5-8 from switch 1 -- both switches are connected together directly, and thanks to VLAN trunking, all the nodes can still reach each other. Darn. Say we also unplug that link. Well, both switches are plugged into the rest of the network, right? Assuming you also set up VLAN trunking there, the switches will be able to reach each other through a third switch. Hmm. What if we unplug those, too?

Congratulations! Split-brain! Nodes 1-4 proceed on Array 1 and Switch 1, and nodes 5-8 proceed on Array 2 and Switch 2. You only had to unplug 22 different cables -- eight Ethernet cables and eight FC cables from the nodes to the switches, two inter-switch links (one Ethernet and one FC), two disk array uplinks (one from each array to each switch), and two Ethernet uplinks (one from each Ethernet switch) to the rest of the network -- simultaneously.

As I said, no single point of failure.

Re:MySQL Cluster != master/slave (2, Informative)

msanto (81364) | more than 8 years ago | (#14040968)

For the OCFS2 special file not to be a single point of failure, I assume it's mirrored between both storage arrays. If you lose all connectivity between data centers, assuming each DC has a storage array and a node then don't you still have both halves thinking they are it and still have split brain? That's 2 simultaneous outages and if your company is anything like those I've worked with 1 outage may suffice.

In my current company, our network group tends to keep things to themselves and we'd never have known we didn't have redundant fiber if someone didn't shoot out our cable that ran through a bad neigborhood and it became a topic of conversation (yes, with a gun).

I also worked at another company that had redundant fiber lines that were both taken out by a single backhoe, apparently they were bundled together.

Re:MySQL Cluster != master/slave (1)

cravey (414235) | more than 8 years ago | (#14039623)

Remarkably, that's not how the Oracle reps explained it to me. I went around with then for quite some time trying to figure out a way to get that to happen properly and they kept telling me that there was not a way to protect against a SAN device failure.

As for availability on mysql, The switch limitation is an OS issue. I choose to run an OS that can't handle multiple interfaces per subnet, though that issue has been fixed in the latest version. As for adding storage space, I believe that fix is scheduled for the next version of mysql. I won't get into support contract price differences.

Re:MySQL Cluster != master/slave (1)

delta407 (518868) | more than 8 years ago | (#14039717)

Remarkably, that's not how the Oracle reps explained it to me. I went around with then for quite some time trying to figure out a way to get that to happen properly and they kept telling me that there was not a way to protect against a SAN device failure.
When was this? I'm fairly certain that much of what I described won't work on anything older than 10g, and it's really, really painful to attempt this using ASM. This only becomes practical with OCFS2, which (as I recall) is currently supported only by SLES9 SP2, released a few months ago.

As for availability on mysql, The switch limitation is an OS issue. I choose to run an OS that can't handle multiple interfaces per subnet, though that issue has been fixed in the latest version.
Spanning tree protocol is your friend. It's not generally used to provide redundancy all the way to the host (as few operating systems support it), but it works great under Linux. Just make sure that priorities are set correctly -- you don't want a database server becoming the favorite path between switches.

Don't forget to duplicate end users too (1)

giuntag (833437) | more than 8 years ago | (#14042145)

A couple of points missing from the above setup:
- each and every device must have double power sources, linked to 2 separate power grids, possibly by different energy vendors (we just had the power failing at airport XXX server room last week, right after the cable guys had spent 6 months convincing us to migrate all servers to the 'new' power lines)
- the entire network path from server to end users should be duplicate. What about the super-solid cluster lan being connected to end-user lan via a single pof (e.g. firewall)?

Jokes aside, real HA is real hard. There's always a little single point of failure hidden somewhere that you forgot. And gettim them all duplicate costs $$$$ and then $$$$ more

DB2 UDB offers shared nothing and HA (1)

TheCrayfish (73892) | more than 8 years ago | (#14034497)

MySQL is the ONLY available 'shared nothing' (tm) solution available

Not quite. Check out the shared nothing architecture of DB2 Universal DB [ibm.com] . You can get DB2 for AIX, Windows, Linux, and other platforms. UDB offers a High Availability Disaster Recovery [ibm.com] solution.

Re:MySQL Cluster != master/slave (1)

ivan256 (17499) | more than 8 years ago | (#14036569)

there's also no other way to do it without either spending ridiculous amounts of money or being dependent on a single piece of hardware.

That's not true. along with some other things you've mentioned from various marketing materials.

For starters, there's no such thing as "shared nothing" clustering. You have to have a shared resource to have a cluster.
Solutions that claim to be "shared nothing" actually share a network. Once you've gone that far, there's no reason not to do replicated network block devices, which some larger vendors do (One sibling comment mentions DB2). It can even be fast if you're clever about where you issue your acknowledgements.

There are third party cluster middleware products that make this process transparent to the database. Most of them focus on Oracle. You can also fool some shared storage cluster products into thinking you have some big, expensive, high-performance, SAN array by pointing them at a linux DRBD [linux-ha.org] . Anyway, the point is that there are plenty of ways to do so-called shared nothing database clustering without any expensive hardware.

Re:MySQL Cluster != master/slave (1)

doc modulo (568776) | more than 8 years ago | (#14038005)

Mnesia is a database written in Erlang, a functional programming language. Erlang has support for concurrency built into the language and it does concurrency really well.

Mnesia was built to run non-stop forever because it's supposed to automatically run on a clustered server. This gives it fault tolerance. The best kind, because no matter what PC you buy, some day it's going to break. You can reconfigure it while it's running and although you're better of using Erlang to interface to the database, SQL is available.

An Erlang enthousiast's blog post about Mnesia [3pblog.net] .

Re:MySQL Cluster != master/slave (4, Informative)

jorela (84188) | more than 8 years ago | (#14033681)

Actually we have implemented disk storage.
Hopefully it will make into version 5.1.

MySQL Cluster 4.1/5.0 supports:
* transactions
* transparent data partitioning and transparent data distribution
* recovery using logging
* (multi) node failure and automatic non blocking hot sync for crashed/stopped nodes
* hot backup

MySQL Cluster 5.1 will support:
* user defined partitioning
* cluster to cluster async. replication (like "ordinary" mysql replication)

The disk impl. supports
* support putting column in memory or on disk
    (currently index columns has to be in memory)
* all HA features mentioned above.

Re:MySQL Cluster != master/slave (1)

Zontar The Mindless (9002) | more than 8 years ago | (#14036218)

BTW, partitioning in MySQL 5.1 is not limited to NDB Cluster - you can partition tables using any storage engine supported by your 5.1 server [mysql.com] .

Right now this is available only from source (you'll need to build the MySQL 5.1 server using --with-partitioning), but there should be some alpha -Max binaries released soon.

Re:MySQL Cluster != master/slave (4, Informative)

Joff_NZ (309034) | more than 8 years ago | (#14032646)

The other thing to note with MySQL Cluster, is that, even in 'stable' releases of MySQL, is horribly unstable, and prone to massive data loss..

We deployed it ourselves, and it worked ok for a while, but things went very very wrong when we tried updating one of the configuration parameters, causing us to to inexplicably lose quite a bit of data.

Avoid it. At least for another generation or two, or three.

Re:MySQL Cluster != master/slave (2, Funny)

duffbeer703 (177751) | more than 8 years ago | (#14034766)

MySQL was prone to massive data loss for years... what's the big deal?

Re:MySQL Cluster != master/slave (1)

Kefaa (76147) | more than 8 years ago | (#14047327)

We deployed it ourselves, and it worked OK for a while, but things went very very wrong when we tried updating one of the configuration parameters, causing us to to inexplicably lose quite a bit of data.

They changed the database parms on part of a large clustered HA system and it is the systems fault you lost data? I would be very interested in hearing what parm(s) were changed. If you mean they changed a parm on part of the cluster, then have them "move away from the keyboard, before someone gets hurt." Every major database (Oracle, DB2, SS) warns about doing such things because the side-effect is nearly impossible to predict.

Re:MySQL Cluster != master/slave (1)

Joff_NZ (309034) | more than 8 years ago | (#14060195)

The parameter in question was just increasing the maximum number of indexes allowed in the database. Not something which you would expect to destroy everything.

Believe me, it was MySQL's fault.. NDB is seriously seriously flakey

Re:MySQL Cluster != master/slave (1)

bigtrike (904535) | more than 8 years ago | (#14048043)

Try running the latest CVS nightly and see if it fixes your problem. While it may sound like a bad idea, it's what the mailing list people told me to do repeatedly when I was suffering from these issues. This was for non-cluster mysql and years ago. We stopped using mysql due to many issues like this so my experience is somewhat outdated and limited, but it seemed like MySQL coders never released any production quality code.

Re:MySQL Cluster != master/slave (0)

Anonymous Coward | more than 8 years ago | (#14047908)

> While MySQL supports master/slave replication,

For certain definitions of replication. I've used it on about two dozen db servers for several years. I would never call it replication. All it does is copy the log file from the "master" to the "slave." Then it pipes the log to MySQL. If you have any errors, it simply stops. For trivial DB's that do no updates, it works fine. Of course for those type of systems, you don't need replication. For anything nontrivial, it might work for a few minutes before stopping with an error. You then have to take down both databases, copy all of the database files again, and then flush the logs. Then you restart it and hope for the best. You can do the "SET GLOBAL SQL_SLAVE_SKIP_COUNTER = n;" if you get desperate, but typically when there's a problem, it only runs one more statement then stops again.

Real replication wouldn't require stopping the master server and copying db files. Nor would it mean replication would just stop at random due to errors. There should also be some way of testing the state of the tables against the master. I've seen times where an update from months ago didn't happen, and you'd never know it unless you do a row by row compare.z

MySQL - Master / Slave is the only option (3, Interesting)

digerata (516939) | more than 8 years ago | (#14031893)

You can use MySQL's in-memory cluster replication, which is pretty cool. You can have quite a few nodes, each serving requests against the same database. However, your database size is limited to the amount of RAM a single node can support. That really limits long term scalability of the database. What we just hit 8 GB? What do we do? Sorry, boss, I need another ten grand for an unplanned DB upgrade. Also, if you are used to the atomic transactions InnoDB provides, forget that. The cluster storage system NDB does not support all of the features that InnoDB does.

We chose to go with a Master / Slave option which basically gave us failover within 3 seconds. Any more fine grained monitoring and the CPU performance on the slave gets pretty high. Not ideal, but probably the best option that uses MySQL when you don't want to be tied so much to one platform.

Re:MySQL - Master / Slave is the only option (3, Interesting)

cravey (414235) | more than 8 years ago | (#14032095)

Your database size is NOT limited to the amount of RAM a single node can support unless you're only running 2 nodes. It's possible that your tables size may be limited, but I don't believe that that is the case. No, the cluster storage system does not support all the INNO DB features, but INNODB doesn't support all of the MyISAM features. Does that make MyISAM better?

Re:MySQL - Master / Slave is the only option (1)

Rich0 (548339) | more than 8 years ago | (#14036139)

Uh, atomic transactions are hardly a useless bell-and-whistle item. If you have so many people accessing your DB that you need a cluster, then I can't imagine them not needing transactions, unless they are read-only, and you infrequently do updates.

Transactions are what make DBs such great tools - the system is always consistent. No corruption issues if some process dies halfway through updating a pile of records.

Any system that expects to compete on this scale really needs to support transactions, unless they are in some read-only/nightly-updates niche.

Do your own (1, Funny)

clockwise_music (594832) | more than 8 years ago | (#14031933)

Just use .NET's XML serialisation on a DataSet to use an offline file a high availability server which has a RAID-5 array. Ok... so that's not really high availability, simple or really sensible but it would be a bit of fun.

How about you have a few Access .mdb files (created using Jet, it's free) updated asynchronously using a few COM objects in Transaction Server? Well, it wouldn't work but the idea is pretty cool. Well actually it isn't really but you know.

Or just write down all of your data, photocopy it and give the copy to your Mum (that's Mom, to all you yanks.. (why are you called "yanks"?)). And let's face it, your Mum, oops sorry Mom, is always itching for your phone call. Maybe you should call her more often?

It's ok, I think I'm hilarious but know I'm not.

MOM - Motherly Object Model (3, Funny)

ACORN_USER (902686) | more than 8 years ago | (#14033323)

A typical transaction:
Phone: Ring Ring
Son: BEGIN TRANSACTION!
Mum: Oh, hi dear. How have you been? Did you know that ..
Son: *cough* Work, remember?
Mum: Yes, dear.
Son: WRITE_DOWN into notepad ADDRESSES. fname='John'. sname='Smith'
Mum: Wait, i have to look up the next identity number thingie. Oooh, that's a big number, I can't even recite it.
Son: Hush! address='12 Rover Roa..
Mum: Oh, dear. My pencil broke.
Son: Ok, Roll back!
Mum: That's fine for you to say. I don't have any erasers.

Re:MOM - Motherly Object Model (1)

george_w (32279) | more than 8 years ago | (#14036085)

I'd guess a VACUUM call would fit in nicely.

Re:Do your own (1)

Short Circuit (52384) | more than 8 years ago | (#14033865)

(why are you called "yanks"?)

It's debated [wikipedia.org] .

Re:Do your own (1)

giuntag (833437) | more than 8 years ago | (#14042167)

Nice one.

But I had a SQL Server on a windows cluster, that consistently crashed the shared-storages scsi controller when trying to execute a backup, sending junk down both data and log files.
The fault was probably in the scsi controller firmware, but I really was left wondering what kind of low-low-level disk commands the sql backup mechanism used to move its data around... especially considering that, being cluster-aware, the db refused to use any storage device but the shared ones for stroing backup data!

Might want to consider Data Guard (1)

gripperzipper (590623) | more than 8 years ago | (#14031999)

If you did go the Oracle route, you might take a look at Data Guard. It may be a cheaper option than RAC. It lets your site run in an active/standby mode. I'm not a DBA, but I think you can configure Data Guard with different levels of reliability. Guaranteed synchronization is one of those levels. I'm thinking that it would let you automatically fail over to the standby site if the active site goes down, but you'd have to look at the docs for sure. The nice thing about this route is that you could have a remote disaster recovery site. BTW, "master/slave" is not "politically correct." I have actually heard of people having to reword documentation becuase of some policy that said those terms couldn't be used.

Re:Might want to consider Data Guard (1)

bestguruever (666273) | more than 8 years ago | (#14032753)

Correct you can set it to maximum reliability mode which ensures the the redo logs are written to all sites as the transactions are performed and before they are committed. This can be a big performance killer as you are adding network round trips.

Re:Might want to consider Data Guard (1)

giuntag (833437) | more than 8 years ago | (#14042177)

There's also a nice product from Quest software (Shareplex?) that works along the same line of Oracle Dataguard.
Dunno which one is cheaper, though...

Re:Might want to consider Data Guard (1)

gripperzipper (590623) | more than 8 years ago | (#14060263)

I did not know about that. Thanks... I'll take a look at it. :-)

Re:Might want to consider Data Guard (1)

BlueWomble (36835) | more than 8 years ago | (#14047827)

Data guard is only available in Oracle Enterprise Edition which makes it as expensive as RAC ($40,000/cpu).

That said you can run a standby database and do manual transfer of archivelog files without using data guard. In this case you only need Standard Edition ($15,000/cpu) or even Standard One Edition ($5,000/cpu) if you only have a one or two cpu machine.

Standard One Edition on a single CPU is pretty cost effective.

Oh that's easy... (2, Funny)

ipoverscsi (523760) | more than 8 years ago | (#14032005)

It's called a filing cabinet. It's got full text search and an easy to use index. Although it has 99.9999% availability (it's blocked by crap stacked in front of it the other 0.00001%), it's a bit difficult to make backups without access to the office copier, two toner cartriges and 20 boxes of paper.

Re:Oh that's easy... (0)

Anonymous Coward | more than 8 years ago | (#14033741)

It's called a filing cabinet. It's got full text search and an easy to use index. Although it has 99.9999% availability (it's blocked by crap stacked in front of it the other 0.00001%), it's a bit difficult to make backups without access to the office copier, two toner cartriges and 20 boxes of paper.

Does it scale?

Re:Oh that's easy... (1)

Safety Cap (253500) | more than 8 years ago | (#14034528)

Does it scale?
Depends upon the Henway...

(Subeeism [google.com] - ar ar)
Thank you; I'll be here all week. Try the fish!

Re:Oh that's easy... (1)

Shakrai (717556) | more than 8 years ago | (#14034925)

(it's blocked by crap stacked in front of it the other 0.00001%

Only 0.00001% of the time? What office environment have you been working in? My co-workers tell me that there's a desk and a filing cabinet under all this paperwork -- I still think it's just a rumor.

What are you doing? It's important. (5, Insightful)

anon mouse-cow-aard (443646) | more than 8 years ago | (#14032073)

It's odd that all these people are answering without hearing a thing about your application. How big is the db? How often is it written? How often is it read?

For example, we run a site with data from a thousand odd different data sources, with each source getting updated every hour or so. We do it by parsing the data into static pages. We we receive a datum, we rebuild the pages that depend on it.

We have another site that runs off an Oracle db. the static page site runs about 90x faster, and is basically in memory (disk access is nil.) Now take into account that we can (and do) replicate the static page solution with zero load, we get to a solution that is literally 900x faster.

Now folks are thinking 'oh, the horror!' well... tough! There is no substitute for thinking about your data, and how it flows. A DB is not a given, but a (potentially wrong) answer to a question after you have done some analysis.

Nice "question" (4, Interesting)

photon317 (208409) | more than 8 years ago | (#14032087)


Hello, anonymous Sequoia promoter seeking free advertising. (BTW, You might try picking a product name that normal people can spell without thinking about it).

Your solution is not database clustering, and should not be advertised as such. It's more a long the lines of a database connection proxy which supports multiple simultaneous backends and operating on them in parallel, with some added features to make HA-like solutions relatively easy.

The downside of this style of approach, as opposed to an architecture of the likes of Oracle "RAC", is that it doesn't scale up as you add backend nodes (at least not for writes, but in any case for read-only scaling there are simpler solutions for all of the vendors, even the free ones), and it must have limits on how many transactions it can backlog and replay to a temporarily-unreachable/down server before that server has to be re-synced from scratch in order to catch back up (and I have to wonder if there's really any real-world scenario under real transaction load in which the practical net effect wouldn't be a complete resync of a backend server anytime something goes wrong with it, in which case one could throw out any attempt to backlog transactions for a single failed server and just keep things simple - you fall out of sync, you resync).

The open source world really needs a RAC-like solution for PostgreSQL and MySQL (I'm a fan of friendly open-source competition, so while my personal preference is PostgreSQL, I hope both projects stay current and popular for many years to come). Unfortunately there is unlikely to be a generic way to do this, it will probably have to be re-invented for each database project.

I took a brief look around PostgreSQL's guts a while back, and it actually seemed like the architecture they use isn't far off from something RAC-capable to begin with, just nobody's quite buttoned together a few peices here and there to make it happen. Basically on SMP multiple co-operating backends already serve parallel requests and synchronized on a shared memory cache. There's patches out there for the linux kernel to support network-synchronized distributed shared memory. Put two and two together, and what do you get? Something not far off from a first-pass hack at a RAC-like network-distributed database caching system. Most of the other details are easy to solve (start/shutdown, join/leave cluster, tracking of processes across the cluster, etc), or belong in another problem domain (implementing shared storage filesystems (hey, we have GFS, Oracle OCFS, etc available...)). One of the biggest issues would be multiple nodes all having pg "Writer" processes. The first step would probably be to put the writer on one node and failover the writer functionality when that node dies, to be quickly replaced by a scheme whereby multiple writers can work by synchronizing through a distributed lock manager (there are already dlm modules available for linux). Then there's the issue of making the current distributed shared-memory patches do the right thing performance-wise for this kind of usage, and so on. It's not easy, but it's not outside the realm of possibility.

Re:Nice "question" (2, Interesting)

cecchet (931266) | more than 8 years ago | (#14036831)

I am working for the Sequoia project. I don't think that this post was made by anyone of our group. I don't necessarily share your vision with the advantage of a shared disk architecture over shared nothing.

As a SAN has to be shared between all nodes in your cluster, this already limit the availability and scalability of your database to your SAN capabilities (you are shifting the problem to the disk). With a shared nothing architecture you are also replicating the disks and thus distributing the IO workload as well. Note that when you want to synchronize nodes that are not collocated, then a SAN does not work anymore.

You should read some papers about C-JDBC (the original name of the Sequoia technology) and you will find that the technology can scale quite well. We were able to achieve throughput of a million transaction a minute... that should be quite enough for many businesses.

This looks like a religious war between shared-disk and shared-nothing solutions but a SAN and its admin cost is usually not compatible with someone seeking for an open source solution. If you can afford the SAN, why not just using Oracle RAC?

Re:Nice "question" (1)

Nutria (679911) | more than 8 years ago | (#14043315)

Note that when you want to synchronize nodes that are not collocated, then a SAN does not work anymore.

But you can replicate IO across (high-end) SANs up to several hundred km. Totally OS-independent.

We'e talking serious money though...

This looks like a religious war between shared-disk and shared-nothing solutions but a SAN and its admin cost is usually not compatible with someone seeking for an open source solution. If you can afford the SAN, why not just using Oracle RAC?

SAN isn't the only way to do shared-disk clusters. VMS has been doing it for 20 years, and obviously SANs weren't around back then.

There does, though, have to be some h/w support for shared-disk. Cluster Interconnect, anyone????

Re:Nice "question" (1)

LordMyren (15499) | more than 8 years ago | (#14046017)

There does, though, have to be some h/w support for shared-disk. Cluster Interconnect, anyone????

What, like SCSI? FC? ;) It would re-complexify the situation, but you can just hook up a bunch of SCSI controllers to a chain. Typically this is used for a Hot/Cold situation, but some solutions are Hot/Hot.

Re:Nice "question" (1)

Nutria (679911) | more than 8 years ago | (#14052816)

There does, though, have to be some h/w support for shared-disk. Cluster Interconnect, anyone????

What, like SCSI? FC? ;) It would re-complexify the situation, but you can just hook up a bunch of SCSI controllers to a chain. Typically this is used for a Hot/Cold situation, but some solutions are Hot/Hot.


Huh? A chain? That's the least-redundant topology.

HA DB's (1)

Zapman (2662) | more than 8 years ago | (#14032094)

There are several options on 'master/slave' that can be done. The easiest invovles shared storage (2 boxes tied to 1 disk controler... box A goes down, box B notices, imports the disks, mounts them, starts the DB, and you're back. You only loose any transaction that was 'in progress' at failure time.

Any time you add HA to something, you're adding complexity, and usually a fair bit of it. That's a trade off you need to consider (as is the extra price for software/hardware and support for the solution).

MySQL cluster and replication are not the same (4, Informative)

Scott (1049) | more than 8 years ago | (#14032097)

The submitter of this question seems to have confused the two, Cluster and the older replication. Cluster does not in any way rely on a master/slave setup. Think of Cluster as RAID for databases, where you can lose a node (or more, depending on your configuration) before you lose your db. The current drawbacks of cluster are that it is in-memory and doesn't support certain features, such as fulltext indexing. Replication isn't going to cause you to lose data either if your application is designed to handle a situation where the master server (which you kick your writes to) hits the bricks. Have the app go into a read only mode from your slave.

Neither option is really "beautiful", though Cluster has a lot of promise for the future, especially in 5.1.

Re:MySQL cluster and replication are not the same (1)

Zontar The Mindless (9002) | more than 8 years ago | (#14033467)

Thanks for clearing that up, Scott. You're right (and the submitter is wrong, or at least misinformed) - Cluster replication is *not* of the master/slave variety.

Master/slave replication is tenatively scheduled for MySQL 5.1 Cluster. IOW, you'll be able to replicate with a Cluster as a master or slave, or between two separate Clusters. (It's already working, it's just not yet been merged into the main 5.1 tree.) Also, a disk-based Cluster implementation - while not 100% guaranteed at this time - is also a strong possibility for 5.1. Otherwise, it should be in MySQL 5.2.

DRBD (3, Informative)

Bios_Hakr (68586) | more than 8 years ago | (#14032462)

Have you looked into DRBD? It works kinda like RAID1 over a network. It uses 2 computers to store the database. Another computer acts as a heartbeat server. You'll need 3 NICs in the database servers; one for the connection to the network, one (gig-e preferably) for the connection between servers, and one for the connection to the heartbeat server.

http://www.drbd.org/ [drbd.org]

If you are smart, you'll play around with this on a test network or VMWare first. Get it all tweaked out and actually test it by killing a server while in mid-transaction to see if it works for you.

Re:DRBD (1)

LordMyren (15499) | more than 8 years ago | (#14046072)

DRBD is one of the most interesting sw projects out there right now. There's two things on my wishlist:

1. Multicast replication to more than one backup
2. Write support from more than one system.

Coupled with GFS, it would be absolutely astounding. Multiple failure happens, a single backup is not always enough.

speaking from experience (3, Insightful)

mrselfdestrukt (149193) | more than 8 years ago | (#14032684)

High availability is NEVER as highly available as on paper...
*sob*

Re:speaking from experience (2, Insightful)

anon mouse-cow-aard (443646) | more than 8 years ago | (#14033728)

In my experience, you're right. But you have to take the long view. You don't just say... let's do an HA project, put it in, and walk away. to get more 9's you start with something that makes sense, and then look at every failure that happens, and fix the cause.

case in point:
We started off with HA, figured out how to go to cloned configuration: two servers, two RAIDS, no SPoF, right? We had some LAN issues which caused traffic storms, there was a bug in the controller logic, so both RAIDS crashed simultaneously. We fixed it by using another brand of RAID for one of the units. Those servers have not crashed since...

If you do the accounting, the biggest cause of lack of availability with HA sites is number 18, 18 inches in front of the keyboard. That's not because people are less skilled than before, it is because we have eliminated all the hardware issues, the stuff you don't automate is all the stuff that is too complicated to automate, so only human error in making complicated changes remains. So every down time, there is usually an analyst looking sheepish, but it is usually not his/her fault. The process had some failing in it, and you have to fix the process. It's a lot like I hear airliner crash investigations are like. Find out what happenned, fix the process, so that it doesn't happen again.

You hone it over years, and every failure or even glitch is precious. Study it.

Re:speaking from experience (1)

superpulpsicle (533373) | more than 8 years ago | (#14041651)

If you run HA in any production site, the company better be willing to hire and pay $$$. Any engineer knows that HA means you have a lot to lose. Otherwise why else would you run it.

Re:speaking from experience (1)

anon mouse-cow-aard (443646) | more than 8 years ago | (#14051538)

High Availability is actually a cost-cutter for people who have to provide a service. We get a paged at 4am, told that a node has gone down, we thank them and go back to sleep, fix it during the day. Hardware support can be Next Business Day, instead of 24x7, with no impact on the service given to clients. otoh, adding redundancy is not, really not, just something you slap on top of a single node application, unless the entire solution has been architected with that in mind.

What often happens is that people build normal applications, and then decide to 'add reliability' by moving to HA. It works, but it is far more complicated than doing it right, and such a solution is almost guaranteed not to scale. It is usually quite clunky, and testing configuration changes is difficult.

I'll just posit some terminology. HA usually is used to refer to failover methodologies (ie. piranha, linux-ha, (HP) MC/ServiceGuard, (SGI) FailSafe... what have you. Setting up HA is very complicated, and invariably includes some form of SPF in the form shared disk, or the arbitration system itself.

HA is actually not the best way to get maximum availability. What you want to do is try to make the application cloneable, be able to run multiple instances that at best do not need to talk to each other. Second best is if they talk to each other as if they were peers (no health check, just passing updates around like any other peer)... only after you have given up on that sort of architecture should you fall back on HA.

Slony + PGPool are one postgresql option. (2, Informative)

Anonymous Coward | more than 8 years ago | (#14033062)

If you want to stay within open source Slony + PgPool [google.com] is a viable option. Slony is a very capable master-slave replication system and PgPool is an easy way of handling failover and load-balancing (of reads).


For commercial postgresql options, Afilias (who wrote Slony and uses it to power the .NET domain registry [google.com] ) will happily sell you commercial support for less than similar microsoft-or-oracle stuff. I think a company called Command Prompt also has a commercial solution for postgresql, but I haven't tried it.

Provide more information (1)

Chazmyrr (145612) | more than 8 years ago | (#14034277)

You want the "best" HA solution but not something too expensive. How about you give us something more to go on, like how much are you willing to spend, how much downtime you can tolerate in the event of failure, are there space/power constraints, etc. Then people can give you a real recomendation instead of the standard MySQL sucks/is great.

Take a look at Avokia (2, Informative)

Wojtek_at_Avokia (931205) | more than 8 years ago | (#14034646)

Hey, first, full disclosure: I work at Avokia. But we do have an availability solution that is cheaper than RAC (doesn't require a SAN) and combines the value props of both RAC and DataGuard. We virtualize the data layer enabling many identical databases to be in production serving your users. And you can put these databases into geographically distant datacenters. So you get a live-live-live... set-up without the need for manual conflict resolution that others require. Check it out at: www.avokia.com Wojtek (I'm the product manager).

Radiant Data PeerFS (2, Informative)

darkone (7979) | more than 8 years ago | (#14034782)

http://www.radiantdata.com/ [radiantdata.com]
Radiant Data has a product called PeerFS which is a replicated filesystem (rw/rw, active/active) which allows you to also hold MySQL databases on it. You run 2 seperate MySQL servers pointing to the same data folder, and have it use POSIX locks for writes. The data is physicaly held on each server, and synced across the network.
  I am testing it at work ( http://www.concord.org/ [concord.org] ) now for our websites. VERY easy to setup, but it supports MyISAM tables, and NOT InnoDB (or the other way around).
  This does not solve IP failover, but there are other solutions out there for that. As a bonus we're holding our php webpages and MySQL databases files on the replicated PeerFS Filesystem.
  -Ben

RAIDb (0)

Anonymous Coward | more than 8 years ago | (#14036143)

Redundant Array of Independent Databases; dunno if it went anywhere, but it was a very cool idea.

Don't need SAN for RAC (1)

skinfitz (564041) | more than 8 years ago | (#14036209)

You don't need a SAN to run Oracle RAC - it will work using a shared firewire drive apparently.

RAC is an absolute ******* to install however. The heavily bugged Oracle Installer being the cause of most of these issues.

Incidentally for people experiencing blue screens when building a cluster with Windows Server 2003 Enterprise on Dell 2850 servers with AX100 SAN with Oracle's ASM with QLA200 fibre channel cards - flash the QLA card firmware.

And before anyone asks, I was going to use Linux, but it was actually cheaper to use Windows rather than fork out for RHEL / SUSE licenses. Yes really.

So use Mandriva (1)

leonbrooks (8043) | more than 8 years ago | (#14041571)

Cheaper than RHEL or SuSE or Win2k3 Server and they have had practice with some massive systems.

Re:So use Mandriva (1)

skinfitz (564041) | more than 8 years ago | (#14044427)

Sounds interesting - but can you get certified drivers for QLA200 cards and a Dell (EMC) AX100 SAN?

Re:Don't need SAN for RAC (0)

Anonymous Coward | more than 8 years ago | (#14053012)

"cluster with Windows Server 2003...Oracle...And before anyone asks, I was going to use Linux, but it was actually cheaper to use Windows rather than fork out for RHEL / SUSE licenses."


Our workaround is that we have a small cluster (2 computers) with fully-paid-supported OS's where we do all our testing; and a larger production cluster that's running on non-paid linuxes. Since the machines use pretty much the exact same disk image; we've had no problem with support.

m/Cluster (1)

reaper (10065) | more than 8 years ago | (#14036582)

m/Cluster [continuent.com] is a fine solution for MySQL clustering, and it's almost reasonably priced. Be preapred not to use it on RedHat EL4 for a bit while they work out rpoblems with RH's kernel tweaks, but it works really well for our large LAMP and e-mail system.

Clustering, Master/Slave, Load- balancing (2, Interesting)

queenb**ch (446380) | more than 8 years ago | (#14038866)

High availability is another one of those marketing buzz words that really doesn't have a good, nailed down definition.

You can acheive this in three basic ways. Each has their own pros and cons. I recommend that you weigh them out and come to a decision you think you can live with.

Clustering - You have a group of servers (physical hardware) each running the same software and working to stay synched up with each other. Now clustering comes in two flavors active/active and active/passive. The active/active clusters share the requests between them. The active/passive clusters wait for a node to fail and then another node in cluster assumes the active role.

Master/Slave - This is similiar to clustering in that you have a group of servers (physical pieces of hardware) each running copies of the software. The master does not service requests directly and operates only as a central repository replicating all the data to the slaves. The slaves synch with the master. If one slave fails, there is normally some means for another slave to take over. Note: While this is old and largely deprecated, it is still quite functional and cost-effective under certain circumstances.

Load-Balancing - There are several physical pieces of hardware who each process transactions all the time. All updates/writes/commits are sent to all the servers, but reads are serviced only by one. Note: Again, this is deprecated, but can still be quite functional in specific circumstances.

My experience has been that typically all that is needed for clustering to work is some shared drive space for the various instances in the cluster to manage their own internal bookkeeping. In all honesty, there are several factors here that would make a big difference in what I would recommend for you. One - how much data are you talking about storing? Two - What are you using it for? If this is driving your web site, a second or two of latency while something fails over won't be noticable to the end user. If you're transmitting medical records to an EMT team, it might be fatal. Three - Why are you looking at HA? What need are you trying to fill?

2 cents,

Queen B.

Links for you to consider -
http://www.linuxlabs.com/clusgres.html [linuxlabs.com]
http://www.openminds.co.uk/high_availability_solut ions/databases/postgresql.htm [openminds.co.uk]

PS: Yes, I like Postgres. It supports foreign key relationships out the box (ummm...*R*DMBS, anyone?) It also doesn't force me to put in unnecessary indexes to use fk's. I will say that MySQL has managed to address some of my previous complaints about not supporting views, stored procedures or triggers. I haven't had time to test their version 5 for now, so I'll be silent as I have no opinion on their implementation.

not really HA but... (0)

Anonymous Coward | more than 8 years ago | (#14042262)

try this http://www.intersystems.com/ [intersystems.com]

databases (1)

bugfix1982 (921244) | more than 8 years ago | (#14046240)

Extended System's has a pretty good client-server high performance database server called Advantage. I guess they just released a new 8.0 version. Its not open source and but its affordable, gives you whole lot of replication and backup featurs..and support for various clients. herez is their url : www.advantagedatabase.com

Re:databases (0)

Anonymous Coward | more than 8 years ago | (#14049369)

Advantage is a POS. Stay far away. The only possible reason to use Advantage is if you have an ancient legacy app that requires the use of .dbf files.

A normal DB server can be configured to use lots of RAM for caching. Advantage can't, which is nice if your DB is running on a pentium 90 with 64 MB of ram, but on a reasonably modern server with 2 GB of RAM, it is sloooooow compared to any modern DB since it has to go to disk every time.

Advantage stores each table in a flat file.

Advantage isn't ACID compliant.

The Advantage people decided not to use TCP when the DB client talks to the DB server. They use UDP, and UDP has no guarantee that your network packets go where they are supposed to. Instead of offloading packet delivery to a specialized network layer in TCP/IP that does a fine job, Advantage wrote their own protocol to try and guarantee delivery. It doesn't always work.

Depends on amout of data and amout of data changin (1)

jbplou (732414) | more than 8 years ago | (#14049424)

If you are dealing with a small db or relatively light transactions you could setup real time replication or some other type of of rapid change transfer system depending upon DB vendor. But if the change is rapid to the database you will need some sort of shared storage. There are storage solutions for under 10000 that you can purcase and connect multiple servers to. If your data isn't worth that amount to spend you don't need a cluster.

HA database solution (0)

Anonymous Coward | more than 8 years ago | (#14056494)

I use two 16 way solaris boxes in different data centers, as a veritas cluster. The data centers are vlan'd so that they appear on the same subnet. Out of the bak of each sun are triple fiber channel routes to an EMC symmetrix disk array running SRDF. Oracle (9i) is active on one box, when the box fails, the cluster starts it up on the other box, complete to the last committed transaction.
failover has happened one or twice, and we didn't notice for a couple of days that it had happened.

hardware solution (1)

OutOfFocus (802396) | more than 8 years ago | (#14058524)

HA could also mean hardware redundancy.
Most homebuilt computers now have motherboards that supports
built in RAID-5 support. It's an easy way to provide not just
database HA, but system stability as well.

Just string a couple of SATA-II drives together and activate
RAID support on a motherboard like LanParty from DFI.

RAID-5 is more robust and has higher survival rate in case of
hard-drive failures.

HA for databases (0)

Anonymous Coward | more than 8 years ago | (#14076774)

most ppl here tend to think in smale scale desasters. Master/Slave is unuseable for any serios HA database because of latencys and the single-point-of-failure. Several other solutions like GFS suffer the same problem. Our System is distributed in 6 centers over the whole country. the big trick is *not* to use fancy solutions. they drive you mad. Split the write as early as possible so data get early to the backup db.
Identify your need; alot of money and sweat is wasted in accuracy that nobody needs.
We simply compare the databases of all 6 centers regulary to make sure no write is lost. So the databeses
are actualy independent and no not waste important time with cecking up each other.
we redesigned the systems some years ago and could reduce spending by more than 90%. since simpler Software is easier to maintain we could also strongly reduce the number of 'strange incidence' (=nobody had an idea what happened).
in short: KISS, write early, understand your REAL problem, avoid SPOF at any circumstances
  and forget the advertising from DB people

 
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?