×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Brian Aker On the Future of Databases

kdawson posted more than 5 years ago | from the everything-you-know-changed dept.

Databases 175

blackbearnh recommends an interview with MySQL Director of Technology Brian Aker that O'Reilly Media is running. Aker talks about the merger of MySQL with Sun, the challenges of designing databases for a SOA world, and what the next decade will bring as far as changes to traditional database architecture. Audio is also available. From the interview: "I think there's two things right now that are pushing the changes... The first thing that's going to push the basic old OLCP transactional database world, which... really hasn't [changed] in some time now — is really a change in the number of cores and the move to solid state disks because a lot of the... concept around database is the idea that you don't have access to enough memory. Your disk is slow, can't do random reads very well, and you maybe have one, maybe eight processors but... you look at some of the upper-end hardware and the mini-core stuff,... and you're almost looking at kind of an array of processing that you're doing; you've got access to so many processors. And well the whole story of trying to optimize... around the problem of random I/O being expensive, well that's not that big of a deal when you actually have solid state disks. So that's one whole area I think that will... cause a rethinking in... the standard Jim Gray relational database design."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

175 comments

Well (5, Funny)

CrispBH (822439) | more than 5 years ago | (#23645175)

I couldn't... agree... more... I'd say that some... very valid... points... have been... raised.

Re:Well (2, Informative)

Bogtha (906264) | more than 5 years ago | (#23645965)

Come on, he's talking about the future of databases. He was just trying to set the mood by doing his best Kirk impression.

Dear Slashot (5, Funny)

CopaceticOpus (965603) | more than 5 years ago | (#23645217)

Can we please have another loud, circular debate over which database is best? It's the only way your favorite database will ever win.

Thank you.

Re:Dear Slashot (1, Funny)

Anonymous Coward | more than 5 years ago | (#23645575)

MyPostGroracleSQL Server 2008

</thread>

Re:Dear Slashot (1)

CastrTroy (595695) | more than 5 years ago | (#23646157)

Since you brought up the topic let me bite. I was just doing some testing today to see how certain queries we were running would run on MySQL, as compared to SQL server. Just to see if the query was so slow because of the size of the dataset, or because SQL Server is made by MS. Anyway I was in the process of transferring data, and I ran into this weird 4GB MyISAM table size limit. You can run a query and alter the table so that it can store more data, but what kind of DB in the year 2008 doesn't by default, out of the box, support storing more than 4GB of data in a single table. it's really not that much data anymore. Yes I realize that InnoDB doesn't have these crazy limitations, but MyISAM is much faster, and I don't require any of the transactional stuff for this table.

Re:Dear Slashot (5, Interesting)

dave87656 (1179347) | more than 5 years ago | (#23647687)

Okay, I'll bite too ...

We've been running MySQL using MyISAM since 2002. It's delivered acceptable performance until recently as we've expanded our application and the data volumes have increased. Now, we have to reorganize it on a frequent basis (we just backup and restore).

But, we really need to move to a transactional model so I've done some benchmarking between InnoDB and Postgresql. In almost all cases, Postgresql was significantly faster. Our application is very transactional with alot of writes.

And from what I've read, Postgresql scales well to multiprocessors and multiple cores where as MySQL does not. I know Falcon is coming but it was still very Alpha at the time I compared - I couldn't get it to run long enough to perform the tests.

Has anyone else compared Postgres to MySQL/Innodb?

Too small (-1, Offtopic)

bluefoxlucid (723572) | more than 5 years ago | (#23645225)

Solid state disks will never match with hard disks. The price point for a spinning metal platter array will always stay ahead of the price point for SSD. A microdrive even does 8GB on what I assume is 2 sides of 1 platter... this would make a 3.5 inch drive with 8 platters 98 times bigger (surface area wise), 784GB. Of course we can do 2TiB such drives, and 20GB microdrives are feasible... 1.96TiB. 5.25 inch drives (Bigfoot hard drives) clock in at 200.5 times the surface area, 4TiB in this model.

Next year the data density on those spinning disks will go higher, and so will NAND. Let's watch the price point, especially when nanotech gives us shiny new read-write heads and super dense NAND....

Re:Too small (2, Funny)

Tablizer (95088) | more than 5 years ago | (#23645321)

Solid state disks will never match with hard disks.

In 2025, this will be grouped with "640K RAM will always be more than enough." and you'll feel like such a Bill G.
         

Re:Too small (2, Informative)

Anonymous Coward | more than 5 years ago | (#23645985)

Except that Bill Gates never said that. Bluefoxlucid did.

I'm sure he'll feel lots worse. While Gates gets hounded for something he never said, at least he has mountains and mountains of cash to console him.

Re:Too small (1)

bobwrit (1232148) | more than 5 years ago | (#23645339)

You fail to mention the human factor. We maybe so accustomed to hard disks that we will get sick of them and then switch.

Re:Too small (1)

Tablizer (95088) | more than 5 years ago | (#23645401)

You fail to mention the human factor. We maybe so accustomed to hard disks that we will get sick of them and then switch.

Reliability should also be a driving force. The mechanics of hard-drives make them one of the most fragile and problematic parts in a computer. (Although RAID may dampen that somewhat, but double the cost.)
     

Re:Too small (1)

bluefoxlucid (723572) | more than 5 years ago | (#23645647)

RAID 5 against (n) disks. Given non-failure probability (p) of one disk in a given time window, we have sqrt(p^n). So for 3 disks at a point in time where p=0.5 (0 p 1 in all cases, any number will illustrate the effects), 35% chance that the whole array still works. 5 disks, 17%. 2 disks (not doable), 50%.

Point: RAID5 doesn't statistically increase reliability. I know someone who worked for a data recovery firm, lots of people swapped the drives the moment they failed and had a second go while the first hot spare (i.e. was already in there for auto-activation on one disk failure) was rebuilding.

Mirroring is okay but gets expensive. For two storage devices (disks, or RAID5 arrays) p=50%, 1/Sqrt(2) is your failure probability of the whole array. And the two arrays likely don't share identical data layouts; though a specific RAID10 might, who knows.

Re:Too small (2, Insightful)

dgatwood (11270) | more than 5 years ago | (#23645929)

Worse than that. Part of that is the fact that RAID vendors tell people to go with drives from the same manufacturer to optimize performance, when in reality, they should be doing with the most diverse pool of drives possible, dramatically reducing the probability of multiple catastrophic failures at the same time (except through human error, some sort of physical impact, electrical surge, etc.). If a drive has a design defect, it isn't at all uncommon for them all to fail at n+/-k hours for some relatively small value of k. If all your drives are the same model from the same batch, your probability of losing the entire array is suddenly remarkably close to 100%. RAID with the same model of drive dramatically reduces reliability in the name of performance.

Even mirroring is basically useless for reliability if the drives are from the same vendor unless you swap out the mirrored drive daily so that no single backup has been operating more than half as long as the main drive. RAID is a nice idea in theory, but the reality is that with what seems to be a rapid decline in hard drive reliability over the past few years, in practice, performance notwithstanding, RAID just raises your power bill and guarantees you have two drives to ship back to the manufacturer for replacement instead of one.

As an aside, I had two personal hard drives die in the past week (and four within the last year). When you consider that I only have about 5 drives in regular use, that's alarming. Both of these drives were under a year old. One was a Seagate 500 GB drive in continuous operation in a heavily cooled tower (sides off the tower, four fans blowing outside air straight across the drive). Came into the house and it sounded like someone was using a radial arm saw. Cloned off enough data before the drive stopped reading any blocks at all, so my MythTV box is up and running again. That lasted about 9 months. The other was a Seagate 160 GB 5400 RPM laptop drive. Lasted 11 months and suddenly went into click-of-death mode where no data was accessible from the drive. It's going to be a long time and a lot of therapy before I'll ever trust anything important to a Winchester drive again... and Seagate went in a single week from my "high reliability, buy over all other vendors" list to my "not in a million years" list.

Re:Too small (0)

Anonymous Coward | more than 5 years ago | (#23646005)

err.. "they should be doing it with". My bad.

Re:Too small (0)

Anonymous Coward | more than 5 years ago | (#23646381)

Your Myth drive was probably killed by the shit you were recording off fox.

Jus sayin

Re:Too small (0)

Anonymous Coward | more than 5 years ago | (#23646749)

RAID 5 against (n) disks. Given non-failure probability (p) of one disk in a given time window, we have sqrt(p^n). So for 3 disks at a point in time where p=0.5 (0 p 1 in all cases, any number will illustrate the effects), 35% chance that the whole array still works. 5 disks, 17%. 2 disks (not doable), 50%.
Wrong.

Point: RAID5 doesn't statistically increase reliability. I know someone who worked for a data recovery firm, lots of people swapped the drives the moment they failed and had a second go while the first hot spare (i.e. was already in there for auto-activation on one disk failure) was rebuilding.
Selection bias anyone?

Mirroring is okay but gets expensive. For two storage devices (disks, or RAID5 arrays) p=50%, 1/Sqrt(2) is your failure probability of the whole array. And the two arrays likely don't share identical data layouts; though a specific RAID10 might, who knows.
Wrong again, Statistics 101.

Re:Too small (1)

CastrTroy (595695) | more than 5 years ago | (#23646297)

Personally, I've had way less problems with hard disks than I have had with RAM chips, CPUs, video cards, motherboards, network cards. I've only ever had 1 hard drive die. I've had countless of the other devices die. Hard drives have very simple circuits, so it's very unlikely that the chips will overheat, or have any capacitors and such blow. The only thing left to fail is the mechanics. Since they are all sealed in a dust free chamber, the only thing to kill it is heat. Keep you hard disks properly cooled, and you really shouldn't have many problems with them. I really don't understand why people say hard drives are so problematic. Most computer problems I've experienced are due to faulty solid state hardware.

Re:Too small (3, Interesting)

Anonymous Coward | more than 5 years ago | (#23645479)

It's more accurate to say that there will probably always be a tradeoff between slow and fast storage, there will probably always be a tradeoff between permanent and temporary storage, and there will probably always be a tradeoff between expensive and cheap storage.

In 20 years, I do not know what form slow, or cheap, or permanent storage may take. It may not be spinning magnetized platters. But I do know that in 20 years, every well-written database will have algorithms and data structures to deal with slow storage, permanent storage, and cheap storage.

Re:Too small (1)

gfody (514448) | more than 5 years ago | (#23647215)

Volatile, nonvolatile, expensive, cheap are all hardware concepts not applicable to software. Applications in the future will just use 'memory' and assume it's persistent and as fast as possible.

The only reason applications are forced to account for slow storage is because hard disk drives have been our biggest bottleneck for 50 years!

Leaky abstractions (5, Interesting)

yoris (776276) | more than 5 years ago | (#23645239)

Gotta love that link between the hardware limitations and the software concepts that may seem fancy but are essentially only built to get around them. I believe someone once called it "the law of leaky abstractions" - would be interesting to see what the new limitations would be if you start combining solid-state storage with pervasive multiprocessing, i.e. what can you do with a multi-processor multi-sdd server that you can not do with a single-processor single-hard drive server?

I think TFA is pretty right on the money that parallellization and massive use of SSD could cause some pretty fundamental changes in how we approach database optimization - if I were to imagine that rack that I'm staring at being filled with SSD drives and processors instead of with nothing but hard drives... locality of data takes on a whole new meaning if you don't require data to be on the same sector of the HD, but rather want certain sets of data to be stored on storage chips located around the same processor chips to avoid having to overload your busses.

Then again, I haven't been in this game for so long, so maybe I'm overestimating the impact. Oldtimer opinion would be very welcome.

Re:Leaky abstractions (1)

Crispy Critters (226798) | more than 5 years ago | (#23645735)

I don't think that is the way that "leaky abstractions" is generally used. Leaky abstractions occur when you need to understand the underlying technology to predict how something will act.

One example I know is in Python.
>>> a=[[5]]
>>> b=a
>>> a[0][0]=3
>>> b
[[3]]

Should changing one variable also change another? It makes sense if you think of a Python list as linked list of pointers. (It acts that way -- I know nothing about the internals of Python.) The behavior is very mysterious at face value.

Re:Leaky abstractions (1)

tepples (727027) | more than 5 years ago | (#23646291)

Should changing one variable also change another?
If both variables point to the same object, yes. This is the case for Java's ArrayList and Python's list, which share similar semantics, and for Java's HashMap and Python's dict.

Re:Leaky abstractions (1)

ZerothOfTheLaw (1227036) | more than 5 years ago | (#23646507)

Well, thats because python, to save memory and processing time with lists, passes by reference, not by copy. The way to do what you want is this: >>> a=[[5]] >>> b=[a[0][:]] >>> a[0][0]=3 >>> b [[5]] However, if you are going to be passing around lists, and want copies to modify, then do this: b=a[:] to do a one-level copy of the list's contents. Any lists in the list, as above, will also be passed by reference, not copy. If you do have nested lists, then take a look at the copy module.

Re:Leaky abstractions (0)

Anonymous Coward | more than 5 years ago | (#23646651)

if I were to imagine that rack that I'm staring at being filled with SSD drives and processors instead of with nothing but hard drives

Welcome to Slashdot folks!

Get ready, aim, fire! (1, Funny)

Anonymous Coward | more than 5 years ago | (#23645273)

Eagerly awaiting the gratuitous Sun trash talking! Every thread seems to devolve into Sun bashing (especially with regard to the MySQL acquisition). Everyone: Sun did not decide to close source MySQL. If anything they are pushing against it.

This IS news! (5, Funny)

Bluesman (104513) | more than 5 years ago | (#23645305)

MySQL has people who are responsible for *designing* it? I'm shocked, Shocked.

Re:This IS news! (0)

Anonymous Coward | more than 5 years ago | (#23646593)

Well, they don't seem to be well versed in database design:

"random I/O being [in]expensive" "will... cause a rethinking in.. the standard Jim Gray relational database design"

Wtf the relational algebra have to do with random I/O ? That is the absolute opposite: thanks to Jim Gray (a CJ Date, etc) work, relational database design have no dependency on the physical implementation, so SSD is a no problem for relational databases...

Re:This IS news! (2, Informative)

gfody (514448) | more than 5 years ago | (#23647271)

I'm sure he meant the 'implementation of'
Relational algebra has nothing to do with random IO however building a relational database system has everything to do with random IO because it is by and large the worst bottleneck in the system. The best performing RDBMSs are the ones completely designed around avoiding random IO. That's why TFA says a new RDBMS could be created from scratch and blow the existing players out of the water in the new SSD world.

Help with MYSQL! (-1, Offtopic)

JavaManJim (946878) | more than 5 years ago | (#23645397)

I downloaded and tried to run this MYSQL version 5.0.51a. MYSQL is a seriously flawed product in my opinion. Because it uses a web interface. Why not a PC based information box? Requiring a web browser adds to the requirements of this product.

I start the thing. OK DOS box.
I start the administrative console.
Which starts SeaMonkey which is NOT MY defalt web browser.
SeaMonkey reports it cannot get to http://localhost:4848/ [localhost]

Does anyone know how MYSQL might have a decent PC based administrative panel like Microsoft products do?
Does anyone know about aforementioned errors?

I would appreciate any suggestions!
Jim

Re:Help with MYSQL! (0)

Anonymous Coward | more than 5 years ago | (#23645469)

mysql-admin
mysql-query-browser

$ mysql -u -p

Re:Help with MYSQL! (0)

Anonymous Coward | more than 5 years ago | (#23645509)

MYSQL is a seriously flawed product in my opinion.
Download Tortoise SVN, then try to check something out and claim that SVN is seriously flawed because it doesn't work out of the box. Please LEARN until you understand what you are doing before trying to place criticism.

I start the administrative console.
I dont think you started what you think you did.

http://www.mysql.com/products/tools/administrator/ [mysql.com]

Admittedly.... (1, Interesting)

Enderandrew (866215) | more than 5 years ago | (#23645429)

I haven't read the article yet, but that summary terrifies me. I keep hearing how in the modern age we shouldn't think about optimal programming because people have more resources than they need.

Databases need to scale to disgusting large numbers. Memory and disk resources should always be treated as expensive, precious commodities, because they might be plentiful on a simple database on robust hardware, but there are plenty of people out there with massive friggin' databases.

In corporate America, Oracle and MSSQL sadly are king. MySQL has some interesting advantages, one of them is performance over MSSQL, but if they squander that, what will they be left with? And ffrankly, I don't think Sun paid a fortune for MySQL just to piss away opportunities at gaining ground in corporate America.

Re:Admittedly.... (1)

od05 (915556) | more than 5 years ago | (#23645679)

I agree that there hasn't been enough progress in this field, just slow, incremental improvements.

I'm constantly having to compromise with the physical limitations of my database, such as having to pick and choose which attributes to delete, as the tables in my Oracle database reached it's 1000 column limit a long time ago. While these limits have forced me clean up the data and consolidate, it's a major annoyance that I shouldn't have to deal with.

MySql's 4096 column limit is enough to make me want to switch.

Re:Admittedly.... (1)

m0rph3us0 (549631) | more than 5 years ago | (#23645811)

wow.

I imagine a schema like OrderItemId1, OrderItemId2 .. OrderItem500. I have a feeling if you designed your schema, it would be much faster.

Re:Admittedly.... (1)

njcoder (657816) | more than 5 years ago | (#23645883)

I'm constantly having to compromise with the physical limitations of my database, such as having to pick and choose which attributes to delete, as the tables in my Oracle database reached it's 1000 column limit a long time ago.
You do know you can create more than one table right? You don't have to normalize every schema so much :)

Seriously though, I'm curious what type of application would need and actually benefit from such a design.

Re:Admittedly.... (1)

od05 (915556) | more than 5 years ago | (#23646527)

My database stores the behaviors of visitors to develop predicative modeling applications for a mid-sized online retailer. The data collection process is automated, and responses can fill up the tables very fast. I've found that takes a lot of data to accurately run a regression, as the behaviors of Internet users is pretty much random. In my system, each participant is a row, and each response is a column. Many people tend to revisit the site on a regular basis, and it is not uncommon at all for the click-throughs of certain users to exceed the 1000 column limit over a period of time.

I don't mind having to break the data up into chunks when it comes time to import the data into SPSS for analysis, however, I do get annoyed by having to store the data in chunks.

I think there is a high demand for scalability on disgusting proportions, I for one, would benefit from this.

Re:Admittedly.... (0)

Anonymous Coward | more than 5 years ago | (#23646661)

you do realize thats a horrible schema for storing that data. instead, you should have a People table with one entry per person, and a Response table, with one entry per response that has a FK back to the People table.

select * from Response as R where R.people = somebody;

would be how you'd collect all the responses from a single person.

Re:Admittedly.... (2, Informative)

allanw (842185) | more than 5 years ago | (#23646721)

If you find you have to create thousands of columns response_0001, response_0002, ... response_4096, then you should probably realize that there's something wrong with your schema. It's just basic database normalization. (Though I suppose you might have a reason for doing it this way. But it sounds incredibly horrible.)

Re:Admittedly.... (2, Insightful)

njcoder (657816) | more than 5 years ago | (#23646821)

Yeah, like the other responders have mentioned. Sounds like you're doing it horribly wrong.

I don't know much about SPSS, but if you need to present the data in such a flat format, you can always store the data the proper way and create a view when you need to import it into SPSS.

I'm not sure if there is some sort of limit on the number of columns in Oracle views or not.

Re:Admittedly.... (2, Informative)

CastrTroy (595695) | more than 5 years ago | (#23646227)

I'm going along with the other two guys. I can't see what application would need more than 1000 columns in a single table. What really gets my is the MS SQL Server 2000 maximum of 8 KB ( SQL Server 7 was 2 KB) in a single row. Now there's a limitation that's badly designed. Oh, and you can define a table with 15 Varchar(8000) fields, just don't try filling every field. 1000 columns I could do just fine with (SQL Server supports 2048?) but the big killer is that you can't even use 2000 columns, because if you did, you would run out of space in the row, unless the average field size was under 4 bytes.

Re:Admittedly.... (1)

CrazedWalrus (901897) | more than 5 years ago | (#23646329)

?!

1000? That's "One Thousand" ?? As in ten times more than 100?

You might want to take some database design courses if you consider a 1000 column limit to be unreasonable or unworkable. The widest table I'd ever seen was one where the original programmer did a poor job of normalization (that is, he didn't do it), and even that only hit about 150 to 200 columns (times about 60 million rows. Ouch.) Not only is it difficult to maintain wide tables like that, the database will probably perform poorly as well when dealing with that table -- especially if it needs to table scan.

With proper schema design, you should never have to pick and choose which attributes to keep due to table width limitations. Also, I get the feeling that, when you say "consolidate", you mean moving several columns into a single delimited field, which is the makings of another unmaintainable mess.

If this is a database that a real company depends on, you really might want to get someone in who can refactor your data model. It'll be painful at first, but it'll save you some real trouble down the road.

Re:Admittedly.... (2, Interesting)

TheFlamingoKing (603674) | more than 5 years ago | (#23646403)

I wonder if I've been reading Slashdot too long - I can't tell whether this is a troll, a joke, a newbie, or an actual legitimate issue...

Re:Admittedly.... (1)

allanw (842185) | more than 5 years ago | (#23646745)

The guy is absolutely right. Anyone designing a database schema should know this.

Re:Admittedly.... (2)

Samah (729132) | more than 5 years ago | (#23646065)

Just as a reference Re: big applications and databases, the company I work for maintains a transport management system for the largest rail freight provider in Australia. The database for this application (Oracle 10g) currently stands at 1100+ tables (some containing a few million rows) 2400+ packages, and 450+ triggers. The client application is comprised of 2100+ java source files and 500+ interactive Swing forms.
Your statement of "Databases need to scale to disgusting large numbers." is spot on. ;)

Re:Admittedly.... (2)

Enderandrew (866215) | more than 5 years ago | (#23646159)

I used to work for a casino company (the largest in the world at the time, and largely is).

Our slots database didn't just track totals for rewards credits and such, we kept every transaction ever. We can see how long you played at each machine, for how long, when you put money it, when you hit, when you took money out, etc. We know when you want to come in and play, which machines, etc. Consider that for every customer in a casino. Now consider than on a scale of 60 major casinos.

I'm not a DBA, so I was only responsible for resource managing on the servers.

Re:Admittedly.... (2, Funny)

Samah (729132) | more than 5 years ago | (#23647379)

UPDATE SlotMachines SET ChanceToWinPercent = 100;
COMMIT;
try {
    machine.collectMoney();
    player.run();
} catch(SecurityException e) {
    player.resist();
    if(player.isFast())
        casino.exit(0);
    else
        player.yield();
}

Re:Admittedly.... (1)

CastrTroy (595695) | more than 5 years ago | (#23646831)

All software should be tested and optimized for performance. I ran my web app through the profiler today, and found out it was doing tons extra work, which basically accomplished nothing, and which ended up slowing things down quite a bit. It's amazing how much things can be sped up just by running a profiler. All your unit tests may pass, and everthing may appear to be working correctly, but running through a profiler can fix some major bottlenecks quite quickly.

Re:Admittedly.... (1)

hemp (36945) | more than 5 years ago | (#23646919)

Sounds like your web app was badly designed ( or maybe not at all ). You should have not spent any time coding superfluous functions and procedures in the first place.

Re:Admittedly.... (1)

CastrTroy (595695) | more than 5 years ago | (#23647025)

Yeah, well back here in the real world, people have to deal with applications that are 15 years old, and have tons of legacy code, a good portion of which wasn't "designed". It's nice to talk about how things should be in and ideal project, but in real life, things are usually not ideal.

Re:Admittedly.... (1)

Alpha830RulZ (939527) | more than 5 years ago | (#23647693)

I believe Edgar Dijkstra said it well: "Premature optimization is the root of all evil." Profilers exist for a reason, which is that in significant systems, it can be hard to predict where the system and users will spend all their time. It's not economic for a project to optimize everything, all the time. Nor should you - the job is to get good work done. A slow, correct routine is often just fine, for my work at least.

BigTable, HBase and SimpleDB are the future (3, Informative)

atomic777 (860023) | more than 5 years ago | (#23645481)

I recently blogged on this [tomictech.com] , but essentially, as long as your average PHP developer thinks of MySQL as a glorified flat file system to place their serialized PHP objects, an always-available, pay-as-you-go distributed database is going to revolutionize application development in the coming years. For those that want to keep control of their data, HBase [apache.org] is coming along quite nicely.

Can I ask a stupid question... (2, Interesting)

Enderandrew (866215) | more than 5 years ago | (#23645501)

I'm actually reading the article now, and as he is talking about design for a database taking multiple cores into consideration, etc, I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations. Wouldn't the interbase approach work better in a parallel environment?

Again, I'm sure this is a stupid question, but perhaps someone could clue me in.

Re:Can I ask a stupid question... (0)

Anonymous Coward | more than 5 years ago | (#23645591)

When you can get groups of record (not tables) to be their own lockable object(s), with dedicated processor(s), disk(s), you'll end up with the endgame in functional databases that is only limited by throughput. Just a matter of time.

Re:Can I ask a stupid question... (3, Informative)

Johnno74 (252399) | more than 5 years ago | (#23645743)

Umm I'd say you have it wrong - "Traditional" databases have many different lock granularities, such as Table locks, page locks and row locks. SQL server and Oracle certainly do this.

MySQL only does table locks, which are much simpler and much faster for light workloads, but as I'm sure you can imagine when you have many CPUs trying to update the table at once in the end each thread has to wait their turn to grab the lock and perform their updates sequentially.

In SQL Server, Oracle, or any other "enterprisey" db multiple threads can update the same table at exactly the same time, as long as its not the same row.

Stuff like this is exactly why people who use MS-SQL and oracle look down their nose at people who use MySQL and claim it is capable of playing with the big boys.

Once again, despite what MySQL are saying there is nothing innovative here. All this stuff has existed in the mainstream database engines for many, many years and they are still playing catchup.

Re:Can I ask a stupid question... (3, Informative)

XanC (644172) | more than 5 years ago | (#23645987)

What you say is true for MyISAM tables, but MySQL's InnoDB tables fully support row-level locking. And I believe their BDB tables support page-level locking.

Re:Can I ask a stupid question... (1)

Johnno74 (252399) | more than 5 years ago | (#23646791)

Cheers for the info, I thought it still did table locks only.

Don't get me wrong, I have nothing against MySQL... It has its place, and it does seem to be improving. What does make me grind my teeth is when people say its a serious competitor to MS SQL, or oracle. The typical argument is all the other things that these DBs give you (like proper ACID transactions, distributed transactions, fine-grained security, triggers, functions, etc etc) are just fluff and you don't need them anyway....

Re:Can I ask a stupid question... (1)

XanC (644172) | more than 5 years ago | (#23646977)

MySQL 5 may be worth a look. I'm not saying it's at the level of the other databases, but it does now support I believe all the features you mentioned. And if you set strict SQL mode, a lot of the other gripes about data handling go away as well.

Re:Can I ask a stupid question... (1)

kestasjk (933987) | more than 5 years ago | (#23645809)

I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations
I'm not sure what you mean, fine-grained locking is up to the engine, and MySQL can't really do it.
InnoDB is the classic example; you can get MySQL to lock entire InnoDB (or MyISAM) tables, but if you want to lock a certain row in an InnoDB table MySQL can't help, because it doesn't know about InnoDB.

If you're saying "the MyISAM/MySQL table-based locking won't work in a parallel environment", then you're right, but that's nothing new that relates SSDs and micro-cores.

Fully transactional engines have to be very good at being parallelizable. In databases things had to be "parallel" even before multi-core was around, because being able to interact in more ways with the database at a time means better performance.
You don't need two CPUs to have multiple clients trying to write to the same data without treading on each others' toes.

Databases are inherently scalable and parallelizable, so I think more low-latency memory and more CPUs will have a big effect.
But I could be wrong, I only deal with databases on the other side of the SQL parser.

Re:Can I ask a stupid question... (1)

Enderandrew (866215) | more than 5 years ago | (#23645895)

As far as I understand Interbase (and now Firebird) use a versioning approach as opposed to locking. I wonder if this approach is more beneficial with parallel processing, but you're right in that the concern was already relevant given multiple users in a database at once.

I wonder which one has better performance for multiple operations at once.

Re:Can I ask a stupid question... (2, Informative)

anarxia (651289) | more than 5 years ago | (#23646073)

It is called MVCC. Other databases such as oracle and postgres also use this approach. MVCC has its pros and cons. It allows for higher concurrency, but it might require extra copies of data and that translates to more memory and disk space. On a "weak" server it might actually end up being less concurrent.

Re:Can I ask a stupid question... (1, Interesting)

Anonymous Coward | more than 5 years ago | (#23646285)

InnoDB uses MVCC as well. As storage goes, InnoDB is perfectly serviceable. It's just the rest of the DB engine around it that's out of whack.

Re:Can I ask a stupid question... (1)

merreborn (853723) | more than 5 years ago | (#23646277)

I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations. Wouldn't the interbase approach work better in a parallel environment?
Both interbase and MySQL's InnoDB engine use the concept of Multi Version Concurrency Control (MVCC) to address the issue of simultaneous access -- although their implemenations differ: InnoDB does make use of row-level write locks. The upcoming Falcon engine is said to do the same with even fewer locks than InnoDB.

All should parallelize fairly well -- it's the table-level locks of MySQL's default MyISAM engine that really kill simultaneous access.

If you... (-1, Redundant)

Anonymous Coward | more than 5 years ago | (#23645515)

If you.... edit an article.... enough.... you can make it.... say just.... about anything.

Cores? (1)

bar-agent (698856) | more than 5 years ago | (#23645555)

In the article, they say the multiple cores will have a big impact on the design of databases. I don't understand this. Databases are already threaded, are they not? Why does multi-core support require different code than multi-thread support?

Re:Cores? (1)

eggfoolr (999317) | more than 5 years ago | (#23645841)

Most DB's are sudo threaded. They run multiple single threaded processes and do have issues scaling. To be fair, it is also the application design that limits scalability, not just the database.

For a very simple example, you could choose to traverse a table sequentially, or you could split it into smaller chunks and perform multiple operations in parallel.

The problem on a single or small number of processors is that if you make your application highly threaded, you will generate a huge amount of context switches.

If you get really threaded then have a look at Sun's T2 processor. That's the future!

Re:Cores? (4, Insightful)

bersl2 (689221) | more than 5 years ago | (#23646301)

"sudo" is that command which grants one user authorization to act as another user.

"pseudo-" is that verbal prefix which means "false".

I'm seeing language devolve in front of my eyes...

Re:Cores? (0)

Anonymous Coward | more than 5 years ago | (#23646435)

> Most DB's are sudo threaded.

OK, name one DB that has to run as root? MySQL, PostgreSQL, Microsoft SQL, and DB2 do not. Back-up your claim.

Re:Cores? (1)

eggfoolr (999317) | more than 5 years ago | (#23646479)

Ummmm, Oracle. In order for it to run DISM on Solaris it needs setuid.

Bloody hell, I'm letting my real identity escape now!

Re:Cores? (1)

MBCook (132727) | more than 5 years ago | (#23646499)

Right. MySQL, at least under Linux, just starts to fall flat on it's face if it's given more than 4 cores. That's the case for 5.0.x, and I think it is still true of 5.1.x (but that's still beta anyway). I seem to remember the situation being better under Solaris, but it's not fixed entirely.

Oracle, DBII, and (probably to a lesser extent) PostgreSQL can handle many more processors.

Just like any other software, multithreading is hard. But where there are times in some programs (say games) where stale data may be OK (i.e. for drawing a frame) or a double update may be tolerable (setting something to be dead twice is probably pretty harmless) when you are running a multi-billion dollar system those kind of errors are absolutely unacceptable. You just have to be far far more careful.

You're right that in some circumstances this isn't that bad. In some cases each processor could almost "own" a table and that might work. But when you have one giant table and a couple of small ones, you just have to share the giant table and lock on individual rows, sections, partitions, whatever.

The furture of databases you say? (0)

Anonymous Coward | more than 5 years ago | (#23645677)

2 words....SQL Server

Oracle is going the way of UNIX, and by UNIX I mean HPUX, Solaris, & AIX. In 10 Years you'll see the trend towards MS SQL Server for all but the top-tier business critical apps. MySQL and friends will likely remain a popular choice for meta-repositories and web tech solutions. The dirty secret is that tons of Oracle 8 & 9 mid-range systems are getting replaced with SQL Server every day as they get technology refreshes. You really can't argue the economics, and Oracle, like legacy big-iron won't change fast enough to turn the tide.

Flame on -

OCP Certified Oracle DBA

Don't have to wait 10 years (1)

VampireByte (447578) | more than 5 years ago | (#23645863)

I think it's happening already. I develop in both Oracle & SQL Server environments and for years organizations I've encountered used Oracle for one purpose and SQL Server for another. In the last two years however, many of those companies are interested in totally replacing their Oracle databases with SQL Server.


As far as MySQL, after doing a lot of development with it over several years during the late dotcom/early bust days, I haven't worked with it lately because there hasn't been much demand for it. It'll be interesting to see if that changes with Sun's involvement.

Re:The furture of databases you say? (3, Interesting)

njcoder (657816) | more than 5 years ago | (#23646071)

If I was migrating away from Oracle, MS SQL Server wouldn't be my first choice. Postgresql would. Given the choice between a free version that is similar to the original vs a product that is very different that I need to pay for it's a no brainer. Also take into consideration that for some database applications you're going to need some serious horsepower. You're limited in the number of procs you can have in a Windows system. Last time I checked, once you get past 8 processors Windows doesn't scale as well. Even linux doesn't do as well as Solaris, AIX or HPUX past a certain number of procs.

Oracle's RAC seems to be a better solution than MSSQL's approach. PostgreSQL (and EnterpriseDB) are working on a more RAC-like approach.

This is a good story about a company that successfully moved from Oracle to Postgresql [arnnet.com.au] . Basically, they had 2 database systems running Oracle, a data warehouse and an OLTP system. They moved their data warehouse over to Postgresql running on Solaris 10, then they used the licenses they no longer need for the data warehouse to boost the computing power of the OLTP system.

Remember kids. (0)

Anonymous Coward | more than 5 years ago | (#23646175)

Remember kids this is the same guy who said that ACID wasn't important and the emulating foreign keys was as good as having support for them in the database. Face facts - he will do / say what ever it takes to get you to use / buy his product. History shows this to be true.

Brian's Blog (1)

apachetoolbox (456499) | more than 5 years ago | (#23646299)

http://tangent.org/ [tangent.org]

Sup Brian!

-Bryan (from apachetoolbox)

Re:Brian's Blog (0)

Anonymous Coward | more than 5 years ago | (#23646943)

Heh.

I lived in Brian's basement for a few weeks when I first moved to Seattle.

Sadly, none of his brainpower settled into his basement and into my brain. o.O

(Even if some of you folks don't like MySQL and/or some of its features, you gotta admit Brian's done more than most of us...)

self-fulfilling predictions? (1)

BigGerman (541312) | more than 5 years ago | (#23646509)

If one's product does not support advanced features or, sometimes, even basic common sense features, no wonder one's predictions call for the world that does not need those features.

Locality is the key (5, Interesting)

Dave500 (107484) | more than 5 years ago | (#23646627)

In my mind as a database engineer for a wall street bank, the biggest change in the near term that we forsee is data locality.

Given the amount of computing power on hand today, it may surprise many how difficult it is to engineer a system capable of executing more than a few thousand transactions per second per thread.

Why? Latency. Consider your average SOA application which reaches out to 4-5 remote services or dataserver calls to execute its task. Each network/rpc/soap/whatever call has a latency cost of anything between one and at worst several hundred milliseconds. Lets say for example that the total latency for all the calls necessary is 10 milliseconds. 1000/10=100 transactions per thread per second. Oh dear.

The amount of memory an "average" server ships with today is in the 32-64GB range. Next year it will be in the 64-128GB range. The average size of an OLTP database is 60-80GB.

So, the amount of memory available to the application tier will very soon be greater than the size of the database, warehouses excluded. Moore's law is quickly going to give the application tier far more memory than it needs to solve the average business state, exceptions noted.

The final fact in the puzzle is that for transaction processing, read operations outnumber write operations by roughly 20 to 1. (This will of course vary on the system, but that *is* the average.)

This situation is strongly in favor in migrating read only data caches back into the application tier, and only paying for the network hop when writes are done in the interests of safety. (There is a lot of research into how writes can be done safely asynchronously at the moment, but its not ready yet IMHO.)

Challenges exist in terms of efficient data access and manipulation when caches are large, performant garbage collection and upset recovery - but they are all solvable with care.

Its my opinion that in the near future large data caches in the application tier will become the norm. What has to be worked out is the most effective way of accessing, manipulating and administering that data tier and dealing with all the inevitable caveats of asynchronous data flow.

Some (not complete) examples of implementing this:

Relational Caches (there are many more):
http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_for_java.html
http://www.alachisoft.com/ncache/index.html

Object Caches:
http://www.ogf.org/OGF21/materials/970/GigaSpaces_DataGrid_OGF_Oct07.ppt
http://jakarta.apache.org/jcs/

Re:Locality is the key (1)

eekygeeky (777557) | more than 5 years ago | (#23647141)

So the future is giant databases running in RAM, backed up to nearby RAM and only written to disk on schedule or in extremis? sounds... kind of cyberspace?

Re:Locality is the key (1)

Dave500 (107484) | more than 5 years ago | (#23647199)

You are right - it does.

The best methodology on how to efficiently and robustly manipulate large data sets in the application tier is still up for debate. In memory databases are but one potential solution to this requirement - there are many others. The javaspaces/gigaspace grid style solution is another.

Every current approach to this issue at present has is pros and cons - that's the price for being on the bleeding edge I guess.

What cannot be ignored is that we are on the edge of a potential paradigm shift - one powered by the fact that available system memory is about to surpass the size of the average state of a complex system. How this will effect our application design is still very much up for grabs, but be assured it will.

Re:Locality is the key (4, Interesting)

ppanon (16583) | more than 5 years ago | (#23647247)

Interesting ideas, but it would seem that, once your application tier is spread over multiple servers that don't share a memory space, you are going to have significant distributed cache coherency issues. While I can understand the desire to avoid the marshalling overhead involved in database reads and updates, you're also going to have to reinvent the wheel of distributed concurrency control for each application when it's already been solved in a general way in the clustered database.

For instance, from the JCS link you provided:
  JCS is not a transactional distribution mechanism. Transactional distributed caches are not scalable. JCS is a cache not a database. The distribution mechanisms provided by JCS can scale into the tens of servers. In a well-designed service oriented architecture, JCS can be used in a high demand service with numerous nodes. This would not be possible if the distribution mechanism were transactional.

So if you're having to give up transactional integrity to have your distributed cache, I think it's going to have limited applications because it doesn't solve that 1000 transactions per thread problem you indicated. Sure you can work your way around it a little by treating it as extremely optimistic locking to maintain transactional integrity on writes, but it also does limit the accuracy of the cache and for some applications (financial for starters, I would expect) that's going to be an issue.

Re:Locality is the key (2, Interesting)

Dave500 (107484) | more than 5 years ago | (#23647373)

Extremely valid point.

Not to bash Oracle, but the ultimate scalability of their multi-host database partitioning solution (RAC) is indeed limited by the amount of communication the distributed lock manager needs to make to ensure transactional isolation as the number of partitions/hosts increase. (Caveat to Oracle fans - 80% of requirements are beneath this threshold - so I understand Oracle's strategy.) (An alternative solution is the "shared nothing" partitioning approach (example - db2's DPF) but this has its own drawbacks too.)

I don't pretend for a second to know all the answers - indeed I suspect that some of them are yet to be invented/utilized effectively by industry.

My major point is that having distributed application side data caches will soon become very tempting in terms of the latency involved with accessing data. There are admittedly great challenges involved with doing this safely, in a way which is scalable as you point out and providing a productive application interface.

It will be very interesting over the next few years as we collectively work out the best approach to these requirements. Anybody can be wrong - me of all people - but my bet is that most of these problems will be solved. How they will be is the coolest part :) .
   

Everyone knows which database is the best! (1)

csoto (220540) | more than 5 years ago | (#23646837)

PC-File (from good 'ol Jim "Button" Knopf of Buttonware). That and PC-Write (Bob Wallace, Quicksoft) are all everyone ever needed. That and 640K. Where's my box of floppies?

Moores Law vs. Micromechanic Law (1)

Crass Spektakel (4597) | more than 5 years ago | (#23647049)

This is just another case where Moores Law bypasses another Law, in this case the Micromechanical Law.

Microchips are developing faster than Micromechanic so by leaving behind harddrives we might see flashbased storage become bigger and bigger.

But even more interesting, I think we will see SATA-attached flash being a short lived exception. It is much cheaper, faster, more flexible and more direct to attach it directly into address space (doesn't matter if NAND or NOR, the memory controller will take care of that). With that done we will also see DRAM becoming some sort of cache for flash and flash replacing DRAM. And as memory and inner circuit bandwidth will increase even further there really is no reason to safe bandwidth or storage any more - who cares if your 100GB database uses 200GB because you deflate/multi-index it for faster access times? The only limit here is that 64bit address space may be a little narrow but then we just see 128bit architectures ten years earlier.

Ah yes, the future of SQL... well, I think a small SQL-to-DBM interface should do the trick until all applications map the whole db into real address space. There is little use for a overcomplicated interpreting language if you can just put your whole 200GB db into an associative array.

Solid state storage devices are more than disks. (4, Interesting)

Animats (122034) | more than 5 years ago | (#23647135)

Until recently, solid state storage devices have been treated as "disks". But they're not disks. They have orders of magnitude less latency.

For files, this doesn't matter all that much. For databases, it changes everything. Solid state devices need new access mechanisms; I/O based seek/read/write is oriented towards big blocks and long latencies. The optimal access size for solid state storage devices is much smaller, more like the size of a cache line. With smaller, lower latency accesses, you can do more of them, instead of wasting channel bandwidth reading big blocks to get some small index item. It's not RAM, though; these devices usually aren't truly random access.

It starts to make sense to put more lookup-type functions out in the disk, since getting the data into the CPU has become the bottleneck. Search functions in the disk controller went out of fashion decades ago, but it may be time to bring them back. It may make sense to put some of the lower-level database functions in the disk controller, at the level of "query with key, get back record". Cacheing at the disk controller definitely makes sense, and it will be more effective if it's for units smaller than traditional "disk blocks"

This could be the beginning of the end of the UNIX "everything is a stream of bytes" model of data storage. We may see the "database is below the file system" model, which has appeared a few times in mainframes, make it to ordinary servers.

Re:Solid state storage devices are more than disks (0)

Anonymous Coward | more than 5 years ago | (#23647403)

I wouldn't count on it. The main problem is the tension between standardized, interoperable, commodity physical devices and the ever-changing needs of the application software. The byte-stream or block-based I/O models won because they are painfully general purpose!

Instead, consider "the system is a cluster"... your local processing will occur finally when the full OS and application stack can migrate tasks onto the storage-local coprocessors and do message-passing over the network that used to be the "system bus".

Re:Solid state storage devices are more than disks (1)

gfody (514448) | more than 5 years ago | (#23647477)

SSD hasn't changed the whole seek/read/write paradigm because the current batch are tailored to be retrofit to existing HBA/RAID storage systems.
Once the IO drive [fusionio.com] is here we'll have a true random access to nonvolatile storage with latency measured in nanoseconds.

Well... (0)

Anonymous Coward | more than 5 years ago | (#23647303)

Considering SSD drives won't be mainstream for a few years on production servers for critical data storage, that a shift from relational to "something else" won't happen overnight due to inertia and business conservatism, that this "something else" system of storage hasn't been developed yet, wake me up in 10 years and I will look into it then, and I probably still will be an early adopter.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...