×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

OLTP vs Data Warehouse, Why not or just have both?

Chacham (981) writes | about 11 years ago

Programming 11

This is something that I've wondered about before. I had a small interview with some people regarding some work, and the idea came up again. So, I've been thinking about it.

This is something that I've wondered about before. I had a small interview with some people regarding some work, and the idea came up again. So, I've been thinking about it.

The issue is having two databases. One as an OLTP, and the other as a Data Warehouse. Being I worked with Oracle, and I like a good deal of the Oracle way of doing things, I am going along with Oracle terms. Basically an OLTP is an On-Line Transaction Processing. It is a database that is designed more for input than for output, and most queries are small. A good example would be an ATM machine. In such a case, the database should be normalized, for authoratative answers, and less work on input.

On the hand, a Data Warehourse is made more for output than for input, and most queries are large. A good example would be a data mining firm. In such a case the database should not be normalized, for quick answers, and less work on output.

The problem is, many companies need both. There is always information going in, and every day a number of people want some serious reports. That's when the normalization fights come in, and the line in between OLTP and Data Warehouse blurs beyond recognition.

What I've been wondering, is why not have both? The OLTP could be normalized, and be the authoritative database, and the Data Warehouse could be updated nightly (or weekly). The only loss really is the reports are woking on data from yesterday. But that shouldn't be much of a problem.

11 comments

Data sync can be ugly... (1)

Xerithane (13482) | about 11 years ago | (#5542197)

Just as an FYI, and it's not definite but I may be able to release something under the nerdfarm name that can help you with the data synchronization.

It's a perl system, basically a glorified job scheduler. It needs some more work but it does cascading job schedules with failover. One major patch that I want to do is embed the jobs in a framework that allows shared database connectivity. I'll find out hopefully this week or next about the status of releasing it GPL'd.

Re:Data sync can be ugly... (1)

Chacham (981) | about 11 years ago | (#5543407)

Sounds interesting, but why would I need it? (Enlighten me please. :-)

The way I see it, at midnight a job could fire off a procedure that just runs a bunch of queries to throw report worthy data into the Warehouse. Either INSERT INTO DW.OLTP_Customer SELECT * FROM Customer; Or even, CREATE TABLE DW.OLTP_Customer AS SELECT * FROM Customer; (So the tables don't take up any space that they don't need.) Maybe even exp/imp? The data would be loaded into temp tables, just to get the data over. Then, the Warehouse can run all the queries needed to denormalize and make the data more easily accessible.

So, on Oracle it'd be easy to connect. Just make a Database Link, and add the link name to any qualifiers. Pretty easy.

The one I just heard about, however, is on SQL Server. If I understand it, their databases are more like schemas, and thus wouldn't even need a link (unless on another machinne, in which case ODBC could be used).

Re:Data sync can be ugly... (1)

Xerithane (13482) | about 11 years ago | (#5545178)

Sounds interesting, but why would I need it? (Enlighten me please. :-)


To control the link, it's just a high-capability cron/job scheduling system.

The way I see it, at midnight a job could fire off a procedure that just runs a bunch of queries to throw report worthy data into the Warehouse.

They way I would do it is do a dump of the database from a specific time range, reducing overhead on the transaction system. I'm also seeing this on two different systems, which may not be the case I see now.

Job scheduling with failover commands is quite nice, especially when it has monitoring, for doing data syncs like this.

Re:Data sync can be ugly... (1)

Chacham (981) | about 11 years ago | (#5545465)

reducing overhead on the transaction system.

Good point. But, being there's no real UPDATEs, other than just dumping data, is that really going to be a burden? Nothing in the original DB will be changed, so there's no need for rollback images, and the warehouse would be closed to other transactions. I'd even remove all INDEXes first, so all it does is DELETE the old data.

But an export/import might very well be quicker. Now that I think about it, the real concern would be the space the dump would take. If a database becomes non-small, say 10 GIG, sending it to the other database over a network (if on different systems) would take a bit, without something like a gigabit network.

Job scheduling with failover commands is quite nice, especially when it has monitoring, for doing data syncs like this.

But why can't DBMS_JOB to that? At least in Oracle it can run a procedure that has failover and monitoring. It could easily update a table mentioning its progress, and being inside the DB, I'd assume that it would be at least a little bit faster.

I'm not trying to put you're idea down. I'm just trying to understand it. I appreciate you helping me here.

Anyway, I'm off to a second interview with the guy in charge now. This one's with SQL Server. Does your perl script work there? I don't know if they need/want this idea. But from the little bit I found out, it might just be good for them.

Re:Data sync can be ugly... (1)

Xerithane (13482) | about 11 years ago | (#5545719)

But why can't DBMS_JOB to that? At least in Oracle it can run a procedure that has failover and monitoring. It could easily update a table mentioning its progress, and being inside the DB, I'd assume that it would be at least a little bit faster.

I'm assuming a scenario of a light-weight insert style database (like berkeleyDB or something.) So you should be fine w/ Oracle.

I'm not trying to put you're idea down. I'm just trying to understand it. I appreciate you helping me here.

I know. It's not my idea, just a program I wrote because of someone elses idea. Dumping data between 2 oracle databases and a FilePro database. Boy howdy, that sure was a fun job *groan*. Glad it's over now :)

Anyway, I'm off to a second interview with the guy in charge now. This one's with SQL Server. Does your perl script work there? I don't know if they need/want this idea. But from the little bit I found out, it might just be good for them.

Should work fine, it's just a perl job scheduling system w/ XML support. It may need a few patches to work with a Windows filesystem, but Perl DBI can support SQL Server so you can have it run on a Unix box and connect with SQL Server.

Hope the interview goes well.

Re:Data sync can be ugly... (1)

Chacham (981) | about 11 years ago | (#5546653)

Thanx for the info.

Had the interview. Looks like I won't be doing anything like that. At least not right now. Although, down the road it is a possibility. I'll keep you in mind.

Re:Data sync can be ugly... (1)

Xerithane (13482) | about 11 years ago | (#5546703)

Had the interview. Looks like I won't be doing anything like that. At least not right now. Although, down the road it is a possibility. I'll keep you in mind.

Cool. I'll do a JE if I get the ok for the GPL license.

Re:Data sync can be ugly... (1)

Chacham (981) | about 11 years ago | (#5547017)

OK.

And I must say, you sure do read slashdot often. :-)

Re:Data sync can be ugly... (1)

Xerithane (13482) | about 11 years ago | (#5547116)

And I must say, you sure do read slashdot often. :-)

Today I'm doing some finishing stuff on some code, so mostly my brain is off ensuring I'm not doing anything stupid and my body is trying to keep itself from fidgeting to the point of explosion... hence, slashdot :)

Garbage in, garbage out? (1)

GoRK (10018) | about 11 years ago | (#5542583)

Too much theory; not enough code!

I understand the need for truly enterprise stuff in some applications, but what kind of system is it that needs huge reports like this on up-to-the-minute data (ie not already a day old or something?) You'd have to run a horse track in hong-kong to need something like that, and even still your data set size could probably let you get away with Oracle on big iron.

~GoRK

Re:Garbage in, garbage out? (1)

Chacham (981) | about 11 years ago | (#5543434)

Too much theory; not enough code!

That the way Databases are. Mostly theory and design, and a few things like schemas, triggers, and stored procedures to make them work together. It means the design phase is the longest, and must be pretty much perfect.

but what kind of system is it that needs huge reports like this on up-to-the-minute data

None really. However, the application doesn't pay the bills, the company does. And the suits there generally want it up-to-date, even if for insane reasons. However, if they give in on that one point, they probably would not have to give in on too many others. Because it would allow a second database to be designed to give them everything else they want, and quickly.

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...