Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

How Would You Improve SQL?

Cliff posted more than 8 years ago | from the select-*-except-column99-from-table dept.

Programming 271

theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"

cancel ×

271 comments

Sorry! There are no comments related to the filter you selected.

Indexes (2, Interesting)

diamondmagic (877411) | more than 8 years ago | (#13945573)

Right now there is no standard way of maintaining indexes. Most databases have some sort of CREATE INDEX query, but it is by no means standardized.

LOL (-1, Flamebait)

Anonymous Coward | more than 8 years ago | (#13945579)

an AOL email address!

Drop it for something relational (0, Flamebait)

leandrod (17766) | more than 8 years ago | (#13945591)

SQL has too many defects -- it isn't relational, it is badly designed. Better dump it for Tutorial D or D4.

Re:Drop it for something relational (1)

Mr2cents (323101) | more than 8 years ago | (#13945696)

In what way isn't it relational?

Re:Drop it for something relational (0)

Anonymous Coward | more than 8 years ago | (#13945947)

This guy comes out with this crap every time SQL is mentioed, constantly parroting his (apparent) hero, Fabian Pascal. Needless to say, he never has any proof of what he says, or anything remotely resembling an argument.

In fact, the only thing FP seems to be whinging about is that SQL doesn't deliberately prevent you from not creating a purely relational database.

Re:Drop it for something relational (5, Insightful)

Anonymous Coward | more than 8 years ago | (#13946081)

In what way isn't it relational?

1. SQL syntax doesn't look like relational algebra (see my long rant above). This clouds thinking and hides the simplicity of the underlying model.

2. Relations are sets. SQL allows duplicate rows, so its tables aren't sets, and therefore aren't relations. (This property alone is enough to make it "not relational" by the way).

3. Relation *attributes* (the column names) are also sets. SQL allows columns with the SAME NAME in a query result!!

4. SQL has no "table equality" operator. You'd think the first operator you'd implement for a data type, especially a fundamental data type, would be equality! Imagine a programming language with no integer equality for instance.

5. Relations require each attribute to be drawn from a single type or domain. SQL allows NULLs, which are values not drawn from the column's type. And SQL gives you very little to help you work without NULLs. To add insult to injury, the default for columns is NULLable.

6. (related) Relational algebra requires boolean logic. SQL uses three-valued logic because of NULLs. And it uses it *inconsistently*.

7. The relational model does not specify a type system, it just requires one. Yet SQL specifies it's own particular type system (integers, chars, etc). What if you want to store XML or audio in one of your columns?

8. The relational model specifies nothing about physical implementation. Yet, almost every SQL product stores the columns of tables "together" in such a way that makes joins needlessly expensive.

9. SQL distinguishes between "base tables" and "views". The relational model requires them to be indistinguishable to the end user. Specifically, most SQL implementations don't let you update views! Pretty unbelievable. Imagine a programming language that didn't let you pass arguments to any function for instance.

10. SQL lets you do meaningless things like multiply the primary key values of two tables or add a weight to a height. This is related to the type system issues.

11. SQL confuses KEYs (logical) with INDEXes (physical implementation).

12. (This one gets me all the time) SQL has an EXISTS operator (is this statement true for at least one value of this result?) but not a FOREACH operator (is this statement true for all values in this result?) .. I think this is related to lack of table equality.

13. SQL implementations don't have ANY brains whatsoever. They don't know that book_id from column A and book_id from column B are equal in a join, and that you don't need both of them in the query result. They don't "look inside" your CHECKs and foreign keys to deduce information about your database and use that information to optimize queries.

I'm sure if you picked up a basic theory book you'd find plenty of other nitpicks for the syntax, the semantics, and the basic underlying model of SQL.

And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.

If you are interested in learning more, get Date's O'Reilly book "Database in Depth". It's very short, roughly 200 pages, and tells you all you need to know about data management theory.

Bullshit, at least partial (1)

rpresser (610529) | more than 8 years ago | (#13946743)

And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.

This is what I thought of before even reading your second sentence:

SELECT Customer.* FROM Customer C
  WHERE (SELECT COUNT(DISTINCT O.ProductId)
                  FROM Orders O
                  INNER JOIN Products P1 ON O.ProductID = P.ProductID
                  WHERE O.CustomerID=C.CustomerID
                      AND P1.ProductLine = 'X')
          = (SELECT COUNT(*) FROM Products P2 WHERE P2.ProductLine= 'X') )

And I have never studied relational theory formally; only SQL (and, I admit, Joe Celko's posts).

Re:Drop it for something relational (1)

leandrod (17766) | more than 8 years ago | (#13946893)

In what way isn't it relational?

SQL tables aren't relations, SQL data types are way too limited, NULLs are broken, classes are equated to tables, there are pointers... the ISO SQL standards themselves have long given up on even using relational terms.

Re:Drop it for something relational (1)

i is sqrt neg1 (857446) | more than 8 years ago | (#13945770)

SELECT * FROM SlashDot_Users WHERE UserName = 'leandrod' AND Clue > 0; 0 results returned.

Re:Drop it for something relational (0)

Johnno74 (252399) | more than 8 years ago | (#13946026)

SQL has too many defects -- it isn't relational

Umm, WHAT?

Back AWAY from the crack pipe, buddy.

Your comment makes no sense. How can a language be relational? SQL a functional language designed to query a relational structure.

Re:Drop it for something relational (2, Informative)

leandrod (17766) | more than 8 years ago | (#13946861)

Your comment makes no sense.

Not for those who haven't learnt data fundamentals.

How can a language be relational?

By defining and manipulating relations.

SQL a functional language

It is not. Lisp is, Scheme, Haskell, not SQL. Never was, nor intended to be.

designed to query a relational structure.

SQL tables are not relations. The very words relation and relational have been dropped from the ISO SQL standards since 1999 at least.

Re:Drop it for something relational (1)

richg74 (650636) | more than 8 years ago | (#13946102)

SQL has too many defects -- it isn't relational, it is badly designed

I agree. One of the great (potential) advantages of relational data bases (unlike previous horrors like IMS -- shudder) is that they had a theoretical basis in relational algebra. SQL loses some of that advantage.

Chris Date has written a lot about the deficiencies of SQL (e.g., how joins work). Check out his book Database in Depth, published by O'Reilly.

Hierarchical queries (4, Interesting)

Bogtha (906264) | more than 8 years ago | (#13945641)

Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch [moonbone.ru] available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.

No poetry (3, Funny)

Fred_A (10934) | more than 8 years ago | (#13945644)

You can't write poetry in SQL. So it remains an inferior language compared to Perl.

(yes, well, I ran out of ideas)

Check out LINQ... (3, Informative)

0kComputer (872064) | more than 8 years ago | (#13945647)

If you want to get an idea of some cool SQL improvements, check out the
http://msdn.microsoft.com/netframework/future/linq / [microsoft.com]
LINQ (Language Integrated Query) project for c# 3.0. Some cool stuff tht i never really thought about.
 
    For example, their select statements go backwords ie from table, select column1, n2, n3 etc... Seems kinda wacky at first, but it makes sense since you really should know what table your'e selecting from before you specify the columns.

ex.

public void Linq3() {
        List products = GetProductList();

        var expensiveInStockProducts =
                from p in products
                where p.UnitsInStock > 0 && p.UnitPrice > 3.00M
                select p;

        Console.WriteLine("In-stock products that cost more than 3.00:");
        foreach (var product in expensiveInStockProducts) {
                Console.WriteLine("{0} is in stock and costs more than 3.00.", product.ProductName);
        }
}

Re:Check out LINQ... (1)

Rayban (13436) | more than 8 years ago | (#13946071)

The only thing worse than SQL itself is SQL embedded within another language.

*shudder*

Re:Check out LINQ... (1)

T-Ranger (10520) | more than 8 years ago | (#13946137)

Except that its the columns that are relevent, and what your interested in, where as the tables, joins, etc are an implementation detail. I could start talking like Larry Wall and use nifty terms like "end weight", sufficent to say that with a syntax for fetching data, the data is the important part, and the important part should come first.

Re:Check out LINQ... (1)

Usquebaugh (230216) | more than 8 years ago | (#13946719)

Back in the day Burroughs had a 4gl called LINC, 'Language Interface Network Compiler'. Most devs substituted 'Laugh I Nearly Cried'.

Prehaps MS meant 'Laugh I Never Queried' or 'Laugh I Nearly Quit'

Better NULL handling? (4, Insightful)

joto (134244) | more than 8 years ago | (#13945653)

For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names.

If this is your main problem with SQL, then you have other problems as well. Who in their right mind needs a table with 100 columns? If you have 100 columns, you seriously need to normalize your database.

Ok, I might not be a database buff. Actually, my experience with SQL is purely academical (although I've worked with object-oriented databases). But if I were to improve SQL, my attempts would be in the direction of making it into a more pure mapping of a relational database, not in adding yet more syntactic sugar.

Re:Better NULL handling? (1)

hawkbug (94280) | more than 8 years ago | (#13945797)

"If this is your main problem with SQL, then you have other problems as well. Who in their right mind needs a table with 100 columns? If you have 100 columns, you seriously need to normalize your database."

You're kidding, right? I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized. Say you had a questionaire on a web form. Let's say it had 100 questions. Let's say you had a checkbox next to each. When the user checked the box, the "bit" field you set up for question number 1 gets set to 1. Same applies to 2, it gets set to 1 from 0. You could have 100 columns, one for each question - one row for each user. I see nothing wrong with that. Sure, there are different ways to do it, but there is nothing wrong with doing it that way. Also, think of a tax form - imagine all the questions for each form. Then imagine having a table for each form. Just because it's a large number doesn't mean it's not normalized - some things are just large to begin with, and you don't want to fracture them too much.

Re:Better NULL handling? (2, Interesting)

Otter (3800) | more than 8 years ago | (#13945896)

I don't work with databases, just with enormous flat text and SAS files spit out by them, but -- isn't the proper way to do that with one table with user, question# (1-100), and response (0,1) and another with question# and question?

Your method may be technically correct (I could never get those normal things straight) but as the GP points out, it's an unwieldy, inflexible way to do it.

Re:Better NULL handling? (2, Interesting)

hawkbug (94280) | more than 8 years ago | (#13945939)

You could easily do it the way you propose - the only thing is, you now have 100 rows where before you had 1 - which I'm not saying is the wrong way... but either way you're going to have 100 columns or 100 rows. Performance wise, I'm not sure which is better or if it matters.

Re:Better NULL handling? (2, Interesting)

boxxa (925862) | more than 8 years ago | (#13946201)

I think its better performance wise to have more rows than columns. I develop a lot of PHP/MySQL applications, comerically and OSS and I find that using multiple small tables with more rows seems to load and run smoother than tables with a large number of columns. Maybe something with the indexing but I have no actual proof of what works better. On the original topic, I believe that MySQL has its downfalls compared to other database systems but for me personally, I find that I can work around the problems with it to build highly deployable PHP/MySQL web applications very cheaply. Hey, when its free, I can deal with it.

Re:Better NULL handling? (1)

Otter (3800) | more than 8 years ago | (#13946202)

As someone else has already explained, your approach is using the database as a giant Excel table, not as a database. You could do exactly the same thing with a text file and get your performance advantage that way.

Re:Better NULL handling? (1)

diamondmagic (877411) | more than 8 years ago | (#13945938)

That is what the array column type is for.

Re:Better NULL handling? (5, Interesting)

Johnno74 (252399) | more than 8 years ago | (#13945952)

I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized

I've got mod points but I just have to give them up to reply to this....

You're wrong. I agree totally with the grandparent post.
You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.

And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.

A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.

Re:Better NULL handling? (2, Insightful)

hawkbug (94280) | more than 8 years ago | (#13946020)

"You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other."

I completely disagree. Breaking up the questionaire into seperate tables does nothing. I want every question every time, it's all one HTML page with checkboxes and links to the questions. If they were seperate tables, I'd just have a lot of joins every time the page loads, which isn't going to help performance. In my example, there are no rarely used rows - all rows are used equally.

"And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space."

No, I never said one row per question - I said one row per person - big difference. Also, the default values for the bit columns would be 0, since the questions are not answered. When they get marked as answered, they are changed to 1. No wasted space and every column is relevant in my example. EVERY time I want EVERY column, no exceptions. Yes, if you added a question you'd add a column. And give it a default value of 0, and then run this simple query "Update Questionaire SET NewQuestion = 0" and then you have no nulls, no wasted space.

Re:Better NULL handling? (4, Insightful)

rho (6063) | more than 8 years ago | (#13946142)

If they were seperate tables, I'd just have a lot of joins every time the page loads, which isn't going to help performance.

That is not a SQL or database issue, it's an issue with your scripting language and Web server. This would be cached.

You certainly can have keep the table at 100 rows. But at that point, you might ask yourself why you're using a database at all. A flatfile will probably have less overhead, even with file locking issues. Especially considering the simplistic questionaire you're using as an example--a long line of 0s and 1s would do you.

Re:Better NULL handling? (2, Interesting)

Naikrovek (667) | more than 8 years ago | (#13946513)

if you put 0's in unanswered fields, then you'll have an awful lot of zeros, and since you're using bits, that's a lot of 'no' answers, which isn't what's really there.

any DBA worth his weight in salt knows that you ever see a single table with 100 columns then you have a major design issue.

splitting that up into several tables in the same database will offer a significant performance increase.

Re:Better NULL handling? (1)

bshensky (110723) | more than 8 years ago | (#13946104)

Oh yes you do....the real world says so. Sad, but true.

I have worked with large, multi-column data loads from third-party sources, loading these into a single "staging" table that matches the third-party source, /then/ break the data into a set of smaller 1:1 tables in the local app, where we have the most control.

In those instances, there really is no easier way than to create a massive multi-columnar table that matches the source. You then use the native data load utility (like Oracle's SQL*Loader) to get the file into a table. A "staging" table.

I agree with you in principle, but not in practice. Where I am, we've got GM, Ford and Chrysler shoving data-load specs at us that would make a grown man weep - that whole "spec-for-the-least-common-denominator" thing. Solutions for shtuff like this ain't pretty, but ugly problems sometimes call for ugly solutions.

Re:Better NULL handling? (4, Insightful)

Johnno74 (252399) | more than 8 years ago | (#13947182)

Yeah, I know what you mean, I work in the real world too. A lot of the time we have to work with some awful database structures, and we just have to grin and bear it.

Doesn't make it right tho... If you are given a clean sheet to design a system as the grandparent post suggested (questionare) and you put the results in one table with a row for each question, then you deserve to work with fucked-up systems for the rest of your life.

Re:Better NULL handling? (1)

Dixie_Flatline (5077) | more than 8 years ago | (#13946444)

That it's possible to work around the problem doesn't mean that the problem is invalid.

Yes, you can come up with a way that you can organize data and tables so that this problem is easier to solve, but there's certainly a way that the data and the language can be made so that you don't have to fiddle with that sort of crap.

And that said, someone may have an instance where, for whatever reason, the data absolutely should be laid out in a table that large. They shouldn't be punished for using the system in that manner if the system itself doesn't restrict them from doing something like that.

Languages and databases are our TOOLS. If we have to work around their deficiencies that much, they're pretty lousy tools.

Re:Better NULL handling? (1)

HawkingMattress (588824) | more than 8 years ago | (#13946730)

A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
An answer has to point to a specific question and makes no sense with just an answerid and a userid, so you should have the questionid in the answers table, it's a nonsense to asssociate them in another table in this instance. It takes more space, and requires an useless join. Unless the questions are just choices the user can choose from, like in a poll.

Re:Better NULL handling? (1)

HawkingMattress (588824) | more than 8 years ago | (#13946759)

Hum after reading the GP post again he does indeed seem to be referring to static yes/no responses, sorry about that :)

Re:Better NULL handling? (1)

neuroxmurf (314717) | more than 8 years ago | (#13946883)

Without regard to whether or not the grandparent's example is a good one, I have here a database table with over 150 columns (about 200 or so). It's a database of every census tract in the US (one per row) with all of the data about that tract that the census collected, including population in many different categories, area, location, and so forth. This is a perfectly reasonable case of a 100+-column database.

To claim that no such cases exist only proves that you only play with toy databases, instead of working with real ones.

Re:Better NULL handling? (1)

Johnno74 (252399) | more than 8 years ago | (#13947228)

To claim that no such cases exist only proves that you only play with toy databases, instead of working with real ones.


Don't be so pompus. Recently I worked with a database containing all the census results for New Zealand. Ok, its not as large as the US, but its still not trivial.

The DBA who built & loaded the database from the raw CSV data we got from the department of stats really knew his shit. The database was structured with a table containing the questions, and a table containing the unit measures (each data point data was given for several ranges - for neighbourhoods, for suburbs/towns, for cities and also nationally.

The census results data was in a table that had a question ID, a measure ID, a year, and the figure.
I forget how big the db was in the end, about 1.5gb or something. The strucure made it really easy to write the queries that extracted data for several detail levels (neighbourhood, suburb, city, national) which was how the reporting app worked.

Re:Better NULL handling? (1)

period3 (94751) | more than 8 years ago | (#13947104)

An arbitrary restriction on the number of 'columns', or a suggestion "you don't need 100 columns" doesn't make sense. If I have 99 'columns' that are functionally dependent on another 'column', then I need 100 'columns'. Anything else would be unnormalized.

Re:Better NULL handling? (1, Interesting)

Anonymous Coward | more than 8 years ago | (#13946019)

I'm sorry to bitch but that's so so wrong. What happens when your questionaire has a question added?

Something like:
ALTER TABLE Quiz ADD COLUMN Answer101 BIT NULL

So now you have to deal with all the previouslly submitted quizzes only having the first 100. Same for how to do you delete quizzes? Add another quiz? Now you've got to deal with stuff like [Quiz12].[Question50].

How about this, sorry about errors, dinner's on the stove:

CREATE TABLE Quiz
{
ID INT IDENTITY(1, 1)
-- add any other attributes
)

CREATE TABLE Question
(
ID INT IDENTITY(1, 1),
Text VARCHAR(1000)
)

CREATE TABLE QuizQuestion
(
QuizID INT, --add FK
QuestionID INT, --add FK
QuestionNumber INT
)

CREATE TABLE QuizResponse
(
ID INT IDENTITY(1, 1),
QuizID INT, --add FK here
Submitted VARCHAR(1000)
-- add any other attributes
)

CREATE TABLE Answer
(
QuizResponseID INT, --add FK whatever here
QuestionID INT, --add FK whatever here
Answer VARCHAR(1000) --or BIT, INT, whatever
)

Now you can have as many quizzes, questions, or answers as you need. Yes, it's a little more complicated and odds are I missed a major component of what's needed, but you should see the flexibility that this solution would offer over:

CREATE TABLE Quiz1
{
Answer1 BIT,
Answer2 BIT,
Answer3 BIT, .....
AnswerXX BIT
}

Sorry to ramble, but I've just spent a great deal of time fixing issues exactly like what you describe.

In conclusion, no. Bad. Please don't do any database work.

Re:Better NULL handling? (1)

hawkbug (94280) | more than 8 years ago | (#13946241)

Ok, people are completely misunderstanding what I was saying - I'm not talking about answers or responses on a quiz. I'm talking about a list of the questions, and whether or not they are completed on a main page. Simple as that. Your way is definitely elegant, but my way is quick and to the point - and my example, the questions don't change - and if it was a one time thing, they would never change. I can't stress enough, the quiz itself is not in the table I gave as an example, JUST the user, and the questions numbers - each question number can have a 1 or a 0 as a value. If I were building the actual quiz table, I would do it similar to how you suggested.

Re:Better NULL handling? (2, Informative)

schon (31600) | more than 8 years ago | (#13946977)

people are completely misunderstanding what I was saying

And you're completely misunderstanding what they're saying.

I'm talking about a list of the questions, and whether or not they are completed on a main page. Simple as that.

If you're just displaying a list of questions that's not gonna change, why are you using SQL at all? Why are you not simply using a flat text file?

my way is quick and to the point

And wrong.

Re:Better NULL handling? (3, Insightful)

HawkingMattress (588824) | more than 8 years ago | (#13946926)

The schema you're talking about is wrong, wrong, wrong. You've made a schema based on how you want to display the information, not based on what this information is carrying.
How can you make a request to find which user responded to more than 3 questions, for example ? you'd have to test each column individually, and then you'd have to modify the request if you add or remove some columns. Furthermore, you have to change the schema itself each time you add or remove a question.
And finally, you don't use the relationnal part of the database, at all. Your schema supposes that there is a question table, with a questionid. And in your anwsers table, you have columns named after the ids, right ? Now how can the database check if you delete a question that there are no answers pointing to it ? it can't, because you're using the database as if it was a flat file. So the database can't check data integrity for you, because your schema isn't normalized at all. as Johnno74 says, you need at least a question table, a response table, and an user/question/response association table. If you don't do that, the database has zero advantages over a flat file.

Re:Better NULL handling? (1)

Nuttles1 (578165) | more than 8 years ago | (#13946744)

"Who in their right mind needs a table with 100 columns"

I will tell you one case. In BILLING! As a business rule we have to have a record of exactly what was billed and this can be handled in one of two ways or both. The first way is to write a record that has everything you could possibly need to bill (LOTS OF FIELDS). Lots of redundent data, yes, but the data is easy to get to when needed. The other way is some sort of auditing. We do both. Auditing may seem like a cleaner way to implement this business rule in the database but if you really need to know beyond a shodow of a doubt what a specific piece of billing information was at the time of billing it is easier to go to the exact record used to bill with. Of course IDs are stored in this record to link to lets say the current customers record that allow access to current data if needed.

my experience with SQL is purely academical

To this I would first like to say, I loved being in school and I love book learning. BUT, I would like to add that things are not always what they seem in school. First of all Third Normal form is not the end all, sometimes you need to denormalize and that is ok. An instance where 3rd normal form is could possible not be the best solution is with speed, joining a 'large' number of tables in school may be cool, but it may just take too much time when you have 100s or 1000s of users pounding away at a database with tables of 100s of thousands or even millions of records.

Re:Better NULL handling? (1)

tzanger (1575) | more than 8 years ago | (#13947186)

I will tell you one case. In BILLING! As a business rule we have to have a record of exactly what was billed and this can be handled in one of two ways or both. The first way is to write a record that has everything you could possibly need to bill (LOTS OF FIELDS). Lots of redundent data, yes, but the data is easy to get to when needed.

What's wrong with something like Order Number, Quantity, Part Number, Description, Price, Discount? That's 6 columns, 2 which would be copies from your global part list (Price and Description) but the copies are absolutely necessary since price and descriptions may change as the master list's updated. Your order table becomes huge in terms of rows, but that's not bad, and that's what things like views can help with. All your data's there, you have another order info table wiht order number, customer info, etc. and another table with shipping info, and another with billing info...

Your post sounds like you're doing this for a living, so I am certain I'm missing something crucial here. I have, however, done this too for a large manufacturing company. Please help enlighten me as to why a billing table would have hundreds of columns.

ugh... (5, Insightful)

Anonymous Coward | more than 8 years ago | (#13945657)

Did you know there have been people working on a general algebra for data management for, what 40 years now? Did you know, this is basically a SOLVED PROBLEM? Ever heard of "D"? Or Tutorial D? The Third Manifesto?

You know, I totally understand why Fabian Pascal [dbdebunk.com] is always pissed off.

Here's something for you to chew on:

Why do programmers write this:

A + 3

when they want to add 3 to A? Why do we not write some lovely crap like:

OPERATE ON A WITH 3 USING ADDITION

why do we write:

(A + 3) * 2

and not:

OPERATE ON (OPERATE ON A WITH 3 USING ADDITION) WITH 2 USING MULTIPLICATION

Why do we do that?? Because algebraic notation is 1) declarative .. it tells the computer what you want, it doesn't tell it how to do it and in what order, and 2) algebraic notation is *completely general*. You can nest arbitrarily with parentheses, and you can clearly see what's a variable and what's a value and what's an operator. Easy to create, understand, and *optimize*.

Do you agree with me that the verbose syntax clouds your thinking? Keeps you from seeing the underlying operations? Makes it difficult to apply the basic algebraic skills you learned in high school? Makes it difficult for the compiler writer to do the same?

Now I ask you, why do we write:

SELECT * FROM Order

and not

Order

Why do we write:

SELECT * FROM Order JOIN OrderItem WHERE Order.order_id = OrderItem.order_id

and not

Order JOIN OrderItem

And here you are, dwelling on some little detail about projecting columns.. this is an easy one: use an "ALL BUT" operator for example:

RelvarWith100Attributes ALL BUT (Attribute100)

Once you see that relational algebra is just values, variables, and operators nested in any arbitrary way, just like arithmetic, you have opened the door a little more to understanding the fundamental theory of data management and how backwards and primitive "modern" data management is.

And let's not even get into all the crap that SQL gives us like duplicate rows, NULLs, brain-dead table-oriented storage, lack of 100% updateable views, lack of arbitrary constraints, (often) lack of composite types (why the hell do we splat objects into MULTIPLE COLUMNS?? They should be stored in ONE column). SQL also confuses logical and physical layers (keys vs. indexes), and has basically kept the database industry in the dark ages for decades now.

So the answer to your question is pretty simple: I would ditch SQL and use something that looks like relational algebra, which has been understood and documented for a probably longer than you've been *alive*. No offense.

Re:ugh... (1)

B'Trey (111263) | more than 8 years ago | (#13946057)

I don't disagree with a thing you say. However, I have to ask you - what's more likely? That the entire world ditches SQL and moves to a better system? Or that a new, updated version of SQL includes fixes for some of its most heinous sins? I think the answer clearly is the latter, which leads back to the question - what should we do to fix SQL?

Re:ugh... (0)

Anonymous Coward | more than 8 years ago | (#13946285)

Do you still use FORTRAN?

How do you explain XQuery? That new Microsoft LINQ crap?

Look, we work in an industry that changes buzzwords every few months. Do you *really* think it's impossible to change SQL to something else? Or more precisely, to introduce products that don't depend entirely on SQL?

How about all these people that write Object-SQL mappers and claim that the benefit is to let you "change databases" without affecting the app? (Of course, truly relational database wouldn't NEED any complex mapping layer). Won't they be able to switch?

All it takes is one smart guy to create a relational DB and hype it up, and people will flock to it. Look at Ruby on Rails. Any real innovation there? No, just the best practices we've been preaching for years, wrapped up with a ribbon of hype. Now somebody just needs to create a relational DB that uses Ruby syntax and Ruby types and Rails fans will think "wow, how INNOVATIVE". How "POST RELATIONAL"!! Of course next year, they'll flock to something else, but at least the product will *exist*.

To answer the specific question, you find where SQL is not relational, and you fix it! For instance, don't allow duplicates in tables. Store columns in such a way that multi-table joins are just as cheap as single-table queries. Get rid of NULLs and implement missing data using views on underlying null-free tables. Let people create composite types. Create adapters that let people write code and queries in their favorite programming language. Make natural joins the default for joins. Add some brains to SQL implementations so they can derive and deduce properties of your data and make views updateable. Make the syntax simpler (make some of it optional). Add some of the missing relational operators. It's ain't rocket science (but it *is* science).

Re:ugh... (1)

Clover_Kicker (20761) | more than 8 years ago | (#13946583)

> Look, we work in an industry that changes buzzwords every few
> months. Do you *really* think it's impossible to change SQL to
> something else?

Looks that way. How long has Codd been prophesying from the wilderness?

Has anyone ever implemented one of these relational calculus languages?

Re:ugh... (1)

AuMatar (183847) | more than 8 years ago | (#13946732)

I know a lot of people who still do use Fortran. I know a lot more who still use C (30 years old) and C++ (20 years old). In fact, the last 2 are the majority of programmers. We work on either Unix (30 years old) or Windows (15 years old).

The thing with the buzzword of the month is- after its month is up, it dies. The stuff that works sticks around for a long, long time.

Re:ugh... (4, Insightful)

rho (6063) | more than 8 years ago | (#13946365)

SELECT * FROM Order JOIN OrderItem WHERE Order.order_id = OrderItem.order_id and not Order JOIN OrderItem

Why do you assume Order and OrderItem will be joined on order_id? They don't have to, you know. So you have a "Order JOIN OrderItem ON order_id" format. Except, why do you assume both columns will have the same name? So you now have a "Order JOIN OrderItem ON order_id AND order_id_submitted" format.

And before long, you've got an equally baroque language for describing a query as SQL is now. SQL's point is to have something reasonably human-readable. It's amazingly flexible, and easy for beginners to pick up with simple queries. Your New and Improved query language had better be leaps and bounds ahead, not just simpler to type.

(Especially since probably 90% of queries are written only once, and stored for future use in a script or as a stored procedure. Building a query can be arduous for complex data, but it doesn't have to be typed every time you use it.)

Re:ugh... (1)

oyenstikker (536040) | more than 8 years ago | (#13946443)

I found SQL very hard as a beginner. I learned "connect to mydatabase user oyenstikker using lousypassword". Fine. I sounds like English. So I remembered the thoughts and not the syntax. The 2nd, 3rd, 4th, 7th, 12th, and 18th time, I typed "open mydatabase user oyenstikker password lousypassword", "connect mydatabase as oyenstikker with lousypassword", or something like that. If the language is an abstraction, it is easier to remember the syntax.

Re:ugh... (1)

rho (6063) | more than 8 years ago | (#13946709)

That's some strange SQL you've got there.

Re:ugh... (1)

benwb (96829) | more than 8 years ago | (#13946487)

The relation between tables should be encoded in the database metadata via foreign key relationships, so in the simple and most common case you do not need to specify column names for joins.

Re:ugh... (1)

rho (6063) | more than 8 years ago | (#13946687)

Agreed, but you must still take into account queries that have little or nothing to do with the foreign keys. SQL is flexible in that way--SQL's replacement must be flexible in the same way, and at that point, what benefit is SQL's replacement bringing?

Winning the special olumpics and debating an AC... (4, Insightful)

RingDev (879105) | more than 8 years ago | (#13946509)

Lets look at something a little more realist:

SELECT
  Lease.LeaseNum,
  Lease.LesseeNum,
  Invoice.InvoiceNum,
  Invoice.AmountBilled
FROM
  Lease INNER JOIN
  Invoice ON
    Lease.LeaseNum = Invoice.InvoiceNum
WHERE
  Lease.LeaseNum = "1234"
ORDER BY
  LeaseNum, InvoiceNum

Okay, that's pretty big to get some basic lease and invoice info. Now how you you write that?

Lease.LeaseNum,
Lease.LesseeNum,
Invoic e.InvoiceNum,
Invoice.AmountBilled
Lease JOIN
Invoice ON
  Lease.LeaseNum AND Invoice.LeaseNum
Lease.LeaseNum = "1234"
Lease.LeaseNum
Invoice.InvoiceNum

??? All that's been accomplished is the removal of key words. I'm not seeing any benefit, and I'm seeing the pitfall of it being hard as hell to read.

-Rick

Re:Winning the special olumpics and debating an AC (3, Interesting)

Anonymous Coward | more than 8 years ago | (#13946608)

You're missing the point. You really have to study the theory, and you'd get something like this (the exact syntax is unimportant of course):

((Lease JOIN Invoice) WHERE LeaseNum = "1234")
[LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
ORDER BY whatever

Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.

Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.

Why do I leave off the Theta from the join (the equality test)? Because I *already* set the foreign keys on those tables. The DBMS should be able to *deduce* which columns to join on and generate an error if it can't.

Here's what's happening:

I used the JOIN operator on two base relation values, stored in variables with the names Lease and Invoice. I got a third (anonymous) relation with all of the combined columns.

I then applied the RESTRICT ("where") operator on that relation, along with a boolean expression, and I got a fourth relation with just the rows where that expression was true.

I then PROJECTED that relation to get a fifth relation with just the desired columns.

Note that ORDER BY is a non-relational operator. It turns the relational result (unordered set) into an an ordered array. So for a final step, the relation was 1) turned into a regular array and 2) ordered.

Of course I didn't actually perform those steps. I told the DBMS what I wanted, and *it* did the work.

Do you see how the SQL conceals the underlying algebra? And how it makes YOU do the work (in the join for instance)? You might not see it. Study the theory more, and you will. Compare with the A+3 example. Imagine a 12-table join. Imagine having to do a query like "where all rows of table X are a subset of all rows in table Y" right in the middle. An algebraic notation would make this MUCH easier. Just break it down, and apply the next operator to the result of the last. You do it all the time when programming your favorite language, why not in SQL?

And yes, the differences get MUCH deeper than this. SQL can't even represent all possible relational queries!

dot-qualified column names are unique (1)

Julian Morrison (5575) | more than 8 years ago | (#13946902)

Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.

Result, C naming syndrome.

LeaseNumber and Lease.Number are functionally equivalent, they're both unique, however the latter lends itself to convenient simplification in contexts where it wouldn't create ambiguity. Otherwise you end up repeating yourself a lot. LongTableNamePrefixFoo, LongTableNamePrefixBar, LongTableNamePrefixBaz etc.

Re:dot-qualified column names are unique (1, Interesting)

Anonymous Coward | more than 8 years ago | (#13946985)

1) Relation values can be anonymous, just like "3" is anonymous when it isn't assigned to a variable. What's the table name in that case? In other words, what does this mean:

SELECT 1 AS A, 2 AS A, 3 AS A

The fact that SQL allows this is a flaw. You have to constantly think about the names of your columns, even when they are hidden under "SELECT *".

2) Your point would be fine if SQL consistently used qualified table names, but it doesn't. It doesn't *return* them in a qualified state for example.

3) What qualifier should be used in a derived table (a "view").. the name of the view or the original table? If it uses the original table, that violates the requirement that views be indistinguishable from tables. If it uses the view, then the "full name" of the column has changed from what you specified in the query, and you still can't put two columns with the same "unqualified" name in the same view.

4) What name should be used in a self-join?

5) Relational theory just doesn't allow multiple attributes with the same name in a single relation, because there's no way to to tell them apart (relation headers are UNORDERED SETS, not arrays).

The syntax of what SQL does isn't as important, it's the broken inconsistent semantics.

Re:ugh... (2, Funny)

lscoughlin (71054) | more than 8 years ago | (#13946773)

That's cute.

Now start inserting and updating.

maybe joining.

yeah -- sql is stupid, sure. But you're proposal here really isn't any brighter.

Re:ugh... (1)

tzanger (1575) | more than 8 years ago | (#13947202)

Well, for starters, nobody in their right mind uses SELECT * for any kind of production environment. :-)

sort by... (1)

Atzanteol (99067) | more than 8 years ago | (#13945660)

I'd *love* it if "group by *" (or some such syntax) would just group by all the items in the select in order... It's so bloody tedious to have to re-write each item being selected for the group clause.

Re:sort by... (1)

usrusr (654450) | more than 8 years ago | (#13946162)

you are asking for duplicate elimination as in relational algebra.

easily achieved in SQL by using SELECT DISTINCT instead of SELECT.

i guess what you really mean is "group by everything which is not explicitly in some form of aggregation in the select statement" which is a more special case, ironically one that fits exactly into the "* except x" pattern of the original poster.

Re:sort by... (1)

Atzanteol (99067) | more than 8 years ago | (#13947272)

Only, a "select distinct" won't give me the "count(*)" (which I want).

select fielda, fieldb, fieldc, fieldd, count(*) from foo where status=1 group by fielda, fieldb, fieldc, field;

Just a personal gripe I know, but it seems redundant to tell it what to group by when 99% of the time it's the same order as the fields are selected by...

Um.... (1)

djSpinMonkey (816614) | more than 8 years ago | (#13945674)

Wouldn't it not make sense to....

I think a solid grasp of English syntax would be helpful before moving on to SQL.

To respond to his example, though, I doubt there are many situations where excluding a handful of columns would be particularly useful. Any query including that many fields is almost certainly taking place within a program of some sort, which should be able to ignore extra columns pretty trivially. I'm sure you could come up with some tortuously Goldberg-esque situations where such syntax might be nominally handy (I've thought of a couple myself), but I suspect on the whole it wouldn't be worth the extra complication to the parser.

Of course, that's not to say SQL is flawless. I just don't have any brilliant ideas off the top of my head to improve it that haven't already been implemented.

I would start improvement... (-1, Troll)

SB5 (165464) | more than 8 years ago | (#13945686)

I would start improvement by using PHP.

Standard stored procedure/trigger language (1)

Kunta Kinte (323399) | more than 8 years ago | (#13945783)

Using stored procedures or triggers in a database is a sure way to tie yourself to a particular database platform. For appications that have to be cross-platform, I usually try very hard to avoid triggers and stored procedures.

It would be great if there was a common language for store procedures. The RDBMS may support many languages, but that scripting language would be available on most platforms since it was part of the SQL standard.

Another feature missing is optimization hints. Oracle uses special SQL comments to hint to the database how a query should be optimized. Other databases have their own syntax and methods. There should be a way to set those hints in standard SQL.

Be practical (1)

Commander Spock (796626) | more than 8 years ago | (#13945849)

If there are 100 columns, but you only want to select 99 of them, just select * and ignore the unwanted column. It'll be faster anyway.

Re:Be practical (4, Informative)

neura (675378) | more than 8 years ago | (#13946290)

Actually, it'd be faster if you listed out every column name. If you're talking about faster to write out the code for, you're obviously not writing a query for a program that's intended to be used much. There's absolutely no reason you should be deploying code containing a query that does "select *" or anything like it. You're making the database do the work of looking up the list of columns names every time that query runs. There are much more useful things to spend your caching space on (if you have any).

If you really can't stand to write queries containing the actual column names, you should be using some type of abstraction layer in whatever language you're writing your code in.

If you're not writing code and just making queries by hand to test the results, then you're even further off your rocker. (this also applies in general to the statement you made) Why would you ever NOT want to select that last value out of 100? is it going to keep your output from wrapping? (lol)

Also, those of you saying that you should never have 100 columns in your table, you're certifiable lunatics as well. If you have 100 columns that are used in every record and have very little or no duplication per row, there is no reason you should break this up into multiple tables!!! Then the database has to do joins, which again require more processing power and disk usage. It's also hard to maintain multiple tables when you really have one table after you normalize it.

For those of you that say this isn't normalized... I'm not even really sure how to answer that.... If you have several tables all with a strict 1:1 relationship, they should be in ONE table. Anything else is considered denormalized, not yet normalized. (aside from being just plain BAD!)

For those of you that say you'd never need that many columns in one table or split across multiple tables, however you'd like to think the world should work. I have an example of just that. My wife does genetic research, primarily statistical analysis of sequence data (in various forms, but that's the easiest way to sum it up). We've had discussions on this particular topic, where she had been told by someone else that she would get better performance in Oracle if she split her one table into several tables containing a smaller number of columns, each.

This is just simply not true. It also is a perfect example of a situation where you would actually need a large number of columns. There were specific bits of data that needed to be looked up quickly (like, 45'ish). You can't store it all in one column (or even just a few) and use regexes to find the bits you're looking for. You also don't want to be doing a lot of joins unless you really need to, you know.. when you actually have data that would fit into some form of normalization. Technically, you CAN do this stuff, but not if you want decent performance. If you didn't want decent performance, you could just leave the data in a text file and shell out a grep command. *sigh*

Anyway, enough ranting, but seriously people... Get a clue. Get some experience with these issues. Don't just pipe up because "hey, I've worked with databases and while I probably don't understand them very well, I don't know anybody else that understands them at all, so I'm kind of an expert!"

Replace it by Lisp (2, Interesting)

RAMMS+EIN (578166) | more than 8 years ago | (#13945908)

I'd replace it by a special-purpose Lisp, and compose it like s-expressions. Mix and match query elements in a flexible manner, yet never run the risk of injections, because it all happens in a structured way. I've done things like this on a small scale (contact information database), and it works really nicely.

Re:Replace it by Lisp (1)

davez0r (717539) | more than 8 years ago | (#13946664)

i seem to remember reading [paulgraham.com] about how the guys that do orbitz don't use a database, they use a GINORMOUS custom lisp data access shizzle

Replace it by using MUMPS. (0)

Anonymous Coward | more than 8 years ago | (#13946737)

I'd use MUMPS like god intended.

Native Queries / DLING (1)

Carl Rosenberger (449479) | more than 8 years ago | (#13945925)

In case you are using an object-oriented programming language, you may ask yourself if SQL really is a good choice. SQL is not object-oriented, it's use is usually neither typesafe nor compile-time checked in OO languages, it can't be refactored from an object-oriented IDE and it does not follow OO principles.

Food for thought:

Native Queries
http://www.db4o.com/about/productinformation/white papers/ [db4o.com]

DLINQ
http://msdn.microsoft.com/netframework/future/linq / [microsoft.com]

SQL is only 1/2 the story (2, Interesting)

klui (457783) | more than 8 years ago | (#13945930)

No standard way to extract/load data. No standard way to get all tables in a database. Basically DDL is entirely separate and each type of database has its own way of doing things. Let's not talk about embedded SQL and optimizing queries (like Oracle hints.. *ugh*).

my request (-1, Flamebait)

max born (739948) | more than 8 years ago | (#13945944)

I found the SQL syntax more difficult and less intuitive and to learn than bash. Thus for small databases I often pipe the entire contents into something like awk and grep.

I'd like to see an interface that's easier and more efficient to use.

Having to keep typing select, from, where, etc., seems redundant.

I'd prefer a command line utility with switches like s, f, w, etc.

Dates (3, Interesting)

omibus (116064) | more than 8 years ago | (#13946122)

1. Standard date functions and handling.
2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.

UPDATE, INSERT look different (2, Informative)

yamla (136560) | more than 8 years ago | (#13946141)

It always bugs me that you write UPDATE and INSERT totally differently. I'd much rather see them essentially the same (update obviously would need a WHERE clause). Not a big deal, I admit, but it makes my life harder.

Fundamentally, though, what we need is much better interfaces to our applications. Having to convert C++ data into some format appropriate for SQL and back again is a pain in the behind. Every existing interface between the two that I've seen is crap. Heck, most of the C++ interfaces don't even let you use the std::string type. What are we, back in 1995? Forget vectors and maps, or the fancy boost multiindexed templates. Anyway, these really aren't problems for SQL to solve. And yes, there are object-oriented databases and the like. But relational databases are still pretty much universal. I just mention it because it is a pain in the behind.

Re:UPDATE, INSERT look different (1)

thebatlab (468898) | more than 8 years ago | (#13947224)

I have to tip my hat to that suggestion. It's never made sense. I don't do any intensive database work but the stuff I do, it's annoying to have to write out the insert/update queries differently in the same function.

Better handling of relationships between tables? (1)

Trepalium (109107) | more than 8 years ago | (#13946148)

I'm no SQL expert, but it has always bothered me that you have to enter your relationships between tables twice -- once in the database construction itself via foreign keys, and again in each and every query that will make use of that relationship. Why can't I write: SELECT prefs.val FROM login,prefs WHERE login.username="myid" AND prefs.type="name" instead of SELECT prefs.val FROM login,prefs WHERE login.username="myid" AND prefs.type="name" AND prefs.userid=login.id (or the equiv if using JOINs instead) if I have the appropriate keys set up? Wouldn't that be an easier query to optimize? Wouldn't that create fewer resulting bugs?

Perhaps I'm ignorant to some important detail of databases, but I feel that it would be much easier to work with them if the server could represent the relationships between the tables in some meaningful way.

Re:Better handling of relationships between tables (1)

mooingyak (720677) | more than 8 years ago | (#13946383)

I think the basic problem with that is what if you got two tables, let's say, logins and projects.

logins had an id and a name

projects has, among other things, submit by and coded by, both of which reference the logins table.

So if you do default joining, which field should it join by?
If you limit the default join to when there's one and only one reference to a given table, then existing code may break once you add a new reference.
And lastly, what if you really want a cartesian join? Granted I've needed this about twice out of a very large number of queries I've written, but it my usage doesn't necessarily reflect everyone else's.

Infuriating (3, Insightful)

TTK Ciar (698795) | more than 8 years ago | (#13946176)

Out of all the annoying issues, I've pulled out the most hair over unique id's, and INSERT vs UPDATE.

Most SQL implementations give you some way of assigning unique id's to newly INSERT'ed rows. It would be nice if there were a standardized way, but that's a side issue. Once rows have unique id's, you can identify rows to be updated by id. This is very fast and simple.

Except .. in order to find out what id the DBMS has assigned a row, I usually have to follow my INSERT with a SELECT, to read the id column. Slow and annoying. Sometimes the DBMS I am working with takes a few seconds to perform the INSERT, and ten minutes to perform the SELECT. That takes it beyond an optimization issue, and into a workability issue.

Also, if I do not yet know if a data record has been INSERT'ed, and I need to either UPDATE the existing record or INSERT a new one (say, with just a new timestamp), then I need to either attempt an UPDATE and then fall back on INSERT if the UPDATE fails (ew!) or attempt a SELECT and either INSERT or UPDATE depending on whether it returned any rows (ew!).

If SQL came up with a standardized way to associate unique id's with newly INSERT'ed rows, it would be very, very nice if the id column(s) assigned were returned to the client in the same packet as the message confirming that the INSERT succeeded. Nearly zero additional overhead, neat, fast, and easy.

To solve the UPDATE/INSERT issue, I'm less sure. Say, for instance, that I have a daemon which periodically scans the filesystems in a cluster of machines, and it wants to UPDATE the "exists" column of a given row identified by a ( hostname, mountpoint, path, filename) tuple with the current time, if that row already exists, or INSERT a whole new row for that file if it does not exist. Perhaps there could be a "WRITE" command which is just like INSERT but overwrites a row if it already exists? That seems like the wrong solution, too. In the meantime, I play with caches of hashes to unique id's and lose more hair.

-- TTK

Re:Infuriating (0)

Anonymous Coward | more than 8 years ago | (#13946396)

In Oracle Database, you have MERGE statement (INSERT OR UPDATE if the primary key exist)

Re:Infuriating (2, Informative)

Forbman (794277) | more than 8 years ago | (#13946545)

Well, of course it all depends on the database. For those DBs that use autogenerated field types (i.e., SERIAL in PostGres, AUTOINC attribute in many others) getting the generated ID for the record you inserted is...problematic at best. Your point is valid.

For other DBs that can use triggers and sequence generators (Oracle, PostGres, Interbase/Firebird), it can be a bit easier, as these DBs have ways to query the sequence generator for its current or next value (as long as you're still in the same transaction scope). In the case of Oracle, getting the Next value off of a sequence increments the sequence, so as long as you hold onto that value, it's going to be unique. No more silly "select max(id) from my table" queries after you insert a record...

In both cases, though, it generally requires some way that can peek at the record being added, in a state where it's "added" to the table but before it's locked down. Generally the most expedient method to do this is with a stored proc/function that has the actual INSERT statement in it, but returns the autogen'd field value, however you can get it, and return it to the layer you called the proc from.

It all depends on your database and your database access layer (e.g., ADO, ODBC, OleDB, direct-to-driver, etc).

MIDAS (database caching layer from Borland 5 Enterprise and later) really does this well. There are other similar products as well.

Re:Infuriating (1)

schon (31600) | more than 8 years ago | (#13947113)

For those DBs that use autogenerated field types (i.e., SERIAL in PostGres, AUTOINC attribute in many others) getting the generated ID for the record you inserted is...problematic at best.

For Postgres, this is patently false. [postgresql.org] You can get the value of a sequence (serial) either before or after your INSERT.

Yeah, INSERT/UPDATE sucks (3, Insightful)

MobyDisk (75490) | more than 8 years ago | (#13946643)

I have yet to encounter a DBMS that didn't have an efficient, straightforward way to get the ID after an insert, but YMMV. However, the INSERT/UPDATE issue is a fundamental syntactical problem and it really should be fixed. INSERT and UPDATE do almost the same thing, yet have completely different syntax.

INSERT INTO someTable (fld1,fld2) VALUES ("foo","bar")
UPDATE someTable SET fld1=foo, fld2=bar

It is REALLY annoying when you have to write some code that generates a SQL statement because you must code for two completely different syntaxes. Someone replied about the Oracle MERGE which seems like a nice way to go.

Fortunately though, there are lots of good frameworks around SQL that make it so that writing SQL is becoming a thing of the past. I would like to see SQL treated like HTTP - nobody writes HTTP. It's a protocol. Let it be. Just use the tools. I guess it will never go away though...

Insert / Update combination (1)

justanyone (308934) | more than 8 years ago | (#13947000)


Perhaps, instead of replacing SQL, let's just add a clause to the update statement (or insert statement):

insert into table tabname (field1, field2) values (1, 2) [ WITH UPDATE [ALL,FIRST,ADD,NONE] ON FIELD1=1 ];

Thus, if the row exists with field1=1, it will update that row to with value field2=2.
If there are multiple rows where field1=1, define either update all of them (uses 'ALL'), update the first one found (uses 'FIRST'), insert one more (uses 'ADD'), or updates none of them (uses 'NONE').

Alternately, you could add this syntax to the UPDATE verb:

UPDATE TABLE tabname set field=1, field2=2 where field3=3 [ WITH [ NO ] INSERT ]

This would do the select for field3=3. If one or more records were found, update per request. If no records were found, insert a record with field1=1, field2=2 (field3 would be NULL since it's not specified in the 'set' subclause, only in the 'where' clause).

This solves your problem, provides two interesting methods.

Re:Infuriating (2, Informative)

erotic piebald (449107) | more than 8 years ago | (#13947009)

Maybe only Oracle does this?:

insert into tab (col1, col2, col3, ...) values (v1, v2, v3, ...)
returning expr1, expr2, ...
into var1, var2, ...

or

update tab
set col1=v1, col2=v2, col3=v3, ...
returning expr1, expr2, ...
into var1, var2, ...

Re:Infuriating (1)

Evro (18923) | more than 8 years ago | (#13947290)

In Postgres getting the "next value" is easy, you just get it before you insert it.

SELECT nextval('somesequence') as next_value; - usually takes something on the order of 0.0001 seconds

INSERT INTO table (index, something) VALUES (next_value, 'something'); - normal insert

wild card/regexs (1)

i.r.id10t (595143) | more than 8 years ago | (#13946215)

My complaint about it when I first started learning sql (ms flavor, only way offered at school at the time) and got to wildcards, where someone writing the spec chose % instead of * and so on. To top it off, I use sql so infrequently (months between writing it, then only a few statements) that this difference bites me in the butt just about every time I try to use a wildcard.

Whats the problem? (-1)

profet (263203) | more than 8 years ago | (#13946344)

For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names.

Um...

SELECT * FROM foo WHERE primaryKey != 'bar'

Whats the problem?

Re:Whats the problem? (0)

profet (263203) | more than 8 years ago | (#13946363)

oops... Columns != Rows... my mistake.

Babye karma..

Java is your friend (1, Interesting)

lucm (889690) | more than 8 years ago | (#13946549)

In the world of J2EE you can use container-managed persistence on entity beans (a flavor of EJB). Such object will handle the interaction with the database for you so you don't have to worry about SQL. And it will also address transaction issues so no need to fiddle with manual rollbacks.

With CMP, instead of fooling around with SQL queries you deal with business objects. Why bother with "update inventory" or "insert into cashRegister" when you can call methods on objects, like inventoryItem.substract() or cashRegister.feed() ? This provides you with a good layer between the application and the database, so whenever one has to change you don't have to mess with both. All you got to do to have all this magic is to keep some XML configuration up to date, and if you have the big bucks you can even get JBuilder to do it for you!

SQL is sooo yesterday. Stop wasting time on technical issues: add business value to your applications instead. And with JBoss you don't even have to pay for a good EJB container. (Of course you'll need a good CPU and a little more RAM, but hey, magic has a price!).

Re:Java is your friend (1)

Dibblah (645750) | more than 8 years ago | (#13946928)

You don't need a database in any of your examples. Anyone who's authored a proper application with a database behind it SHOULD be thinking in SQL, so that the database can do it's job. Otherwise, you're going to be reimplementing stuff that's already available at a lower level. Badly.

Re:Java is your friend (1)

lucm (889690) | more than 8 years ago | (#13947032)

Anyone who's authored a proper application with a database behind it SHOULD be thinking in SQL

This is a wrong statement. Data persistence does not have to be bound to the storage technology. Please take time to read about a technology before making so bold statements.

Have a look here [sun.com] .

Re:Java is your friend (1)

mr_rattles (303158) | more than 8 years ago | (#13947390)

How well does this CMP work when you're pulling from a table with 150 million records? What if I want data from this table for each of my 10,000 customers? Do I have to hit the table 10,000 different times or can the container do joins in a smart enough way to get the data efficiently and reliably?

Disallow DELETE or UPDATE without WHERE (1, Interesting)

dtfinch (661405) | more than 8 years ago | (#13946733)

I feel lucky to have not been burned by this, but you can bet a lot of people have. If you really want to delete everything, you should have to type something like "delete from table where true" or "truncate table".

The SELECT syntax is usually good enough for me, but the capabilities of most implementations leave much to be desired. They only support a small subset of what the syntax suggests.

mind reader (3, Funny)

chochos (700687) | more than 8 years ago | (#13946837)

SELECT * FROM whatever the hell my customers have in mind

Join syntax stinks (1)

justanyone (308934) | more than 8 years ago | (#13947049)


I've never been a fan of the join syntax. 'Inner' and 'outer' and 'left' and 'right' and 'full' joins? Let's for gosh sakes use the simple set-theory math terms we all understand.

Instead of:

select A.f1, b.f1, b.f2 from A join B on A.f1 = b.f1

do:
<B>select A.f1, b.f1, b.f2 from intersection join of A, B on A.F1 = B.F1
select A.f1, b.f1, b.f2 from union join of A, B on A.F1 = B.F1
select A.f1, b.f1, b.f2 from union minus intersection join of A, B on A.F1 = B.F1
select A.f1, b.f1, b.f2 from union minus A intersection join of A, B, C on A.F1 = B.F1 = C.F1
select A.f1, b.f1, b.f2 from union minus B intersection join of A, B, C on A.F1 = B.F1 = C.F1
select A.f1, b.f1, b.f2 from union A, B minus B intersection of A, B, C on A.F1 = B.F1 = C.F1</b>

Is this a test? (1, Funny)

planetjay (630434) | more than 8 years ago | (#13947099)

USE Database;
SELECT * FROM table WHERE column !="99";

Did I get the job? Or for that matter DID YOU?

Slightly offtopic (1)

EraserMouseMan (847479) | more than 8 years ago | (#13947262)

But I've been getting into MDX lately (data cube technology). I'll tell ya, if you want to do super-fast reporting of hierarchical data, cubes are the way to go. And MDX is a very feature rich cube querying language. It's such a breeze to do a query in MDX that would take all kinds of sub-queries, inner/outer joins and calculated fields to do with SQL. If you're finding yourself wracking your brain with complex SQL for reports you should really check out cubes and MDX. The learning curve will be well worth it!!

Object oriented (1)

countach (534280) | more than 8 years ago | (#13947359)

The biggest thing needed is the ability to return rows of differing column makeup for object oriented purposes. You need to be able to go:

SELECT FROM base-class

and get back all the fields from sub-classes so that you can populate entire objects from them. Real object oriented databases let you do that.

There are extensions (2, Interesting)

pestilence669 (823950) | more than 8 years ago | (#13947530)

Most SQL dialects include some sort of exclusion operator.

SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;

or

SELECT * FROM A EXCEPT SELECT "B" FROM A;

Other engines do it differently. I think one of the best things about SQL is that it's a loose standard. You can easily choose the engine that works best for you... unless you are from the Cult of Microsoft (SQL Server). DB/2, Oracle, and even Sybase have very cool features that make queries much more powerful.

While SQL is hard to use at times (remembering double outer joins), it's that way for a reason. You don't want to be as easy to use as VB, for instance. Being forced to think in terms of lists and cartesean products forces you to think about speed and abstraction.

SQL is as easy as it should be, IMO. Specializing the access modifiers will only add to the complexity and make query optimization an impossibility. If you don't care about speed, then your needs probably aren't serious enough for a full blown SQL RDBMS. Text, XML, or even MS Access could be better suited.

Complaining about SQL is like complaining about Linear Algebra. These systems exist for exceptionally good reason. They are constrained to reduce or eliminate unsolvable situations.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>