Announcing: Slashdot Deals - Explore geek apps, games, gadgets and more. (what is this?)

Thank you!

We are sorry to see you leave - Beta is different and we value the time you took to try it out. Before you decide to go, please take a look at some value-adds for Beta and learn more about it. Thank you for reading Slashdot, and for making the site better!


GWilliams hasn't commented recently.



Head First SQL

GWilliams GWilliams writes  |  more than 6 years ago

Graeme Williams writes "Head First SQL is an SQL book designed for people who have little or no SQL experience. It has the clarity and effectiveness, as well as the graphical kapow, that you expect from Head First books, but it has too many errors to be recommended. I suppose it's a matter of judgment, but in my opinion the errors, which affect many of the topics in the book, are serious enough to confuse the beginners the book is intended for. Unfortunately, if you know enough SQL for the errors not to matter, you don't need this book.

A defining characteristic of books from Head First is an explosion of fonts on the page. They have a perfectly good explanation, which I happen to accept, related to the way people learn. However, looking at the errors in this book, it's hard to avoid the suspicion that the text in each font has been copy-edited separately. On page 303, there's a statement that "The FOREIGN KEY is a column in a table that references the PRIMARY KEY of another table" (which is false). On page 305, this has turned into, "It doesn't have to be the primary key of the parent table, but it must be unique" – in a different font, but this is correct. On page 357, presenting cross or Cartesian joins, the author confidently asserts that "we will use Cartesian join" in this book, followed on page 358 and 359 by the use of "cross join" in one font and "Cartesian join" in another, including repetition of the phrase, "some result rows (are) removed by a condition in the query".

As appropriate for an introductory book, the examples and explanations in Head First SQL are concrete and specific. But the flip side of this is that the book seems light on theory – everything is presented in terms of specific MySQL SQL. It's easy to see that this choice makes sense in a book for absolute beginners, but it hurts in a few places where the difference between the theory and the language can help.

For example, for a table to be in first normal form, it can't have two records that are exactly the same – that's the theory. It's revealing that C. J. Date's definition of first normal form (The Relational Database Dictionary, page 42) doesn't mention keys at all. The way to express this in SQL is with a constraint. Bleighley is only vaguely correct when she says (page 177) that to be in first normal form, a table must have a primary key – any unique column is enough, and it might be worth explaining why. The waters are further muddied when the following example adds a synthetic key to car_table when it already has a natural key (the VIN!).

At least part of the problem has got to be that "key" is never defined as a concept separate from the definition of a primary key in SQL. MySQL doesn't help, because it uses KEY as a synonym for PRIMARY KEY and UNIQUE as a synonym for UNIQUE KEY.

Another example of the difference between theory and practice is on page 259 where the first table – the result of a SELECT statement without an ORDER BY clause – is described as "There's no real order here" even though the records appear in order of their primary key. What Beighley wants to say is that the theory doesn't require a particular order (and you shouldn't rely on one) even though the database will give you that same order every time.

A standard method in the book is to present a problem and show a number of approaches that work poorly or not at all, before introducing a new concept to really solve the problem. The risk to avoid is that the failures have to be clearly indicated, something that's missed on page 313 when records are duplicated but given different unique synthetic keys. This also nicely illustrates the dangers of synthetic keys, except that's not explained either.

This might also be the problem on page 330, where the example table has a composite key that isn't unique.

Date says (ibid., page 65), "nulls have no place in the relational model". I think it would be odd for an introductory book to leave NULL out altogether, but that raises the problem of what to do with it. NULL is introduced on page 44, but I didn't find the explanation particularly clear, and Beighley goes on to make two kinds of errors: trying to avoid NULL as though as was "slightly bad", and avoiding it by using magic values. On page 310, she has, "If you have a column containing values you don't yet know, you can isolate it and avoid NULL values in your main table". What's lacking is a practical set of rules for when to use NULL, what the risks are, and when not to use it.

The standard example of a self join is table of employee records which include the boss of each employee, except the person at the top. On page 430, this example is introduced using a two-column table mapping clowns to their bosses (no, really!), but for some reason Beighley includes a row for the top clown, Mister Sniffles, who doesn't have a boss. In this case, to avoid a slightly bad NULL, Mister Sniffles is his own boss. This doesn't make any sense at all. Once you've broken the employee-boss relationship into its own two-column mapping table, you don't need a record for Mr Sniffles and his non-existent boss at all.

Page 263 introduces a troop of Girl Sprouts who want to keep track of cookie sales each day for each sprout. The table includes days where a particular sprout sold no cookies using a zero value for sales. It might have been better to leave out those records, but perhaps that would have introduced complications the author wanted to avoid. For example, on page 267, there's a calculation of the average day's sale for each sprout which wouldn't make sense otherwise ("Each girl has seven days of sales"). But on page 269, the stated goal is "to figure out which girl sold cookies on more days than any other" where it seems that we're going to have to deal with those zero values. We proceed to do no such thing. The explanation of the COUNT function explains that "if the value is NULL," it isn't counted" which doesn't help since this table doesn't have any NULLs. We can't really solve the problem as stated since GROUP BY hasn't been introduced – the chapter ends by calculating the number of days on which cookies were sold using:

SELECT COUNT (DISTINCT sale_date) from cookie_sales;
which is correct only as long as there are no days in which none of the sprouts sold any cookies.

On page 230, there's a very odd explanation of how UPDATE is applied to each row of a table. "Your SQL software interprets the statement for each row in the table one at a time; then it goes back and starts over ...". And lower down on the same page, "Then it starts to run through the whole table again a second time, ..." as though the database needs a separate pass through the table for each row.

In introducing subqueries on page 386, the example doesn't distinguish between the tables of the inner and outer query:

SELECT some_column, another_column FROM table WHERE column = (SELECT column from table)
which is correct as far as it goes, but might leave the impression that the two tables have to be the same. On page 394, the term "noncorrelated subquery" is used before it's defined. On page 397, a correlated subquery is shown, followed by the definition of noncorrelated subquery on page 399 and the unfortunate statement that "all of the subqueries you've seen so far are known as uncorrelated subqueries". The explanation of correlated subqueries on page 408 might give the impression that an alias is required for the table in the outer query.

The example used to introduce left outer joins (page 420) uses a join between the foreign_key toy_id in the girls table and the matching primary key toy_id in the toys table, so it's not a great example. The statement, "a NULL value in the results means that a particular toy doesn't belong to any of the girls" is doubly wrong. The foreign key constraint means that a NULL value isn't possible in the results, and if it did appear it would mean that a girl didn't have a toy, not the other way around. The explanation of right outer join (page 426) compares a left outer join and a right outer join with the table names swapped, so that the source and target (and the query result) are the same. This is another place where it looks like the different fonts have been written by different people, since the annotations, in a different font, claim that the source, the girls table, is the left table in both queries, and the target, the toys table, is the right table in both queries.

When you don't like a book, it's possible to throw yourself overboard, and treat every error as an assault on civilization. It's true that not every error in this book is equally serious, and some of the things I've mentioned probably are just typos. But in a book which is intended for people who don't know SQL at all, any error is a possible source of confusion."


GWilliams has no journal entries.

Slashdot Login

Need an Account?

Forgot your password?