Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Rant: Untrusted Data from the Source

Chacham (981) writes | about a month ago

User Journal 4

While trying to load test data, we found duplicates (based on the unique key) in the provided file. So, the BA (English is not her first language) asked them:

Does the test file present valid business scenarios?

The response

While trying to load test data, we found duplicates (based on the unique key) in the provided file. So, the BA (English is not her first language) asked them:

Does the test file present valid business scenarios?

The response

Test data is never as constrained as production data - we have a lot of [...] users setting up test data every day for a lot of different reasons plus there will be historic test data that has been abandoned after either successful or failed tests - it can never be said that test data is as clean as production data ... but I would expect that comment to apply to most if not all applications

Really?? Test data is not constrained? I understand that test data can be bogus, but unconstrained?? What exactly is the purpose of this test data then? I can supply Lorem Ipsum myself.

Another question:

If combination of [two columns] doesn't provide [main id] uniqueness as it was discussed and stated in use case, what would be additional attribute(s) defining [main id] uniqueness?

A simple question asking how to resolve duplicates when we were not expecting any.

The repsonse:

[The two columns] code combination is unique from a Business perspective - do not re-design your [application] tables
I would however expect you to have exception processing in your load job (as [their application] does for all its inbound feeds) e.g. if you try to load something to a table and it can't load for whatever reason (a duplicate or whatever) you would write it to an exception report

Really?? We need an exception report? If they are the trusted source they are supposed to be, any error in the file should completely reject the file as bad, not just individual records, because any bad data means the entire file is suspect.

In general, i am against writing exception code in the database. (See Tom Kyte's posts on the topic for related concerns.) Exceptions, by definition, are unexpected. Handling an exception means they are expected. Only the calling system should handle unexpected errors, the reason being, as it is unexpected we do not know what to do. It's then up to the calling system to decide what its output will be.

To be fair, they do not expect duplicates, and it might just be an issue with the test data. But the whole attitude of "exception reports" is absurd. In short, the source system's team doesn't care about their own data.

This happened to me before on a different team when were to receive data from another team. I noted the absurdity of some of the dates (worst offender was a business that started ~400 CE) . When i notified their BA, he asked me to fill out a request to have it fixed. IOW, they wanted our team to pay to fix their bad data. That case is embarrassing for me as i lost my cool with their BA. When he asked me what we wanted in our feed, i told him to give whatever he wanted as we would not trust his information (more than we had to).

cancel ×

4 comments

Sorry! There are no comments related to the filter you selected.

I'm on a project like that (1)

Marxist Hacker 42 (638312) | about a month ago | (#47732549)

Turns out they didn't define their business case properly. I had to add the time field in to achieve a unique logical key. And even then, I still had to rewrite my loader (tossing XML files in C# to call stored procedures) to report exceptions properly so that I could figure out when they lied to me once again.

Data integrity? We don't need no stinkin' data integrity!

Re:I'm on a project like that (1)

Chacham (981) | about a month ago | (#47732985)

I had to add the time field in to achieve a unique logical key.

:(

Re:I'm on a project like that (1)

Marxist Hacker 42 (638312) | about a month ago | (#47733107)

This was not entirely unexpected, given the nature of the data (floating inventory moving through a factory and a warehouse- sometimes it returns to the same location in three dimensional space, but is unique in four dimensions). But it sure would have been nice if they could have given me a scan sequence number as well; a date time field is nice for reporting but should NEVER be part of a logical primary key.

Re:I'm on a project like that (1)

Chacham (981) | about a month ago | (#47733883)

but should NEVER be part of a logical primary key.

I wouldn't say never. The odd case would be in the table held data by the minute or second or the like.

But, i share your adamant stand against it it nearly all cases.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>