Data Corruption from Excel Autocorrect

Journal Interrobang's Journal: Data Corruption from Excel Autocorrect 1

Journal by Interrobang on Friday March 27, 2009 @05:18PM

Someone on TECHWR-L posted a link to this paper (under the paradoxical title "The Cupertino Effect"), which is about how Excel's autocorrect feature can corrupt statistical analysis of genetic data if/when Excel "makes the wrong assumption" about an entry based on how it looks:

When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered.

As the author points out, this can cause gene names to come back in analyses as "unknown," because "[a] default date conversion feature in Excel ... was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 ... was being converted to '1-DEC.'"

The authors also note that there is a problem with "RIKEN [4] clone identifiers of the form nnnnnnnEnn" being converted to a floating-point number.

The paper also gives some idea of the devastating scale of the problem and its significance for people doing these sorts of analyses: "A non-expert user might well fail to notice that approximately 3% of the identifiers on a microarray with tens of thousands of genes had been converted to an incorrect form, yet the potential for 2,000 identifiers to be transmogrified without notice is a considerable concern."

As far as I know personally and can glean from the paper, the autocorrect and/or conversion feature is nearly impossible to disable completely, and can only be worked around, possibly unsuccessfully 100% of the time. This suggests that perhaps Excel is not the tool of choice for doing these sorts of analyses. (Does the spreadsheet application in OpenOffice work differently?)

Data Corruption from Excel Autocorrect

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 1 Comments Log In/Create an Account

Comments Filter:

I shall check (Score:1)

by Hemi Rodner ( 570284 ) * writes:

I have both Excel and OpenOffice at work, so I'll check tomorrow.
It's a bit silly that they provided a C program to check for corruptions. Providing a VB program (that runs in Excel) would be much more useful for people.
Hugs

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Journal Interrobang's Journal: Data Corruption from Excel Autocorrect 1

Data Corruption from Excel Autocorrect More Login

Data Corruption from Excel Autocorrect

I shall check (Score:1)

Slashdot Top Deals

Slashdot