Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Reliability of Computer Memory?

kdawson posted more than 5 years ago | from the check-some dept.

Data Storage 724

olddoc writes "In the days of 512MB systems, I remember reading about cosmic rays causing memory errors and how errors become more frequent with more RAM. Now, home PCs are stuffed with 6GB or 8GB and no one uses ECC memory in them. Recently I had consistent BSODs with Vista64 on a PC with 4GB; I tried memtest86 and it always failed within hours. Yet when I ran 64-bit Ubuntu at 100% load and using all memory, it ran fine for days. I have two questions: 1) Do people trust a memtest86 error to mean a bad memory module or motherboard or CPU? 2) When I check my email on my desktop 16GB PC next year, should I be running ECC memory?"

cancel ×

724 comments

yeah (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#27384765)

um, sure? . wtf, it said this is ascii art .

Hackers. (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#27385117)

Was the best movie of all time.

Surprise? (4, Funny)

Anonymous Coward | more than 5 years ago | (#27384775)

Recently I had consistent BSODs with Vista64 on a PC with 4GB...

This was a surprise?

Re:Surprise? (5, Informative)

Erik Hensema (12898) | more than 5 years ago | (#27385223)

Yes. Vista is rock solid on solid hardware. Seriously. Vista is as reliable as Linux. Some people wreck their vista installation, some people wreck their Linux installation.

Re:Surprise? (2, Insightful)

Starayo (989319) | more than 5 years ago | (#27385337)

It won't crash often if at all, it's true, but vista is way too slow for my, and many other's tastes.

Memtest not perfect. (5, Informative)

Galactic Dominator (944134) | more than 5 years ago | (#27384781)

My experience with memtest is you can trust the results if it says the memory is bad, however if the memory passed it could still be bad. Troubleshooting your scenario should involve replacing the DIMM's in questions with known good modules while running Windows.

Re:Memtest not perfect. (5, Funny)

0100010001010011 (652467) | more than 5 years ago | (#27384819)

I bet Windows will love you replacing the DIMM's while running.

Re:Memtest not perfect. (4, Funny)

Anthony_Cargile (1336739) | more than 5 years ago | (#27384963)

I bet Windows will love you replacing the DIMM's while running.

Yeah wait until it starts to sleep first, or even better if you catch it while hibernating

Re:Memtest not perfect. (1, Informative)

Anonymous Coward | more than 5 years ago | (#27384821)

It depends. You also have to have the latest stable version. Running an old version of memtest86 on new hardware is liable to produce unpleasant surprises.

Re:Memtest not perfect. (3, Informative)

mrmeval (662166) | more than 5 years ago | (#27384839)

I've yet to see memtest86 find an error even though replacing the ram fixed the problem. This has been on several builds.

Re:Memtest not perfect. (5, Funny)

Antidamage (1506489) | more than 5 years ago | (#27385047)

I've often had it pick up bad ram, usually within the first five minutes. One time, the memory in question had been through a number of unprotected power surges. The motherboard and power supply were dead too.

You can reliably replicate my results by removing the ram, snapping it in half and putting it back in. No need to wait for a power surge to see memtest86 shine.

Re:Memtest not perfect. (5, Interesting)

Anonymous Coward | more than 5 years ago | (#27384849)

Another nice tool is prime95. I've used it when doing memory overclocking and it seemed to find the threshold fairly quickly. Of course your comment still stands - even if a software tool says the memory is good, it might not necessarily be true.

Re:Memtest not perfect. (5, Funny)

Hal_Porter (817932) | more than 5 years ago | (#27385115)

Memtestx86 is bögus. My machine alwayS generated errors when I run the test but it works fOne otherwise ÿ

Re:Memtest not perfect. (4, Informative)

NerveGas (168686) | more than 5 years ago | (#27385215)

+1. I once had a pair of DIMMs which would intermittently throw errors in whichever machine they were placed, but Memtest would never detect anything wrong with them - even if used for weeks.

I called Micron, and they said "Yes, we do see sticks that go bad and Memtest won't detect it." They replaced them for free, the problem went away, and I was happy.

Yawn (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#27384783)

Did kdawson run out of fud and is instead posting non-news? Shouldn't this be in Ask Slashdot? Typical kdawson garbage

tinfoil is the answer (5, Funny)

Anonymous Coward | more than 5 years ago | (#27384785)

wrap your _whole_ computer in tinfoil to deflect those pesky cosmic rays. it also works to keep them out of your head too.

Re:tinfoil is the answer (4, Funny)

platypussrex (594064) | more than 5 years ago | (#27384801)

I have an even better idea. You know how water cooling makes your computer run better? Well my theory is that water cooling would work the same way for the OP. He needs to get a large tank, fill it with ice water, and be sure to keep his head fully submerged while doing all his computer work. I'm sure he'll be amazed at his increased productivity.

Re:tinfoil is the answer (2, Informative)

Anonymous Coward | more than 5 years ago | (#27384893)

That's absolutely true. As Samuel Johnson remarked, "Depend upon it, sir, when a man knows he is to be drowned in ice water, it concentrates his mind wonderfully." Of course, Boswell made a few errors in transcription.

Re:tinfoil is the answer (0)

Anonymous Coward | more than 5 years ago | (#27384805)

You're going to need pretty thick tinfoil to stop cosmic rays.

Works great against CIA thought control though!

Re:tinfoil is the answer (3, Funny)

JWSmythe (446288) | more than 5 years ago | (#27384831)

... or so they've made you believe.

    The tin foil hat works. We can't read your mind. Feel safe wearing the tin foil hat. You've protected yourself against our evil plot to control your mind. :)

metal armour is the answer (4, Funny)

captainpanic (1173915) | more than 5 years ago | (#27385323)

I usually wear medieval armour. Not only does that work as efficient as tinfoil, it's also very fashionable.

Re:tinfoil is the answer (0)

Anonymous Coward | more than 5 years ago | (#27385333)

... or so they've made you believe.

    The tin foil hat works. We can't read your mind. Feel safe wearing the tin foil hat. You've protected yourself against our evil plot to control your mind. :)

Except you are now wearing the tinfoil hat that your slashdot overlords want you to be wearing

Re:tinfoil is the answer (0)

Anonymous Coward | more than 5 years ago | (#27384929)

It is not only the cosmic rays, but the plastic materials that the memory chip are encased in, these materials have small amounts of radioactive substances that will emit alpha particles occasionally, you can unfortunately not wrap the chips in tinfoil to prevent that...

Re:tinfoil is the answer (0)

Anonymous Coward | more than 5 years ago | (#27384989)

unfortually alpha particals are only able to travel a few cm at most in normal atmosphere and cant even penetrate a sheet of a4 paper. unless your gonna put your finger on the chips for a extended period of time i doubt the alpha particles will effect you.

Re:tinfoil is the answer (1)

Hal_Porter (817932) | more than 5 years ago | (#27385129)

I realise it's traditional not to RTFA here, but in the comment you replied to it said "It is not only the cosmic rays, but the plastic materials that the memory chip are encased in, these materials have small amounts of radioactive substances that will emit alpha particles occasionally".

I.e. the bit flipping alpha particles come from the packaging, not from the outside world.

Re:tinfoil is the answer (1)

ILuvRamen (1026668) | more than 5 years ago | (#27385241)

mine's actually encased in 20 feet of solid lead. Now you might think that's pretty inconvenient but it has front USB running out to the front plate. Putting CDs in it is a pain though. But I refuse to lose any performance to ECC ram!

lead is the answer (0)

Anonymous Coward | more than 5 years ago | (#27385283)

Wrap your computer (or RAM) into 10 cm thick of lead, and see if the error persists.

If not, buy lead producers' stocks to suck off profit off the heavy metal computers' buyers of tomorrow.

Re:tinfoil is the answer (0)

Anonymous Coward | more than 5 years ago | (#27385311)

It would work on a passively cooled computer, but not on your head, because there can't be any openings.

The tinfoil hat actually makes matters worse because it functions as an amplifier. You could wrap your entire body in tinfoil, but air would need to get in somewhere.

They don't make tinfoil any more either, it's aluminum foil, you luddite.

Paranoia? (0)

Anonymous Coward | more than 5 years ago | (#27384799)

I doubt this is any major problem, I've had my main computer up and running for a few months without a reboot and so far I haven't had any malfunction or program crash.

Re:Paranoia? (4, Interesting)

dgatwood (11270) | more than 5 years ago | (#27385029)

The probability of a cosmic ray at precisely the right angle and speed to cause a single bit error and cause an app to crash is somewhere on the same order as your chances of getting hit by a car, getting struck by lightning, getting torn apart by rabid wolves, and having sex in the back of a red 1948 Buick convertible at a drive-in movie theater on Tuesday night, Feb. 29th under a blue moon... all at the same time.... Sure, given enough bits, it's bound to happen sooner or later, but it isn't something I'd worry about. :-)

The probability of RAM just plain being defective---failing to operate correctly due to bugs in handling of certain low power states, having actual bad bits, having insufficient decoupling capacitance to work correctly in the presence of power supply rail noise, etc---is probably several hundred thousand orders of magnitude greater (probably on the order of a one in several thousand chance of a given part being bad versus happening to a given part a few times before the heat death of the universe).

Memory test failures (other than mapping errors) are pretty much always caused by hardware failing. If running memtest86 in Linux works correctly for days, this probably means one of three things:

  • A. Linux is detecting the bad part and is mapping out the RAM in question.
  • B. The Linux VM system doesn't move things around RAM as much as Windows. Thus, random chunks of code don't end up there, and the few that do are in rarely used parts of background daemons or unused kernel modules so you don't notice the problem.
  • C. Linux power management isn't as rough on the RAM or CPU as Windows. Dodgy RAM/CPUs are most likely to fail when you take them through power state changes like putting the machine to sleep or switching the CPU into or out of an idle state. If Linux is making power state changes less frequently, is not using some of the lowest power states, is not stepping clock speeds, is not dropping the RAM refresh rate in sleep mode, etc., then you are less likely to see memory corruption. Similarly, power state changes can increase the rate of crashes due to a defective CPU or memory controller (northbridge).

I couldn't tell you which of these is the case without swapping out parts, of course. You should definitely take the time to replace whatever is bad even if it seems to be "working" in Linux. In the worst case, you have a few bad bits of RAM, they're somewhere in the middle of your disk cache in Linux, and you are slowly and silently corrupting data periodically on its way out to disk.... You definitely need to figure out what's wrong with the hardware and why it is only failing in Windows, and it sounds like the only way to do that is to swap out parts, boot into Windows, and see if the problem is still reproducible in under a couple of days, repeating with different part swaps until the problem goes away. Don't forget to try a different power supply.

Re:Paranoia? (3, Insightful)

Workaphobia (931620) | more than 5 years ago | (#27385053)

several hundred thousand orders of magnitude

We've crossed beyond the realm of the astronomical and into something else entirely. Surely you meant several orders of magnitude, aka, hundreds of thousands of times? Let's keep things on this side of the googol.

That's not that much.... (1)

raehl (609729) | more than 5 years ago | (#27385201)

That's only a little more than a few orders of magnitude of orders of magnitude.

Re:Paranoia? (1)

dgatwood (11270) | more than 5 years ago | (#27385251)

The word you're looking for here is "hyperbole". :-)

Re:Paranoia? (5, Funny)

Anonymous Coward | more than 5 years ago | (#27385135)

and having sex in the back of a red 1948 Buick convertible at a drive-in movie theater on Tuesday night, Feb. 29th under a blue moon... all at the same time....

Mom?

Re:Paranoia? (0)

Anonymous Coward | more than 5 years ago | (#27385163)

If running memtest86 in Linux works correctly for days...

This quote and speculations about how Linux might be handling the memory situation show that parent is just, well, speculating about things, so I wonder who marks his reply informative.
First of all, memtest86 is a standalone program (runs without/instead-of any OS). Second, the grand parent said that it does detect the problem.
All in all, it seems that Linux and Windows have different memory layouts and Linux doesn't put as much stress/usage on that particular troubled memory region.

Just my 0.02 UAH.

Re:Paranoia? (1)

Nikker (749551) | more than 5 years ago | (#27385207)

I have to agree with you on linux must be mapping out the bad areas. I have an old p3 500 with 128mb ram and running linux (slackware) it ran fine but I installed XP on it and got a BSOD "Hardware Parity Error". It's likely memtest is not lying to you and Ubuntu is just making the best out of a bad situation.

Re:Paranoia? (1, Interesting)

Jamie's Nightmare (1410247) | more than 5 years ago | (#27385233)

This is a load of crap and your Pro-Linux bias stinks up to high heaven.

If running memtest86 in Linux works correctly for days, this probably means one of three things:

First of all, you don't run Memtest86 under Windows, Linux, or any other operating system. Why? Because you can't test memory that is in use by any other program. This already tells us that you probably haven't used Memtest86 recently enough to remember you would run this from a bootable CD or Floppy. It's downhill from here.

A. Linux is detecting the bad part and is mapping out the RAM in question.

No. Linux doesn't do this. Can you imagine the extra overhead of double checking every single read and write to RAM? Jesus Christ.

B. The Linux VM system doesn't move things around RAM as much as Windows.

Nice, baseless troll argument.

C. Linux power management isn't as rough on the RAM or CPU as Windows.

Isn't as rough? Because half the time it doesn't work as intended? So now a negative becomes a plus? Give us a break.

Re:Paranoia? (1, Informative)

Anonymous Coward | more than 5 years ago | (#27385271)

I disagree - cosmic rays do happen all the time, and do interact with silicon. I was at a demo of a low-light level camera, and every few seconds there'd be a cosmic ray artefact on the monitor.

(Sensible) People do use ECC RAM (1, Informative)

Anonymous Coward | more than 5 years ago | (#27384807)

It's the lowest of the low end of the market that doesn't use ECC, or at least Parity RAM. For anything where reliability and veracity is important, you simply must use ECC.

Re:(Sensible) People do use ECC RAM (2, Informative)

Mr Z (6791) | more than 5 years ago | (#27384943)

With today's wide buses, parity RAM is ECC RAM. It's worth paying the extra couple dollars.

Several years back I experienced disk corruption that seemed to be due to a bitflip that had happened in RAM and got committed to disk. That machine didn't have ECC RAM. I went to ECC for everything after that. That was back in the 128MB days, and no I don't overclock.

(Well, not aggressively. My machine is overclocked by about 1%.)

Re:(Sensible) People do use ECC RAM (1)

MadnessASAP (1052274) | more than 5 years ago | (#27384979)

Although I agree that any critical application should be using ECC RAM the vast majority of desktops do not. The vast majority of RAM out there from the very low end to the very high end come in non-ECC versions, a memory fault every few days doesn't justify to cost and loss of performance to me and too most people.

Error response (4, Informative)

greg1104 (461138) | more than 5 years ago | (#27384829)

If a system gives memtest86 errors, I break it down and swap components until it doesn't. The test pattern it uses can find subtle errors you're unlikely to run into with any application-based testing even when run for a few days. Any failures it reports should be taken seriously. Also: you should pay a attention to the memory speed value it reports, that's a surprisingly effective simple benchmark for figuring out if you've setup your RAM optimally. The last system I built, I ended up purchasing 4 different sets of RAM, and there was about a 30% delta between how well the best and worst performed on the memtest86 results--correlated extremely well with other benchmarks I ran too.

At the same time, I've had memory that memtest86 said was fine, but the system itself still crashed under a heavy Linux-based test. I consider both a full memtest86 test and a moderate workload Linux test to be necessary before I consider a new system to have baseline usable reliability.

There are a few separate problems here that are worthwhile to distinguish among. A significant amount of RAM doesn't work reliably when tested fully. Once you've culled those out, only using the good stuff, some of that will degrade over time to where it will no longer pass a repeat of the initial tests; I recently had a perfectly good set of RAM degrade to useless in only 3 months here. After you take out those two problematic sources for bad RAM, is the remainder likely enough to have problems that it's worth upgrading to ECC RAM? I don't think it is for my home systems, because I'm OK with initial and periodic culling to kick out borderline modules. And things like power reliability cause me more downtime than RAM issues do. If you don't know how or have the time to do that sort of thing yourself though, you could easily be better off buying more redundant RAM.

Re:Error response (2, Interesting)

gabebear (251933) | more than 5 years ago | (#27384899)

Anyone else have RAM modules degrade over time? I've never seen this.

I always buy faster modules than I'm actually using. I usually test the system with memtest at a higher frequency than what it's going to run. My last build overclocked to [2.7Ghz CPU, 1066 FSB, 1066/CL7 DDR3] with memtest still reporting no errors; I run it at [2.1Ghz, 800, 800/CL7](a.k.a stock speed).

Re:Error response (2, Informative)

nmos (25822) | more than 5 years ago | (#27384951)

Anyone else have RAM modules degrade over time? I've never seen this.

I've seen a few known good modules fail later on but it's pretty rare. I'd say I've seen fewer than 5 in 15 years. Most times when a previously good module suddenly appears bad there's something else going on such as a failing power supply etc.

Re:Error response (0)

Anonymous Coward | more than 5 years ago | (#27384953)

If you OC or run high performance RAM you know it runs hot. If you power your machine up and down a lot I would bet there's a good chance simple physical stress will cause some errors eventually.

Re:Error response (2, Funny)

Anthony_Cargile (1336739) | more than 5 years ago | (#27385017)

Anyone else have RAM modules degrade over time? I've never seen this.

I don't know if this is from degraded RAM, or rats pissing on the motherboard, but an olde IBM PC running DOS (upgraded to 3?) started having little blips on-screen and other strange characters appear in the output of programs and the shell itself, and in addition to this it would randomly lock up occasionally displaying a stack error.

I know the floppy is alright, because it boots fine without any of these symptoms occurring from other machines it boots from. The video cardish component appears fine to the naked eye, but does not explain the random stack errors and unexplained lockups. I've always wondered what the hell was wrong with this thing (and a certain someone won't stop nagging me to throw it away, already) but it could very well be degraded RAM. Can't boot up memtest86 because (CPU i386) but the symptoms seem to all point to bad RAM.

Re:Error response (1)

Ralish (775196) | more than 5 years ago | (#27385235)

I know the floppy is alright, because it boots fine without any of these symptoms occurring from other machines it boots from.

Assuming you mean this box is booting off a floppy disk, how can you say the above? Have you discounted faulty drive electronics in the floppy drive itself?

The video cardish component appears fine to the naked eye

Your method for testing the reliability of your video card is by staring at it? What were you expecting to conclude? If you can see visible damage to any modern electrical component, excluding things like fans, then it's unlikely to work at all period. You can't possibly determine more technical and subtle faults by just staring at a PCB.

If you want to establish anything conclusively, you're going to have to test components thoroughly in software, or if that's not possible, swap them out with known good components and see if they fix the problem.

I imagine your issue is going to be finding compatible parts for what I assume is a reasonably ancient machine.

Re:Error response (0)

Anonymous Coward | more than 5 years ago | (#27385221)

I had a stick of RAM go bad. It was ECC, but I had ECC turned off due to some theory that it would be clocked faster without ECC on. All of a sudden, Firefox would kill the system dead. Repeatably. Lived with it for a while, but eventually, just to see what would happen, ran memtest. Bad RAM. Turned on ECC and called the manufacturer. They replaced it, and with ECC on until I got the new ram, everything was fine.

Re:Error response (1)

pegdhcp (1158827) | more than 5 years ago | (#27385243)

Umm, I saw some old "supposedly" dead Cisco 25xx memories return to life after some months in a paper tray. This is not a direct answer to your question, obviously. But, yes, RAM's change their state while stored away...

Yeah I had this too (-1, Flamebait)

mofag (709856) | more than 5 years ago | (#27384833)

...it was due to having too puny a power supply. Go spend some money on a decent power supply then test your memory again and if it fails on memtest, return it to the manufacturer you fucking moron and stop wasting our time. This isn't a forum for you to ask us questions instead of switching your brain on. What a fuckwit!

Answers (5, Interesting)

jawtheshark (198669) | more than 5 years ago | (#27384835)

1) Yes

2) No

Now to be serious. Home PC do not come yet with 6GB or 8GB. Most new home PC still seem to have between 1GB and 4GB. Where the 4GB variety is rare because of the fact that most home PCs still come with a 32-bit operating system. 3GB seems to be the sweet spot for higher-end-home-pcs. Your home PC will most likely not have 16GB next year. Your workstation at work, perhaps, but then even perhaps.

At the risk of sounding like "640KByte is enough for everyone", I have to ask why you think why you need 16GB to check your email next year. I'm typing this on a 6 year old computer, I'm running quite a few applications at the same time and I know a second user is logged in. Current memory usage: 764Meg RAM. As a general rule, I know that Windows XP runs fine on 512Meg RAM and is comfortable with 1GB RAM. The same is true for GNU/Linux running Gnome.

Now, at work with Eclipse loaded, a couple of application servers, a database and a few VMs... Yeah, there indeed you get memory starved quickly. You have to keep in mind that such usage pattern is not that of a typical office worker. I can imagine that a heavy Photoshop user would want every bit of RAM he can get too. The Word-wielding-office-worker? I don't think so.

Now, I can't speak for Vista. I heard it runs well on 2GB systems, but I can't say. I got a new work laptop last week and booted briefly in Vista. It felt extremely sluggish and my machine does have 4Gig RAM. Anyway, I didn't bother and put Debian Lenny/amd64 on it and didn't look back.

I my idea, you have quite a twisted sense of reality regarding to the computers people actually use.

Oh, and frankly... If cosmic rays would be a big issue by now with huge memories, don't you think that more people would be complaining? I can't say why Ubuntu/amd64 ran fine on your machine. Perhaps GNU/Linux has built-in error correction and marks bad RAM as "bad".

Re:Answers (0)

Anonymous Coward | more than 5 years ago | (#27385059)

Don't know about you, but home PC's where I live ARE coming with 6 or 8GB of memory.

I even had one secretary ask me why the computer tells her she only has 3.5GB and bought 8GB.

It's like the whole Ghz thing or the hard disk capacity thing, if people can buy a bigger number they do and assume it's better.

Maybe 64bit will catch on now :)

Re:Answers (4, Informative)

bdsesq (515351) | more than 5 years ago | (#27385303)

... 3GB seems to be the sweet spot for higher-end-home-pcs.

3GB is not so much a "sweet spot" as it is a limitation based on a 32 bit OS.
You can address 4GB max using 32 bits. Now take out the address space needed for your video card and any other cards you may put on the bus and you are looking at a 3GB max for useable memory.
So instead of "sweet spot" you really mean "maximum that can be used by Windows XP 32 Bit (the most commonly used OS today).

Memtester (0)

Anonymous Coward | more than 5 years ago | (#27384847)

I have found that memtester (http://pyropus.ca/software/memtester/ [pyropus.ca] ), which is run as a user-level process under linux, does an excellent job of finding bad ram. I had two instances of memory modules that passed memtest86+ but failed memtester.

ECC All the way (1)

dokebi (624663) | more than 5 years ago | (#27384859)

All of my computers that run for days on end without rebooting have ECC ram in them (Home server, workstation at work). Others must be rebooted every now and then.

Are there laptops that use ECC RAM? I wish I could buy some.

Cost benefit analysis (2, Insightful)

Logic Worshipper (1518487) | more than 5 years ago | (#27384865)

Is ECC memory worth the money in a machine you use to check your E-mail? Can't you just reboot and/or replace the memory if errors occur?

I could see it happening when the cost of ECC memory is no higher than normal memory, and using ECC memory has no or minimal impact on performance, until then, I won't expect to start seeing it desktop machines.

If you want ECC memory on your desktop, feel free to build your own machine with a motherboard that supports ECC memory. Some high end desktops do support ECC memory already.

Re:Cost benefit analysis (2, Interesting)

a09bdb811a (1453409) | more than 5 years ago | (#27384967)

Is ECC memory worth the money in a machine you use to check your E-mail?

Unbuffered ECC is only a few $ more than unbuffered non-ECC. It's only 9 chips per side instead of 8, after all. The performance impact is marginal.

I see no reason not to use ECC except that Intel doesn't want you to. It seems they want to keep ECC as a 'server' feature (as if your desktop at home isn't 'serving' you your data). So all their consumer chipsets don't support it, and the i7's memory controller doesn't either. AMD doesn't play that game with their chips, but it seems only ASUS actually implements the ECC support on most of their boards.

Re:Cost benefit analysis (1)

Logic Worshipper (1518487) | more than 5 years ago | (#27385005)

It seems like the day I'm talking about may not be that far away. We'll probably start seeing ECC memory in desktops sooner rather than later.

Re:Cost benefit analysis (0)

Anonymous Coward | more than 5 years ago | (#27385219)

Another thing to take into consideration is that the incidence of cosmic rays increases with altitude. So if you live in a high-altitude area like Denver for instance, ecc would probably be a good idea.

I'm guessing there's massive amounts of ecc going on in all the cpu's/ram that NASA uses in space.

The truth (4, Insightful)

mcrbids (148650) | more than 5 years ago | (#27384873)

My first computer was a 80286 with 1 MB of RAM. That RAM was all parity memory. Cheaper than ECC, but still good enough to positively identify a genuine bit flip with great accuracy. My 80386SX had parity RAM, so did my 486DX4 120. I ran a computer shop for some years, so I went through at least a dozen machines ranging from the 386 era through the Pentium II era, at which point I sold the shop and settled on a AMDK62 450. And right about the time that the Pentium was giving way to the Pentium II, non-parity memory started to take hold.

What protection did parity memory provide, anyway? Not much, really. It would detect with 99.99...? % accuracy when a memory bit had flipped, but provided no answer as to which one. The result was that if parity failed, you'd see a generic "MEMORY FAILURE" message and the system would instantly lock up.

I saw this message perhaps three times - it didn't really help much. I had other problems, but when I've had problems with memory, it's usually been due to mismatched sticks, or sticks that are strangely incompatible with a specific motherboard, etc. none of which caused a parity error. So, if it matters, spend the money and get ECC RAM to eliminate the small risk of parity error. If it doesn't, don't bother, at least not now.

Note: having more memory increases your error rate assuming a constant rate of error (per megabyte) in the memory. However, if the error rate drops as technology advances, adding more memory does not necessarily result in a higher system error rate. And based on what I've seen, this most definitely seems to be the case.

Remember this blog article about the end of RAID 5 in 2009? [zdnet.com] Come on... are you really going to think that Western Digital is going to be OK with near 100% failure of their drives in a RAID 5 array? They'll do whatever it takes to keep it working because they have to - if the error rate became anywhere near that high, their good name would be trashed because some other company (Seagate, Hitachi, etc) would do the research and pwn3rz the marketplace.

Re:The truth (5, Interesting)

Mr Z (6791) | more than 5 years ago | (#27385013)

Note: having more memory increases your error rate assuming a constant rate of error (per megabyte) in the memory. However, if the error rate drops as technology advances, adding more memory does not necessarily result in a higher system error rate. And based on what I've seen, this most definitely seems to be the case.

Actually, error rates per bit are increasing, because bits are getting smaller and fewer electrons are holding the value for your bit. An alpha particle whizzing through your RAM will take out several bits if it hits the memory array at the right angle. Previously, the bits were so large that there was a good chance the bit wouldn't flip. Now they're small enough that multiple bits might flip.

This is why I run my systems with ECC memory and background scrubbing enabled. Scrubbing is where the system actively picks up lines and proactively fixes bit-flips as a background activity. I've actually had a bitflip translate into persistent corruption on the hard drive. I don't want that again.

FWIW, I work in the embedded space architecting chips with large amounts of on-chip RAM. These chips go into various infrastructure pieces, such as cell phone towers. These days we can't sell such a part without ECC, and customers are always wanting more. We actually characterize our chip's RAM's bit-flip behavior by actively trying to cause bit-flips in a radiation-filled environment. Serious business.

Now, other errors that parity/ECC used to catch, such as signal integrity issues from mismatched components or devices pushed beyond their margins... Yeah, I can see improved technology helping that.

Re:The truth (2, Informative)

tagno25 (1518033) | more than 5 years ago | (#27385147)

An alpha particle whizzing through your RAM will take out several bits if it hits the memory array at the right angle.

Did you figure out how alpha particles cannot travel through paper but can travel through RAM?

Re:The truth (1)

cerberusss (660701) | more than 5 years ago | (#27385169)

We actually characterize our chip's RAM's bit-flip behavior by actively trying to cause bit-flips in a radiation-filled environment.

For people interested, here is a picture of such a test: radtest1.jpg [vankuik.nl] . What you see here, is the electronics placed under a glass dome. The dome is vacuumed. The arm is holding a metal dish which has an open bottom. In the dish is a californium particle source [wikipedia.org] .

I work for a semi-government space research organization as a software engineer and as part of my work, create test scripts to test our electronics with custom ASICs [wikipedia.org] for radiation hardness.

Re:The truth (0, Troll)

itwerx (165526) | more than 5 years ago | (#27385109)

"think that Western Digital is going to be OK with near 100% failure of their drives in a RAID 5 array?"

Given that WD's 5-year failure rate is ~20%, yes, they do appear to be okay with something like that.
And when you compare that with Seagate's ~2% failure rate in the same period I suspect that such pwn3rz-ation of the marketplace has already happened.

Re:The truth (1)

Mista2 (1093071) | more than 5 years ago | (#27385285)

For desktops, I probably wouldnt worry about the cost of ECC RAM, but afor a server requiring 99% or higher uptime, or lots of memory activity like a virtual server host, I would specify nothing but ECC RAM, and if the server is not able to vmotion guests off for downtime, I would also investigate RAID RAM.

RAID(?) for RAM (5, Interesting)

Xyde (415798) | more than 5 years ago | (#27384875)

With memory becoming so plentiful these days (I haven't seen many home PC's with 6 or 8GB granted, but we're getting there) it seems that a single error on a large capacity chip is getting more and more trivial. Isn't it a waste to throw away a whole DIMM? Why isn't it possible to "remap" this known-bad address, or allocate some amount of RAM for parity the way software like PAR2 works? Hard drive manufacturers already remap bad blocks on new drives. Also it seems to me that, being a solid state device, small failures in RAM aren't necessarily indicative of a failing component like bad sectors on a hard drive are. Am I missing something really obvious here or is it really just easier/cheaper to throw it away?

Re:RAID(?) for RAM (1)

1s44c (552956) | more than 5 years ago | (#27384939)

it seems that a single error on a large capacity chip is getting more and more trivial. Isn't it a waste to throw away a whole DIMM? Why isn't it possible to "remap" this known-bad address, or allocate some amount of RAM for parity the way software like PAR2 works?

The poster is talking about how spooky solar radiation or background radioactivity flips a random bit every so often. Remapping would not help at all as you don't know you have a problem until it's too late and then it won't happen in the same place again.

The only RAID level that would really help is RAID 2 but it does not do anything that ECC ram does not already do so it doesn't seem to useful either.

The only way to reduce the solar rays problem is ECC ram which is why high end servers insist on it. The question people should be asking is 'is the extra uptime worth the extra money?'

Re:RAID(?) for RAM (2, Interesting)

Rufus211 (221883) | more than 5 years ago | (#27384995)

You just described ECC scrubbing [wikipedia.org] and Chipkill [wikipedia.org] . The technology's been around for a while, but it costs >$0 to implement so most people don't bother. As with most RAS [wikipedia.org] features most people don't know anything about it, so would rather pay $50 less than have a strange feature that could end up saving them hours of downtime. At the same time if you actually know what these features are and you need them, you're probably going to be willing to shell out the money to pay for them.

Re:RAID(?) for RAM (1, Informative)

Anonymous Coward | more than 5 years ago | (#27385015)

Actually there exists at least one way to do it on Linux(TM), namely the "BadRAM Linux(TM) kernel patch":

http://rick.vanrein.org/linux/badram/index.html

Re:RAID(?) for RAM (1)

Mr Z (6791) | more than 5 years ago | (#27385051)

Actually, it's not uncommon [google.com] for large RAM arrays to have "row repair" and "column repair". The RAM array has more rows and columns than are required to provide the rated capacity. During manufacturing testing, they remap some of these to work around defects and increase yield. So, if you're still seeing faults after the production tests have mapped away the obvious faults, I think you're signing yourself up for a bit of pain.

As I recall memtest86 would output a report of the failing locations that you could give to the Linux kernel, telling it what locations to use and to avoid. [vanrein.org]

Seems like a colossal waste of time to me. If you're not concerned about performance, then it's a question of how much your time's worth. You can get 2GB for $23 [newegg.com] and probably less if you spent more than 5 seconds looking like I did. If you spend more than a couple hours futzing with your flaky system to remap all your bad RAM, even if your time is only worth minimum wage, you quickly cross the "worth it" threshold.

Joking aside... (5, Informative)

BabaChazz (917957) | more than 5 years ago | (#27384887)

First, it was not cosmic rays; memory was tested in a lead vault and showed the same error rate. Turns out to have been alpha particles emitted by the epoxy / ceramic that the memory chips were encapsulated in.

That said: Quite clearly given your experience, Vista and Ubuntu load the memory subsystem quite differently. It is possible that Vista, with its all-over-the-map program flow, is missing cache a lot more often and so is hitting DRAM harder; I don't have the background to really know. I believe that Memtest86, in order to put the most strain on memory and thus test it in the most pessimal conditions, tries to access memory in patterns that equally hit physical memory hardest. But, what I have found is that some OSs, apparently including Ubuntu, will run on memory that is marginal, memory that Memtest86 picks up as bad.

As for ECC in memory... The problem is that ECC carries a heavy performance hit on write. If you only want to write 1 byte, you still have to read in the whole QWord, change the byte, and write it back to get the ECC to recalculate correctly. It is because of that performance hit that ECC was deprecated. The problem goes away to a large extent if your cache is write-back rather than write-through; though there will be still a significant number of cases where you have to write a set of bytes that has not yet been read into cache and does not comprise a whole ECC word.

That said, it is still used on servers...

But I don't expect it will reappear on desktops any time soon. Apparently they have managed to control the alpha radiation to a great extent, and so the actual radiation-caused errors are now occurring at a much lower rate, significantly lower than software-induced BSODs.

Re:Joking aside... (2, Informative)

Ron Bennett (14590) | more than 5 years ago | (#27384965)

It is possible that Vista, with its all-over-the-map program flow, is missing cache a lot more often and so is hitting DRAM harder...

Perhaps that's another "feature" of Windows - no need for Memtest86 ... just leave Windows running for a few days with some applications running ... and if nothing crashes, the RAM is probably good.

Re:Joking aside... (-1, Redundant)

Anonymous Coward | more than 5 years ago | (#27385257)

Very clever...

Re:Joking aside... (5, Insightful)

bertok (226922) | more than 5 years ago | (#27384999)

As for ECC in memory... The problem is that ECC carries a heavy performance hit on write. If you only want to write 1 byte, you still have to read in the whole QWord, change the byte, and write it back to get the ECC to recalculate correctly. It is because of that performance hit that ECC was deprecated. The problem goes away to a large extent if your cache is write-back rather than write-through; though there will be still a significant number of cases where you have to write a set of bytes that has not yet been read into cache and does not comprise a whole ECC word.

AFAIK, on modern computer systems all memory is always written in chunks larger than a byte. I seriously doubt there's any system out there that can perform single-bit writes either in the instruction set, or physically down the bus. ECC is most certainly not "depreciated" -- all standard server memory is always ECC, I've certainly never seen anything else in practice from any major vendor.

The real issue is that ECC costs a little bit more than standard memory, including additional traces and logic in the motherboard and memory controller. The differential cost of the memory is some fixed percentage (it needs extra storage for the check bits), but the additional cost in the motherboard is some tiny fixed $ amount. Apparently for most desktop motherboard and memory controllers that few $ extra is far too much, so consumers don't really have a choice. Even if you want to pay the premium for ECC memory, you can't plug it into your desktop, because virtually none of them support it. This results in a situation where the "next step up" is a server class sytem, which is usually at least 2x the cost of the equivalent speed desktop part for reasons unrelated to the memory controller. Also, because no desktop manufacturers are buying ECC memory in bulk, it's a "rare" part, so instead of, say, 20% more expensive, it's 150% more expensive.

I've asked around for ECC motherboards before, and the answer I got was: "ECC memory is too expensive for end-users, it's an 'enterprise' part, that's why we don't support it." - Of course, it's an expensive 'enterprise' part BECAUSE the desktop manufacturers don't support it. If they did, it'd be only 20% more expensive. This is the kind of circular marketing logic that makes my brain hurt.

Re:Joking aside... (1)

Mr Z (6791) | more than 5 years ago | (#27385155)

First, it was not cosmic rays; memory was tested in a lead vault and showed the same error rate. Turns out to have been alpha particles emitted by the epoxy / ceramic that the memory chips were encapsulated in.

That was true in 1977 when bits were huge. It's not so true in 2009, when they are a few orders of magnitude smaller.

As for ECC in memory... The problem is that ECC carries a heavy performance hit on write. If you only want to write 1 byte, you still have to read in the whole QWord, change the byte, and write it back to get the ECC to recalculate correctly. It is because of that performance hit that ECC was deprecated. The problem goes away to a large extent if your cache is write-back rather than write-through; though there will be still a significant number of cases where you have to write a set of bytes that has not yet been read into cache and does not comprise a whole ECC word.

PC caches have been write-back for a long time (all the way back to the original Pentium!). And, they write-allocate, meaning that a byte write from the CPU could bring the whole line into the cache to merge with the write. (I believe this was optional in the Pentium era, but commonplace not long after.)

Re:Joking aside... (1)

Skuld-Chan (302449) | more than 5 years ago | (#27385211)

That said, it is still used on servers...

Its actually used in Desktops that have more than one physical cpu (not counting multi-core single cpu's) as well. My desktop has an Intel 5400 chipset and requires ECC memory - it has a lot of interested requirements including active cooling.

Depends (5, Interesting)

gweihir (88907) | more than 5 years ago | (#27384901)

My experience with a server that recorded about 15TB of data is something like 6 bit-errors per year that could not be traced to any source. This was a server with ECC RAM, so the problem likely occured in busses, network cards, and the like, not in RAM.

For non-ECC memory, I would strongly syggest running memtest86+ at least a day before using the system and if it gives you errors, replace the memory. I had one very persistend bit-error in a PC in a cluster, that actually reqired 2 days of memtest86+ to show up once, but did occure about once per hour for some computations. I also had one other bit-error that memtest86+ did not find, but the Linux commandline memory tester found after about 12 hours.

The problem here is that different testing/usage patterns result in different occurence probability for weak bits, i.e. bits that only sometimes fail. Any failure in memtest86+ or any other RAM tester indicates a serious problem. The absence of errors in a RAM test does not indicate the memory is necessarily fine.

That said, I do not believe memory errors have become more common on a per computer basis. RAM has become larger, but also more reliable. Of course, people participating in the stupidity called "overclocking" will see a lot more memory errors and other errors as well. But a well-designed system with quality hardware and a thourough initial test should typically not have memory issues.

However there is "quality" hardware, that gets it wrong. My ASUS board sets the timing for 2 and 4 memory modules to the values for 1 module. This resulted in stable 1 and 2 module operation, but got flaky for 4 modules. Finally I moved to ECC memory before I figuerd out that I had to manually set the correct timings. (No BIOS upgrade available that fixed this...) This board has a "professional" in its name, but apparently, "professional" does not include use of generic (Kingston, no less) memory modules. Other people have memory issues with this board as well that they could not fix this way, seems that somethimes a design just is bad or even reputed manufacturers do not spend a lot of effort to fix issues in some cases.In can only advise you to do a thourough forum-search before buying a specific mainboard.

 

my ASUS mobo (1)

YesIAmAScript (886271) | more than 5 years ago | (#27385137)

My ASUS mobo (A8N-SLI) would reduce the memory timings if I put 4 memory modules in automatically. I hated that so I used the BIOS to undo it. I ran MemTest to make sure it was okay.

Oddly, the only RAM I've ever really had problems with was some bad-ass Corsair memory I bought for my 800 FSB P4 early on. The timings in the SPD would prevent the system from booting, even if it was the only RAM in the system. I override this in the BIOS (on one of the rare occasions that it booted) and it was okay unless I cleared CMOS. After a few months I removed that RAM to send it back and put in the cheapest PC3200 RAM I could find at Fry's. That fixed it, and I altered the settings in the CMOS over time, I could overclock this RAM to the same speed as the Corsair stuff. And it would work. And if I cleared CMOS it would just slow back down instead of failing.

To Corsair's credit, they replaced my RAM, although the replacement was in the same series, it did not have the same timings as the original RAM. But at least it worked.

It's funny, if you look up DDR SRAM on wikipedia, it has pictures of essentially the Corsair RAM I used. The version number is the same as the later RAM I got that worked, the earlier stuff was v1.1 or something but otherwise looked the same. The C2 in the name is supposed to mean it is CAS latency 2 RAM, but as I mentioned, the replacement RAM I received was not actually as fast, it was CL3.

Reza Lockwood (0)

Anonymous Coward | more than 5 years ago | (#27384907)

my experience has been to never use computer memory around reza lockwood, she walks around and the whole building shakes (she's about five hundy or so) and simms and dimms get unseated. then the document that you are working on disappears. if you are like me you hate it when this happens

Windows, Linux, and motherboards (0)

Anonymous Coward | more than 5 years ago | (#27384911)

One thing I've noticed is that Linux and Windows have much different access patterns for memory.

So one OS may show problems while the other is running just fine. So... there could likely BE a problem, but it just does not show up as often in one OS or another.

These days 'memory' problems are not always caused by the ram itself. A lot of boards based on Nvidia's 680i chipset for example exhibited a problem when using all 4 slots on the board. They would run fine on 2 slots with most combinations of of dimms... but would start having issues when you use all 4.

Anyhow, I would be less worried about cosmic rays and more about the general configuration of your PC.

Ram, CPU (especially with the memory controller on chip these days), Motherboard, etc... all can create stability problems in regards to memory.

(As a side comment, both a friend and I were bitten by the 680i problem back when the boards had come out. Bummed us out to the point of refusing to use another nvidia northbridge.)

If it was really a cosmic ray (4, Funny)

circletimessquare (444983) | more than 5 years ago | (#27384913)

Then it would proba%ly alter not just one byte, b%t a chain of them. The cha%n of modified bytes would be stru%g out, in a regular patter%. Now if only there were so%e way to read memory in%a chain of bytes, as if it w%re a string, to visu%lize the cosmic ray mod%fication. hmmm...

Re:If it was really a cosmic ray (1)

taucross (1330311) | more than 5 years ago | (#27385319)

Ineetnlgisrty eougnh the cmeutpor in yuor haed can raed a fiplped bit jsut fnie.

Settings matter too (5, Informative)

Max Romantschuk (132276) | more than 5 years ago | (#27384915)

Not all memory is created equal. Memory can be bad if Memtest detects errors, or you can simply be running it at the wrong settings. Usually there are both "normal" and "performance" settings for memory on higher end motherboards, or sometimes you can tweak all sorts of cycle-level stuff manually (CAS latency etc.).

Try running your memory with the most conservative settings before you assume it's bad.

Workaround bad memory howto (linux only) (4, Informative)

gQuigs (913879) | more than 5 years ago | (#27384925)

Depending on where it fails (if it fails in a the same spot) you can relatively easily work around it and not throw out the remaining good portion of the stick. I wrote a howto..

http://gquigs.blogspot.com/2009/01/bad-memory-howto.html [blogspot.com]

I've been running on Option 3 for quite some time now. No, it's not as good as ECC, but it doesn't cost you anything.

Re:Workaround bad memory howto (linux only) (4, Informative)

m_pll (527654) | more than 5 years ago | (#27385239)

On vista you can do the same thing using bcdedit:

bcdedit /set badmemorylist 0x12345 0x23456

Parameters are page frame numbers.

It is a bit tricky (0, Offtopic)

MatchMate (1466095) | more than 5 years ago | (#27384971)

I am not sure why I keep getting free computer offer from dell? any one else getting such an offer?

Trust Memtest86 (4, Informative)

nmos (25822) | more than 5 years ago | (#27384993)

1) Do people trust a memtest86 error to mean a bad memory module or motherboard or CPU?

Well, I'd add some other possibilities such as:

Bad power supply,
Memory isn't seated properly in it's socket.
Incorrect timing set in bios.
Memory is incompatable with your motherboard.
etc..

But yeah, if memtest86 says there's a problem then there really is something wrong.

Best practice (2, Informative)

swehack (975617) | more than 5 years ago | (#27385009)

is to swap the memory modules to find out which is causing the problem, if not motherboard. Also i don't see how memory tests running inside an OS can be effective, i'd much rather boot off of a smaller system on a DVD, USB-stick or floppy to run a memory test. Dell servers have those Dell Diagnostics CDs that are very small in memory footprint just in order to run diagnostics on memory. But even they're not perfect so you often have to take memory out and see if you can reproduce errors.

Chances are you have bad RAM (1)

syousef (465911) | more than 5 years ago | (#27385031)

Memtest86 is the usual test tool for a couple of reasons (and only one of those is price).

Chances are very good you have a problem. Definitely worth checking it out.

1) Re-run the test and see if the error is in the same place. If it is, you can pretty much guarantee the RAM is bad at that position.
2) Swap the memory out and try again. You're best to do this while you still can under warranty.

Bottom line is you're not paranoid and you probably do have a problem. You can either deal with it up front or live with a compromised system that eventually bites you on the backside.

getting this trollish vibe (1)

Eil (82413) | more than 5 years ago | (#27385035)

1) Do people trust a memtest86 error to mean a bad memory module or motherboard or CPU?

Please, please tell me this is an early April fool's joke. If not, dear submitter, I hope that you're either very tired or very drunk right now because you literally just asked:

"Windows is crashing randomly and the program that I ran to test the memory is reporting errors. Does that mean the memory in my computer is bad?"

Recently I had consistent BSODs with Vista64 on a PC with 4GB; I tried memtest86 and it always failed within hours. Yet when I ran 64bit Ubuntu at 100% load and using all memory, it ran fine for days.

You should have also tried running a hacked version of OS X to serve as a tie-breaker.

Completely Stupid... (1)

Nom du Keyboard (633989) | more than 5 years ago | (#27385069)

It has been completely stupid to dispense with parity memory and ECC memory in PC's. Apple was the first to go to 8-bit memory bytes long ago (and they still cost more!) and now it seems everyone below the server level is happy playing without a net. Even GPU cards, if used for highly parallel FP calculations should have the ability to detect when a memory error has happened and signal the application to handle. Completely stupid, and beyond completely stupid, that we trust our calculations to a system now that can't even determine if it has made an error!

Use ECC in anything you care about (2, Insightful)

LukeCrawford (918758) | more than 5 years ago | (#27385081)

really, it's not that much more expensive. Search newegg for unbuffered ecc, if you are using a desktop class system that can't handle registered ram.

You wouldn't put data you care about on a hard drive without raid, would you?

Was it cosmic rays, or...? (2, Informative)

Nom du Keyboard (633989) | more than 5 years ago | (#27385095)

Was it cosmic rays, or Alpha particle decay from impure materials that was going to do in our memory soon? IIRC it was the latter.

OK (2, Insightful)

Runefox (905204) | more than 5 years ago | (#27385127)

1) Do people trust a memtest86 error to mean a bad memory module or motherboard or CPU?

Yes. I do, anyway; I've never had it report a false-positive, and it's always been one of the three (and even if it was cosmic rays, it wouldn't consistently come up bad, then, would it?). Then again, it could also mean that you could be using RAM requiring a higher voltage than what your motherboard is giving it. If it's brand-name RAM, you should look up the model number and see what voltage the RAM requires. Things like Crucial Ballistix and Corsair Dominator usually require around 2.1v.

2) When I check my email on my desktop 16GB PC next year, should I be running ECC memory?

Depends. If you're doing really important stuff then sure. ECC memory is quite a boon in that case. If you're just using your desktop for word processing and web browsing, it's a waste of money.

Memory test: the forgotten option (1)

gedeco (696368) | more than 5 years ago | (#27385139)

Most failures will appear when a pc is heavely stressed.
Combine the Mem86 test togheter with some continues running programs who are using memory, harddisk, CPU and network.

If a systems survivals this test a whole day, it's in perfect shape.

Bad what? (0)

Anonymous Coward | more than 5 years ago | (#27385165)

I have the same configuration 4GB DDR2-6400 , and Vista/Win7 would either run for days without a crash, or it would have one day where it would crash every 30 minutes. no BSOD, just a blank screen like the video card stopped working. (HCCP DVI monitor too), when I would press the reset button windows would come back but there would be blue lines all over the place, turning the monitor off and then back on would fix that.

But it wasn't until I disconnected one of the front chassis fans this all (apparently) went away. I disconnected it because it kept rattling. That had me think that either the +5v rail was overloaded on the power supply (all 4 fans were in parallel, not including the CPU, GPU and PSU fans themselves.) But it could also have been the proximity to the hard drives, as the fan disconnected was in the array in front of the hard drives. Cooling or vibration? I don't know, but if it doesn't black-screen of death to me for another two weeks I'm going to call that the culprit.

But as for the OP, only really rubbish memory is going to be picked up as bad by memtext86 and such. If you run commercial software like pccheck/thetroubleshooter you'll often find that certain hardware repeatedly fails, but works anyways. This is because the software is much more of a "stress test" that forces the hardware way to register a fail state under conditions that may never be met by the operating system.

In the day of 500$ computers, it's quite naive to assume you'd get good equipment without building it yourself, however unless you spend 2000$ on (all quality parts) hardware you won't get decent parts... you may as well buy a mac pro.

If you aren't going for ECC memory, you should be buying whatever is is the maximum performance memory for your mainboard, and stay away from rubbage boards made by biostar. Want to know if the motherboard manufacturer is any good? See if the manufacturer has updated the bios more than 3 years after it was manufactured of the previous generation board. You will usually find the rubbage boards made by biostar and MSI in eMachines/Gateway systems and virtually all low-end desktops and servers.

Are all biostar and MSI boards bad? yes in my experience. These are the brands that traditionally used the third party north and south bridge chips that don't quite work.

I'd pick Asus or Gigabyte over any other brand based on experience. These are the only two companies that produce over-engineered desktop boards that don't fail. I'm not including primary server boards, but the dell and tyan boards used in my datacenter haven't failed once, but all the biostar's have. They are the ones that don't have or support ECC memory. Go figure.

It's not funny dudes (1)

jsse (254124) | more than 5 years ago | (#27385189)

While lots of people are making fun of your seemingly paranoidic concern toward the destructive deathly cosmic ray, I'm here to support you. We get hit by cosmic ray every second, our skulls are just not thick enough to resist all those penatrations, that's why we'd lose memory from time to time. Have you ever found yourself forgot something that was just happened an hour ago? That's why. Wear tinfoil hat is the only safeguard against unexpected memory degradation.

Beside cosmic ray, other form of harmful radiation should not be neglected. The radiation emitted from computer processors and monitor can also cause a deformation of your unborn children, therefore you should buy anti-radiation suit [paipai.com] for your pregnant love ones. Remember, the harmful effect is irreversible, you don't want to take the risk.

Last but not least, reports has shown high corealation between impotence and prolonged computer use. I've friends having their balls fried by sitting near computer with 2.4GHz CPU, because the frequency is exactly the same as your microwave oven. We just can't tell how many poor dudes having their balls disabled this way, sad.

You can't be too careful. At the end of days when the street are stuffed with mindless, impotent zomies, guess who has the last laugh.

Try slowing your memory down... (1)

supersat (639745) | more than 5 years ago | (#27385197)

I've seen lots of RAM errors as the speed of memory has increased, especially with the AMD64 Hammer chips. What it usually boils down to is someone not manufacturing their components such that they truly meet their spec.

If you slow down your memory and the errors go away, it's not cosmic rays. AFAIK, cosmic rays will flip bits regardless of how fast the RAM is being run at.

Cosmic Rays and the Oort cloud (1)

upuv (1201447) | more than 5 years ago | (#27385213)

I'm old enough to remember that rubbish in the press as well. But it started before 512Meg I remember asking a clerk if my $450 4Meg sim would degrade in potentially high radiation env's. Like the crap that comes from my microwave.

For a cosmic ray to have enough energy to flip a bit of memory would be fairly impressive. Has to hit the right spot + be the right energy to stimulate a relatively large device to think it's got a fresh signal. Not too likely.

I'd be more worried about that same cosmic ray causing a DNA error and giving me cancer.

Now of course if you were design inter stellar probes you have a definite concern on your hands. Once out past the Oort cloud ( OK farther than that ) you no longer have the magnet shield of our sun. Now we are in the Cosmic ray bath. This is where the odds of a bit flip is starting to get high. Now lets add the fact that you little probe is going to be out there a LONG time. I can bet a few bucks that yah you are going to suffer from a bit flip or 7. :)

---

Oh Vista 64bit is a nightmare. 3 machines of mine have had it. ALL had major issues. back to XP 32 and ZERO issues. All of them just sing along now. My home server and a few minor devices run Ubuntu NEVER had an issue.

Am I looking forward to Windows 7? Nope. It means Win XP will really die and MS won't patch it. ( Prediction. Win 7 will suck as bad as Vista when people figure out it doesn't actually work on a EEEEEEeeeeeeeeeeePC after they install crap. ) ( Second prediction. The much touted touch screen interface additions in Win7 actually are really annoying to use. )

----

Back to the Oort cloud. Why are you sending a PC out there again?

I haven't seen faulty memory in a while. (1)

AbRASiON (589899) | more than 5 years ago | (#27385237)

The stuff is so cheap now, it only costs a tiny bit more to buy the brand name stuff so you're fine.

When it was 800$ AUD for 64mb of ram, the cheap '500$ stuff!!' was an option, sadly it was the wrong one but all we could afford back then.
If anything needs more quality control it's either hard disks or high end gaming video cards, which literally seem to burn out between 3 and 24 months nowadays :/

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...