View Full Version : Memory RAM Diagnostic tool?
CyberDyneSystems
16th of June 2009 (Tue), 15:14
Skynet is sick..
It looks like either a memory problem or CPU.
The system is a Dual Opteron NForce Pro 2200 board (SuperMico H8 dce ) running a pair of Opteron Dual Core 275s
http://www.supermicro.com/Aplus/motherboard/Opteron/nForce/H8DCE-HTe.cfm
8 GB of DDR 400 ECC ( 8x 1GB sticks)
I am getting a system error pop up, fairly regularly. (though not this afternoon for some reason)
It has caused Firefox to crash, and the system to hard re-boot a few times.
I've looked up the error code and gotten mixed info from MS etc..
Some seem to indicate CPU trouble, some memory.
I'm leaning towards memory..
It turns out the nature of the pop up is being generated by the systems ECC.
If I turn off ECC, the error code is no longer generated, and I get no pop up.. (of course this only means it's not being reported,. the problem is still there)
My Question:
Any recommendations for a memory test program that will be able to tell me Which of the 8 stix of RAM is dieing?
With the problem being intermittent, I can't very well use trial and error to remove 1 of 8 sticks one by one.
Thanks.
overclicker
16th of June 2009 (Tue), 15:31
Memtest86+
http://www.memtest.org/
CyberDyneSystems
16th of June 2009 (Tue), 16:47
Ahh thanks that was exactly what I was looking for!
I'll let you know how it works out!
Moppie
16th of June 2009 (Tue), 19:59
Might be time for a skynet upgrade?
Dual i7 prehaps...........
MaxxuM
17th of June 2009 (Wed), 01:03
Might be time for a skynet upgrade?
Dual i7 prehaps...........
Ahhh, don't mess around. Just go get a Quad Dunnington (4 x 6 = 24 True Cores) system. Here's an example: Link (http://www.youtube.com/watch?v=tysVO6Am9Dw&feature=related).
joove
17th of June 2009 (Wed), 01:12
like was said before, memtest86. Or the + variant. I find that the best version of this is to actually burn an "Ultimate boot disk" http://www.ultimatebootcd.com/ and use that to boot.
While I have used these to diagnose memory in the past, they cannot tell you (as far as I know) which stick is bad. Simplest thing is to start with one stick and keep adding them. The tests should be run for a few hours as per recommendation (although if you are lucky you will see failing tests in less than 10 minutes).
Could also be a disk issue as the VM page reads could fail if your disk is going bad. If SMART is clean, memory is more likely to be the culprit.
It has tools for disk checks, cpu stress tests, mem tests, and several others. All available without worrying about windows being in the picture.
good luck!
Duncan Frenz
17th of June 2009 (Wed), 01:20
To complicate things further, you might want to entertain the thought of a bad memory slot/controller.:confused:
CyberDyneSystems
18th of June 2009 (Thu), 00:58
Well of course it's been running fine with no errors these past few days, but I have my boot disk burned and ready to go..
The first run came up empty,.. but... ???
When I have time I'll try it out, even if things seem fine.
CyberDyneSystems
21st of June 2009 (Sun), 11:56
Well, I ran memtest after the errors returned, and got tons of errors,.
It tells me that those errors are in a certain range, but I can;'t find a way to interpret that range to what memory slot it comes from..
If I had 2 or 4 sticks of RAM i'd be more inclined to trial and error based on removing sticks,. but man,. with 8 sticks in 8 slots,. and the need to use at least 2 at all times,. this is going to be one hell of a process of elimination! I could be at this for weeks..
GregSteer
21st of June 2009 (Sun), 12:47
You can tell by the memory range indicated.
Each stick is 1Gb - first stick is up to 1024mb , next stick is 1024mb-2048mb etc until you hit 8192mb.
AFAIK memtest86+ runs sequentially through the ram slots so you should be able to locate the duff one (or more) - leave that one and it's matching partner in and retest for errors.
Personally I would take the time to test each pair individually overnight - that "should" show up any errors and confirm all RAM safe or duff.
smcclelland
22nd of June 2009 (Mon), 23:03
I'd suggest the following steps when doing memtest:
1. Find the range (the stick/slot) that memtest86 is reporting and pull that stick.
2. Take the stick and rotate it through each slot (1,3,5 on i7's or any slot on ddr2).
3. If the stick throws errors in all slots, it's a bad stick. If it only throws errors in a specific slot it usually points to a bad dimm slot or a failing/dead dimm slot.
It's not a fun process but it will certainly limit any headaches for RMA or replacement.
CyberDyneSystems
24th of June 2009 (Wed), 20:56
You can tell by the memory range indicated.
Each stick is 1Gb - first stick is up to 1024mb , next stick is 1024mb-2048mb etc until you hit 8192mb.
AFAIK memtest86+ runs sequentially through the ram slots so you should be able to locate the duff one (or more) - leave that one and it's matching partner in and retest for errors.
Personally I would take the time to test each pair individually overnight - that "should" show up any errors and confirm all RAM safe or duff.
I took a guess based on my Mobo manual what slot it was.. The range reported by the test was indeed limited to likely one stick, (between 5700MB and 6200MB) so this was the 2nd to last stick of 8.
I pulled out the 7th and 8th stick for now,. I've not re-run the test yet, but so far this is the first time in days I've had the PC on without the error messages.. (or crashes!)
Being the 7th stick, it also makes sense that it was not a constant issue, the intermittent aspect may have simply been that how often was I hitting that 7th GB of Ram..
Now I only need to confirm I am no longer getting errors, then do a little swapping to see if it's the stick or the slot.
Thanks folks!
GregSteer
26th of June 2009 (Fri), 03:43
Excellent news, a stable system is a happy system.
For the nosey and uninformed amongst us what is Skynet/what is she doing? (exlcuding trying to take over the world!)
In2Photos
26th of June 2009 (Fri), 08:52
Excellent news, a stable system is a happy system.
For the nosey and uninformed amongst us what is Skynet/what is she doing? (exlcuding trying to take over the world!)
http://photography-on-the.net/forum/showpost.php?p=6838994&postcount=36
CyberDyneSystems
26th of June 2009 (Fri), 20:00
:lol: Yes, when SkyNet is not using all it's CPU power to plot the end of mankind, it spares a few cycles to help me edit my photos :-)
Bad news, I got another error message,.
I removed two more sticks, now I'm down to 4GB installed in the firs two slots of each of the two CPUs banks.
Could be I messed up which slots was which, but I think it's more likely that I have a larger moving problem..
Opterons have the memory controller ob board,. so I have two memory controllers, one for each four slots of RAM.
If I get another error message now, it likely means one of the CPU based memory controllers is on it's way out, or a Mobo issue..
Fingers crossed that it's one of the two sticks I just removed,.
based on the addresses Memtest spat out, it HAS to be one of the four total I've removed. The only question I had was which bank was counted first.. with 2 GB removed from the end of both banks, it does not matter now.. it had to be one of those 4.
Moppie
26th of June 2009 (Fri), 20:06
Dual i7.........
dual i7.........
dual i7............
dual i7.......
Can you hear it calling you?
strmrdr
27th of June 2009 (Sat), 06:26
check the motherboard capacitors
It is the most common cause for moving memory errors that seem to jump from module to module.
A flacky power supply can also cause it but it is less likely.
http://www.badcaps.net/pages.php?vid=5
GregSteer
28th of June 2009 (Sun), 05:10
Ack CyberDyne - I had that problem with a duff AMD X2 a year or two ago, took a while to diagnose the cpu problem and ended up with a board/cpu/ram upgrade as I couldn't be bothered waiting for warranty replacement parts.
You could try running cpu tests with different known working RAM, there are three on the Ultimate Boot CD mentioned previously by another poster, no errors should show on these tests. But that still doesn't prove the motherboard innocent either, do the capacitor check as strmrdr mentioned, any signs of popping (leakage from the top of the capacitors or severely domed tops [they should all be flat]) would indicate a failure.
sas8888
28th of June 2009 (Sun), 13:57
Have been looking for something like that thanks
vBulletin® v3.6.12, Copyright ©2000-2012, Jelsoft Enterprises Ltd.