16
submitted 8 months ago by [email protected] to c/[email protected]

recently there has been this problem that has been getting more frequent, my computer just randomly freezes up/blackscreens and then fails to post when i do a hard restart. this doesn't resolve itself until after i open it up and play musical chairs with the ram for a bit.

shit that i have tried:

  1. swapped the ram around to different slots. sometimes it works, sometimes it doesn't
  2. cleaned out the case
  3. wd40'd the ram pins (helped with the posting but seems to have increased crash frequency, not enough data to tell for sure)

no idea where to begin with this one, can't tell if it's a motherboard or a ram issue or something else entirely. the sticks are of differing sizes and manufacture so that may also be an issue. would give specs but the thing just died on me in the middle of posting this and i can't boot in just yet. motherboard is a supermicro x9 something server board.

top 15 comments
sorted by: hot top controversial new old
[-] [email protected] 13 points 8 months ago

Don't put WD-40 on the pins. I'd start by pulling out the sicks and cleaning the pins off with a q-tip and iso alcohol. Probably a good idea to clean out the slots now too.

Get Memtest64 and run it with both sticks. If it fails try it with each one by itself. If a stick doesn't past the test you should be able to get a new one under warrenty. Just start an RMA request and say it failed memtest64.

If its not your ram then its probably a poorly seated CPU. Remove the cooler, clean the paste off and carefully put the cooler back on without over tightening it, or tightening one side more than the other.

[-] [email protected] 7 points 8 months ago

cleaning the pins off with a q-tip and iso alcohol

i tried this at the beginning, things didn't noticeably improve so i took it to a local shop and they gave me the wd40 treatment. will try again

probably a poorly seated CPU

inshallah please let this be it

[-] [email protected] 11 points 8 months ago* (last edited 8 months ago)

It is wild to me that they put WD-40 on it. It's a lubricant, not a solvent; it will leave residue behind. Regular WD-40 shouldn't get anywhere near PC components, and the specific stuff they make for cleaning electrical contacts has a bunch of warnings and cautions that would keep me from using it on anything delicate or expensive.

[-] [email protected] 5 points 8 months ago

WD40 isn't a lubricant, it's for "Water Displacement." While as a liquid it can be used as one, it is a poor one. It's whole purpose is to cover a metal part with a hydrophobic layer. It's good at removing water from something like your sparkplugs. Maybe they thought water had gotten in and was causing issues with contact?

[-] [email protected] 3 points 8 months ago

Seconding this. Get some 90% isopropyl and clean off all that WD-40. Let it fully dry/evaporate. The only thing you should spray on your computer parts is compressed air.

[-] [email protected] 6 points 8 months ago

A few things to try out in addition to other folks' good suggestions:

  • when it happens, after a hard shutdown, unplug the power cable, press the power button to discharge anything remaining, and then plug it back in and start. See if it consistently posts after you do this. This would indicate that a component is breaking itself but resets to a temporarily working state after a proper power cycle.

  • monitor temperatures. Log them to file if possible. Overheating components might explain why workarounds only work sometimes. Maybe some of them just let the components cool down enough.

  • just leave in one stick at a time and see how it goes. You can try to narrow down whether it's a stick or a spot that's broken by trying different slots with 1 stick and different sticks in the same spot.

  • Not posting can look like a few things. Is it possible it's the video card / output breaking?

[-] [email protected] 4 points 8 months ago
  1. i've been doing this when testing each individual stick of ram, there is no real pattern, but some stick/slot combinations are more consistent than others.

  2. will try this when i get the thing to turn on.

  3. see 1

  4. how would i test/fix this? nvidia-smi was fine last i checked. would this have any correlation with the ram issues?

[-] [email protected] 3 points 8 months ago

If you've tested each stick all by itself (no others plugged in) in a few different slots and all of them have this issue, that suggests that it's not the sticks and possibly not the slots either. If it were one of those two options you'd expect to be able to find one stable single stick + slot option, as you'd think that only one would break at a time. One stick breaking or one slot (or single pair of slots).

For your graphics card, do you also have an integrated one in the CPU? If so, I'd remove your discrete card and see if it's more stable. You'd need to switch your monitor cable to a different receptacle, of course. If that's not an option, I'd come up with ways to "ping" your computer under the assumption that maybe it is posting and working but just not showing you anything. You could set up an ssh server or similar and auto-login and see whether you can still get in after one of these incidents and a hard reset

The inconsistency of the memory issue makes new think it isn't memory (no single stick at a time is stable in any slot, right?). I'd start removing more components to see if any minimal set is stable.

[-] [email protected] 4 points 8 months ago

My cooler failed not long ago, the symptoms there were similar. Computer would freeze/crash and then wouldn't turn on until it cooled off.

[-] [email protected] 4 points 8 months ago

Is it overheating maybe?

[-] [email protected] 4 points 8 months ago

This could be any component, including MB, CPU, GPU, power supply. This could be damage that temporarily fixes itself once the thing cools down again. You'll want to remove as many components as possible, and swap out the rest with alternatives, or swap your components into another computer. Maybe you know someone you can visit to swap stuff out with?

Also, have you tried running memcheck86 on the RAM? There's also other diagnostic software for other components.

Just running a stress test like a benchmark might reliably trigger the problem, so you have a reproducible way of triggering the issue instead of just waiting for it to happen.

[-] [email protected] 4 points 8 months ago

try using with single stick maybe? if it crashes try a diff one if that still crashes its most likely not ram.

[-] [email protected] 3 points 8 months ago

i was on 6 sticks, i think i have narrowed the candidates down to 3 stable sticks, 2 unstable, and 1 definitely busted

problem is the stable sticks only work in certain slots and even then uptime is not great

one of the unstable sticks is brand new, makes me think that it got destroyed by being in one of the bad slots

a big problem is that i have 16 slots for ram and it's a total pain in the ass to test all of them

[-] [email protected] 3 points 8 months ago

if sticks of ram are only working in certain slots its entirely possible the IC that controls the ram is shot.

Recently had this happen on an old dual xeon setup, rendered half of my 192GB of ram unusable and was causing problems exactly like what you’re describing.

Does the mobo show the sticks as inserted upon bootup?

[-] [email protected] 3 points 8 months ago

Are you sure it's the ram? I had a bad motherboard that did basically the same thing. I was swapping pci cards around and it would eventually work. Turned out it was the flexing of the motherboard that got it working again.

this post was submitted on 13 Nov 2023
16 points (100.0% liked)

technology

23005 readers
136 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS