Mag B550 crashing randomly

tourbu12b602a9

New member
Joined
Feb 14, 2025
Messages
8
Motherboard: Mag B550 Tomahawk MAX wifi 7C91
Processor: Ryzen 7 5800X
Ram: G.SKILL Ripjaws V Series 64GB (2 x 32GB) 288-Pin PC RAM DDR4 3200 (PC4 25600) Desktop Memory Model F4-3200C16D-64GVK
GPU: Zotac Geforce RTX 3070 ti Trinity OC

This is a new PC build. Everything on the computer works well until, after gaming for about 45mins -1hr the computer will randomly crash. I have updated all my drivers, updated windows, the motherboard is on the latest firmware. I have tried other ram and done a second clean new install of Windows 11. It always results in the same problem.

Event 46, WHEA-Logger
A fatal hardware error has occurred.
Component: Memory
Error Source: Machine Check Exception
The details view of this entry contains further information.

I have tried everything I can think of to fix this, and I am beginning to think I got a bad motherboard.
 
I have tried other ram and done a second clean new install of Windows 11. It always results in the same problem.

That means it's a low-level problem that has to be solved on a hardware or BIOS settings level, nothing you can do in Windows will solve it.

I have tried everything I can think of to fix this, and I am beginning to think I got a bad motherboard.

Please list exactly what you tried. Don't jump to conclusions about the cause. That has to be found out by doing methodical troubleshooting. If you just "think" something is the cause, and you RMA it, you can end up wasting time and resources if it wasn't responsible.

Please also list your PSU model and other components like CPU cooler and drives (SSD etc.). Everything inside your PC. Also, if you can, please show a photo of your system, maybe there's something wrong that can be spotted on there.
 
Thanks for your reply. I appreciate it.
The computer runs great until it has this memory crash. and it only happens when gaming. the computer can run for hours otherwise. I have the dump files and can send them later if needed.
updated the bios to AMI BIOS7C91v291(Beta version)2024-09-09, completed all Windows updates, Completed all MSI center updates
(note: One time when restarting the computer, I got a windows notification that Windows defender was preventing a RAM update from happening, I disabled that feature, but could never find the update to run it again.)

I have reseated the ram sticks, tried out 2 sets that were both new. the first set was
G.SKILL TridentZ RGB Series 64GB (2 x 32GB) 288-Pin PC RAM DDR4 4000 (PC4 32000) Desktop Memory Model F4-4000C18D-64GTZR
but when the problem started I just purchased a second set.
I have tested the RAM with Memtest86
I have the ram installed in DimmA2 and DimmB2
I have tested both single stick ram in DimmA2
The problem persisted with every configuration.

I believe that I have all overclock settings turned off. But I am not too sure about that. It is set to Simple (not advanced settings).

I can send a picture later with my full build in place.
I have a PC Part Picker list here. https://pcpartpicker.com/list/JyRLFZ
PC Build.jpg
 
G.SKILL TridentZ RGB Series 64GB (2 x 32GB) 288-Pin PC RAM DDR4 4000 (PC4 32000) Desktop Memory Model F4-4000C18D-64GTZR
Ram: G.SKILL Ripjaws V Series 64GB (2 x 32GB) 288-Pin PC RAM DDR4 3200 (PC4 25600) Desktop Memory Model F4-3200C16D-64GVK

Well, which one is it then? Of course, anything much above DDR4-3600 is pretty much a waste, since at some point above that you would have to run the memory controller on a divider, which is bad for performance. So even with a DDR4-4000 kit, you'd stay at the sweet spot of DDR4-3600 instead of using the A-XMP DDR4-4000 profile which would ultimately perform worse. Also see point 4) of my RAM thread. Also see point 5), Memtest86 is good for a basic test (seeing if a module is defective, and basic stability), but TestMem5 in Windows will really tell you if your RAM is stable or not.

The computer runs great until it has this memory crash. and it only happens when gaming.

It can point to several things. First, the 3070 Ti, make sure you are using two seperate cables from the PSU, not one cable with a daisy-chained second plug on it. Then, seeing how it happens after an hour or so of gaming, it can be that the GPU ends up heating up the RAM to the point where it becomes unstable. If the airflow through the case is not very good, then 300W of heat from GPU can do that over time. See this video where they complain about horrible thermals on an MSI pre-built, which i believe is using the same case model. The front is definitely a bit more sealed than you'd want with high-power-draw hardware, ideally you want a case with a mesh front for much improved airflow. So we would have to look at the thermals under some gaming load. RAM stability can suffer when it becomes too hot. It might not even show in normal stress tests when the GPU is not running.

First i'd like to see how the system looks, where you put the radiator and all the rest.
 
Well, which one is it then? Of course, anything much above DDR4-3600 is pretty much a waste, since at some point above that you would have to run the memory controller on a divider, which is bad for performance. So even with a DDR4-4000 kit, you'd stay at the sweet spot of DDR4-3600 instead of using the A-XMP DDR4-4000 profile which would ultimately perform worse. Also see point 4) of my RAM thread. Also see point 5), Memtest86 is good for a basic test (seeing if a module is defective, and basic stability), but TestMem5 in Windows will really tell you if your RAM is stable or not.



It can point to several things. First, the 3070 Ti, make sure you are using two seperate cables from the PSU, not one cable with a daisy-chained second plug on it. Then, seeing how it happens after an hour or so of gaming, it can be that the GPU ends up heating up the RAM to the point where it becomes unstable. If the airflow through the case is not very good, then 300W of heat from GPU can do that over time. See this video where they complain about horrible thermals on an MSI pre-built, which i believe is using the same case model. The front is definitely a bit more sealed than you'd want with high-power-draw hardware, ideally you want a case with a mesh front for much improved airflow. So we would have to look at the thermals under some gaming load. RAM stability can suffer when it becomes too hot. It might not even show in normal stress tests when the GPU is not running.

First i'd like to see how the system looks, where you put the radiator and all the rest.

Thanks for the reply. I am us the G.SKILL Ripjaws V Series 64GB (2 x 32GB) 288-Pin PC RAM DDR4 3200 (PC4 25600) Desktop Memory Model F4-3200C16D-64GVK in the machine currently with all these tests.

Here is the case with Fan directions, I will check out your links.
1739924739336.png


The Computer temperatures don't seem crazy high, Here is the MSI Center monitoring while playing Marvel Rivals for 20 minutes, It has crashed while playing this game many times.
1739924926405.png
 
MSI Center is not good for monitoring, use HWinfo64, much better. You want to see all the sensors, and you want current/min/max/avg values for each. Open "Sensors", then expand all sensors by clicking on the little <--> arrows on the bottom, also expand the columns of the sensors a bit so everything can be read. Make it three big columns of sensors (or four, if the screen resolution is high enough). In the end, it should be a screenshot with all the sensors visible at once, like this:

yes.png


Make sure the power plan in Windows is on "Balanced" and just leave the PC running in idle for a bit with the sensors open, to get some baseline numbers. Then you can play a game for maybe 20-30 minutes or so (before it can crash) and take a screenshot of the HWinfo sensor window.

Your setup actually doesn't look too bad at first glance, but we'll have to see about the temperatures. The fan curves may need optimizing too.
 
MSI Center is not good for monitoring, use HWinfo64, much better. You want to see all the sensors, and you want current/min/max/avg values for each. Open "Sensors", then expand all sensors by clicking on the little <--> arrows on the bottom, also expand the columns of the sensors a bit so everything can be read. Make it three big columns of sensors (or four, if the screen resolution is high enough). In the end, it should be a screenshot with all the sensors visible at once, like this:

yes.png


Make sure the power plan in Windows is on "Balanced" and just leave the PC running in idle for a bit with the sensors open, to get some baseline numbers. Then you can play a game for maybe 20-30 minutes or so (before it can crash) and take a screenshot of the HWinfo sensor window.

Your setup actually doesn't look too bad at first glance, but we'll have to see about the temperatures. The fan curves may need optimizing too.

Thank you again for you help, I really appreciate it. I wonder about the fans, they seem to be all over the place. I also wonder about GPU like you said before, Its fans are pointed down, and don't have a separate fan to assist getting rid of the heat in that direction.
I have set the Windows power plan to Balanced.
Here is the baseline Run,
1739972093178.png


And Here is the gaming for 30 minutes. The computer didn't crash.
1739972187750.png
 
Your fan curves (for the system fans) seem non-existant, they are not changing speeds depending on CPU temperature. So right off that bat that's not what you want. You should set proper fan curves, right in the BIOS. The CPU_FAN header, which i assume the two radiator fans are on, reports somewhat of a fan curve, but even that could go lower in idle. For my system, in idle, i have all fans around 500 RPM or so.

Your RAM modules don't have temperature sensors, but i can see it's only running at the safe first-boot profile of DDR4-2666, not at DDR4-3200. That should actually make things super-stable. So i'm not convinced this is a RAM-related instability, it might only be the symptom, not the cause.

Your temperatures don't look too bad, but of course, this particular game only had the GPU power consumption up to 109W peak, which is a bit more than a third of what it could draw if it was fully loaded. So either that game wasn't very demanding, or you use VSync/G-SYNC (which is not a bad idea at all, of course). As for the cause of it, so far we're none the wiser though.

I would suggest to test things seperately in an effort to narrow it down.

For the RAM, test with TestMem5 (1usmusv3 profile), which i link under 5) of my RAM thread.

For the CPU, test with OCCT, https://www.ocbase.com/download

For the GPU, test with FurMark 2, https://www.geeks3d.com/furmark/downloads/

Up to an hour of each, or whatever previously was the longest time until a crash.
 
Your RAM modules don't have temperature sensors, but i can see it's only running at the safe first-boot profile of DDR4-2666, not at DDR4-3200. That should actually make things super-stable.

It should, but not with a DRAM voltage of 1.188V only.
Anything less than 1.20V for DDR4 leads to stability issues.
Especially for 64GB memory ...
 
That should just be a sensor inaccuracy, sometimes it reads a bit higher, on other boards a bit lower, i've seen it a lot of times. Usually the RAM should get 1.2V in the end. He can of course set DRAM Voltage so that it reads 1.2V dead in the BIOS (or enable A-XMP), but i doubt that this will solve the entire issue. Maybe worth a try, especially enabling XMP, because it's so easy to test, so why not.
 
That should just be a sensor inaccuracy

If a sensor inaccuracy is higher than 0.1V that sensor is worthless.
In any case, even at 2666 a voltage of 1.20V only is not enough for 64GB.
Not enough for CPU IMC anyway (if XMP is disabled that's the voltage used on the IMC level).
 
Your fan curves (for the system fans) seem non-existant, they are not changing speeds depending on CPU temperature. So right off that bat that's not what you want. You should set proper fan curves, right in the BIOS. The CPU_FAN header, which i assume the two radiator fans are on, reports somewhat of a fan curve, but even that could go lower in idle. For my system, in idle, i have all fans around 500 RPM or so.

Your RAM modules don't have temperature sensors, but i can see it's only running at the safe first-boot profile of DDR4-2666, not at DDR4-3200. That should actually make things super-stable. So i'm not convinced this is a RAM-related instability, it might only be the symptom, not the cause.

Your temperatures don't look too bad, but of course, this particular game only had the GPU power consumption up to 109W peak, which is a bit more than a third of what it could draw if it was fully loaded. So either that game wasn't very demanding, or you use VSync/G-SYNC (which is not a bad idea at all, of course). As for the cause of it, so far we're none the wiser though.

I would suggest to test things seperately in an effort to narrow it down.

For the RAM, test with TestMem5 (1usmusv3 profile), which i link under 5) of my RAM thread.

For the CPU, test with OCCT, https://www.ocbase.com/download

For the GPU, test with FurMark 2, https://www.geeks3d.com/furmark/downloads/

Up to an hour of each, or whatever previously was the longest time until a crash.
Thank you for these suggestions, and I will look into them. But first I must admit that I made an error in my reporting. I took off the glass side panel for the pictures and didn't replace it. So my above readings are not accurate. Here is a new reading while playing No Mans Sky for about a half hour with my glass panel in place. (this is the game that I play the most and where I get most of my crashes)
1739979657760.png
 
A tip.
Sometimes Windows gives us very generic information that doesn't really shed any light on the problem. However, combining these errors with the results obtained from a memory dump makes it easier to validate the issue.

I would try to get an SSD and redo the installation without drivers to see what happens. One factor that no one takes into account sometimes is Microsoft Defender's Kernel and Memory Protection. It generates kernel errors when a game or application tries to execute something out of context in memory and kernel areas that shouldn't be executed. CoD BoP 6 introduced an anti-cheat that causes problems with Kernel Protection enabled.

Use this application to read the memory dumps that are allocated in your Windows, to determine which file contributed to the error.

BSOD Viewer - https://www.nirsoft.net/utils/blue_screen_view.html.

I hope this helps.
 
Here is a new reading while playing No Mans Sky for about a half hour with my glass panel in place. (this is the game that I play the most and where I get most of my crashes)

Ok, but it's not fundamentally different, similar picture with a bit higher temperatures everywhere. Still nothing where i'd say it's critical.

So, i'd get the fan curves properly set, then do some testing, maybe enable XMP first to get that eventuality out of the way. If it still crashes at DDR4-3200 @ 1.35V, then do the rest of the testing.
 
Hello,

Interesting issue, disable REBAR support (Resizable BAR) in your motherboard BIOS and see how it goes.
This fits with the symptoms, long game play causing crashes and the era of the 3070Ti.
 
Ok, but it's not fundamentally different, similar picture with a bit higher temperatures everywhere. Still nothing where i'd say it's critical.

So, i'd get the fan curves properly set, then do some testing, maybe enable XMP first to get that eventuality out of the way. If it still crashes at DDR4-3200 @ 1.35V, then do the rest of the testing.
I have to admit, I am not the most skilled person in the BIOS, I have clicked the XMP settings button, but am not sure how to set Fan curves. I saved and exited with XMP turned on. Let me know what i need to do for the Fan Curves.
1739982548012.png

Here is the new baseline with XMP turned on.
1739982521764.png



A tip.
Sometimes Windows gives us very generic information that doesn't really shed any light on the problem. However, combining these errors with the results obtained from a memory dump makes it easier to validate the issue.

I would try to get an SSD and redo the installation without drivers to see what happens. One factor that no one takes into account sometimes is Microsoft Defender's Kernel and Memory Protection. It generates kernel errors when a game or application tries to execute something out of context in memory and kernel areas that shouldn't be executed. CoD BoP 6 introduced an anti-cheat that causes problems with Kernel Protection enabled.

Use this application to read the memory dumps that are allocated in your Windows, to determine which file contributed to the error.

BSOD Viewer - https://www.nirsoft.net/utils/blue_screen_view.html.

I hope this helps.
I was curious about this earlier in the troubleshooting and have disabled the Windows Defender Memory Protection already. I got specific errors about this and I thought I mentioned that above. I do have the mini dump file from yesterday if that is helpful. I downloaded the Windows dump file viewer, but I would be lying if I said that I completely understand what it is saying.

But the thing that stuck out to me was that the dump file said it was related to:
HYPERVISOR_ERROR (20001)
The hypervisor has encountered a fatal error.
Arguments:
Arg1: 0000000000000026
Arg2: 0000000000000000
Arg3: 00007ffb9b8f9d96
Arg4: 000000ba470ff2c0
 

Attachments

  • 1739982607127.png
    1739982607127.png
    86.6 KB · Views: 25
Hello,

Interesting issue, disable REBAR support (Resizable BAR) in your motherboard BIOS and see how it goes.
This fits with the symptoms, long game play causing crashes and the era of the 3070Ti.
Thank you for assisting with trying to figure out this problem. I disabled REBAR support in the Bios and the game (No Mans Sky) crashed in about a half hour.

The error in Windows Event viewer is the same as listed above. the dump file lists:
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffffc08c71197028, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 00000000bc000800, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000001010135, Low order 32-bits of the MCi_STATUS value.

and later in the dump file:
MODULE_NAME: AuthenticAMD
IMAGE_NAME: AuthenticAMD.sys
STACK_COMMAND: .process /r /p 0xffffc08c93571080; .thread 0xffffc08c92c49080 ; kb
FAILURE_BUCKET_ID: 0x124_0_AuthenticAMD_MEMORY__UNKNOWN_FATAL_IMAGE_AuthenticAMD.sys
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {b0905187-9dbc-d607-4dc5-8630b9eddb7f}

Both Dump files state windows 10, even though my computer is on Windows 11.
 
I have to admit, I am not the most skilled person in the BIOS, I have clicked the XMP settings button, but am not sure how to set Fan curves. I saved and exited with XMP turned on. Let me know what i need to do for the Fan Curves.

See my Guide: How to set up a fan curve in the BIOS. It's done on the bottom left in "Hardware Monitoe" there, of if you press F7 to set the BIOS to advanced view, it will be on the bottom right.

XMP is active now, and since you still get the crash, regrettably (and somewhat predictably), that wasn't it.
 
Here is the new baseline with XMP turned on.

I did not ask for XMP enabled.
I've asked for a bit of extra "juice" for CPU IMC and memory at 2666.
Try 1.25V for memory, VDDQ and VDDIO (manually set this value for them).
Also, I know you said a second clean new install of Windows 11, but another sfc /scannow won't hurt.
 
Thank you for assisting with trying to figure out this problem. I disabled REBAR support in the Bios and the game (No Mans Sky) crashed in about a half hour.

Are you able to recall what was going on when / just before the game crashed?

At the moment there are too many variables in play with the CPU and GFX card, both have a form of dynamic overclocking and if not working correctly can cause the game to crash. Both issues also operate in a way that means max power and heat stress testing can all pass with no issues. This is because at a constant load the CPU and GPU will also be at a constant frequency / voltage whereas during a gaming environment both are changing many times a second, the CPU/GPU frequencies and voltages will be all over the place.

So for the purpose of fault finding it's best to control the CPU and GPU.
- CPU is easy, you simply fix the frequency to the base clock of 3.8GHz. (In the BIOS set the CPU ratio to x38) and try gaming again.

- GPU swap out or limit the boost clock using MSI afterburner software.
once installed -500MHz from the GPU, make sure you click the apply button.
 
Back
Top