But since the 1st of this year it's been VERY UNSTABLE and I've been pulling my hair out trying to track down the issue.
Unstable how? Do you get crashes in Unreal-Engine-based games, that are seemingly GPU-related? See
https://www.radgametools.com/oodleintel.htm
I finally bought a new win 11 to reinstall it "clean"and still unstable,
You can do a clean install at any time. Your computer specs are saved at MS. Once you go online and Microsoft recognizes your PC, their server will automatically activate Windows for you. So your system is registered with their servers once it's been activated the first time. When you buy Win11 again, you only register the same system twice, basically handing Microsoft your money without getting anything in return.
Plus, most instability will be a low-level one, meaning on a hardware or BIOS level (usually on a hardware level, if you haven't changed BIOS settings or updated it before the instability started). Then a clean install can't fix it, i'm afraid.
I may swap back the 12700 to confirm. And get intell to swap it out. HOWEVER - do I want another 14900? That lasts about a year? or what else?
Is this a 14900K and 12700K, or the non-K ones?
Anyway, a couple notes here. If you have a true 13th/14th gen CPU, then it was affected by
buggy microcode in older BIOS versions. So on an old BIOS, the CPU slowly gets grilled from spikes of excessive voltage, which eventually causes CPU degradation to the point of instability. If you were oblivious of this issue until now (despite it being featured on every major tech channel etc. for months last year) and have failed to update the BIOS, then yeah, it's very possible that your CPU was damaged by the voltage spikes by now. Plus, the 14900K, along with the 13900K, is the most affected CPU model for the degradation from voltage spikes.
Once there is degradation (basically, damage to parts of the CPU on a molecular level), the CPU becomes unstable at voltages it was originally stable at. And when you have instability at a certain voltage, you have two main ways of regaining stability: Lower the frequency and/or increase the voltage.
So, if we assume for a moment that your CPU suffered degradation, then you have a conflict:
1) To achieve stability, it would need higher voltages, but 2), It is already overpowering your cooling, and raising the voltages just makes everything much worse.
If my calculations are correct, 212°F is 100°C (we use °C here on the forum), that would be full thermal throttling. Your CPU is desperately trying to save itself from a thermal death by clocking itself down. So not only is the BIOS most likely outdated, but also, the power limits are not set appropriately for your cooling capabilities, because the cooling is completely overpowered. If you have an Arctic 360mm AIO, it should be good for 250W or so (less, if the
fan curves and airflow aren't optimized), but clearly your CPU draws even more power.
Now, i don't wanna make it sound like it's all your fault, far from it. Most of all, Intel messed up, and to a far lesser extent the board makers, by not putting such monster CPUs like the 14900K on a short enough leash. Heck, the BIOS will have presented you with a cooler selection screen on first boot (or after a BIOS update), and the "Water cooler" option would've totally maxed out the power limits to 4096W (highest possible value), and a 14900K can easily go above 300W, for short peaks even around 400W is not unheard of. Needless to say, even a 360mm AIO is completely out of its element with that amount of heat, you'd need a custom loop.
Then, Intel also messed up bigtime with a buggy microcode (code relating to how the CPU behaves), which the board makers had in their BIOS. So these voltage spikes, even if you had all the BIOS settings dialed in properly, could still wreak havoc if the BIOS wasn't updated to a version that came with fixed microcode.
Normally, to dial in the settings properly, you could go by my
Guide: How to set good power limits in the BIOS and reduce the CPU power draw for further improvements. And this will still be necessary, definitely step 1) of the guide. But step 2), we probably can't apply by now, because it's about trying to lower the power draw (by lowering the voltages), this works very well on CPUs that are still fully ok. On CPUs which have suffered degradation, this won't work at all, since they actually need higher voltages to be stable again, which would drive up the power consumption and basically make everything worse.
So step 2), we either skip or we do the opposite, but purely for testing if this helps stability, it's nothing permanent. Because you don't really want to prop up a degraded CPU with even more voltage than the 14900K already wants from factory, you want to ultimately RMA such a CPU. But we should first test what's going on with it.
So my suggestion is as follows:
1) Update to the latest BIOS version for your board, this is long overdue. You don't actually name your board model, but go to its support site, get the latest BIOS, extract it to USB, and update via M-Flash in the BIOS.
2) Go by my guide linked above, but only the first step for now. The second step, well, the new BIOS will already use higher baseline voltages than in older BIOS versions, now that the voltage spikes have been taken care of. This seems to be an attempt by MSI to stabilize somewhat degraded CPUs. So just by doing the BIOS update, and doing the first step to prevent thermal throttling, you might have improved stability.
3) Post the result of a Cinebench run, as described in the guide, showing the HWinfo Sensors screenshot (properly expanded and in °C if possible) after the 10-minute run. For the power limits, try 250W for both, i explain how to set them in the guide. If you find that the temperatures still approach 100°C, then try something lower until you can ideally stay out of the 90°Cs range entirely (high 80°Cs is ok). And then check for stability, it should have somewhat improved already, from the higher baseline voltages. That is, if we really dealing with a degraded CPU already, but from what you said, it could definitely be the case. With those voltage spikes, the CPUs can degrade within a couple months, depending on use.