My desktop PC seems to have recently trapped itself in Machine Check Heck, so I come to the Cemetech hivemind for opinions on what to do now.
My desktop PC is based on a Skylake Core i7-6700K CPU, running Windows 10 with 32GB of DDR4-2133 memory and a Radeon Vega 56 GPU. Despite being fairly old at this point, the performance is satisfactory for my gaming and other needs, but yesterday it bluescreened on me while doing some light gaming and videoconferencing with a WHEA_UNCORRECTABLE_ERROR stop code. Today it's been doing that with much greater frequency (4ish times today as of this writing?). Since WHEA_UNRECOVERABLE_ERROR usually corresponds to a machine check exception indicating some kind of hardware fault and the onset seems to have been rapid and not corresponding to any particular software changes, I'm inclined to think it's time to seriously consider upgrading my 6-year-old CPU to something more modern.
To further investigate the bluescreens I loaded up a memory dump in WinDbg on another machine (an even-older Thinkpad T430s) and got the following information out:
Code:
The value of 0x10 for BUGCHECK_P1 indicates an error flagged by a device driver, in this case it was GenuineIntel.sys. I'm not entirely clear on how to to interpret multiple sources in the decoded error source table, but it looks like something triggered an NMI that was then interpreted to be a machine check exception. I also tried using errrec! to decode the individual MCE records, but the debugger spat out thousands of records that didn't seem to have anything interesting in them- since an Intel driver seems to have flagged the MCE, I'm guessing that's a huge pile of information that might be interesting to somebody who knows more about the Intel-specific bits going on, but Windows doesn't ship with that information.
At a guess, I think this is most likely to indicate power delivery problems in my system- if something in the motherboard's power delivery subsystems has deteriorated and just started going out of spec, then rapid onset of symptoms at fairly high frequency seems plausible.
Having established that it seems I have hardware problems (but please do comment if you disagree or have other ideas!), the further question is what I should do to bring the system back into good working order. The options seem to be either
Replacing the motherboard alone could be a more inexpensive option but may not actually fix my issues, and it might be difficult to get a new board that is compatible with a Skylake CPU. According to Ark, [ZHQ][12]70 chipsets support it, where my current board is based on the Z170 platform.
Browsing PCPartPicker, the few compatible boards that anybody has in stock run more than $200. There are a few compatible boards listed from various AliExpress sellers as well that run down to as low as $130, but that doesn't really seem any better after accounting for potential concerns around shipping time and product quality.
For upgrading the whole lot, there are twoish options:
For AMD, I'm thinking a Ryzen 7 5700G (8C/16T) paired with a B550 motherboard for total cost of around $660. While I could gain a little CPU performance at slightly higher cost with a 5800X (with a 100 Mhz higher boost clock and twice as much L3 cache), I like the idea of having integrated graphics available should I need them at any time.
For Alder Lake, I like the look of the i5-12600K (6+4C/16T) and a Z690 motherboard, which run just under $800 (with the motherboard accounting for the majority of the cost increase over AMD). If I wanted to go with DDR5 memory, there's only one board matching my needs and that one costs $200 more than the others- factoring in that I would also need to buy new RAM for that, it seems like far too dear a price for a pretty small improvement over DDR4.
I'm leaning towards Alder Lake right now, since the performance is a little higher than Zen 3 although power consumption and cost are also higher. But what do you think is the better option, gallant readers?
My desktop PC is based on a Skylake Core i7-6700K CPU, running Windows 10 with 32GB of DDR4-2133 memory and a Radeon Vega 56 GPU. Despite being fairly old at this point, the performance is satisfactory for my gaming and other needs, but yesterday it bluescreened on me while doing some light gaming and videoconferencing with a WHEA_UNCORRECTABLE_ERROR stop code. Today it's been doing that with much greater frequency (4ish times today as of this writing?). Since WHEA_UNRECOVERABLE_ERROR usually corresponds to a machine check exception indicating some kind of hardware fault and the onset seems to have been rapid and not corresponding to any particular software changes, I'm inclined to think it's time to seriously consider upgrading my 6-year-old CPU to something more modern.
To further investigate the bluescreens I loaded up a memory dump in WinDbg on another machine (an even-older Thinkpad T430s) and got the following information out:
Code:
Windows 10 Kernel Version 19041 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Edition build lab: 19041.1.amd64fre.vb_release.191206-1406
Machine Name:
Kernel base = 0xfffff801`44800000 PsLoadedModuleList = 0xfffff801`4542a190
Debug session time: Sat Dec 4 16:11:03.329 2021 (UTC - 8:00)
System Uptime: 0 days 0:02:02.986
Loading Kernel Symbols
...............................................................
................................................................
................................................................
.........................................
Loading User Symbols
Loading unloaded module list
.........
For analysis of this file, run !analyze -v
2: kd> !analyze
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon. Try !errrec Address of the WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000010, Error Source Type
Arg2: ffff8c0665cf9028
Arg3: ffff8c064a0a7bcc
Arg4: ffff8c064e12e1a0
Debugging Details:
------------------
BUGCHECK_CODE: 124
BUGCHECK_P1: 10
BUGCHECK_P2: ffff8c0665cf9028
BUGCHECK_P3: ffff8c064a0a7bcc
BUGCHECK_P4: ffff8c064e12e1a0
PROCESS_NAME: System
MODULE_NAME: GenuineIntel
IMAGE_NAME: GenuineIntel.sys
FAILURE_BUCKET_ID: 0x124_16_GenuineIntel__UNKNOWN_IMAGE_GenuineIntel.sys
FAILURE_ID_HASH: {37af9407-4a3e-0b08-acdd-dadffdc34c3c}
Followup: MachineOwner
---------
2: kd> !whea
Error Source Table @ fffff801454daed8
5 Error Sources
Error Source 0 @ ffff8c064a0a7b40
Notify Type : Unknown
Type : 0x10 (Invalid)
Error Count : 1
Record Count : 1
Record Length : 29f8
Error Records : wrapper @ ffff8c064a0a8000 record @ ffff8c064a0a8028
Descriptor : @ ffff8c064a0a7ba0
Length : 3cc
Max Raw Data Length : d2c
Num Records To Preallocate : 1
Max Sections Per Record : 3
Error Source ID : 0
Flags : 00000000
Error Source 1 @ ffff8c065004b920
Notify Type : MCE (INT18)
Type : 0x0 (MCE)
Error Count : 0
Record Count : 8
Record Length : de8
Error Records : wrapper @ ffff8c065005f000 record @ ffff8c065005f028
: wrapper @ ffff8c065005fde8 record @ ffff8c065005fe10
: wrapper @ ffff8c0650060bd0 record @ ffff8c0650060bf8
: wrapper @ ffff8c06500619b8 record @ ffff8c06500619e0
: wrapper @ ffff8c06500627a0 record @ ffff8c06500627c8
: wrapper @ ffff8c0650063588 record @ ffff8c06500635b0
: wrapper @ ffff8c0650064370 record @ ffff8c0650064398
: wrapper @ ffff8c0650065158 record @ ffff8c0650065180
Descriptor : @ ffff8c065004b980
Length : 3cc
Max Raw Data Length : 141
Num Records To Preallocate : 8
Max Sections Per Record : 8
Error Source ID : 1
Flags : 80000000
Error Source 2 @ ffff8c065004a920
WHEA_NOTIFICATION_DESCRIPTOR @ 0xffff8c065004a9b0
Notify Type : CMCI (Local Interrupt)
Type : 0x1 (CMC)
Error Count : 0
Record Count : 8
Record Length : de8
Error Records : wrapper @ ffff8c06500c7000 record @ ffff8c06500c7028
: wrapper @ ffff8c06500c7de8 record @ ffff8c06500c7e10
: wrapper @ ffff8c06500c8bd0 record @ ffff8c06500c8bf8
: wrapper @ ffff8c06500c99b8 record @ ffff8c06500c99e0
: wrapper @ ffff8c06500ca7a0 record @ ffff8c06500ca7c8
: wrapper @ ffff8c06500cb588 record @ ffff8c06500cb5b0
: wrapper @ ffff8c06500cc370 record @ ffff8c06500cc398
: wrapper @ ffff8c06500cd158 record @ ffff8c06500cd180
Descriptor : @ ffff8c065004a980
Length : 3cc
Max Raw Data Length : 141
Num Records To Preallocate : 8
Max Sections Per Record : 8
Error Source ID : 2
Flags : 80000000
Error Source 3 @ ffff8c0650049920
Notify Type : NMI (INT2)
Type : 0x3 (NMI)
Error Count : 0
Record Count : 1
Record Length : 6c0
Error Records : wrapper @ ffff8c06500d4720 record @ ffff8c06500d4748
Descriptor : @ ffff8c0650049980
Length : 3cc
Max Raw Data Length : 100
Num Records To Preallocate : 1
Max Sections Per Record : 3
Error Source ID : 3
Flags : 80000000
Error Source 4 @ ffff8c0650048920
Notify Type : Polled
Type : 0x7 (BOOT)
Error Count : 0
Record Count : 0
Record Length : 0
Error Records : Descriptor : @ ffff8c0650048980
Length : 3cc
Max Raw Data Length : 1000
Num Records To Preallocate : 1
Max Sections Per Record : 8
Error Source ID : 4
Flags : 80000000
The value of 0x10 for BUGCHECK_P1 indicates an error flagged by a device driver, in this case it was GenuineIntel.sys. I'm not entirely clear on how to to interpret multiple sources in the decoded error source table, but it looks like something triggered an NMI that was then interpreted to be a machine check exception. I also tried using errrec! to decode the individual MCE records, but the debugger spat out thousands of records that didn't seem to have anything interesting in them- since an Intel driver seems to have flagged the MCE, I'm guessing that's a huge pile of information that might be interesting to somebody who knows more about the Intel-specific bits going on, but Windows doesn't ship with that information.
At a guess, I think this is most likely to indicate power delivery problems in my system- if something in the motherboard's power delivery subsystems has deteriorated and just started going out of spec, then rapid onset of symptoms at fairly high frequency seems plausible.
Having established that it seems I have hardware problems (but please do comment if you disagree or have other ideas!), the further question is what I should do to bring the system back into good working order. The options seem to be either
- Replace the motherboard and hope that fixes it
- Replace the motherboard and CPU, assuming the rest of the system is okay
Replacing the motherboard alone could be a more inexpensive option but may not actually fix my issues, and it might be difficult to get a new board that is compatible with a Skylake CPU. According to Ark, [ZHQ][12]70 chipsets support it, where my current board is based on the Z170 platform.
Browsing PCPartPicker, the few compatible boards that anybody has in stock run more than $200. There are a few compatible boards listed from various AliExpress sellers as well that run down to as low as $130, but that doesn't really seem any better after accounting for potential concerns around shipping time and product quality.
For upgrading the whole lot, there are twoish options:
- AMD Zen 3 (Ryzen 5000)
- Intel Alder Lake ("12 generation Core")
- Alder Lake with DDR5
For AMD, I'm thinking a Ryzen 7 5700G (8C/16T) paired with a B550 motherboard for total cost of around $660. While I could gain a little CPU performance at slightly higher cost with a 5800X (with a 100 Mhz higher boost clock and twice as much L3 cache), I like the idea of having integrated graphics available should I need them at any time.
For Alder Lake, I like the look of the i5-12600K (6+4C/16T) and a Z690 motherboard, which run just under $800 (with the motherboard accounting for the majority of the cost increase over AMD). If I wanted to go with DDR5 memory, there's only one board matching my needs and that one costs $200 more than the others- factoring in that I would also need to buy new RAM for that, it seems like far too dear a price for a pretty small improvement over DDR4.
I'm leaning towards Alder Lake right now, since the performance is a little higher than Zen 3 although power consumption and cost are also higher. But what do you think is the better option, gallant readers?