MS-C931 USB-C PD FW update fails (1 -> 502), CX7 PCIe power/hotplug instability

p_noc15c302e3

New member
Joined
May 8, 2026
Messages
2
Platform:
- Model: MSI MS-C931
- OS: Ubuntu 24.04
- Kernel: 6.17.0-1014-nvidia
- fwupd: 1.9.34 (runtime/compile)
- Node: edgexpert-b6b5 (10.0.2.204)

Issue summary:
- USB-C PD controller firmware update from LVFS repeatedly fails.
- ConnectX-7 devices initialize at boot, then are removed after hotplug/cable-removal event.
- Multi-node NIC path is unstable/unusable due to PCIe/power behavior.

fwupd results:
- Embedded Controller update: success (to 10600)
- SoC/UEFI update: success (to 10700)
- USB-C PD FW Controller update: FAIL
- Device ID: c1e32194292eae35c64314ee9b1d9690d6142c76
- GUID: fff25056-e175-45e2-bb1b-23de59689cff
- Target version: 502
- Current version remains: 1
- Error: "failed to run update on reboot: expected 502 and got 1"
- LVFS provides only this single PD release (no intermediate versions).

Kernel evidence (with CX7 cables unplugged):
- mlx5_pcie_event: "Detected insufficient power on the PCIe slot (27W)" on all CX7 functions
- cx7-pcie-hotplug MTKP0001:00: "Cable removal"
- AER correctable physical-layer errors RxErr on root ports:
- 0000:00:00.0
- 0002:00:00.0

PCI behavior:
- Early boot: CX7 endpoints (15b3:1021) are present and mlx5 initializes.
- After hotplug event: lspci no longer shows Mellanox endpoints; only Realtek NIC remains.

Request:
1) Please provide a validated recovery/update procedure for PD controller FW when 1->502 capsule does not apply.
2) Please provide any required BIOS/EC settings and/or offline/vendor updater.
3) Please advise if a specific fwupd version (e.g., 1.9.31 per LVFS test metadata) is required.
4) Please advise corrective action for cx7-pcie-hotplug/AER RxErr/power warning sequence.
 
Hi, thank you for the detailed report. We have reviewed your findings and would like to address each issue directly.

Q1 & Q2 — USB-C PD Controller FW Update Failure (fwupd / LVFS: 1 → 502) + Required fwupd Version
The root cause of this failure is a firmware bundle dependency issue, not a fwupd version issue. Attempting to update the PD firmware independently via fwupdmgr will not succeed, as it requires the EC and GOP firmware to be at the corresponding compatible versions first. If these prerequisites are not met, the UEFI capsule update will be rejected after reboot, resulting in the version check failure: “expected 502 and got 1.”
The recommended and validated procedure is a full OS reinstall + OOBE update:
Download the latest DGX OS image from the MSI EdgeXpert official support page:
https://ipc.msi.com/product_download/Industrial-Computer-Box-PC/AI-Supercomputer/EdgeXpert-MS-C931
Follow the DGX OS Installation Guide (also available on the same page) to perform a clean reinstall.
After the OS is installed, go through the OOBE update flow, which will apply all firmware components — EC, GOP, SoC, and PD Controller — in the correct order and bundle.
The latest PD firmware version is 5.07. After completing the OOBE update flow, verify via:
fwupdmgr get-devices
Important notes during update:
- Keep the unit connected to the official power adapter throughout the entire process.
- Do not connect or disconnect any USB-C devices during firmware update.
- Do not force-reboot or power off during the update sequence.

Q3 — ConnectX-7 Disappears from PCIe After Cable Removal / Hotplug Event
This is expected and intentional behavior. The cx7-pcie-hotplug driver (MTKP0001) is designed to disable the PCIe link and trigger PCIe device removal when no CX7 cable is connected, in order to reduce power consumption. This is part of the ConnectX-7 Hot-Plug Power Saving feature included in DGX OS GA2 OTA2 (up to ~18W reduction: 19W → 1W idle).
After cable removal, lspci will no longer show Mellanox endpoints — this is correct. When the cable is reconnected, the device will re-appear automatically.
To manually test and verify the hotplug behavior:
# After connecting cable:
/opt/nvidia/dgx-spark-mlnx-hotplug/mtk-hotplug-handler.sh plug-in
lspci | grep Mellanox # Should show ConnectX-7 endpoints
# After removing cable:
/opt/nvidia/dgx-spark-mlnx-hotplug/mtk-hotplug-handler.sh removal
lspci | grep Mellanox # Expected: no output
If the CX7 device does not re-appear after reconnecting the cable, this indicates the DGX OS GA2 OTA2 has not been applied. Please proceed with the OS reinstall described above.

Q4 — PCIe 27W Insufficient Power Warning + AER RxErr Correctable Errors
These kernel messages are a known benign behavior present on all GB10/DGX Spark-based units and have been widely observed in the community. They do not indicate a hardware failure and do not affect normal system operation.
The underlying reason is that when no CX7 cable is connected, the PCIe slot power budget (27W reported by the controller) falls below the CX7 active-link threshold, triggering the warning. The AER RxErr correctable errors on root ports 0000:00:00.0 and 0002:00:00.0 are associated with this same link state transition.
After installing DGX OS GA2 OTA2, the hot-plug power saving feature reduces CX7 idle consumption to ~1W, which resolves or significantly reduces these warnings.
We strongly recommend performing the full OS reinstall + OOBE update as the single procedure that resolves all four reported issues at once.
Please let us know if you need further assistance after completing the update.
 
Back
Top