MSI EdgeXpert (DGX Spark) - GPU POWER THROTTLING PROBLEM

eternix15d302e6

New member
Joined
May 2, 2026
Messages
1
DEVICE INFORMATION:
-------------------
Manufacturer: MSI
Product Name: MS-C931
Model: DGX Spark GB10 (MSI EdgeXpert Edition)
BIOS Version: 5.36_1.6.0
BIOS Date: 03/16/2026
Serial Number: 706-C9311-02ST2510001067
OS: Ubuntu 24.04.4 LTS
Kernel: 6.17.0-1014-nvidia

================================================================================
PROBLEM DESCRIPTION:
================================================================================

GPU is STUCK at 13W power draw instead of 250-350W (normal performance)

Current Status:
- Power Draw: 13W (should be 250-350W under load)
- Power Limit: N/A (not being reported)
- Temperature: 51°C
- GPU Utilization: 0% (even with llama-server running with 28GB VRAM allocated)

Command that fails:
"sudo nvidia-smi -pl 350"
Error: "Changing power management limit is not supported in current scope"

================================================================================
ROOT CAUSE - MISSING FIRMWARE:
================================================================================

CRITICAL - The following firmware components are NOT INSTALLED:

1. USBPD (USB Power Delivery) - v0.5.22 minimum required
Status: NOT INSTALLED
Impact: Controls power delivery to GPU - THIS IS THE MAIN PROBLEM

2. SOCFW (System on Chip Firmware) - v2.152.15 minimum required
Status: NOT INSTALLED
Impact: System stability and power management

3. EC (Embedded Controller) - v3.3.2 minimum required
Status: Version mismatch (current: 10600, expected: 3.3.2)
Impact: Power management not working correctly

4. TPM - v7.516.1 minimum required
Status: Version mismatch
Impact: Security/firmware updates

================================================================================
WHY THIS HAPPENS:
================================================================================

The USBPD (Power Delivery) firmware is missing. This firmware controls how
much power the GPU can draw. Without it, the system limits the GPU to only 13W
for safety, preventing it from reaching full performance.

This is a KNOWN ISSUE with MSI EdgeXpert DGX Spark systems.
Reference: NVIDIA Developer Forums - "DGX Spark Performance Degradation"

================================================================================
OTA STATUS:
================================================================================

Current OTA: April 2026 (OTA2604) - LATEST VERSION
Status: 13% "torn" (incomplete)
No OTA updates available

The system shows the latest OTA is installed, but critical firmware components
are still missing. This suggests a firmware update failure or version mismatch.
 
Thank you for the detailed report and system information.

Based on the symptoms described (GPU power locked at ~13W, power limit reported as N/A, and inability to set power limits), this behavior is consistent with a known issue related to incomplete or mismatched firmware components, particularly USBPD, SOCFW, and EC.

In addition, the reported OTA status showing "torn" suggests that the firmware update may not have been fully applied, which can lead to missing power delivery configuration and cause the system to fall back to a conservative power state for protection. At this stage, we recommend verifying whether the OTA update process completed successfully. If not, a re-application of the system image or firmware update may help ensure all components are properly aligned.

Meanwhile, MSI has already started the development and validation of the next BSP release aligned with OTA2604, which includes updated SOCFW, PD firmware, and EC to address this class of issues. The updated BSP is currently under validation and is targeted for release via LVFS around May, pending final verification.

We appreciate your feedback.
 
Back
Top