What causes "Windows DirectML model runs on CPU instead of GPU"?

**Environment:** Windows 10/11 running [ONNX Runtime](/tools/onnx-runtime) with [DirectML](/tools/directml) provider — typical for AMD/Intel GPUs and integrated graphics where CUDA isn't available. **Severity: medium** — works, but at a fraction of expected speed. - `onnxruntime` (CPU-only) was installed instead of `onnxruntime-directml` - DmlExecutionProvider isn't first in the providers list — ORT picks the highest-priority one - Model uses fp16 ops that the GPU's DirectML driver doesn't support; ORT silently falls back - Old GPU driver predates DirectML 1.13 features the model uses - Mixed environment with both `onnxruntime` and `onnxruntime-directml` installed; pip resolves to the wrong one

Windows DirectML model runs on CPU instead of GPU — fix and explanation

Q: How do you fix "Windows DirectML model runs on CPU instead of GPU"?

**1. Install the DirectML build (and only the DirectML build):** ```powershell pip uninstall -y onnxruntime onnxruntime-gpu onnxruntime-directml pip install onnxruntime-directml ``` **2. Force DmlExecutionProvider first:** ```python import onnxruntime as ort sess = ort.InferenceSession( "model.onnx", providers=[ ("DmlExecutionProvider", {"device_id": 0}), "CPUExecutionProvider" ] ) print(sess.get_providers()) # should show DmlExecutionProvider first ``` **3. Check the GPU is actually picked up:** ```powershell # In another window while inference runs Get-Counter '\GPU Engine(*engtype_3D)\Utilization Percentage' ``` If GPU stays at 0%, DML isn't being used. **4. Update the GPU driver** — DirectML rides on DXGI/D3D12. AMD Adrenalin or Intel Arc Control should be the latest stable. **5. Convert fp16-only ops to fp32** if a specific operator is unsupported: ```python from onnxruntime.transformers.float16 import convert_float_to_float16 # Inverse: convert specific ops back to fp32 ``` **6. Verify the wheel target:** ```powershell python -c "import onnxruntime; print(onnxruntime.get_available_providers())" # Must include 'DmlExecutionProvider' ```

Cause

Environment: Windows 10/11 running ONNX Runtime with DirectML provider — typical for AMD/Intel GPUs and integrated graphics where CUDA isn't available.

Severity: medium — works, but at a fraction of expected speed.

onnxruntime (CPU-only) was installed instead of onnxruntime-directml
DmlExecutionProvider isn't first in the providers list — ORT picks the highest-priority one
Model uses fp16 ops that the GPU's DirectML driver doesn't support; ORT silently falls back
Old GPU driver predates DirectML 1.13 features the model uses
Mixed environment with both onnxruntime and onnxruntime-directml installed; pip resolves to the wrong one

Solution

1. Install the DirectML build (and only the DirectML build):

pip uninstall -y onnxruntime onnxruntime-gpu onnxruntime-directml
pip install onnxruntime-directml

2. Force DmlExecutionProvider first:

import onnxruntime as ort
sess = ort.InferenceSession(
    "model.onnx",
    providers=[
        ("DmlExecutionProvider", {"device_id": 0}),
        "CPUExecutionProvider"
    ]
)
print(sess.get_providers())  # should show DmlExecutionProvider first

3. Check the GPU is actually picked up:

# In another window while inference runs
Get-Counter '\GPU Engine(*engtype_3D)\Utilization Percentage'

If GPU stays at 0%, DML isn't being used.

4. Update the GPU driver — DirectML rides on DXGI/D3D12. AMD Adrenalin or Intel Arc Control should be the latest stable.

5. Convert fp16-only ops to fp32 if a specific operator is unsupported:

from onnxruntime.transformers.float16 import convert_float_to_float16
# Inverse: convert specific ops back to fp32

6. Verify the wheel target:

python -c "import onnxruntime; print(onnxruntime.get_available_providers())"
# Must include 'DmlExecutionProvider'

Windows DirectML model runs on CPU instead of GPU

Cause

Solution

Related errors

Did this fix it?