cuPQC SDK 0.4.1 crashes on DGX Spark (GB10, sm_121) — Does Blackwell support exist?

Hi everyone,

I’m hoping someone here can help me — I’ve been stuck on this for a few days and I’m running out of ideas.

The short version

I’m trying to use cuPQC SDK 0.4.1 on an NVIDIA DGX Spark for a GPU-accelerated PQC project. The library loads fine and CUDA initializes, but the program crashes with double free or corruption the moment the first PQC function is called. After a lot of debugging, I found out the GB10 GPU has compute capability sm_121 — which isn’t in the cuPQC SDK’s supported list (it goes up to sm_90).

My system

  • Machine: NVIDIA DGX Spark (Rev A.7)

  • GPU: NVIDIA GB10 — nvidia-smi reports compute cap 12.1

  • CPU: ARM aarch64 (Cortex-X925 / A725)

  • Host CUDA: 13.1, Driver 580.95.05

  • cuPQC SDK: 0.4.1 (aarch64)

  • Docker container: nvidia/cuda:12.8.0-devel-ubuntu22.04

What I see

Loaded cuPQC library from: /opt/cupqc-lib/libcupqc_wrapper.so
CUDA initialized
double free or corruption (!prev)
Exited with code 133 (SIGABRT)

This happens on the very first call to cupqc_kem_keypair() — right after cudaSetDevice(0) succeeds. No kernel output, just a crash.

Root cause I identified

The cuPQC SDK 0.4.1 precompiles its internal libraries (cupqc-pk_static, cupqc-hash_static) using LTO code for architectures sm_70 through sm_90. The GB10 is sm_121, which has no native or PTX code path in the SDK. So when the kernel tries to launch, it either picks the wrong code or fails to find any.

Related: cuPQC examples fail to compile on Jetson Orin Nano — similar pattern of cuPQC being incompatible with specific platforms.

What I’ve tried

  1. Updated CUDA base image in Docker from 12.6.2 → 12.8.0 in Dockerfile (SDK requires 12.8+)

  2. Added PTX fallback to CMakeLists.txt:

--generate-code=arch=compute_90,code=compute_90

This embeds sm_90 PTX in our wrapper so CUDA might JIT it for sm_121. Waiting to confirm if this works with the cuPQC LTO internals.

  1. Confirmed aarch64 SDK matches machine architecture ✓

My questions for the community / NVIDIA team

  1. Has anyone successfully run cuPQC SDK on a Blackwell GPU (sm_100, sm_121)? What did you do?

  2. Does embedding sm_90 PTX in the wrapper help? Or will the cuPQC LTO libraries still fail to JIT on sm_121?

  3. Is there a newer SDK version being worked on with Blackwell support?

  4. Any other workaround you’d recommend for getting GPU-accelerated PQC running on a DGX Spark?

Thanks in advance — any help is hugely appreciated!

Hello there,

You are trying to execute the examples for 0.4.1, correct?

Can you try editing line 2 the makefile for the public_key examples and change arch=native to arch= and see if that changes things?

I don’t think you need to build the example to PTX and then JIT.

Thanks!

Hello,

Yes, we were trying to build a custom C++ wrapper for the 0.4.1 PQC SDK to deploy as a Python extension.

You are completely correct! Setting arch= (or explicitly setting arch=sm_90) solved the issue perfectly. Since nvcc in CUDA 12.8 doesn’t officially recognize compute_121 yet, using native on the Blackwell DGX was causing the compiler to fail because it couldn’t find the 121 specific targets. By dropping the architecture flag (and letting LTO default to Hopper sm_90), the linker successfully assembled the .cubin and the Blackwell driver executed the JIT flawlessly at runtime via backwards compatibility.

Thanks for your help!

Hi,

Great, I am glad it worked out!