Missing crt/host_defines.h when using pip CUDA headers with NVRTC (cuda_fp16.h)

Dear maintainers, dear community,

I need to access the __half and __half2 types in JIT-compiled CUDA code using NVRTC. Some our users ( Using CUDA toolkit from pypi · Issue #443 · getkeops/keops · GitHub ) have suggested relying solely on the CUDA toolkit distributed via pip (e.g. nvidia-cuda-runtime-cu12), which seems reasonable in principle.

However, in practice, relying only on the pip-provided headers leads to broken include resolution. For instance, on a fresh Google Colab instance with a GPU, the following minimal test:

!echo '#include <cuda_fp16.h>' | g++ -E \
-I/usr/local/lib/python3.12/dist-packages/nvidia/cuda_runtime/include -

results in:

/usr/local/lib/python3.12/dist-packages/nvidia/cuda_runtime/include/vector_types.h:65:10: fatal error: crt/host_defines.h: No such file or directory
   65 | #include "crt/host_defines.h"
      |          ^~~~~~~~~~~~~~~~~~~~

This originates from:

cuda_fp16.h → vector_types.h → crt/host_defines.h

The issue is that the pip-distributed CUDA headers do not seem to provide a complete internal header layout. In particular, the crt/ subdirectory is missing or not correctly structured, even though related headers (like host_defines.h) may exist elsewhere.

On some systems, this leads to inconsistent behavior, where compilation succeeds only because headers are implicitly resolved from a system-wide CUDA installation… giving a fragile and mixed configuration.

This reminded me of how I’m getting the error

‘crt/host_config.h’: No such file or directory

when CMake-configuring with CUDA 13.x on Windows. But I don’t know the answer!