Dear maintainers, dear community,
I need to access the __half and __half2 types in JIT-compiled CUDA code using NVRTC. Some our users ( Using CUDA toolkit from pypi · Issue #443 · getkeops/keops · GitHub ) have suggested relying solely on the CUDA toolkit distributed via pip (e.g. nvidia-cuda-runtime-cu12), which seems reasonable in principle.
However, in practice, relying only on the pip-provided headers leads to broken include resolution. For instance, on a fresh Google Colab instance with a GPU, the following minimal test:
!echo '#include <cuda_fp16.h>' | g++ -E \
-I/usr/local/lib/python3.12/dist-packages/nvidia/cuda_runtime/include -
results in:
/usr/local/lib/python3.12/dist-packages/nvidia/cuda_runtime/include/vector_types.h:65:10: fatal error: crt/host_defines.h: No such file or directory
65 | #include "crt/host_defines.h"
| ^~~~~~~~~~~~~~~~~~~~
This originates from:
cuda_fp16.h → vector_types.h → crt/host_defines.h
The issue is that the pip-distributed CUDA headers do not seem to provide a complete internal header layout. In particular, the crt/ subdirectory is missing or not correctly structured, even though related headers (like host_defines.h) may exist elsewhere.
On some systems, this leads to inconsistent behavior, where compilation succeeds only because headers are implicitly resolved from a system-wide CUDA installation… giving a fragile and mixed configuration.