Nsight Systems causes CuPy to crash in Windows 10 if nvcc is invoked for kernel compilation

To reproduce:

  1. Install CuPy through wheel

pip install cupy-cuda11x

  1. Create nsys_dbg.py with the following content:

import cupy as cp
print(cp.nansum(cp.linspace(0,10)))
import time
time.sleep(10)

  1. Run the python file without Nsight Systems to check everything is right:

python ./nsys_dbg.py

  1. Run the python file through Nsight Systems:

Expected behaviour:

The python file prints the same correct answer without Nsight Systems:

250.0

What I actually got:

Traceback (most recent call last):
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy\cuda\compiler.py”, line 64, in _run_cc
log = subprocess.check_output(cmd, cwd=cwd, env=env,
File “C:\Users\berry\AppData\Local\Programs\Python\Python39\lib\subprocess.py”, line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File “C:\Users\berry\AppData\Local\Programs\Python\Python39\lib\subprocess.py”, line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘[‘D:\NV_CUDA\v11.7\bin\nvcc.EXE’, ‘-gencode=arch=compute_86,code=sm_86’, ‘–ptx’, ‘-DFIRST_PASS=1’, ‘–std=c++11’, ‘-ID:\qinfer_cuda\venv_alt\lib\site-packages\cupy\_core\include’, ‘-ID:\qinfer_cuda\venv_alt\lib\site-packages\cupy\_core\include\cupy\_cuda\cuda-11’, ‘-ID:\NV_CUDA\v11.7\include’, ‘-ftz=true’, ‘C:\Users\berry\AppData\Local\Temp\tmpp67fo98t\preprocess.cu’]’ returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy\cuda\compiler.py”, line 366, in compile_using_nvcc
_run_cc(cmd, root_dir, ‘nvcc’, log_stream)
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy\cuda\compiler.py”, line 80, in _run_cc
raise NVCCException(msg)
cupy.cuda.compiler.NVCCException: nvcc command returns non-zero exit status.
command: [‘D:\NV_CUDA\v11.7\bin\nvcc.EXE’, ‘-gencode=arch=compute_86,code=sm_86’, ‘–ptx’, ‘-DFIRST_PASS=1’, ‘–std=c++11’, ‘-ID:\qinfer_cuda\venv_alt\lib\site-packages\cupy\_core\include’, ‘-ID:\qinfer_cuda\venv_alt\lib\site-packages\cupy\_core\include\cupy\_cuda\cuda-11’, ‘-ID:\NV_CUDA\v11.7\include’, ‘-ftz=true’, ‘C:\Users\berry\AppData\Local\Temp\tmpp67fo98t\preprocess.cu’]
return-code: 2
stdout/stderr:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “D:\qinfer_cuda\examples\nsys_dbg.py”, line 2, in
print(cp.nansum(cp.linspace(0,10)))
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy_math\sumprod.py”, line 105, in nansum
return _math._nansum(a, axis, dtype, out, keepdims)
File “cupy_core_routines_math.pyx”, line 750, in cupy._core._routines_math._nansum
File “cupy_core_routines_math.pyx”, line 754, in cupy._core._routines_math._nansum
File “cupy_core_reduction.pyx”, line 568, in cupy._core._reduction._SimpleReductionKernel.call
File “cupy_core_reduction.pyx”, line 351, in cupy._core._reduction._AbstractReductionKernel._call
File “cupy_core_cub_reduction.pyx”, line 700, in cupy._core._cub_reduction._try_to_call_cub_reduction
File “cupy_core_cub_reduction.pyx”, line 536, in cupy._core._cub_reduction._launch_cub
File “cupy_core_cub_reduction.pyx”, line 471, in cupy._core._cub_reduction._cub_two_pass_launch
File “cupy_util.pyx”, line 67, in cupy._util.memoize.decorator.ret
File “cupy_core_cub_reduction.pyx”, line 243, in cupy._core._cub_reduction._SimpleCubReductionKernel_get_cached_function
File “cupy_core_cub_reduction.pyx”, line 228, in cupy._core._cub_reduction._create_cub_reduction_function
File “cupy_core\core.pyx”, line 2232, in cupy._core.core.compile_with_cache
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy\cuda\compiler.py”, line 493, in _compile_module_with_cache
return _compile_with_cache_cuda(
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy\cuda\compiler.py”, line 536, in _compile_with_cache_cuda
base = _preprocess(‘’, options, arch, backend)
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy\cuda\compiler.py”, line 434, in _preprocess
result = compile_using_nvcc(source, options, arch, ‘preprocess.cu’,
File “D:\qinfer_cuda\venv_alt\lib\site-packages\cupy\cuda\compiler.py”, line 376, in compile_using_nvcc
raise cex
cupy.cuda.compiler.CompileException: nvcc command returns non-zero exit status.
command: [‘D:\NV_CUDA\v11.7\bin\nvcc.EXE’, ‘-gencode=arch=compute_86,code=sm_86’, ‘–ptx’, ‘-DFIRST_PASS=1’, ‘–std=c++11’, ‘-ID:\qinfer_cuda\venv_alt\lib\site-packages\cupy\_core\include’, ‘-ID:\qinfer_cuda\venv_alt\lib\site-packages\cupy\_core\include\cupy\_cuda\cuda-11’, ‘-ID:\NV_CUDA\v11.7\include’, ‘-ftz=true’, ‘C:\Users\berry\AppData\Local\Temp\tmpp67fo98t\preprocess.cu’]
return-code: 2
stdout/stderr:

What are the versions of my SDKs etc:

OS : Windows-10-10.0.19044-SP0
Python Version : 3.9.13
CuPy Version : 11.2.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.23.3
SciPy Version : None
Cython Build Version : 0.29.32
Cython Runtime Version : None
CUDA Root : D:\NV_CUDA\v11.7
nvcc PATH : D:\NV_CUDA\v11.7\bin\nvcc.EXE
CUDA Build Version : 11070
CUDA Driver Version : 11070
CUDA Runtime Version : 11070
cuBLAS Version : (available)
cuFFT Version : 10701
cuRAND Version : 10210
cuSOLVER Version : (11, 4, 0)
cuSPARSE Version : (available)
NVRTC Version : (11, 7)
Thrust Version : 101500
CUB Build Version : 101500
Jitify Build Version : 4a37de0
cuDNN Build Version : 8500
cuDNN Version : 8500
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : 10600
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA GeForce RTX 3090
Device 0 Compute Capability : 86
Device 0 PCI Bus ID : 0000:01:00.0
Nsight Systems Version : 2022.4.1.21-0db2c85 Windows-x64.
Visual Studio Version : 2019 Community (16.11.19)
VC++ Compiler Version : 19.29.30146
Microsoft (R) Incremental Linker Version: 14.29.30146.0

Workarounds:

Use Linux instead. Everything is working there.

nvcc exit status 2 (without any stdout/stderr) on Windows often indicates a permission issue of temporary files. Make sure that TEMP environment variable is set to a writable directory. You can check that by adding a code like:

import os
print(os.environ["TEMP"])

First of all, thanks for your reply! But I think there’s more than that. Because when not launching it from Nsight Systems, everything works just fine. And I did check the temp folder just to be sure, the permissions are correct. Also, I can actually see the temporary folders and files being created and removed.

Beside spitting out return code 2, sometimes Python complains about temporary folders still being accessed by programs. I am guessing that Nsight Systems introduces some weired racing condition, and CuPy uses subprocess when dealing with NVCC backend (NVRTC backend works fine, and it doesn’t invoke a new process to do the compiling). I hope the additional information will clear things up a bit.

@dofek for Windows target.