Warning: Cuda API error detected: cuModuleLoadFatBinary returned (0xd1)

kavindamadhujith · June 2, 2024, 4:35pm

Hello, I have RTX Ada 6000 48GB gpu installed. and I am planning to setup a local environment here. somehow. TensorFlow giving issues. I have installed cuda 11.8, cudnn 8.7 and also tensorrt. but tensorflow-gpu still couldn’t recognize my GPU. can you please guide me how to successfully setup a local environment ? I originally installed tensorflow 2.16 but since it ddnt worked I downgraded to 2.15. somehow it is abled to detect my GPU. but when I do some training, I get out of memory error from cuda driver

veraj · June 3, 2024, 2:31am

@kavindamadhujith Please start a new post as your issue is more like a setup issue. Note this is forum support for develop tools - cuda-gdb. Please post your problem under “CUDA set up and installation” forum. You can get support there. Thanks !

veraj · June 3, 2024, 5:22am

Hi, @892516165

Sorry for the late response. We built your source code internal successfully.
And we can also see “warning: Cuda API error detected: cuModuleLoadFatBinary returned (0xd1)” printed multi times, but it doesn’t impact the debug. For example:

warning: Cuda API error detected: cuModuleLoadFatBinary returned (0xd1)

[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

CUDA thread hit application kernel entry function breakpoint, 0x00007fffb5313600 in void cusparselt::pruning_kernel<64u, 8u, (cusparselt::(anonymous namespace)::Sparsity)1, true, __half>(__half const*, __half*, long, long, long, long)
<<<(1,1,1),(64,8,1)>>> ()
(cuda-gdb)
(cuda-gdb)
(cuda-gdb)
(cuda-gdb) l
62 }
63
64 define CHECK_CUSPARSE(func)
65 {
66 cusparseStatus_t status = (func);
67 if (status != CUSPARSE_STATUS_SUCCESS) {
68 printf(“CUSPARSE API failed at line %d with error: %s (%d)\n”,
69 LINE, cusparseGetErrorString(status), status);
70 return EXIT_FAILURE;
71 }
(cuda-gdb) n
Single stepping until exit from function _ZN10cusparselt14pruning_kernelILj64ELj8ELNS_43_GLOBAL__N__47f2a60d_10_pruning_cu_35f6c0908SparsityE1ELb1E6__halfEEvPKT3_PS4_llll,
which has no line number information.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (32,0,0), device 0, sm 0, warp 1, lane 0]
0x00007fffb5313600 in void cusparselt::pruning_kernel<64u, 8u, (cusparselt::(anonymous namespace)::Sparsity)1, true, __half>(__half const*, __half*, long, long, long, long)<<<(1,1,1),(64,8,1)>>> ()
(cuda-gdb)

So in your case, I’m thinking is it possible that you get latest CUDA12.5 package to rebuild and debug ?

FYI, the compile command we use is “nvcc -lcusparseLt -lcusparse -ldl -gencode arch=compute_80,code=sm_80 -g -G -o test a.cu” (We are using Ampere so we choose 80 here)

Also we install cusparselt from here: cuSPARSELt Downloads | NVIDIA Developer

892516165 · June 3, 2024, 9:51am

After reinstalling cusparseLt, it still fails to run properly. I’m wondering if the issue might be related to environment variables or WSL (Windows Subsystem for Linux). Here’s my .bashrc configuration:

export TVM_LOG_DEBUG="ir/transform.cc=1,relay/ir/transform.cc=1"
export PATH="/home/whh/environ/bsc/bin:$PATH"
export LIBRARY="/home/whh/environ/bsc/lib:$LIBRARY_PATH"

export TVM_HOME=/home/whh/tvm
export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
export PYTHONPATH=$TVM_HOME/src:${PYTHONPATH}

#export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}
export PATH=/usr/local/cuda-12.5/bin${PATH:+:${PATH}}
#export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:/usr/local/lib:/home/whh/anaconda3/envs/caffe_test/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.5/lib64:/usr/local/lib:/home/whh/anaconda3/envs/caffe_test/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
#export CUDA_HOME=/usr/local/cuda-12.1:$CUDA_HOME
export CUDA_HOME=/usr/local/cuda-12.5:$CUDA_HOME
# export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
#export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.5/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# cuda gdb on wsl
#export CUDBG_USE_LEGACY_DEBUGGER=1
# export QT_DEBUG_PLUGINS=1
# 不确定是否需要配置
# export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:${LD_LIBRARY_PATH}
#export CUDA_TOOLKIT_PATH=/usr/local/cuda-12.1:${CUDA_TOOLKIT_PATH}
export CUDA_TOOLKIT_PATH=/usr/local/cuda-12.5:${CUDA_TOOLKIT_PATH}
#export CUDA_VISIBLE_DEVICES="0"
#export CUDA_DEBUGGER_SOFTWARE_PREEMPTION=1
#export CUDA_LAUNCH_BLOCKING=1
#export CUSPARSELT_DIR=/home/whh/workspace/env/cusparseLt
export CUSPARSELT_DIR=/home/whh/workspace/env/libcusparse_lt-linux-x86_64-0.6.1.0-archive
#export LD_LIBRARY_PATH=${CUSPARSELT_DIR}/lib64:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=${CUSPARSELT_DIR}/lib:${LD_LIBRARY_PATH}

# LAPACK
export LIBRARY_PATH="$LIBRARY_PATH:~/workspace/env/lapack-3.12.0"
export C_INCLUDE_PATH="$C_INCLUDE_PATH:~/workspace/env/lapack-3.12.0/LAPACKE/include:~/workspace/env/lapack-3.12.0/CBLAS/include"
export LAPACK_LIBRARIES="/home/whh/workspace/env/lapack-3.12.0/liblapack.a"
export BLAS_LIBRARIES="/home/whh/workspace/env/lapack-3.12.0/libcblas.a"

#export NVLOG_CONFIG_FILE=${HOME}/workspace/env/nvlog.config
#export NVLOG_CONFIG_FILE=${HOME}/nvlog.local.config

Moreover, even when I comment out the NVLOG_CONFIG_FILE line, cuda-gdb keeps outputting error messages during usage.

veraj · July 4, 2024, 10:20am

Hi, @892516165

I noticed you set “export CUDBG_USE_LEGACY_DEBUGGER=1”
Please remove this and try.

veraj · July 30, 2024, 9:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"A Unwinder should return gdb.UnwindInfo instance." error with NVIDIA Nsight Visual Studio Code Edition Nsight Visual Studio Code Edition nsight , vscode , cuda-gdb	11	972	July 5, 2023
Cuda-gdb aborted CUDA-GDB	7	119	November 24, 2024
Failure to install CUDA on WSL 2 Ubuntu CUDA on Windows Subsystem for Linux	65	45109	September 10, 2021
Possible debugger bug? Debugger doesn't recognize functions instantiated from templates CUDA Programming and Performance	7	2971	August 3, 2009
CUDA Toolkit and SDK v2.2 released CUDA Programming and Performance	59	64619	January 25, 2011
Updating the CUDA Linux GPG Repository Key Technical Blog	70	31065	April 4, 2024
CUDA sample throwing error CUDA on Windows Subsystem for Linux	46	22814	April 29, 2022
Deepstream can't debug with cuda-gdb successfully CUDA-GDB	11	1107	May 27, 2024
cusolverDnSgetrf() fails on A100 (but not on A10) when called in a tight loop GPU-Accelerated Libraries cusolver	15	1853	February 23, 2022
Deepstream pipeline waits for input indefinitely DeepStream SDK deepstream61	19	668	June 16, 2022

Warning: Cuda API error detected: cuModuleLoadFatBinary returned (0xd1)

Related topics