JETPACK版本:5.1.1
硬件版本:AGX ORIN
故障现象:
部署了获取gpu coredump信息的debug方案,但是当出现nvgpu错误的时候,系统没有生成nvcudmp文件
复现步骤:
sudo usermod -aG debug $USER
ulimit -c unlimited
export CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1
进入到/usr/local/cuda/samples/0_Simple/目录下
将附件:cudaOpenMP.zip复制到该目录下并解压到当前目录
解压后进入到/usr/local/cuda/samples/0_Simple/cudaOpenMP.目录
执行sudo ./cudaOpenMP,此时通过dmesg -w可以观察到内核已经上报nvgpu相关错误,见附件dmesg.log
使用sudo find / -name .nvcu ,发现系统中并没有生成该文件
cudaOpenMP.zip (336.8 KB)
dmesg.log (111.7 KB)
Hi,
We will try if we can see similar behavior first.
Will get back to you later.
Thanks.
Hi,
Based on your command, the global variable is set to the $USER account but the app is run with root.
Could you copy the app to $HOME and run it without sudo?
Thanks.
谢谢,调整用户后,目前可以正常生成.nvcudmp文件
system
Closed
June 4, 2024, 5:05am
8
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.