Hallo everybody,
During my Bachelor Thesis, I started developing CUDA software.
Everything worked quite fine for a long time. Now my program results in an error after several executions and I have no clue why it does.
I allocate pinned memory by using cudaMallocHost and some mapped memory by using cudaHostAlloc.
After several executions, the program starts resulting in a segmentation fault (SIGSEGV) because these memory allocations seem to fail. The error message I get via cudaGetErrorString only says unknown error.
After some information gathering with google, my guess is, that the permissions on nvidia-uvm module fail. They start with root:video, and result in root:root. I guess, the program therefore has no permission to allocate some mapped memory or pinned memory.
I am using an OpenSuse 42.2 and have no chance to upgrade to 42.3 for work related reasons. But i think that should not be the problem, because it already worked for several tries.
Furthermore I am using cuda 9.0 suite. The graphics card I am working on is an NVidia Quadro K620.
If you need more information, feel free to ask.
Some outputs:
before starting the program:
crw-rw----+ 1 root video 195, 0 Feb 14 08:10 nvidia0
crw-rw----+ 1 root video 195, 255 Feb 14 08:10 nvidiactl
crw-rw-rw- 1 root video 195, 254 Feb 14 08:10 nvidia-modeset
crw-rw-rw-+ 1 root video 248, 0 Feb 14 08:10 nvidia-uvm
crw-rw-rw- 1 root root 248, 1 Feb 14 08:40 nvidia-uvm-tools
afterwards:
crw-rw----+ 1 root video 195, 0 Feb 14 08:10 nvidia0
crw-rw----+ 1 root video 195, 255 Feb 14 08:10 nvidiactl
crw-rw-rw- 1 root video 195, 254 Feb 14 08:10 nvidia-modeset
crw-rw-rw-+ 1 root root 248, 0 Feb 14 08:10 nvidia-uvm
crw-rw-rw- 1 root root 248, 1 Feb 14 08:40 nvidia-uvm-tools
error:
==20741== NVPROF is profiling process 20741, command: Entity_Update_CUDA -g 1
unknown error
Code passage that fails allocations:
entity_cdb_cpu *d_info_entity, *entity_information;
all_information_cpu *d_info_mapped, *mapped_information;
cudaSetDeviceFlags(cudaDeviceMapHost);
cudaError_t err;
err = cudaMallocHost(reinterpret_cast<void **>(&entity_information), sizeof(entity_cdb_cpu));
std::cout << cudaGetErrorString(err) << std::endl;
err = cudaHostAlloc(reinterpret_cast<void **>(&mapped_information), sizeof(all_information_cpu),
cudaHostAllocMapped);
std::cout << cudaGetErrorString(err) << std::endl;
Thanks!
Max