I am using the 4.2. Your comment is interesting, because it could still be related too big allocation. After investigating, I am able to reproduce the problem. And, I may consider it like a bug or unknow behavior where the CPU/GPU memory is on the same device.
The code work but it is polluted by this log.
The code is a single vector addition. However the vector is unify, it can work on CPU for GPU, totally transparent.
It is very simple, I love it in my daily programming.
The constructor/destructor of the vector use Cuda unify malloc managed directly or the pool.
The pool allocated a huge piece of memory and it will select memory inside. I need a pool
In my real life because, I have thousands of tiny allocation (~512 KB), used cuda unify on
Tiny allocation take ages.
Currently the code work for both version but I have this log for the pool version, only on the Jetson
board. I did not write the pool, it come from Hopkins university. The pool has been tested on the following hardware.
Under the env - NVIDIA-SMI 410.93 Driver Version: 410.93 CUDA Version: 10.0
- M2000 ^_^
- GTX 1600
Jetson AGX Xavier
Cuda compilation tools, release 10.0, V10.0.166
Currently the pool just return a block from a big allocation. Valgrind, cuda-memecheck do
Not give back anything negative. The log appear when I start the cuda kernel. Unfortunately,
I can not go inside with my tool, as it is inside the driver CUDA SDK
You may have a look,
I set up a git repo, free to the public
I may open a bug if you provide me the good link
- app/main.cu, the tiny main and the tiny vector class
- src/pool.cu, the pool manager