I’m using ‘Unified Memory’ on tx1 to get a better IO performance. Normally it works fine and also I can get a good performance on memory operations. But once I requested too much memory using cudaMallocManaged, I got some errors. When changed the memory allocation method with cudaMalloc for some memory, the error disappeared.
The error sometime like: ‘Unlunched kernel error or segment error’
How many Unified memory can I get through cudaMallocManaged API on Jetson TX1? Could I have some tools to monitor the memory or cuda related resource limitation and its usage ?
How to get more detailed error information in such case on tx1? (I tried cuda-gdb but it can not work, conflict with other application)
Do I need cudadevicesync to guaranteed cache consistency when using Unified memory? (I lunch the cuda kernel with stream and there is cudaStreamSynchronize after the cuda kernel)