Since the Pascal architecture supports memory coherence can we say for sure that the TX2 also supports full memory coherency between the ARM CPU and Pascal GPU?
I think that depends on what you mean by “coherence.” ARM, itself is not as coherent as you’d be used to from the Intel/x86 world. The separate GPU is likely to add additional necesary synchronization points. It seems to me like it wouldn’t be possible to have a high performance GPU share an automatically coherent memory bus/ring with a multi-core general CPU. So: “signs point to no, but I’ve been wrong before!”
The question is: What do you want to do with this? All the APIs that allow you access to the GPU resources take care of coherency for you.
Well thanks. But what I meant was with pascal nvidia supported memory coherence. For some reason I notice that cuda malloc managed fails on jetson tx2. Any reasons why this happens?
No idea, sorry. Sounds like a CUDA bug, and/or some limitation in how much you can get/use.
I see you have another post explicitly about the CUDA malloc question, so let’s hope you get better answers there!
Thanks for your question.
We test standard um sample in tx2 and it works properly.
Could you share the source hit error in unified memory?
Please notes that you need to compile it with specified SM architecture.
Or you will got error when compiling sm_20 archi but it won’t be used on tx2.
/usr/local/cuda/bin/nvcc -gencode arch=compute_62,code=compute_62 test.cu -o test