Because if it is, we can store the data in CUDA memory, perform calculations in kernel1, get back data1,
perform calculations in kernel2, get back data2. It will be definitely a speedup.
What are you even asking? Do you have to do a PCIe transfer after every kernel call or can you just leave memory resident on the GPU? Of course you can do the latter…
Also I would add - what I want is to have a persistent data to remain in memory, not to be overwritten and to be available
in next kernel calls.
Now, what will happen when:
another cudaMalloc and cudaFree is made in the next kernel? In this case I am not sure the first data will remain in device memory.
As far as I know in Windows 7, DirectX 11 is using the CUDA device. What will happen if DirectX 11 allocates its own memory in the
device in time between my kernel calls?
cudaMalloc and cudaFree are identical to normal malloc/free, which means that whatever values happened to be in that chunk of physical memory are readable after allocation. If you allocate N bytes, write something to it, free that region, and then immediately allocate N bytes, there’s certainly a good chance that you’ll get the original N byte region back (just like with normal malloc), but this is absolutely not something to be dependent on (just like normal malloc). If thread 2 allocates N bytes between thread 1’s free and second allocation of N, who knows what will happen?
(also, just like on the CPU, if you malloc/free around every function even when the sizes are the same between functions you’re doing it wrong)
I think some of the confusion here is assuming that the card has some kind of unprotected memory space that could be overwritten by another process using the card. There is virtual memory translation going on in the device, so other GPU contexts cannot see your memory. Calls to cudaMalloc() and cudaFree() by other processes cannot affect your memory space. (Although driver bugs in the past have resulted in crashed system when accessing random memory locations.) Now, if someone else allocates all the remaining memory on the card, then your process won’t be able to allocate anymore memory.
Calls to cudaMalloc() and cudaFree() by other processes cannot affect your memory space. (Although driver bugs in the past have resulted in crashed system when accessing random memory locations.)
I am sure nVidia made it right but this is a software and I am doing a work to be delivered to many different computers with different configurations.
It is not possible to test all permutations a possible driver bugs can crash the system as you see.
Context switches don’t matter, they’re not going to magically explode the data you have. WDDM actually does paging anyway, so if you have a 1GB card and App 1 requests 800MB and then App 2 requests 800MB, WDDM will actually page things in and out.