CUDA contexts

Does cuda contexts provide any data isolation? For instance, if two applications run concurrently and use different contexts is it possible for the first one to access the data of the second? If there is protection how it is achieved?

Generally, a CUDA context can be thought of as being analogous to a process. This statement is directly made in the programming guide description of a context. You may wish to read that linked section.

Like processes, contexts generally have isolation between them. A counter-example would be CUDA IPC.

I don’t have any details for “how it is achieved”. However to test the idea, you could take an address from one context, and attempt to dereference that address in another context. You will find that you won’t be able to successfully dereference the address. This could be done for example with host-based IPC to pass the address from one process to another, or perhaps a more “manual” method.

1 Like

What will happen if I create an application that allocates all the GPU memory and sets it to zero. Then inside the kernel, I will print all the memory addresses. From a second application, I will start a kernel. From this kernel, I will print all the GPU memory. Can I print the same values?

How would you do that?

I don’t think so. That is the point of the isolation.

Why not try it? I think your thought experiment here has flaws or nonsense wrapped into it:

  • I create an application that allocates all the GPU memory and sets it to zero.
  • Then inside the kernel, I will print all the memory addresses
  • From a second application, I will start a kernel.

So far, so good. No issues, and your thought experiment so far makes sense.

  • From this kernel, I will print all the GPU memory.

???

How will you do that? You would need a pointer of some sort, right? Where are you going to get that pointer?

Anyway, it really doesn’t matter how you arrive at the pointer. It certainly could not be arrived at through a valid operation like cudaMalloc, because you have already allocated all the memory, so cudaMalloc would presumably return an out of memory error. Therefore your pointer will be arrived at by some bizarre method. Any such pointer would not be valid in the memory space of the 2nd context, and any attempted usage would trigger a protection fault, just as if you dereferenced a null pointer, or accessed out of bounds, etc.

Anyway, try it.

(I am using the words “protection fault” loosely. The GPU has a method to detect illegal address space accesses. It would detect such an access in the context of the 2nd kernel in this scenario. I’m not suggesting “protection fault” means whatever you think it means. I’m suggesting the GPU would detect the illegal access in some unspecified fashion, and would do something catastrophic to the context.)

It occurs to me later that perhaps you are talking about the idea of data “leaking” from one application to another. After the first application exits, is there a possibility for a subsequent app to read data “left behind” by the first application. I don’t know if CUDA happens to provide guarantees against this (it may, I’m just not able to cite any at the moment), but my simple test suggests it is at least not typical:

$ cat t2257.cu
#include <iostream>
#include <cstdlib>
const int val = 33;
int main(int argc, char *argv[]){

  unsigned char *p;
  const size_t ds = 31ULL*1048576*1024;
  cudaMalloc(&p, ds);
  if (argc > 1){
    unsigned char *h;
    h = (unsigned char *)malloc(ds);
    cudaMemcpy(h, p, ds, cudaMemcpyDeviceToHost);
    for (int i = 0; i < ds; i++) if (h[i] == val) std::cout << "match at index: " << i << std::endl;
    }
  else {
    cudaMemset(p, val, ds);}
}
$ nvcc -o t2257 t2257.cu
$ compute-sanitizer ./t2257
========= COMPUTE-SANITIZER
========= ERROR SUMMARY: 0 errors
$ compute-sanitizer ./t2257 1
========= COMPUTE-SANITIZER
========= ERROR SUMMARY: 0 errors
$

This is on a 32GB V100, CUDA 11.4. Similar results are observed without the use of compute-sanitizer and over a number of runs. So it appears there may be some scrubbing going on.

Thank you very much for your response. So there are multiple TLBs (one per context?) that perform the virtual to physical address translation and when a kernel from a context performs illegal accesses this leads to an error?

I don’t know any of those details. Yes, when a kernel performs an illegal access, it leads to an error.

(OK. If you access an array one byte beyond the end of a valid allocation, and you do not use a tool like compute-sanitizer, you may not see an error. But this really isn’t the case being discussed. And certainly if you access far enough beyond the end of a valid allocation, you will see an error.)

1 Like

The GPU is a collection of multiple independent engines that can run different contexts simultaneously. The two most common engines are the GR (3D + 2D + compute) and the Asynchronous Copy Engine. Each engine can run 1 context at a time. Each engine has a separate page table pointer via the context. On a context switch the TLBs are invalidated.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.