When a thread has a primary cuda context, does the child thread it creates automatically inherit the cuda context?

I have this simple test code:

import torch

data = torch.randn(10, 10, device="cuda:0")

import threading

def thread_func():
    print("thread_func")
    print(f"{torch.cuda.is_initialized()=}")
    def check_cuda_context():
        """Check CUDA driver context status"""
        import ctypes
        cuda = ctypes.CDLL("libcuda.so")
        device = ctypes.c_int()
        result = cuda.cuCtxGetDevice(ctypes.byref(device))
        return (True, device.value) if result == 0 else (False, None)

    print("Checking CUDA context...")
    valid, device_id = check_cuda_context()
    print(f"CUDA context is valid: {valid}, device id: {device_id}")

thread = threading.Thread(target=thread_func)
thread.start()
thread.join()

It prints:

thread_func
torch.cuda.is_initialized()=True
Checking CUDA context...
CUDA context is valid: False, device id: None

My primary thread creates a cuda tensor, and has a cuda context. PyTorch does not store thread-local state for `torch.cuda.is_initialized()`. It is a process-level global state. When I create a child thread, inside this thread, PyTorch thinks it already initialized cuda (and has a context). However, when I query the driver directly through `libcuda.so`, it actually does not have a cuda context.

So, my question is:

Is it a well-defined behavior in cuda, that new threads will not inherit primary context?

Hope to get an answer from expert like @Robert_Crovella 😄

If your question is about pytorch I suggest asking it on a pytorch forum such as discuss.pytorch.org. There are NVIDIA experts that patrol those forums.

If we restrict ourselves to discussing the behavior of the CUDA driver API, then my expectation is that in order for a context to be associated with a thread, it must be done explicitly (i.e. there is no implicit mechanism, for any thread, anywhere). This is more-or-less described in the CUDA programming guide:

A host thread may have only one device context current at a time. When a context is created with cuCtxCreate( ), it is made current to the calling host thread. CUDA functions that operate in a context (most functions that do not involve device enumeration or context management) will return CUDA_ERROR_INVALID_CONTEXT if a valid context is not current to the thread.

my takeaway:

  • calling cuCtxCreate() is one method to establish a context and make current to the thread
  • it is possible that a thread has no (valid) context associated or “made current”
  • therefore my expectation is that a spawned thread must explicitly make a context current.

Note: a “primary cuda context” or “primary context” has a very specific meaning in CUDA (the driver API accessible context that is implicitly created by the CUDA runtime API). I don’t think you are actually referring to that.

If that sort of response is not sufficient, i.e. you are looking for a statement in CUDA documentation that “a spawned thread does not automatically inherit a context” or something similar, I cannot quote chapter and verse of documentation where that can be found. In that case I suggest you file a bug. Instructions to do so are linked to a sticky post at the top of this sub-forum.

thanks for your response. yes I’m looking for this clarification.

Since there’s no clear statement about this, I’ll just treat the behavior I saw as the standard.