I have this simple test code:
import torch
data = torch.randn(10, 10, device="cuda:0")
import threading
def thread_func():
print("thread_func")
print(f"{torch.cuda.is_initialized()=}")
def check_cuda_context():
"""Check CUDA driver context status"""
import ctypes
cuda = ctypes.CDLL("libcuda.so")
device = ctypes.c_int()
result = cuda.cuCtxGetDevice(ctypes.byref(device))
return (True, device.value) if result == 0 else (False, None)
print("Checking CUDA context...")
valid, device_id = check_cuda_context()
print(f"CUDA context is valid: {valid}, device id: {device_id}")
thread = threading.Thread(target=thread_func)
thread.start()
thread.join()
It prints:
thread_func
torch.cuda.is_initialized()=True
Checking CUDA context...
CUDA context is valid: False, device id: None
My primary thread creates a cuda tensor, and has a cuda context. PyTorch does not store thread-local state for `torch.cuda.is_initialized()`. It is a process-level global state. When I create a child thread, inside this thread, PyTorch thinks it already initialized cuda (and has a context). However, when I query the driver directly through `libcuda.so`, it actually does not have a cuda context.
So, my question is:
Is it a well-defined behavior in cuda, that new threads will not inherit primary context?
Hope to get an answer from expert like @Robert_Crovella đ
If your question is about pytorch I suggest asking it on a pytorch forum such as discuss.pytorch.org. There are NVIDIA experts that patrol those forums.
If we restrict ourselves to discussing the behavior of the CUDA driver API, then my expectation is that in order for a context to be associated with a thread, it must be done explicitly (i.e. there is no implicit mechanism, for any thread, anywhere). This is more-or-less described in the CUDA programming guide:
A host thread may have only one device context current at a time. When a context is created with cuCtxCreate( ), it is made current to the calling host thread. CUDA functions that operate in a context (most functions that do not involve device enumeration or context management) will return CUDA_ERROR_INVALID_CONTEXT if a valid context is not current to the thread.
my takeaway:
- calling
cuCtxCreate() is one method to establish a context and make current to the thread
- it is possible that a thread has no (valid) context associated or âmade currentâ
- therefore my expectation is that a spawned thread must explicitly make a context current.
Note: a âprimary cuda contextâ or âprimary contextâ has a very specific meaning in CUDA (the driver API accessible context that is implicitly created by the CUDA runtime API). I donât think you are actually referring to that.
If that sort of response is not sufficient, i.e. you are looking for a statement in CUDA documentation that âa spawned thread does not automatically inherit a contextâ or something similar, I cannot quote chapter and verse of documentation where that can be found. In that case I suggest you file a bug. Instructions to do so are linked to a sticky post at the top of this sub-forum.
thanks for your response. yes Iâm looking for this clarification.
Since thereâs no clear statement about this, Iâll just treat the behavior I saw as the standard.