RTX A4000 RDMA Usage Issues: Intermittent Locking Issues When Entering Sleep Mode

RTX A4000 RDMA Usage Issues: Intermittent Locking Issues When Entering Sleep Mode.

power on → cuda memory allocation → sleep command → cuda memory release → sleep entry → wake up command → power on → cuda memory allocation → repeat

During sequence repetition

When creating RDMA memory at the time of wake-up after entering sleep mode, the CuCtxCreate_v2 function has intermittent lock effects as shown below.

a key problem.
1.
Intermittent lock on cuCtxCreate_v2 function when creating RDMA memory at wakeup time after entering sleep mode
2.
Request to see if there is also a wakeup time from the GPU memory perspective on the A4000. Is there a minimum required timing for calling the cuCtxCreate_v2 function for HW stabilization during wakeup?
3.
You have to turn off the CUDA memory allocation to enter sleep on Windows because the graphics card memory uses normal power, and when entering sleep mode, you lose all the data in the memory area
Please check if sleep will only go in after unlocking the FreeCuda Memory.


Use what Nvidia provided RDMACudaControl.cpp


The problematic part of the content below <<<<<<<<<<<<<<<<< is indicated by the symbol.


Memory allocation

if (!checkCUresult(cuDeviceGet(&m_CudaDevices[0], TCC_DEVICE)))

{

const char* msg = “Could not get cuda devices(TCC Mode)”;

m_logger.PrintLogMsg(msg);

throw std::exception(msg);

}

if (!checkCUresult(cuCtxCreate(&m_CudaContexts[0], CU_CTX_MAP_HOST | CU_CTX_SCHED_SPIN, m_CudaDevices[0]))) <<<<<<<<<<<<<<<<<<<<<< Stopping at that function
{
const char* msg = “Could not create cuda contexts”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);
}

if (!checkCUresult(cuCtxPushCurrent(m_CudaContexts[0])))
{
const char* msg = “Failed to push cuda context”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);
}
m_logger.PrintLogMsg(“Push cuda context”);

if (!checkCUresult(cuMemAlloc(&m_DstMem, memSize)))
{
const char* msg = “Could not allocate device memory on destination”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);
}
m_logger.PrintLogMsg(“Allocate device memory on destination (size: %d)”, memSize);

unsigned int flag = 1;
// As we are directly copying from the memory, make sure that the operation is synchronous
if (!checkCUresult(cuPointerSetAttribute(&flag, CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, m_DstMem)))
{
const char* msg = “Could not set device memory attributes”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);
}
m_logger.PrintLogMsg(“Set device memory attributes (%d)”, CU_POINTER_ATTRIBUTE_SYNC_MEMOPS);

if (!checkCUresult(cuCtxPopCurrent(&m_CudaContexts[0])))
{
const char* msg = “Failed to pop cuda context”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);

}

m_logger.PrintLogMsg(“Pop cuda context”);


Unallocate Memory

if (!checkCUresult(cuCtxPushCurrent(m_CudaContexts[0])))
{
const char* msg = “Failed to push cuda context”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);
}
m_logger.PrintLogMsg(“Push cuda context”);
if (!checkCUresult(cuMemFree(m_DstMem)))
{
const char* msg = “Failed to free device memory”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);
}
m_logger.PrintLogMsg(“Free device memory”);
if (!checkCUresult(cuCtxPopCurrent(&m_CudaContexts[0])))
{
const char* msg = “Failed to pop cuda context”;
m_logger.PrintLogMsg(msg);
throw std::exception(msg);
}

m_logger.PrintLogMsg(“Pop cuda context”);


Please review the problem in detail.