Creation and Cleanup of CUcontext

frenzi · December 6, 2022, 4:11am

I’m developing a vision application with VPI on the Jetson Platform, I have relatively low utilisation so should theoretically be able to handle multiple streams. However when I multithread (with additional VPI Context per thread) each of the threads is frequently waiting for another for cuda calls (malloc, copy, streamsync etc) resulting in approximately the same throughput. Nothing I can do about reducing these calls as they’re all on VPI’s side of things (vpiImageCreateView or vpiImageLockData, even though everything is exclusively flagged as CUDA backend).

Therefore to prevent all this waiting, I’ve tried to give each thread its own cuda context. This significantly increases throughput. I presume I should clean this up when the thread finishes running, however when I try to use cuCtxDestroy, I either get a segfault or cudaErrorContextIsDestroyed/cudaErrorDeviceUninitialized. I ensure that the vision engine is destroyed before the context is destroyed (its in an inner scope), so I’m 99% sure that there’s no cuda allocated memory left danging around, unless VPI has some leftovers (vpiContextDestroy is the last thing I call in my engine’s destructor).

I don’t get any errors when omitting cuCtxDestroy, but I assume I’ve just leaked a context or something, so things will go wrong if I create and destroy many of these engines. Example structure is shown below.

void thread()
{
CUcontext ctx;
CHECK_CU_STATUS(cuCtxCreate(&ctx, CU_CTX_MAP_HOST, 0));
{
    Engine engine{}
    engine.run()
}
cuCtxDestroy(ctx);
}

frenzi · December 18, 2022, 10:30pm

No Ideas???

njuffa · December 18, 2022, 11:15pm

Questions that relate to one of NVIDIA’s integrated embedded platforms usually receive faster and / or better answers in the sub-forums dedicated to those platforms. I would therefore suggest to start here: Jetson & Embedded Systems - NVIDIA Developer Forums

I have never used CUDA’s driver-level API, and I don’t know what VPI is, so I won’t even speculate on an answer.

Robert_Crovella · December 18, 2022, 11:22pm

Sorry, no ideas. I find that people who provide a short, complete test case are more likely to get useful feedback. The 6 or so lines of code that you provided that I cannot compile do not shed any light on it at all, for me.

The profilers can provide API tracing functionality. I guess if I thought I owned a context, meaning I had created it, and then when I went to destroy it, I got an error message saying its already destroyed, that would seem weird. At that point I might try to use profiler API tracing to count context destroy calls and see in what parts my code they are occurring.

But I really have no idea what is happening in your case.

Good luck!

frenzi · December 18, 2022, 11:42pm

Thanks, I guess I could make a minimal working example, I just wanted to demonstrate the structure of the creation and deletion of the context I’m trying, if there was anything obvious in incorrectness of usage. I could make a minimal viable example and see if I get a reproduceable error (rather than the bucket load of proprieatry code I can’t share), maybe I’m missing a cudaFree somewhere for all I know (I’ve quadruple checked resource creation and deletion, maybe VPI is missing a cudaFree…).

Ideally there also wouldn’t be so many cudamalloc/cudafree with vpiImageCreateView(), that’s somehow the bottleneck of the code, but that’s just me complaining haha (I guess that’s creating gpu allocated space to store addresses and strides of the view?)

frenzi · January 3, 2023, 4:52am

engine.hpp (1.2 KB)
engine.cpp (691 Bytes)
main.cpp (111 Bytes)
CMakeLists.txt (513 Bytes)

Minimal working example is attached, which has the following output for me:

Hello, world from main!
Starting Engine
Thread 0 starting
Stopping Engine
Thread 1 starting
Worker 0 Created
Worker 0 Destroyed
Worker 1 Created
Worker 1 Destroyed
Thread 0 ending
Thread 1 ending
Engine Stopped
[WARN ] 2023-01-03 15:50:45 (cudaErrorInvalidResourceHandle)
[ERROR] 2023-01-03 15:50:45 Error destroying cuda device: 0��ڪ�
[WARN ] 2023-01-03 15:50:45 (cudaErrorContextIsDestroyed)
[WARN ] 2023-01-03 15:50:45 (cudaErrorContextIsDestroyed)
[WARN ] 2023-01-03 15:50:45 (cudaErrorContextIsDestroyed)
[WARN ] 2023-01-03 15:50:45 (cudaErrorContextIsDestroyed)
..... and more

frenzi · January 3, 2023, 6:02am

In my main code, if I use cudactx I get segfaults when trying to recreate a worker class for the second time in one thread, however after removing there is no issues. I tried to use compute-sanitizer but it spits the dummy before it even tries to run my code (and says no issues after running it). Barebones install of latest jetpack on xavier nx, idk what else I’m mean to do to run compute-sanitiser correctly (other than compute-sanitizer --flags exe)

========= Internal Sanitizer Error: Failed to initialize mobile debugger interface. Please check that /dev NVIDIA nodes have the correct permissions
========= 
========= Internal Sanitizer Error: Device not supported. Please refer to the "Supported Devices" section of the sanitizer documentation
=========

Robert_Crovella · January 3, 2023, 4:14pm

see here

I don’t have a jetson device to run on, nor a system set up with VPI. So it may be a while before I look at this. You may get more/better response by asking on the relevant jetson forum. If I had to guess, VPI may have some default handling of CUDA contexts, for example if a context is already created and current to the calling thread, then VPI create context may just use that one rather than creating a new one. (that is just a guess). If that were the case, it would explain the behavior. However I didn’t find any documentation to that effect.

FWIW, after a quick perusal of VPI sample codes and example, I didn’t find any that did this:

cuda context create
VPI context create
VPI context destroy
cuda context destroy

frenzi · January 3, 2023, 9:42pm

I found without additional cuda context, each thread would still be waiting for each other to finish cudafree/cudamalloc when analysing with nsight systems. When creating a cuda context for each thread, they wouldn’t be blocked by each other and benchmark time would decrease by 30%.

When poking around with what results in these errors, vpictxcreate and cudamalloc/free in my code runs fine, but if I add any other API call such as vpiimage/stream/payload create (and the respective free) I get these errors.

I wasn’t sure if I was using the driver API incorrectly or something, I haven’t done so before, and I couldn’t really find any examples online, so that’s why I posted here. If its a VPI problem, I could repost in Jetson Forums.

frenzi · January 17, 2023, 1:21am

Solved here, extra function call to vpiContextSetCurrent is required to bind VPI context to the thread, overall structure is as follows:

CUcontext cuCtx;
cuCtxCreate(&cuCtx, CU_CTX_MAP_HOST, 0);

VPIContext vpiCtx;
vpiContextCreate(VPI_BACKEND_CUDA, &vpiCtx);
vpiContextSetCurrent(vpiCtx);

... do work ...

vpiContextDestroy(vpiCtx);
cuCtxDestroy(cuCtx);

system · January 31, 2023, 1:22am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
VPI Problems demonstrated with CUDA Context Jetson Xavier NX cuda , vpi	15	1208	January 31, 2023
CUcontext creation and destruction leads to handles leak How to create/destroy context in the worker CUDA Programming and Performance	10	10415	February 17, 2009
Why cuCtxDestroy craches? CUDA Programming and Performance	8	6860	September 3, 2010
Crashes at cuCtxDestroy() when running multiple threads CUDA Programming and Performance	0	1331	June 27, 2012
CUDA context simple program. Need help! CUDA Programming and Performance	0	4546	August 10, 2010
Destroying CUDA context cause core dump Video Processing & Optical Flow	0	400	December 25, 2023
Problems using cuda context CUDA Programming and Performance	4	7918	August 11, 2010
Running cuda-enabled code from a separate thread CUDA Programming and Performance	5	4106	July 15, 2011
CUDA context and multi-threading CUDA Programming and Performance	0	2725	June 17, 2009
Cleanup fail with "invalid context" error OptiX	3	1510	June 14, 2022

Creation and Cleanup of CUcontext

Related topics