Nvinferserver always allocates memory on GPU ID 0 and ignores gpu_ids configuration

philipp.schmidt · April 25, 2023, 5:37pm

When using nvinferserver on a multi-server gpu server, configuring gpu_ids to != [0] still always allocates memory on gpu ID 0, additionally to the specified GPU.

• Hardware Platform dGPU
• DeepStream Version 6.2-triton docker
• NVIDIA GPU Driver Version 525.105.17
• Issue Type bug
• How to reproduce the issue ?

Reproduce with the following pipeline on a server with e.g. 4 GPUs in deepstream 6.2-triton docker:

export USE_NEW_NVSTREAMMUX=yes
export VIDEO=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264

gst-launch-1.0 filesrc location=$VIDEO ! h264parse ! nvv4l2decoder gpu-id=2 ! mux.sink_0 nvstreammux name=mux ! nvinferserver config-file-path="./config_triton_grpc_infer.txt" ! fakesink

Notice nvv4l2decoder gpu-id=2 sets decoding to GPU ID 2. Set gpu_ids: [2] in config_triton_grpc_infer.txt to match that.

Will result in 144 MB on GPU 2, but additional 102 MB on GPU 0:

For nvv4l2decoder gpu-id=1 and nvinferserver gpu_ids: [1]

For nvv4l2decoder gpu-id=0 and nvinferserver gpu_ids: [0]

No additional memory needed now that both gpus are ID 0.

For nvv4l2decoder gpu-id=2 and nvinferserver gpu_ids: [1]

0:00:00.653846585  2030 0x55d37a57e980 WARN           nvinferserver gstnvinferserver.cpp:628:gst_nvinfer_server_submit_input_buffer:<nvinferserver0> error: Memory Compatibility Error:Input surface gpu-id doesn't match with configured gpu-id for element, please allocate input using unified memory, or use same gpu-ids OR, if same gpu-ids are used ensure appropriate Cuda memories are used
0:00:00.653882336  2030 0x55d37a57e980 WARN           nvinferserver gstnvinferserver.cpp:628:gst_nvinfer_server_submit_input_buffer:<nvinferserver0> error: surface-gpu-id=2,nvinferserver0-
[ERROR push 333] push failed [-5]

Which is an expected outcome, because buffers are on gpu 2, but nvinferserver is on gpu 1.

Crosschecking that nvinferserver is the plugin that allocates the additional memory on GPU 0 by omitting it from the pipeline:

gst-launch-1.0 filesrc location=$VIDEO ! h264parse ! nvv4l2decoder gpu-id=3 ! mux.sink_0 nvstreammux name=mux ! fakesink

Only allocates memory on GPU ID 3.

• Requirement details

Specifying gpu_ids should not additionally allocate memory on GPU ID 0. Buffers should be on same GPU and also processing should take place only on that GPU.

If possible please confirm the issue and provide workarounds for Deepstream 6.2.

fanzh · April 26, 2023, 3:09pm

did you modify config_triton_grpc_infer.txt? if yes, please share the configuration file?

philipp.schmidt · April 26, 2023, 3:50pm

The problem is not related to our config, please use the following deepstream samples to reproduce the same issue (everything already shipped in docker image):

# Start deepstream triton docker, allow all gpus
docker run --gpus all -it -e CUDA_CACHE_DISABLE=0 nvcr.io/nvidia/deepstream:6.2-triton

To start a valid tritonserver follow the sample instructions:

# Init the sample triton model repo
cd /opt/nvidia/deepstream/deepstream/samples
./prepare_ds_triton_model_repo.sh
# Start tritonserver with this sample repo
tritonserver --model-repository=/opt/nvidia/deepstream/deepstream/samples/triton_model_repo

Attach a new bash to the same docker container via docker exec -it CONTAINER-ID /bin/bash and run:

export USE_NEW_NVSTREAMMUX=yes
export VIDEO=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
export MODEL_CONFIG=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app-triton-grpc/config_infer_plan_engine_primary.txt

gst-launch-1.0 filesrc location=$VIDEO ! h264parse ! nvv4l2decoder gpu-id=0 ! mux.sink_0 nvstreammux name=mux ! nvinferserver config-file-path=$MODEL_CONFIG ! fakesink

This should give you a perfectly working pipeline run until the video file ends.

Now to observe the actual issue please open the sample model config with nano $MODEL_CONFIG and change gpu_ids: [0] to gpu_ids: [2]. Also change the pipeline above to nvv4l2decoder gpu-id=2.

So the new pipeline to observe the issue:

gst-launch-1.0 filesrc location=$VIDEO ! h264parse ! nvv4l2decoder gpu-id=2 ! mux.sink_0 nvstreammux name=mux ! nvinferserver config-file-path=$MODEL_CONFIG ! fakesink

See nvidia-smi now on the same server while this pipeline runs:

The 102MB additional memory on GPU 0 are a critical issue. nvv4l2decoder is set to GPU 2 and nvinferserver is set to GPU 2, but still memory is allocated on GPU 0. Every single deepstream pipeline which uses nvinferserver will allocate 102MB on GPU ID 0, so VRAM of GPU 0 is the overall bottleneck of the system, the other GPUs can not be utilized fully. We need a fix or workaround before we can use nvinferserver plugin in production in its current state.

philipp.schmidt · April 26, 2023, 3:54pm

Furthermore, I don’t understand why nvinferserver would need 100MB of VRAM on GPU 0 in the first place. Memcopying the buffers from GPU 2 to GPU 0 for preprocessing clearly makes no sense when we look at the statement in the nvinferserver documentation:

fanzh · April 28, 2023, 9:22am

thanks for your sharing, I can reproduce this issue, it is related to nvinferserver, we are investigating. BTW, nvinferserver plugin is opensource.
this command will use one GPU.
gst-launch-1.0 filesrc location=$VIDEO ! h264parse ! nvv4l2decoder gpu-id=2 ! mux.sink_0 nvstreammux name=mux ! fakesink
this command will use two GPU.
gst-launch-1.0 filesrc location=$VIDEO ! h264parse ! nvv4l2decoder gpu-id=2 ! mux.sink_0 nvstreammux name=mux ! nvinferserver config-file-path=$MODEL_CONFIG ! fakesink

philipp.schmidt · April 28, 2023, 9:54am

Thanks for addressing this issue. I wasn’t aware the gst-nvinferserver plugin was open source, that’s great! Where can I find the sources? We may work on a workaround on our own in the meantime.

fanzh · April 28, 2023, 9:55am

in deeptream SDK /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinferserver/
and /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinferserver/

philipp.schmidt · May 1, 2023, 11:39pm

We couldn’t locate the issue in the source code of the plugin. Is there an update on Nvidias end?

fanzh · May 3, 2023, 3:15pm

here is a bug needed to fix, need to use cudaSetDevice to set gpuId.
TrtISBackend::specifyInputDims{
int gpuId = 0;
UniqCudaTensorBuf tensor = createGpuTensorBuf(
dims.dims, layer->dataType, dims.batchSize, name, gpuId, false););

}

philipp.schmidt · May 4, 2023, 12:12pm

Hi @fanzh

We have changed the gpuId to 1 to verify, but the memory is still allocated on GPU 0.

    std::cout << "I WAS HERE" << std::endl;

    int gpuId = 1;
    CONTINUE_CUDA_ERR(
        cudaGetDevice(&gpuId), "CudaDeviceMem failed to get dev-id:%d", gpuId);

We can see the debug output, so the change is in effect. 102 MB of memory are still on GPU 0 though.

philipp.schmidt · May 4, 2023, 12:47pm

We have read the documentation on cudaGetDevice and it returns gpuId 0, because that seems to be the device in use for the current context.

So skipping the check we set GPU ID 1 directly:

    int gpuId = 1;
    // CONTINUE_CUDA_ERR(
    //     cudaGetDevice(&gpuId), "CudaDeviceMem failed to get dev-id:%d", gpuId);

    SharedBatchArray allInputs = std::make_shared<BaseBatchArray>();
    for (const auto& in : shapes) {
        ...
        UniqCudaTensorBuf tensor = createGpuTensorBuf(
            dims.dims, layer->dataType, dims.batchSize, name, gpuId, false);
        RETURN_IF_FAILED(
            tensor, NVDSINFER_CUDA_ERROR, "failed to create GPU tensor buffer.");
        ...
    }

But now we have cuda contexts on GPU 0,1 and 2 for a pipeline using only GPU 2 according to config.

We have attached a debug message to createGpuTensorBuf, and get the following output:

CREATING GPU BUFFER ON 1
CREATING GPU BUFFER ON 2
CREATING GPU BUFFER ON 2

So no buffers on GPU 0 allocated there apparently, but still 102MB memory.
Buffer on 1 is from the above lines of code. Buffers on 2 seem to be using the config correctly.

philipp.schmidt · May 4, 2023, 12:51pm

@fanzh please advise on how to proceed here and whether we need to make this an issue with our direct contacts at nvidia to prioritize this issue.
And thanks for the helpful pointers so far.

fanzh · May 4, 2023, 1:57pm

int gpuId = 1;
cudaSetDevice(gpuId);
Please try this, and need to find all gpu usage bugs.

philipp.schmidt · May 5, 2023, 11:41am

There are quite a lot of occurences of cudaSetDevice calls in the code.

How long would it take to create a patch for this? Is this something we can get support for? Happy to go through the appropriate channels. We need a solution here for multiple customer projects.

We could work with a git patch that we apply for now, it doesn’t have to be included in a release.

fanzh · May 5, 2023, 2:02pm

workaround.txt (1.2 KB)
please try this workaround code in ds6.2, especially please rebuild nvdsinferserver and repace the old /opt/nvidia/deepstream/deepstream/lib/libnvds_infer_server.so.

fanzh · May 6, 2023, 8:18am

@philipp.schmidt dose the code above work? I tested it on T4 + nvcr.io/nvidia/deepstream:6.2-triton, it works fine.

philipp.schmidt · May 6, 2023, 8:25am

Hello @fanzh, thanks for the quick help. I will have the opportunity to try in a few hours and let you know asap. Thanks!

philipp.schmidt · May 6, 2023, 5:12pm

Hello @fanzh

I can confirm the patch works, thanks for the solution and great support.
Attached the git patch in case somebody wants to apply this directly with git apply.

libnvdsinferserver.patch (1.5 KB)

Will this fix make it into a DS release anytime soon?

philipp.schmidt · May 6, 2023, 5:22pm

Also really great that this is open source and we can just apply a patch. Thumbs up for that.

system · May 24, 2023, 1:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvinferserver DeepStream SDK	13	1615	October 12, 2021
Triton Server GPU memory copy? DeepStream SDK	10	1400	May 10, 2023
Multiple GPU DeepStream SDK nvbugs	15	921	May 8, 2024
NvInfer using both GPU 0 and GPU 1 although setting gpu-id = 1 DeepStream SDK deepstream	10	175	April 29, 2025
Gpu-id=1 but memory is still allocated on GPU0 DeepStream SDK	2	249	August 29, 2023
Deepstream NvDCF tracker on running on 2 GPUs with nvinferserver DeepStream SDK deepstream	6	162	November 26, 2024
Surface-gpu-id=0,primary-inference-gpu-id=1 DeepStream SDK deepstream	4	108	March 4, 2025
Memory Compatibility Error:Input surface gpu-id doesn't match with configured gpu-id for element DeepStream SDK	15	1013	July 11, 2023
Building nvinferserver from source generates different lib than shipped lib DeepStream SDK cuda , deepstream	8	504	April 15, 2024
Extra memory usage in GPU0 when testing deepstream-segmentation-app in GPU1 DeepStream SDK	16	648	August 21, 2022

Nvinferserver always allocates memory on GPU ID 0 and ignores gpu_ids configuration

Related topics