Multiple GPU

• Hardware Platform (Jetson / GPU) NVIDIA A2
• DeepStream Version 6.3
• TensorRT Version 8.4.0
• NVIDIA GPU Driver Version (valid for GPU only) 535.129.03

Hello,

I have two NVIDIA A2 GPUs. I am trying to run deepstream_test_3.py with nvinferserver. When I do set gpu_ids: [0], it works fine and the probe function can access the objects. But, when I make it gpu_ids: [1], the probe function says that I have no objects detected.


def tiler_sink_probe(pad,info,u_data):
    frame_number=0
    num_rects=0
    got_fps = False
    number_of_objects_in_batch = 0
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return
    
    
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        number_of_objects_in_batch = number_of_objects_in_batch + frame_meta.num_obj_meta
        frame_number=frame_meta.frame_num
        l_obj=frame_meta.obj_meta_list
        num_rects = frame_meta.num_obj_meta
        time_stamp = frame_meta.ntp_timestamp
        timestamp_seconds = time_stamp / 1e9

        # Convert timestamp to datetime
        dt_object = datetime.utcfromtimestamp(timestamp_seconds)

        # Format the datetime as a string
        formatted_date_time = dt_object.strftime('%Y-%m-%d %H:%M:%S.%f')
        print(f"Number of objects in the frame: {num_rects}")
        while l_obj is not None:
            try: 
                # Casting l_obj.data to pyds.NvDsObjectMeta
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break
            obj_user_meta_list = obj_meta.obj_user_meta_list
            while obj_user_meta_list is not None:
                try:
                    user_meta = pyds.NvDsUserMeta.cast(obj_user_meta_list.data)
                except StopIteration:
                    break
                if (
                    user_meta.base_meta.meta_type
                    == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META
                ):

                    tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
                    if tensor_meta.unique_id == 3:
                        tensor = get_tensor_meta(26, tensor_meta)
                        age, gender = get_age_gender(tensor)
                        print(f"Age: {age}, Gender: {gender}")
                    
                    else:
                        print('Unknown tensor meta id')

                try:
                    obj_user_meta_list = obj_user_meta_list.next
                except StopIteration:
                    break
            try: 
                l_obj=l_obj.next
            except StopIteration:
                break
        # if not silent:
        #     print("Frame Number=", frame_number, "Time Stamp=",formatted_date_time,"Cam ID:",frame_meta.source_id)

        # Update frame rate through this probe
        stream_index = "stream{0}".format(frame_meta.pad_index)
        global perf_data
        perf_data.update_fps(stream_index)

        try:
            l_frame=l_frame.next
        except StopIteration:
            break
    # print(f"No. objects in batch: {number_of_objects_in_batch}")
    return Gst.PadProbeReturn.OK


This line print(f"Number of objects in the frame: {num_rects}") always prints 0.

Here are my configs:
config.txt (1.3 KB)
config_triton_infer_primary_peoplenet.txt (1.2 KB)

Note:
I also get the memory types just like this

mem_type = int(pyds.NVBUF_MEM_CUDA_UNIFIED)
streammux.set_property('nvbuf-memory-type', mem_type)
nvvidconv.set_property('nvbuf-memory-type', mem_type)
tiler.set_property('nvbuf-memory-type', mem_type)

Thank you for your use case, we will analyze the problem as soon as possible

I modified a bit in my code, and now I am getting this error:

Error:Input surface gpu-id doesnt match with configured gpu-id for element, please allocate input using unified memory, or use same gpu-ids OR, if same gpu-ids are used ensure appropriate Cuda memories are used

How can I change Input surface gpu-id to set it 1. Since I want to run the models and pipeline on GPU_1.

Later, I want to run some plugins on gpu_0 and some others one gpu_1.

You should not change this value, but rather unify the memory of all plugins to NVBUF_MEM_CUDA_UNIFIED.

Okay, to narrow this issue down, I modified deepstream_test_3.py to be nvstreammux + nvurisrcbin + nvinferserver + sink and removed everything else.

I couldn’t set the memory of nvinferserver to NVBUF_MEM_CUDA_UNIFIED since it doesn’t have that property. Any suggestions?

The config files of nvinferserver which is my pgie (detector), are attached above. I set gpu_ids: [1] and gpus: [1] in both files.

I’ll attach my code here too.
deepstream_test_3_simple.txt (15.3 KB)

This code runs, but I can’t fetch any objects in the probe function.

This is the command I use to run the pipeline:

python3 deepstream_test_3_simple.py -i file:///videos/cam1.mp4 --no-display --pgie nvinferserver -c /opt/nvidia/deepstream/deepstream-6.3/samples/triton_model_repo/peoplenet/config_triton_infer_primary_peoplenet.txt

About running the docker container, I built an image and created a container out of it. I am mounting some directories but the main command is:

docker run -it --name repro --gpus all tempds

Appreciate your support.

Okay, I tried to unify memory and set the same gpu_id of all plugins.

I got this error:

Unable to set device in gst_nvstreammux_change_state
Unable to set device in gst_nvstreammux_change_state

And this is the output after exporting the log file.

0:00:00.163911663 e[335m  226e[00m      0x2fb3390 e[33;01mWARN   e[00m e[00;01m                 bin gstbin.c:2832:reset_state:<Stream-muxer>e[00m Failed to switch back down to NULL

Another trail:

The pipeline works fine if I didn’t set gpu-id in any plugin and I changed back the gpu-id in the config file. (Running the whole pipeline on GPU0)

I used this command

export CUDA_VISIBLE_DEVICES=1

And this used GPU_1 instead of GPU_0. Which is great!!
But I want to use some plugins on GPU_0 and some others on GPU_1 to make it balanced.
When I set the gpu-id in config files to GPU1 (As I’ll attach below), I get this error:

INFO: infer_trtis_backend.cpp:218 TrtISBackend id:3 initialized model: age_gender
0:00:00.998863206  1135      0x987b2c0 WARN           nvinferserver gstnvinferserver.cpp:412:gst_nvinfer_server_logger:<secondary-inference> nvinferserver[UID 3]: Warning from allocateResource() <infer_cuda_context.cpp:554> [UID = 3]: Attention !! Tensor pool size larger than max host tensor pool size: 64 Continuing with user settings
0:00:01.000407529  1135      0x987b2c0 WARN           nvinferserver gstnvinferserver_impl.cpp:360:validatePluginConfig:<primary-inference> warning: Configuration file batch-size reset to: 1
WARNING: infer_proto_utils.cpp:144 auto-update preprocess.network_format to IMAGE_FORMAT_RGB
0:00:01.000654170  1135      0x987b2c0 ERROR          nvinferserver gstnvinferserver.cpp:408:gst_nvinfer_server_logger:<primary-inference> nvinferserver[UID 1]: Error in createNNBackend() <infer_trtis_context.cpp:223> [UID = 1]: InferTrtISContext failed to set cuda device(1) during creatingNN backend, cuda err_no:101, err_str:cudaErrorInvalidDevice
0:00:01.000670980  1135      0x987b2c0 ERROR          nvinferserver gstnvinferserver.cpp:408:gst_nvinfer_server_logger:<primary-inference> nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:79> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_CUDA_ERROR
0:00:01.000691600  1135      0x987b2c0 WARN           nvinferserver gstnvinferserver_impl.cpp:592:start:<primary-inference> error: Failed to initialize InferTrtIsContext
0:00:01.000698260  1135      0x987b2c0 WARN           nvinferserver gstnvinferserver_impl.cpp:592:start:<primary-inference> error: Config file path: /opt/nvidia/deepstream/deepstream-6.3/samples/triton_model_repo/peoplenet/config_triton_infer_primary_peoplenet.txt
0:00:01.000715100  1135      0x987b2c0 WARN           nvinferserver gstnvinferserver.cpp:518:gst_nvinfer_server_start:<primary-inference> error: gstnvinferserver_impl start failed
Warning: gst-library-error-quark: Configuration file batch-size reset to: 1 (5): gstnvinferserver_impl.cpp(360): validatePluginConfig (): /GstPipeline:pipeline0/GstNvInferServer:primary-inference
Error: gst-resource-error-quark: Failed to initialize InferTrtIsContext (1): gstnvinferserver_impl.cpp(592): start (): /GstPipeline:pipeline0/GstNvInferServer:primary-inference:
Config file path: /opt/nvidia/deepstream/deepstream-6.3/samples/triton_model_repo/peoplenet/config_triton_infer_primary_peoplenet.txt
Exiting app

These are the configs of the previous trial:
config.txt (493 Bytes)
config_triton_infer_primary_peoplenet.txt (1.2 KB)

@yuweiw Thank you for your support.

There is a bug in this scenario, we are investigating this.

Do you mean that balancing effort has an issue? and you knew about it?
Or
Does my scenario have an issue?

You can see from our nvinferserver guide.
There maybe some bugs when use the multiple GPU for nvinferserver. And we are analyzing this bug.

I took a look on that guide many times. Actually, my configs are written based on it and some Nvidia examples.

Okay, please let me know whenever it is possible how I can run some plugins on GPU_0 and the others on GPU_1.

It is not currently available, and we will analyze and solve this issue as soon as possible.

Thank you. Please ping me when it’s solved.

Good luck.

Hi @JoeShz , Could you just modify the “gpus” to 1 in the config.pbtx file and use the engine generated by GPU1? You can see if that meets your needs.

Hello @yuweiw

So should I set gpu in both config files?

And, how can I generate an engine using a specific GPU?

NO. The config_triton_infer_primary_peoplenet.txt is for nvinferserver, the config.pbtx is for the triton server. There is a bug in our code currently. Since it’s open source, you can try to use the patch below to meet your needs.

src/utils/nvdsinferserver/infer_preprocess.cpp
NvDsInferStatus CropSurfaceConverter::resizeBatch(
    SharedBatchBuf& src, SharedBatchBuf& dst) {
    assert(m_ConverStream);
    InferDebug("NetworkPreprocessor id:%d resize batch buffer", uniqueId());
......

-   int devId = src->getBufDesc().devId;
+   int devId = dst->getBufDesc().devId;

......
1 Like

Thank you @yuweiw for your response!!

Sorry for being late. I’ll try to modify that piece of code and see if it meets my requirements and get back to you.