nvbufsurftransform:cuInit failed : 100 on A4000 DGPU

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

DGPU

• DeepStream Version

6.2

• TensorRT Version

8.5.2

• NVIDIA GPU Driver Version (valid for GPU only)

535

• Issue Type( questions, new requirements, bugs)

We received errors on deepstream 6.2 docker image saying

nvbufsurftransform:cuInit failed : 100

This happened 2 days back. The system was running properly since then. Is this because of a kernel update on ubuntu 20.04 ? Kernel seems to be auto updated 4 days back from 5.15 to 6.5.

Is the error because of this? No other cuda or nvidia driver packages were modified.

We don’t know.

Why does this error pop up usually?

The log is quite low level. The nvbuftransform is used in many modules to handle image scaling. So the direct reason is the image scaling failed, but it is not useful for debugging.

Thank you. Do you have any resource you could point us to, to debug this?

It is hard to give any instruction with just one error log.

We figured the issue, it’s related to NOTICE: Containers losing access to GPUs with error: "Failed to initialize NVML: Unknown Error" · Issue #1730 · NVIDIA/nvidia-docker · GitHub

nvidia-smi inside the container throws the above error when we get nvbufsurftransform error.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.