Yolov3 Implementation error in inference

nithin.m1 · June 21, 2022, 9:01am

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
[+] DRIVE OS Linux 5.2.6 and DriveWorks 4.0
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
[+] Linux
QNX
other

Hardware Platform
[+] NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
[+] 1.8.0.10363
other

Host Machine Version
[+] native Ubuntu 18.04
other

Hi team,

I am trying to implement yolov3 in dw 4 by modifying the sample object detection and tracking.
I followed the yolov3 (tensorrt sample provided). Converted the model in onnx format and later to .bin file uisng tensorrt optimization tool.
But am getting error stating ‘illegal memory access’. Following that I ran the cuda-memcheck and the result is attached.
I did not get any error while creating tensorrt optimized model. Also its able to initialise the DNN and return the output blob sizes etc.
What could be the reason for this error?
am attaching the main.cpp and memcheck result and dataconditioner parameters used.

One thing to add: Intermittently I was able to run the code and generate the output, but most of the time I was getting the above stated error.
cuda_mem_check_yolov3.txt (215.0 KB)
main.cpp (40.3 KB)
model32.bin.json (321 Bytes)

SivaRamaKrishnaNV · June 21, 2022, 9:23am

Dear @nithin.m1,
Intermittently I was able to run the code and generate the output, but most of the time I was getting the above stated error

So, you could successfully load model and perform inference couple of times in the multiple runs of same executable(loading same trt model)?

nithin.m1 · June 21, 2022, 9:29am

yes but previously. I did not change core code after that other than adding some functions (means did not touch input output initialization /allocation and inference part etc). Now every time am getting this error.

SivaRamaKrishnaNV · June 21, 2022, 11:00am

Dear @nithin.m1,
Is the issue occurring in tensorRT model loading now? were you able to narrow down which line/API/function is hitting the issue?

nithin.m1 · June 21, 2022, 11:09am

Hi Siva,

While running the code, right after the inference using dwDNN_inferRaw line, we are copying the outputs to host from device, there am getting this ‘illegal memory access’ error. Thats why I ran cuda-memcheck and attached.

SivaRamaKrishnaNV · June 22, 2022, 9:42am

Dear @nithin.m1,

Commands

/usr/local/driveworks-4.0/tools/dnn/tensorRT_optimization --modelType=onnx --onnxFile=/usr/src/tensorrt/samples/python/yolov3_onnx/yolov3.onnx --out=~/yolov3.bin
./sample_object_detector_tracker_yolov3 --tensorRT_model=/home/sindarapu/yolov3.bin --video=</path/to/DW_data>/samples/sfm/triangulation/video_0.h264

I see output like below

[22-06-2022 15:06:01] Platform: Detected Generic x86 Platform
[22-06-2022 15:06:01] TimeSource: monotonic epoch time offset is 1655885559259828
[22-06-2022 15:06:01] Platform: number of GPU devices detected 1
[22-06-2022 15:06:01] Platform: currently selected GPU device discrete ID 0
[22-06-2022 15:06:01] Context::mountResourceCandidateDataPath resource FAILED to mount from '/home/sindarapu/DW-4.0-Host/src/dnn/sample_object_detector_tracker_yolov3/data/': VirtualFileSystem: Failed to mount '/home/sindarapu/DW-4.0-Host/src/dnn/sample_object_detector_tracker_yolov3/data/[.pak]'
[22-06-2022 15:06:01] Context::findDataRootInPathWalk data/DATA_ROOT found at: /usr/local/driveworks-4.0/data
[22-06-2022 15:06:01] Context::mountResourceCandidateDataPath resource FAILED to mount from '/usr/local/driveworks-4.0/data/': VirtualFileSystem: Failed to mount '/usr/local/driveworks-4.0/data/[.pak]'
[22-06-2022 15:06:01] Context::findDataRootInPathWalk data/DATA_ROOT found at: /usr/local/driveworks-4.0/data
[22-06-2022 15:06:01] Context::mountResourceCandidateDataPath resource FAILED to mount from '/usr/local/driveworks-4.0/data/': VirtualFileSystem: Failed to mount '/usr/local/driveworks-4.0/data/[.pak]'
[22-06-2022 15:06:01] SDK: No resources(.pak) mounted, some modules will not function properly
[22-06-2022 15:06:01] TimeSource: monotonic epoch time offset is 1655885559259828
[22-06-2022 15:06:01] Initialize DriveWorks SDK v4.0.0
[22-06-2022 15:06:01] Release build with GNU 7.4.0 from no-gitversion-build
[22-06-2022 15:06:01] SensorFactory::createSensor() -> camera.virtual, offscreen=0,profiling=1,tensorRT_model=/home/sindarapu/yolov3.bin,video=/home/sindarapu/DW-4.0-Host/_data/samples/sfm/triangulation/video_0.h264
[22-06-2022 15:06:01] CameraVirtual: defaulting to non SIPL
[22-06-2022 15:06:01] CameraBase: pool size set to 8
[22-06-2022 15:06:01] CameraNVCUVID: no seek table found at /home/sindarapu/DW-4.0-Host/_data/samples/sfm/triangulation/video_0.h264.seek, seeking is not available.
SimpleCamera: Camera image: 1280x800
Camera image with 1280x800 at 30 FPS
[22-06-2022 15:06:02] Loaded engine size: 351 MB
[22-06-2022 15:06:02] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 154 MB, GPU 584 MB
[22-06-2022 15:06:03] [MemUsageChange] Init cuDNN: CPU +430, GPU +148, now: CPU 585, GPU 1084 (MB)
[22-06-2022 15:06:03] [MemUsageChange] Init cuBlas: CPU +58, GPU +44, now: CPU 643, GPU 1128 (MB)
[22-06-2022 15:06:03] Deserialize required 772327 microseconds.
[22-06-2022 15:06:03] [MemUsageSnapshot] deserializeCudaEngine end: CPU 643 MB, GPU 1110 MB
[22-06-2022 15:06:03] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 643, GPU 1120 (MB)
[22-06-2022 15:06:03] [MemUsageChange] Init cuBlas: CPU +0, GPU +8, now: CPU 643, GPU 1128 (MB)
[22-06-2022 15:06:03] DNN: Metadata json file could not be found. Metadata has been filled with default values. Please place <network_filename>.json in the same directory as the network file if custom metadata is needed.
m_largeIdx= 0m_mediumIdx= 1m_smallIdx= 2
[22-06-2022 15:06:03] DataConditioner: Scale transformation has been configured with 1.0000000.
[22-06-2022 15:06:03] DataConditioner: Mean value subtract transformation has been configured with {0.00000, 0.00000, 0.00000}.
[22-06-2022 15:06:03] DataConditioner: Standard deviation has been configured with {1.00000, 1.00000, 1.00000}.
m_detectionRegion.x= 0
bbox_score= 1.50015e-06
bbox_score= 1.19788e-07
bbox_score= 8.95024e-08
bbox_score= 8.56612e-08
bbox_score= 3.45667e-08
bbox_score= 2.07359e-11
bbox_score= 5.34504e-18
bbox_score= 1.32322e-26
bbox_score= 3.1229e-23
bbox_score= 1.39939e-23

I used the main.cpp provided in the description and I see there is no memory access issues on my host.

nithin.m1 · June 22, 2022, 9:57am

Hi @SivaRamaKrishnaNV ,

Thanks for trying it out. yes even its running fine for me when I converted your onnx file to .bin file and tried.
something should have gone wrong while I converted model to onnx format. But as I mentioned I just followed the link. TensorRT/yolov3_to_onnx.py at main · NVIDIA/TensorRT · GitHub
Let me check on that.

Thank you,
Nithin

SivaRamaKrishnaNV · June 22, 2022, 9:58am

Dear @nithin.m1,
please use files at /usr/src/tensorrt/samples/python/yolov3_onnx

Topic		Replies	Views
Sample object detector tracker YOLOV3 model error inference DRIVE AGX Xavier General driveworks	14	1493	August 4, 2022
Unexpected exception an illegal memory access was encountered DeepStream SDK	4	2654	February 1, 2022
pycuda._driver.LogicError: cuStreamSynchronize failed: an illegal memory access was encountered TensorRT	1	1154	September 3, 2021
Error: cuStreamSynchronize failed: an illegal memory access was encountered TAO Toolkit yolo , pycuda	7	3978	October 12, 2021
Illegal memory access Error Computer Vision & Image Processing tensorrt , pycuda	1	2627	May 3, 2023
Runutime Error TensorRT	1	424	April 27, 2023
Yolov3 not working TensorRT	3	615	June 14, 2021
Error while converting yolov3 to trt engine TensorRT tensorrt , yolo , pycuda	2	1216	June 7, 2022
I reported an error when I finished converting my onnx model Cuda Runtime (an illegal memory access was encountered) DeepStream SDK	4	435	July 25, 2023
Errore CUDA failure 'an illegal memory access was encountered' TAO Toolkit cuda , docker , yolo , tao	5	1991	February 16, 2022

Yolov3 Implementation error in inference

Related topics