DNN Inference PX2

Hi everyone,

for a project I want to build an application for the PX 2 using an own neural network.
The network is fed with images from a camera or video.
My problem is that the output of my network is not what I expect.
This is how I run the NN:

dwDNN_infer(dnn_Output, &dnn_Input, dnn);
cudaMemcpy(dnn_Output_H[0].get(), dnn_Output[0], sizeof(float32_t) * output_size[0], cudaMemcpyDeviceToHost));

The values of dnn_Output_H[0] are not at all what I expect as an output from the network.
Am I missing something with the dwDNN_infer call? Is dnn_Output_H[0] the right variable to look at, or is it possible that the error is already in my NN?

Happy for any suggestions!

Dear bdevm

Did you review “DNN Workflow” part(Development Guide -> DNNs -> DNN Workflow) in DriveWorks doc?

Dear StenvNV,

yes I did look into it.
Am I correct, that in the results (output of my network) is stored in dnnOutputHost?

// Enqueue asynchronous copy of the inference results to host memory
cudaMemcpyAsync(dnnOutputHost[output1Index].data(), dnnOutputs[output1Index], sizeof(float32_t) * numElements1, cudaMemcpyDeviceToHost);
cudaMemcpyAsync(dnnOutputHost[output2Index].data(), dnnOutputs[output2Index], sizeof(float32_t) * numElements2, cudaMemcpyDeviceToHost);

If that is the case something is wrong with my network :/

Hi bdevm

We had this exact same problem a while back, where the results we got from our TensorFlow models did not mach the results from either DriveWorks DNN or running TensorRT directly via C++ interface. In our case, with a model that took one input (image from camera feed, a sekonix ar231-rccb) we ended up writing our own cuda pipeline to “prepare” the image (resize-crop) but the main thing was the conversion of the image from HWC to CHW format. Once we used our own cuda kernel for CHW conversion and/or openCV for resize and crop, matching the training, and converted the image (with DriveWorks we bypassed dwDataConditioner_prepareData and prepared the input ourselves) we saw exact results in both DriveWorks and tensorRT on DPX2. We also noticed that it is important if image is resized for training, the same filter is used to resize it during inference, although this produces results that are slightly different than TF. We also noticed openCV resize with exact same filters on the CPU and GPU are slightly different, so we had to stay consistent. Anyway, this resolved our issues and now we can get exact results as TensorFlow in both DriveWorks and straight tensorRT.

I face the same problem(https://devtalk.nvidia.com/default/topic/1045548/yolo-tensorrt-model-on-sample_object_detector-with-weird-bounding-boxes/#5305269) during use YOLO V1 as my dnn network. I think that’s nothing to do with network’s error, it’s about the drivewoks inference. I found it’s weird in dwdnn_infer or dwdnn_inferSIO’s output. Anyone knows the output’s content?

By the way, deer servanti mentions the CUDA’s pipeline. I want to make sure that dwDataConditioner_prepareData shouldn’t be used in code, right? And the original image format is HWC not CHW? So I have to write my own CUDA pipeline to make the conversion?

Hi imugly1029

As far as our tests showed the dwDataConditioner_prepareData did not do the HWC->CHW conversion but I do not know the internal code some maybe some one from NVIDIA can comment on this. But since we wanted to run both through DriveWorks API and just C++/TensorRT without DriveWorks we wrote our own CUDA kernels to resize/crop etc. and convert to CHW , we also developed another path of using OpenCV for resizing and cropping but we still do the CHW and conversion to float ourselves.

We have also compiled TensorFlow for aarch64 and can run inference with python. In that case NumPy has a reshape function that does the CHW conversion , but we are more interested in C++/tensorRT for performance reasons. TensorFlow was just an “intermediate” step.

Hi servanti!
Very appreciate for the reply!
So it means that each frame originally is HWC format and we need to convert to CHW for inference right?

By the way, the tensorRT model transfer from tensorRT_optimization is correct right? I mean, the parameters inside the model should be the same as the original caffe/tensorflow model?