Engine plan file incompatible?

Hello!
I was trying to run the sample project from DriveWorks then I got this:
The engine plan file is generated on an incompatible device, expecting compute 5.0got compute 6.1, please rebuild.

So I rebuild the sample projects with the cmake file provided, but it give me the same message. I wonder how is this engine plan file generated and how should I do the rebuild?

Dear yoannzh,
Could you please check generating the model using TensorRT_Optimization tool in DW on board.

Thanks Siva

Can you elaborate a bit what model I need to regenerate and using what command?
I only found two .caffemodel files in the DriveWorks folder.
BTW, it is DriveWorks 2.0 and I am trying to run the sample on the host.

Dear yoannzh,
Could you confirm if it is DRIVE AGX platform or DRIVE PX 2? The last release for DRIVE PX 2 has Driveworks 1.2.
If it is DRIVE AGX, please check TensorRT_optimization tool documentation(file:///usr/local/driveworks-2.0/doc/nvsdk_html/dwx_tensorRT_tool.html). This tool is used to convert your caffe/onnx/uff models to TensorRT model which can be used inside DW sample.
Also, please share your host system GPU details. You can run deviceQuery sample in CUDA(/usr/local/cuda/samples/1_Utilities/deviceQuery).

Hello, Siva

It is DRIVE AGX and I just want to run some sample code for starters. I don’t have any caffe/onnx/uff model yet. So my question was, e.g. in order to run Sample_Dnn_plugin, I am re-generating model sample_mnist.caffemodel with tensorRT_optimization, but I am stuck at getting the right argument for --outputBlobs.

Here is the result executing deviceQuery:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Quadro M2000M”
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 4042 MBytes (4238802944 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1137 MHz (1.14 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

Dear yoannzh,
First thing, You need minimum PASCAL based GPU on host to run all Driveworks samples(https://devtalk.nvidia.com/default/topic/1050024/development-environment-requirements-/). It seems You have Maxwell based GPU card.
To know outputBlobs, you can check the corresponding prototxt file(In case of mnist it is prob).

OK, I though Maxwell is the minimum requirement and Pascal is recommended. Thanks!

Hello Siva
I think it might work on the Maxwell, since I re-generated the model file for ‘sample_dnn_plugin’ and it worked. But when I re-generated the model file for sample_object_detector_tracker, i got the same problem as in this thread: https://devtalk.nvidia.com/default/topic/1044350/general/yolo-tensorrt-model-for-sample_object_detector/1

So I assume it is not GPU power related, I think it is the mismatch between the ‘prototxt’ and the sample code.

However,I wonder how it is working with the other samples like ‘sample_drivenet’. With a quick look at the source code, I didn’t find any reference of the model files like the two samples above. While these samples are asking to rebuild to cope with the compute capability as mentioned above, how I can locate the model file and re-generate them? Thanks!

Dear yoannzh,
Just for clarification, Maxwell GPU is minimum, but we recommend PASCAL based GPUs to have no issues. We notice moving from Maxwell to PASCAL solved issues for few customers earlier.
Driveworks DNN APIs use engine file generated using TensorRT_Optimization tool. The default shipped engine files might not work on your machine due to mismatch of architecture. So you need to regenerate them using TensorRT_Optimization tool.

Now, Could you tell details on which network you are trying to regenerate now?

Hello Siva
As I mentioned in my last ticket, I re-generated the model file for ‘sample_dnn_plugin’ (sample_mnist.prototxt & sample_mnist.cafemodel) using TensorRT_Optimization tool, then this sample works well.
NOW, I am trying to make ‘sample_object_detector_tracker’(predict.prototxt & weights.cafemodel) working in the same way. BUT, after I re-generated the model file (‘tensorRT_model.bin’) using TensorRT_Optimization tool, while executing the sample, I got the same error as in this thread: https://devtalk.nvidia.com/default/topic/1044350/general/yolo-tensorrt-model-for-sample_object_detector/1 It is interesting that he was running this same sample project. After reading through that thread, I think the problem has nothing to do with GPU architecture, it seems to be a mismatch of the dimension of the model output and the cuda project.
It will be helpful that if you can reproduce this issue on your side, just re-generate the ‘tensorRT_model.bin’ with ‘predict.prototxt’ & ‘weights.cafemodel’ using TensorRT_Optimization tool, and try to run ‘sample_object_detector_tracker’ with the new bin file to see what happens
Thanks

Dear yoannzh,
Could you tell which predict.prototxt & weights.cafemodel used with TensorRT_Optmization tool for sample_object_detector_tracker? if they are from DW package, could you share the location of those files?

Hello Siva

The problem is solved, which proved Maxwell is… OK?
However, I still have the same problem for other samples (e.g. drivenet). Since the CNN models used in these sample projects(e.g. drivenet) are packaged in the DW lib and are not exposed to us. I guess there is no way to solve the issue for these sample projects? or?

BR!

Dear eeywrj,
You need PASCAL based Host PC to run all DW samples on host. Maxwell is not sufficient.
The sample_dnn_plugin sample worked for you because you regenerated engine file for maxwell. You need PASCAL based host to run other DNN samples.