Hi,
TLDR:
- Used convert-to-uff to convert ssd_inception_v2_coco (sampleUffSSD) and ssd_mobilenet_v1_coco to uff (x86)
- Tried running both the sample_uff_ssd and sample_uff_ssd_debug binaries. Worked for inception sample but got memory issues when trying to run the mobilenet. (Jetson TX2)
I am trying to get ssd_mobilenet_v1_coco (from the tensorflow SSD zoo) parsed in TensorRT. Comparing it the UFF SSD example in TensorRT 4.0, these two models are the same in the plugin support necessary to get them running (at least from what I have found). In fact, the internal operations inside the mobilenet modeled are supported by TensorRT 4.0: DepthwiseConv2dNative, Conv2D, the underlying ‘batchnorm’ ops which are really just (Conv2D, add and mul) are all supported). Comparing their pipeline.config and overall architecture files in Tensorboard, they seem to have the same plugin parameters.
Based on the above, I have reason to believe that I should be able to use the mobilenet model under the same context of the inception uffSSD example in TensorRT (convert-to-uff, run inference). I am able to successfully parse this model into UFF using the same approach as the ssd_inception_v2_coco example. When I run the inception ssd model, it works fine and gives me the correct output. When I try to do the same for mobilenet, I get a core dump following the enquing memory allocation for the concat plugins. Then Tensorboard output suggests that there is a one-to-one mapping between the concat layers for mobilenet and inception.
Running cuda-memcheck yields an interesting error for the mobilenet:
========= CUDA-MEMCHECK
========= Invalid global read of size 16
========= at 0x00000150 in void cuScale::scale<float, float, cuScale::Mode, bool=0, int=4, cuScale::FusedActivationType>(float const , cuScale::scale<float, float, cuScale::Mode, bool=0, int=4, cuScale::FusedActivationType>, cuScale::KernelParameters<cuScale::scale<float, float, cuScale::Mode, bool=0, int=4, cuScale::FusedActivationType>>, nvinfer1::cudnn::reduced_divisor, nvinfer1::cudnn, nvinfer1::cudnn)
========= by thread (63,0,0) in block (11,0,0)
========= Address 0xfc89bd9f8 is misaligned
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuLaunchKernel + 0x1e8) [0x1fe770]
========= Host Frame:/usr/local/cuda-9.0/targets/aarch64-linux/lib/libcudart.so.9.0 [0xc984]
And the error shows up (320 times in total) until it finally crashes.