TensorRT 4 - Cuda failure: 74Aborted (core dumped) - for ssd_mobilenet_v1

Hi,

I’ve been trying to play with TensorRT to optimize object detection. I have been using Google’s Object Detection API to create custom models. Referring to the sampleUffSSD example, I have been able to recreate it using ssd_inception_v2_coco_2018_01_28, which works amazing. But changing to ssd_mobilenet_v1_coco_2018_01_28 has not been fruitful. I am posting the error along with this. I have checked the nodes/ops for both the graphs and at least the input and output nodes seem to be consistent. Please do let me know if I am doing anything wrong or if there is a workaround you would know.

Error:

sudo ./sample_uff_ssd
../data/ssd/sample_ssd.uff
Begin parsing model...
End parsing model...
Begin building engine...
End building engine...
 Num batches  2
 Data Size  540000
*** deserializing
Cuda failure: 74Aborted (core dumped)

Note: I have also seen the respective CUDA error. It states
cudaErrorMisalignedAddress = 74
"The device encountered a load or store instruction on a memory address which is not aligned. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched. "

Turns out changing the batch size fixed the problem. The changed code snippet from sampleUffSSD.cpp is given below.

auto fileName = locateFile("sample_ssd.uff");
std::cout << fileName << std::endl;

const int N = 1; // for ssd_mobilenet_v1
//const int N = 2;
auto parser = createUffParser();

I still wonder what caused this to prompt a misaligned memory address error. If anyone could figure this out, please let me know. Will not accept this as the answer since the reason is still unknown.