Hi!
I am trying to run GoogleNet sample on Jetson TX2 with workspace allocation of (16<<20). running builder serialization of engine everything works fine but when I call inference I get error at following line.
CHECK(cudamemcpyAsync(buffer[inputindex],data,hosttodevice))
which says:
cuda failure: 11 Aborted (core dump)
Other than this query i have another one regarding running YOLO Version 2 on Jetson TX2 using TensorRT. There is batch normalization used in YOLO network but when I run inference I get wrong output using tensorRT but running the model on caffe gave correct output. When I debugged and compared the each layer output of TensorRT with caffe then found out that Batch normalization layer was changing the output drastically as compared to TensorRT. I know that TensorRT supports LRN (Local Response Normalization) but how I can Implement Batch Normalization.
Thanks