Alexnet with INT8 made with caffe gives CUDA Error : "cudnnEngine.cpp (357) - Cuda Error in execute: 77"

adit_bhrgv · August 29, 2017, 1:16am

I am running Imagenet using 1000 batches with only 2 classes Dog and Cat using Caffe:
I try to run my deploy_imagenet.protoxt and caffe_imagenet.caffemodel with TensorRT and I got the below error.

Anyone has idea what can be reason for this ? Please help

/TensorRT-2.1.2/bin> ./sample_int8 imagenet

INT8 run:4 batches of size 10 starting at 10
cudnnEngine.cpp (357) - Cuda Error in execute: 77

Q: How to decide how many batches to make for a particular dataset(CIFAR,MNIST, Imagenet etc)
Q: What is the significance of batches? They all are same size…What does it contain?
Q: Running sampleint8 mnist shows 400 batches of 100 size each processing 40000 images…Where are these images?

How are the batches connected to the images to test for inference ?

Thanks a lot for your help in clarifiying my queestions in advance.

putty05.log (68 KB)

adit_bhrgv · August 29, 2017, 6:44pm

Attached cuda-gdb logs and cuda-memcheck logs:

d1230@linse3:~/no_backup/d1230/TensorRT-2.1.2/bin> cuda-gdb sample_int8
NVIDIA (R) CUDA Debugger
8.0 release
Portions Copyright (C) 2007-2016 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/…
Reading symbols from /net/linse8-sn/no_backup_00/d1230/TensorRT-2.1.2/targets/x86_64-linux-gnu/bin/sample_int8…done.
(cuda-gdb) r imagenet
Starting program: /net/linse8-sn/no_backup_00/d1230/TensorRT-2.1.2/targets/x86_64-linux-gnu/bin/sample_int8 imagenet
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.

INT8 run:1 batches of size 64 starting at 0
[New Thread 0x7fffd4fac700 (LWP 11790)]
[New Thread 0x7fffd472a700 (LWP 11791)]
[New Thread 0x7fffd3f29700 (LWP 11793)]
[New Thread 0x7fffd36c2700 (LWP 11794)]
[New Thread 0x7fffd2ec1700 (LWP 11795)]

CUDA Exception: Warp Illegal Address

Program received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 772, block (0,0,0), thread (32,0,0), device 1, sm 0, warp 2, lane 0]
0x0000000004bdf008 in .text.trtwell_scudnn_128x32_relu_interior_nn<<<(1513,3,1),(128,1,1)>>> ()
(cuda-gdb)
(cuda-gdb)

dvbr · October 19, 2017, 11:16am

Hi,

It look like you probably did not allocate enough host memory for your outputs.

Maybe you have to resize your output buffer and to allocate enough memory on each output by using cudaMalloc ?

Did you take care about the batchsize in your cudaMalloc ?

David

Topic		Replies	Views
Build Caffe with patch for TensorRT INT8 batch generation fails GPU-Accelerated Libraries	4	2098	January 17, 2018
unable to run sample_int8 TensorRT	3	1076	May 9, 2018
Building TensorRT int8 for batch greater than 1 fails TensorRT	1	430	January 26, 2021
Onnx to trt and use int8 for inference, with batchsize=8. Got ERROR:genericReformat.cu (1262) TensorRT	2	569	May 5, 2021
Building TensorRT int8 engine fails TensorRT	1	343	January 20, 2021
Failed to generate batch engine model file DeepStream SDK	2	488	October 12, 2021
buildCudaEngine(*network) runs forever TensorRT	1	309	August 13, 2021
TensorRT 7 INT8 quantization TensorRT tensorrt	3	400	May 30, 2022
TensorRT 2.1.2 sampleINT8 with googleNet GPU-Accelerated Libraries	2	892	August 17, 2017
int8_patch in caffe data_layer.cpp not generating batch files for INT8 inference GPU-Accelerated Libraries	2	924	August 28, 2017

Alexnet with INT8 made with caffe gives CUDA Error : "cudnnEngine.cpp (357) - Cuda Error in execute: 77"

Related topics