sampleINT8 crash

Hi ,

I have created a CNN F32 network using TensorRT api example . I wanted to convert the CNN network to INT8 and check the performance.
Hence I have integrated the network into sampleINT8 example code . I am using the batch=10 start=50 score=50. I tried to run the GIE ,though the calibrationTable is generated app is crashing at infer->deserializeCudaEngine .

Could you suggest what could be the issue. The same network was working without any crash in TensorRT in F32 format.

Thanks

Hi subarukun,

Can you please let me know did you use Theano/Keras model or Caffe ? I am doing the same thing these days. I don’t know about this issue right now, but I’ll look into it. Can you please let me know your setup? Which GPU are you using ? Which dataset you using? MNIST, CIFAR etc.

I want to ask you can you see the TENSORRT optimize your CNN FP32 model into INT8 ? If yes, how can you see the resulting INT8 quantized model ? I want to see the resulting weights, activations and bias matrices into INT8. Can I do that in Tensorrt ?

Thanks

Hi,

I was using my own network built with TensorRT api model. Issue was with passing gieModelStream argument which I was doing incorrectly which indeed was causing the crash .

We need to pass the argument as &gieModelStream for it to work.

I want to ask you can you see the TENSORRT optimize your CNN FP32 model into INT8 ?
Yes it does optimize FP32 model to INT8 and we have almost 75% increase in FPS.

As of now everything is a blackbox wrt TensorRT we cant see the quantized weights . Only thing that is visible is calibration Table that is generated after the calibration is done with Calibration dataset.

Thanks

Hi,

Thanks so much for your reply.

I have few questions:

  1. Have you tried to optimize your FP32 model(trained with Keras/Theano) into FP16 ? Does it work ? I guess for this to work, SampleMNISTAPI given in Tensorrt samples folder has to be used. My question is, the weights file it loads is ‘data/mnistapi.wts’ , how to generte this file from own model built in Keras which has weights saved in .H5 format.
    Also, we need to re-create our own Keras/Theano model using Tensorrt.(Refer: Section 3.2. Sample 2: SampleMNISTAPI API Usage: TensorRT user guide).

Right now, I am trying to optimize my Keras/Theano trained FP32 model into FP16 model using Tensorrt. For this I will recreate my Keras model into Tensorrt and make .wts weigths file. Is this approach correct ?

  1. Have you seen the 75% increase in FPS in INT8 or 50% increase in FP16 ?

  2. Which GPU are you using? Must be with DP4A support

  3. Can’t we not see quantized weights even for FP16 ?

  4. Which test dataset do you use for inference and how is it used ?

Thanks

Hi,

I didnot try the FP16 mode. NVIDIA guys are the right people to say about the settings that need to be made for FP16.
I donot know what H5 format contains ,if its some header followed by data then extract the data and create .wts file and place the data as was in mnist.wts.

I am using the Pascal TitanX.