With INT8 the accuracy of the network_api_pytorch_mnist model drops to less than 10 %

Hi all, this is w.r.t to the observation I have that the pytorch mnist model’s performance drops to less than 10% when INT8 calibration is introduced.

There is an existing sample.py in the directory /usr/src/tensorrt/samples/python/network_api_pytorch_mnist/
It has got support for FP32 and FP32 data type inference but not for INT8
I have modified the code sample.py to have INT8 calibration referring to the sample.py code present in the directory /usr/src/tensorrt/samples/python/int8_caffe_mnist/

Though it worked, but the accuracy got reduced to less than 10%
Out of 10000 inferences it could successfully predict

Predicted 916 / 10000 correctly
It means the accuracy is only 9.16 %
But the with FP32 and FP16 it gave more than 97% accuracy
Let me know what could be the reason and how to fix this issue.

Thanks and Regards

Nagaraj Trivedi

Dear @trivedi.nagaraj,
I think some issue with the INT8 calibration cache. Did you generate a new INT8 calibration cache? Also, may i know the Jetpack version? Just to confirm if you are using Jetson Orin Nano board?

Hi SivaRamaKrishna, yes a new INT8 calibration cache is generated. The Jetpack version begin used is 4.6.1 and the board is Jetson Nano.

Thanks and Regards
Nagaraj Trivedi

Hi SivaRamaKrishna, can you please update me on this query. I need this test to be conducted with INT8 calibration to compare the result with FP32 and FP16.

An early solution to this problem will definitely help me.

Thanks and Regards

Nagaraj Trivedi

Dear @trivedi.nagaraj,
Could you share details on how this new cache is generated? Did you use any dataset of images to generate cache?

present in the file /usr/src/tensorrt/samples/python/int8_caffe_mnist/sample.py

The dataset used to generate the cache is from the location
“/usr/src/tensorrt/data/mnist/t10k-images-idx3-ubyte”
Below three lines are added to generate calibration cache.
calibration_cache = “mnist_calibration1.cache”
test_set = “/usr/src/tensorrt/data/mnist/t10k-images-idx3-ubyte”
calib = MNISTEntropyCalibrator(test_set, cache_file=calibration_cache)

I am also attaching source files so that you can have a look.
The sample.py from the directory /usr/src/tensorrt/samples/python/network_api_pytorch_mnist/
got modified referring the sample.py file from the directory /usr/src/tensorrt/samples/python/int8_caffe_mnist/
I am attaching the files below files in the zip format.

  1. sample.py modified from network_api_pytorch_mnist/ directory
  2. The calibration cache file generated is mnist_calibration1.cache
  3. calibrator.py
  4. saved_mnist_model (which is a saved trained model file from model.py)
  5. model.py

The sample.py runs in three modes FP32, FP16 and INT8. It is controlled by the flag INFER_WEIGHT_SIZE. If it is set to “INT8” then it uses the INT8 calibration. Please refer to this flag in the sample.py

I am pasting below the inference results which I got with FP16 and INT8.

Inference with INT8 where it detects
Predicted 970 / 10000 correctly
It means only 970 images out of 10000 images are predicted correctly and the accuracy is less than 10%

But the inference with FP16 is
Predicted 9759 / 10000 correctly
It means the accuracy is 97.59%

Also I have a doubt with regards to weights I am assigning in the method
populate_network()
Please verify in the sample.py file whether I am assigning the weights to all the layers properly particularly when the inference mode is not FP16.

Thanks and Regards

Nagaraj Trivedi
saved_mnist_model.zip (12.7 MB)

Dear @trivedi.nagaraj,
Thank you for sharing the files. I will check and get back to you on this.

Dear @trivedi.nagaraj,
I don’t see MNIST INT8 python sample in Jetson Orin nano with Jetpack 5.1.2. Could you confirm the Jetpack version and platform?

Hi SivaRamaKrishnan, sorry I have wrongly mentioned the board info.

=== Device Information ===
[11/21/2023-05:10:08] [I] Selected Device: Xavier
[11/21/2023-05:10:08] [I] Compute Capability: 7.2

It is the Jetson Xavior 7.2 and the jetpack version is
Package: nvidia-jetpack
Version: 4.6-b199

Thanks and Regards

Nagaraj Trivedi

Hi ShivaRamaKrishnan, can you please update me on this. It is very much required for my academic study submission.

Thanks and Regards

Hi SivaRamaKrishnan, a gentle reminder. Please update me.

Thanks and Regards

Nagaraj Trivedi

Hi,

How do you get the 9.16% accuracy?

We could get 99% accuracy with the below instructions.
Please give it a check.

Download data

$ cd /usr/src/tensorrt/data/mnist
$ sudo wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
$ sudo wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
$  sudo wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
$ sudo gzip -dk t10k-images-idx3-ubyte.gz
$ sudo gzip -dk t10k-labels-idx1-ubyte.gz
$ sudo gzip -dk train-images-idx3-ubyte.gz

Install dependencies

$ sudo apt install python3-pip
$ sudo apt install libboost-all-dev
$ export CPATH=$CPATH:/usr/local/cuda-11.4/targets/aarch64-linux/include
$ export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-11.4/targets/aarch64-linux/lib
$ pip3 install pycuda --user
$ pip3 install numpy requests pillow

Calibration and verification

$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/
$ git checkout release/8.4
$ cd samples/python/int8_caffe_mnist/
$ python3 sample.py
[12/05/2023-09:05:44] [TRT] [W] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
sample.py:55: DeprecationWarning: Use network created with NetworkDefinitionCreationFlag::EXPLICIT_BATCH flag instead.
  builder.max_batch_size = batch_size
sample.py:56: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = common.GiB(1)
Calibrating batch 0, containing 64 images
...
Validating batch 310
Total Accuracy: 99.1%

Thanks.

Hi, thank you for posting this.

But I was referring to the sample.py file from the directory samples/python/network_api_pytorch_mnist/ not from the directory samples/python/int8_caffe_mnist/

From the directory samples/python/int8_caffe_mnist/ the accuracy is > 99 .
But with sample.py file from the directory samples/python/network_api_pytorch_mnist/ is always less than 10%

I have modified the sample.py from this directory looking at the sample.py from samples/python/int8_caffe_mnist/
Let me know how the sample.py from the directory samples/python/network_api_pytorch_mnist/ need to be modified for inferencing with INT8 precision.

Thanks and Regards

Nagaraj Trivedi

Hi,

The model architecture is different. One is based on Caffe and the other is from PyTorch.

So do you generate the calibration cache with Caffe MNIST.
And use the same cache with the PyTorch MNIST?

These two models have some differences in the architecture.
If you apply the cache from a different model architecture, the cache cannot be read correctly.

Thanks.

Hi, I have created a separate calibration file for PyTorch MNIST model and used it. Let me know if you need more information on it so that I can provide.

Thanks and Regards

Nagaraj Trivedi

Hi,

So you modify the int8_caffe_mnist for PyTorch MNIST and can get 99% accuracy.
But when you use the calibration cache with network_api_pytorch_mnist, the accuracy drops to 10%.

Is that correct?

If yes, could you share the network_api_pytorch_mnist source/steps and the cache you tried with us?
Thanks.

Hi, yes is the answer for your question.

I will provide that. Apart from this, is it possible to try at your side on performing the INT8 calibration on network_api_pytorch_mnist at your side and perform inference.

Thanks and Regards

Nagaraj Trivedi

Hi,

Here is a sample.

  1. Pease copy the calibrator.py file from the int8_caffe_mnist folder to the network_api_pytorch_mnist folder.

  2. Apply the below patch to the sample.py:

diff --git a/samples/python/network_api_pytorch_mnist/sample.py b/samples/python/network_api_pytorch_mnist/sample.py
index e5e95de2..3a5d47f8 100644
--- a/samples/python/network_api_pytorch_mnist/sample.py
+++ b/samples/python/network_api_pytorch_mnist/sample.py
@@ -24,9 +24,12 @@ import numpy as np
 import pycuda.autoinit
 import tensorrt as trt
 
+from calibrator import load_mnist_data, load_mnist_labels, MNISTEntropyCalibrator
+
 sys.path.insert(1, os.path.join(sys.path[0], ".."))
 import common
 
+
 # You can set the logger severity higher to suppress messages (or lower to display more messages).
 TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
 
@@ -102,51 +105,98 @@ def populate_network(network, weights):
     network.mark_output(tensor=fc2.get_output(0))
 
 
-def build_engine(weights):
+def build_int8_engine(weights, calib, batch_size=32):
     # For more information on TRT basics, refer to the introductory samples.
     builder = trt.Builder(TRT_LOGGER)
-    network = builder.create_network(common.EXPLICIT_BATCH)
+    builder.max_batch_size = batch_size
+
+    network = builder.create_network()
     config = builder.create_builder_config()
     runtime = trt.Runtime(TRT_LOGGER)
 
     config.max_workspace_size = common.GiB(1)
+    config.set_flag(trt.BuilderFlag.INT8)
+    config.int8_calibrator = calib
+
     # Populate the network using weights from the PyTorch model.
     populate_network(network, weights)
     # Build and return an engine.
     plan = builder.build_serialized_network(network, config)
+
+    #with open("sample.engine", "wb") as f:
+    #    f.write(plan)
     return runtime.deserialize_cuda_engine(plan)
 
 
-# Loads a random test case from pytorch's DataLoader
-def load_random_test_case(model, pagelocked_buffer):
-    # Select an image at random to be the test case.
-    img, expected_output = model.get_random_testcase()
-    # Copy to the pagelocked input buffer
-    np.copyto(pagelocked_buffer, img)
-    return expected_output
+def check_accuracy(context, batch_size, test_set, test_labels):
+    inputs, outputs, bindings, stream = common.allocate_buffers(context.engine)
+
+    num_correct = 0
+    num_total = 0
+
+    batch_num = 0
+    for start_idx in range(0, test_set.shape[0], batch_size):
+        batch_num += 1
+        if batch_num % 10 == 0:
+            print("Validating batch {:}".format(batch_num))
+        # If the number of images in the test set is not divisible by the batch size, the last batch will be smaller.
+        # This logic is used for handling that case.
+        end_idx = min(start_idx + batch_size, test_set.shape[0])
+        effective_batch_size = end_idx - start_idx
+
+        # Do inference for every batch.
+        inputs[0].host = test_set[start_idx : start_idx + effective_batch_size]
+        [output] = common.do_inference(
+            context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream, batch_size=effective_batch_size
+        )
+
+        # Use argmax to get predictions and then check accuracy
+        preds = np.argmax(output.reshape(batch_size, 10)[0:effective_batch_size], axis=1)
+        labels = test_labels[start_idx : start_idx + effective_batch_size]
+        num_total += effective_batch_size
+        num_correct += np.count_nonzero(np.equal(preds, labels))
+
+    percent_correct = 100 * num_correct / float(num_total)
+    print("Total Accuracy: {:}%".format(percent_correct))
 
 
 def main():
     common.add_help(description="Runs an MNIST network using a PyTorch model")
+
     # Train the PyTorch model
     mnist_model = model.MnistModel()
     mnist_model.learn()
     weights = mnist_model.get_weights()
-    # Do inference with TensorRT.
-    engine = build_engine(weights)
-
-    # Build an engine, allocate buffers and create a stream.
-    # For more information on buffer allocation, refer to the introductory samples.
-    inputs, outputs, bindings, stream = common.allocate_buffers(engine)
-    context = engine.create_execution_context()
-
-    case_num = load_random_test_case(mnist_model, pagelocked_buffer=inputs[0].host)
-    # For more information on performing inference, refer to the introductory samples.
-    # The common.do_inference function will return a list of outputs - we only have one in this case.
-    [output] = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
-    pred = np.argmax(output)
-    print("Test Case: " + str(case_num))
-    print("Prediction: " + str(pred))
+
+    _, data_files = common.find_sample_data(
+        description="Runs a Caffe MNIST network in Int8 mode",
+        subfolder="mnist",
+        find_files=[
+            "t10k-images-idx3-ubyte",
+            "t10k-labels-idx1-ubyte",
+            "train-images-idx3-ubyte",
+        ],
+        err_msg="Please follow the README to download the MNIST dataset",
+    )
+    [test_set, test_labels, train_set] = data_files
+
+    # Now we create a calibrator and give it the location of our calibration data.
+    # We also allow it to cache calibration data for faster engine building.
+    calibration_cache = "mnist_calibration.cache"
+    calib = MNISTEntropyCalibrator(train_set, cache_file=calibration_cache)
+
+    # Inference batch size can be different from calibration batch size.
+    batch_size = 32
+    #with open('sample.engine', 'rb') as f:
+    #    plan = f.read()
+
+    with build_int8_engine(
+        weights, calib, batch_size
+    ) as engine, engine.create_execution_context() as context:
+        # Batch size for inference can be different than batch size used for calibration.
+        check_accuracy(
+            context, batch_size, test_set=load_mnist_data(test_set), test_labels=load_mnist_labels(test_labels)
+        )
 
 
 if __name__ == "__main__":

We can get 97% accuracy with the INT8 calibration.

$ python3 sample.py
Train Epoch: 1 [0/60000 (0%)]   Loss: 2.288751
...
Test set: Average loss: 0.0649, Accuracy: 9804/10000 (98%)
...
Validating batch 310
Total Accuracy: 97.5%

Thanks.

Thank you for your timely response. What about the calibration cache file?. I hope it will be generated by this code itself. I will try it and let you know the results.

Thanks and Regards

Nagaraj Trivedi

Hi,

Yes, it did.

Please check below line:

+    calibration_cache = "mnist_calibration.cache"
+    calib = MNISTEntropyCalibrator(train_set, cache_file=calibration_cache)

The calibration cache is saved as mnist_calibration.cache.
Thanks.