Fp 16 trt model will output nan value

QQQQ · September 15, 2022, 2:44am

jp 4.6 trt 8.2
part of my model output will be NaN value when I transfer it with --fp16 and the output is normal with fp32. Any solution?

ONNX model:
pt_nvidia_trt8.2.onnx (3.8 MB)

./trtexec --onnx=pt_nvidia_trt8.2.onnx --saveEngine=pt_nvidia_trt8.2.trt --minShapes=input:1x1x512x512 --optShapes=input:4x1x512x512 --maxShapes=input:4x1x512x512 --workspace=4096 --verbose --fp16

./trtexec --onnx=pt_nvidia_trt8.2.onnx --saveEngine=pt_nvidia_trt8.2.trt --minShapes=input:1x1x512x512 --optShapes=input:4x1x512x512 --maxShapes=input:4x1x512x512 --workspace=4096 --verbose

AastaLLL · September 19, 2022, 2:51am

Hi,

We can get the output with trtexec.

$ /usr/src/tensorrt/bin/trtexec --onnx=pt_nvidia_trt8.2.onnx --minShapes=input:1x1x512x512 --optShapes=input:4x1x512x512 --maxShapes=input:4x1x512x512 --fp16 --dumpOutput

Does the NAN value generated from a custom implementation?
If yes, please check if the source can handle the fp16 mode correctly.

Thanks.

QQQQ · September 20, 2022, 4:48am

I have checked my trt inference code, it can work as expected with other 'fp16models. And also i tested with several random inputs, it will still outputNaN` values. Any solutions?

AastaLLL · September 20, 2022, 6:01am

Hi,

Would you mind sharing the source so we can check it further?
Thanks.

QQQQ · September 21, 2022, 3:04am

nvidia_test.cpp (10.8 KB)
Hi,

Since it’s company property, I have to remove most of the data processing part, and just keep the basic trt engine part.

QQQQ · September 21, 2022, 5:46am

And the inference in our project can work properly with other fp16 models, except this one, so would you mind helping me to check if it is a specific layer inside the model that will generate NaN value after changing it to trt model? Or will it work on other versions of jp/trt ?

AastaLLL · September 29, 2022, 8:12am

Hi,

Could you set all the batch size in minShapes, optShapes and maxShapes to the identical and try it again?

For example, set all the size to the 1x1x512x512?

Thanks.

QQQQ · October 12, 2022, 6:54am

Would you please elaborate on why I should do this? Even though it works, I still need a batch size of 4…

AastaLLL · October 20, 2022, 5:11am

Hi,

The testing is just trying to locate the root cause.
Based on the log above, it seems that only part of the output becomes NAN.
So just want to check if setting batchsize=1 can generate the correct result.

We try to reproduce this issue in our environment as well.
But the source shared in Set 21 includes another file called testnet.h.
Could you share the file with us as well?

Thanks.

AastaLLL · October 20, 2022, 5:18am

Hi,

More, could you confirm whether the NAN occurs before or after the postProcess(.)?

...
cudaMemcpyAsync(out, buffers[1], bufferSize[1], cudaMemcpyDeviceToHost, stream);
cudaStreamSynchronize(stream);

auto results = postProcess(vec_Mat, out, outSize);
...

Thanks.

QQQQ · October 20, 2022, 5:19am

testnet.h (1.5 KB)

Before the post-process,

AastaLLL · October 20, 2022, 5:47am

Hi,

Thanks for sharing the header.
But still some function missing:

$ nvcc nvidia_test.cpp -o test -I/usr/include/opencv4/ -I/usr/src/tensorrt/samples/common/ -I./
nvidia_test.cpp: In function ‘bool readTrtFile(const string&, nvinfer1::ICudaEngine*&)’:
nvidia_test.cpp:106:99: warning: ‘nvinfer1::ICudaEngine* nvinfer1::IRuntime::deserializeCudaEngine(const void*, std::size_t, nvinfer1::IPluginFactory*)’ is deprecated [-Wdeprecated-declarations]
  106 |     engine = trtRuntime->deserializeCudaEngine(cached_engine.data(), cached_engine.size(), nullptr);
      |                                                                                                   ^
In file included from /usr/include/aarch64-linux-gnu/NvInfer.h:17,
                 from testnet.h:6,
                 from nvidia_test.cpp:1:
/usr/include/aarch64-linux-gnu/NvInferRuntime.h:637:43: note: declared here
  637 |     TRT_DEPRECATED nvinfer1::ICudaEngine* deserializeCudaEngine(
      |                                           ^~~~~~~~~~~~~~~~~~~~~
nvidia_test.cpp: In constructor ‘testnet::testnet(const string&)’:
nvidia_test.cpp:121:21: error: ‘readCOCOLabel’ was not declared in this scope
  121 |     detect_labels = readCOCOLabel(labels_file);
      |                     ^~~~~~~~~~~~~
nvidia_test.cpp: At global scope:
nvidia_test.cpp:141:6: error: no declaration matches ‘void testnet::InferenceFolder(const string&)’
  141 | void testnet::InferenceFolder(const std::string &folder_name) {
      |      ^~~~~~~
In file included from nvidia_test.cpp:1:
testnet.h:23:10: note: candidate is: ‘bool testnet::InferenceFolder(const string&)’
   23 |     bool InferenceFolder(const std::string &folder_name);
      |          ^~~~~~~~~~~~~~~
testnet.h:8:7: note: ‘class testnet’ defined here
    8 | class testnet{
      |       ^~~~~~~
nvidia_test.cpp: In member function ‘void testnet::EngineInference(const std::vector<std::__cxx11::basic_string<char> >&, const int&, void**, const std::vector<long int>&, cudaStream_t)’:
nvidia_test.cpp:267:40: error: ‘save_dir’ was not declared in this scope
  267 |                 std::string rst_name = save_dir + tmp;
      |                                        ^~~~~~~~
nvidia_test.cpp: At global scope:
nvidia_test.cpp:279:9: error: no declaration matches ‘cv::Mat testnet::preprocess(cv::Mat)’
  279 | cv::Mat testnet::preprocess(cv::Mat img) {
      |         ^~~~~~~
nvidia_test.cpp:279:9: note: no functions named ‘cv::Mat testnet::preprocess(cv::Mat)’
In file included from nvidia_test.cpp:1:
testnet.h:8:7: note: ‘class testnet’ defined here
    8 | class testnet{
      |       ^~~~~~~
nvidia_test.cpp: In member function ‘std::vector<float> testnet::prepareImage(std::vector<cv::Mat>&)’:
nvidia_test.cpp:284:1: warning: no return statement in function returning non-void [-Wreturn-type]
  284 | }
      | ^
nvidia_test.cpp: In member function ‘std::vector<std::vector<testnet::DetectRes> > testnet::postProcess(const std::vector<cv::Mat>&, float*, const int&)’:
nvidia_test.cpp:289:1: warning: no return statement in function returning non-void [-Wreturn-type]
  289 | }
      | ^

Thanks.

QQQQ · October 24, 2022, 6:15am

void setReportableSeverity(Logger::Severity severity)
{
    gLogger.setReportableSeverity(severity);
    gLogVerbose.setReportableSeverity(severity);
    gLogInfo.setReportableSeverity(severity);
    gLogWarning.setReportableSeverity(severity);
    gLogError.setReportableSeverity(severity);
    gLogFatal.setReportableSeverity(severity);
}

std::vector<std::string>readFolder(const std::string &image_path)
{
    std::vector<std::string> image_names;
    auto dir = opendir(image_path.c_str());

    if ((dir) != nullptr)
    {
        struct dirent *entry;
        entry = readdir(dir);
        while (entry)
        {
            auto temp = image_path + "/" + entry->d_name;
            if (strcmp(entry->d_name, "") == 0 || strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
            {
                entry = readdir(dir);
                continue;
            }
            image_names.push_back(temp);
            entry = readdir(dir);
        }
    }
    return image_names;
}

std::map<int, std::string> readImageNetLabel(const std::string &fileName)
{
    std::map<int, std::string> imagenet_label;
    std::ifstream file(fileName);
    if (!file.is_open())
    {
        std::cout << "read file error: " << fileName << std::endl;
    }
    std::string strLine;
    while (getline(file, strLine))
    {
        int pos1 = strLine.find(":");
        std::string first = strLine.substr(0, pos1);
        int pos2 = strLine.find_last_of("'");
        std::string second = strLine.substr(pos1 + 3, pos2 - pos1 - 3);
        imagenet_label.insert({atoi(first.c_str()), second});
    }
    file.close();
    return imagenet_label;
}

std::map<int, std::string> readCOCOLabel(const std::string &fileName)
{
    std::map<int, std::string> coco_label;
    std::ifstream file(fileName);
    if (!file.is_open())
    {
        std::cout << "read file error: " << fileName << std::endl;
    }
    std::string strLine;
    int index = 0;
    while (getline(file, strLine))
    {
        coco_label.insert({index, strLine});
        index++;
    }
    file.close();
    return coco_label;
}

system · November 7, 2022, 6:16am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.