Fp 16 trt model will output nan value

jp 4.6 trt 8.2
part of my model output will be NaN value when I transfer it with --fp16 and the output is normal with fp32. Any solution?

ONNX model:
pt_nvidia_trt8.2.onnx (3.8 MB)

./trtexec --onnx=pt_nvidia_trt8.2.onnx --saveEngine=pt_nvidia_trt8.2.trt --minShapes=input:1x1x512x512 --optShapes=input:4x1x512x512 --maxShapes=input:4x1x512x512 --workspace=4096 --verbose --fp16

./trtexec --onnx=pt_nvidia_trt8.2.onnx --saveEngine=pt_nvidia_trt8.2.trt --minShapes=input:1x1x512x512 --optShapes=input:4x1x512x512 --maxShapes=input:4x1x512x512 --workspace=4096 --verbose 

Hi,

We can get the output with trtexec.

$ /usr/src/tensorrt/bin/trtexec --onnx=pt_nvidia_trt8.2.onnx --minShapes=input:1x1x512x512 --optShapes=input:4x1x512x512 --maxShapes=input:4x1x512x512 --fp16 --dumpOutput

Does the NAN value generated from a custom implementation?
If yes, please check if the source can handle the fp16 mode correctly.

Thanks.

I have checked my trt inference code, it can work as expected with other 'fp16models. And also i tested with several random inputs, it will still outputNaN` values. Any solutions?

Hi,

Would you mind sharing the source so we can check it further?
Thanks.

nvidia_test.cpp (10.8 KB)
Hi,

Since it’s company property, I have to remove most of the data processing part, and just keep the basic trt engine part.

And the inference in our project can work properly with other fp16 models, except this one, so would you mind helping me to check if it is a specific layer inside the model that will generate NaN value after changing it to trt model? Or will it work on other versions of jp/trt ?

Hi,

Could you set all the batch size in minShapes, optShapes and maxShapes to the identical and try it again?

For example, set all the size to the 1x1x512x512?

Thanks.

Would you please elaborate on why I should do this? Even though it works, I still need a batch size of 4…

Hi,

The testing is just trying to locate the root cause.
Based on the log above, it seems that only part of the output becomes NAN.
So just want to check if setting batchsize=1 can generate the correct result.

We try to reproduce this issue in our environment as well.
But the source shared in Set 21 includes another file called testnet.h.
Could you share the file with us as well?

Thanks.

Hi,

More, could you confirm whether the NAN occurs before or after the postProcess(.)?

...
cudaMemcpyAsync(out, buffers[1], bufferSize[1], cudaMemcpyDeviceToHost, stream);
cudaStreamSynchronize(stream);

auto results = postProcess(vec_Mat, out, outSize);
...

Thanks.

testnet.h (1.5 KB)

Before the post-process,

Hi,

Thanks for sharing the header.
But still some function missing:

$ nvcc nvidia_test.cpp -o test -I/usr/include/opencv4/ -I/usr/src/tensorrt/samples/common/ -I./
nvidia_test.cpp: In function ‘bool readTrtFile(const string&, nvinfer1::ICudaEngine*&)’:
nvidia_test.cpp:106:99: warning: ‘nvinfer1::ICudaEngine* nvinfer1::IRuntime::deserializeCudaEngine(const void*, std::size_t, nvinfer1::IPluginFactory*)’ is deprecated [-Wdeprecated-declarations]
  106 |     engine = trtRuntime->deserializeCudaEngine(cached_engine.data(), cached_engine.size(), nullptr);
      |                                                                                                   ^
In file included from /usr/include/aarch64-linux-gnu/NvInfer.h:17,
                 from testnet.h:6,
                 from nvidia_test.cpp:1:
/usr/include/aarch64-linux-gnu/NvInferRuntime.h:637:43: note: declared here
  637 |     TRT_DEPRECATED nvinfer1::ICudaEngine* deserializeCudaEngine(
      |                                           ^~~~~~~~~~~~~~~~~~~~~
nvidia_test.cpp: In constructor ‘testnet::testnet(const string&)’:
nvidia_test.cpp:121:21: error: ‘readCOCOLabel’ was not declared in this scope
  121 |     detect_labels = readCOCOLabel(labels_file);
      |                     ^~~~~~~~~~~~~
nvidia_test.cpp: At global scope:
nvidia_test.cpp:141:6: error: no declaration matches ‘void testnet::InferenceFolder(const string&)’
  141 | void testnet::InferenceFolder(const std::string &folder_name) {
      |      ^~~~~~~
In file included from nvidia_test.cpp:1:
testnet.h:23:10: note: candidate is: ‘bool testnet::InferenceFolder(const string&)’
   23 |     bool InferenceFolder(const std::string &folder_name);
      |          ^~~~~~~~~~~~~~~
testnet.h:8:7: note: ‘class testnet’ defined here
    8 | class testnet{
      |       ^~~~~~~
nvidia_test.cpp: In member function ‘void testnet::EngineInference(const std::vector<std::__cxx11::basic_string<char> >&, const int&, void**, const std::vector<long int>&, cudaStream_t)’:
nvidia_test.cpp:267:40: error: ‘save_dir’ was not declared in this scope
  267 |                 std::string rst_name = save_dir + tmp;
      |                                        ^~~~~~~~
nvidia_test.cpp: At global scope:
nvidia_test.cpp:279:9: error: no declaration matches ‘cv::Mat testnet::preprocess(cv::Mat)’
  279 | cv::Mat testnet::preprocess(cv::Mat img) {
      |         ^~~~~~~
nvidia_test.cpp:279:9: note: no functions named ‘cv::Mat testnet::preprocess(cv::Mat)’
In file included from nvidia_test.cpp:1:
testnet.h:8:7: note: ‘class testnet’ defined here
    8 | class testnet{
      |       ^~~~~~~
nvidia_test.cpp: In member function ‘std::vector<float> testnet::prepareImage(std::vector<cv::Mat>&)’:
nvidia_test.cpp:284:1: warning: no return statement in function returning non-void [-Wreturn-type]
  284 | }
      | ^
nvidia_test.cpp: In member function ‘std::vector<std::vector<testnet::DetectRes> > testnet::postProcess(const std::vector<cv::Mat>&, float*, const int&)’:
nvidia_test.cpp:289:1: warning: no return statement in function returning non-void [-Wreturn-type]
  289 | }
      | ^

Thanks.

void setReportableSeverity(Logger::Severity severity)
{
    gLogger.setReportableSeverity(severity);
    gLogVerbose.setReportableSeverity(severity);
    gLogInfo.setReportableSeverity(severity);
    gLogWarning.setReportableSeverity(severity);
    gLogError.setReportableSeverity(severity);
    gLogFatal.setReportableSeverity(severity);
}

std::vector<std::string>readFolder(const std::string &image_path)
{
    std::vector<std::string> image_names;
    auto dir = opendir(image_path.c_str());

    if ((dir) != nullptr)
    {
        struct dirent *entry;
        entry = readdir(dir);
        while (entry)
        {
            auto temp = image_path + "/" + entry->d_name;
            if (strcmp(entry->d_name, "") == 0 || strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
            {
                entry = readdir(dir);
                continue;
            }
            image_names.push_back(temp);
            entry = readdir(dir);
        }
    }
    return image_names;
}

std::map<int, std::string> readImageNetLabel(const std::string &fileName)
{
    std::map<int, std::string> imagenet_label;
    std::ifstream file(fileName);
    if (!file.is_open())
    {
        std::cout << "read file error: " << fileName << std::endl;
    }
    std::string strLine;
    while (getline(file, strLine))
    {
        int pos1 = strLine.find(":");
        std::string first = strLine.substr(0, pos1);
        int pos2 = strLine.find_last_of("'");
        std::string second = strLine.substr(pos1 + 3, pos2 - pos1 - 3);
        imagenet_label.insert({atoi(first.c_str()), second});
    }
    file.close();
    return imagenet_label;
}

std::map<int, std::string> readCOCOLabel(const std::string &fileName)
{
    std::map<int, std::string> coco_label;
    std::ifstream file(fileName);
    if (!file.is_open())
    {
        std::cout << "read file error: " << fileName << std::endl;
    }
    std::string strLine;
    int index = 0;
    while (getline(file, strLine))
    {
        coco_label.insert({index, strLine});
        index++;
    }
    file.close();
    return coco_label;
}

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.