I’m completing a image color conversion program on vic
of Jetson Xavier Nx with VPI under jetpack 5.1.2. But I notice a significant performance drop and I have tried a lot and read official samples. The result is same, there is also a performance drop from official samples even if I have maximum the jetson clocks and vic frequency referring to this. Sadly, there are so few topics about vic, even the official forums.
Here is the output by adding some timing codes into 09-tnr
along with jetson board residing in path /opt/nvidia/vpi2/samples/09-tnr. From the output, we can find the rough average process total time is ~5000us, however, there are some like, 10799us, 13941us and 71945us, which are unacceptable.
Frame: 1
t10 127 t21 15 t32 7 t43 10607 t54 41 Total: 10799
Frame: 2
t10 64 t21 113 t32 10 t43 4969 t54 40 Total: 5197
Frame: 3
t10 22 t21 13 t32 6 t43 6244 t54 45 Total: 6331
Frame: 4
t10 23 t21 13 t32 6 t43 4442 t54 39 Total: 4525
Frame: 5
t10 24 t21 14 t32 7 t43 5670 t54 50 Total: 5767
Frame: 6
t10 21 t21 13 t32 6 t43 5465 t54 46 Total: 5553
Frame: 7
t10 24 t21 14 t32 6 t43 4855 t54 44 Total: 4944
Frame: 8
t10 24 t21 13 t32 6 t43 4660 t54 39 Total: 4744
Frame: 9
t10 22 t21 11 t32 6 t43 4437 t54 37 Total: 4515
Frame: 10
t10 21 t21 45 t32 6 t43 5802 t54 40 Total: 5917
Frame: 11
t10 20 t21 12 t32 6 t43 4160 t54 38 Total: 4238
Frame: 12
t10 20 t21 12 t32 5 t43 4232 t54 37 Total: 4308
Frame: 13
t10 20 t21 12 t32 5 t43 4306 t54 37 Total: 4382
Frame: 14
t10 20 t21 11 t32 5 t43 4227 t54 36 Total: 4302
Frame: 15
t10 20 t21 12 t32 5 t43 4244 t54 37 Total: 4321
Frame: 16
t10 20 t21 12 t32 5 t43 5243 t54 37 Total: 5318
Frame: 17
t10 22 t21 12 t32 6 t43 4157 t54 39 Total: 4236
Frame: 18
t10 20 t21 11 t32 6 t43 4169 t54 36 Total: 4244
Frame: 19
t10 20 t21 11 t32 6 t43 4135 t54 35 Total: 4209
Frame: 20
t10 19 t21 11 t32 5 t43 4098 t54 36 Total: 4171
Frame: 21
t10 19 t21 52 t32 6 t43 4166 t54 35 Total: 4279
Frame: 22
t10 20 t21 12 t32 5 t43 4138 t54 36 Total: 4213
Frame: 23
t10 19 t21 11 t32 5 t43 3805 t54 40 Total: 3882
Frame: 24
t10 21 t21 43 t32 5 t43 4148 t54 37 Total: 4256
Frame: 25
t10 20 t21 12 t32 6 t43 4161 t54 38 Total: 4239
Frame: 26
t10 19 t21 12 t32 5 t43 4060 t54 36 Total: 4134
Frame: 27
t10 19 t21 11 t32 39 t43 4036 t54 36 Total: 4143
Frame: 28
t10 18 t21 11 t32 41 t43 3967 t54 38 Total: 4078
Frame: 29
t10 20 t21 12 t32 5 t43 4393 t54 38 Total: 4470
Frame: 30
t10 21 t21 43 t32 5 t43 4074 t54 37 Total: 4181
Frame: 31
t10 20 t21 11 t32 5 t43 4102 t54 36 Total: 4176
Frame: 32
t10 117 t21 12 t32 6 t43 6205 t54 36 Total: 6378
Frame: 33
t10 19 t21 12 t32 5 t43 4297 t54 36 Total: 4371
Frame: 34
t10 19 t21 12 t32 5 t43 5014 t54 36 Total: 5089
Frame: 35
t10 19 t21 11 t32 38 t43 3972 t54 39 Total: 4081
Frame: 36
t10 20 t21 12 t32 42 t43 4091 t54 36 Total: 4203
Frame: 37
t10 20 t21 11 t32 5 t43 4329 t54 38 Total: 4405
Frame: 38
t10 20 t21 11 t32 5 t43 5842 t54 36 Total: 5916
Frame: 39
t10 20 t21 11 t32 5 t43 4356 t54 36 Total: 4430
Frame: 40
t10 19 t21 11 t32 5 t43 4080 t54 37 Total: 4154
Frame: 41
t10 58 t21 11 t32 5 t43 4270 t54 37 Total: 4383
Frame: 42
t10 20 t21 13 t32 5 t43 4409 t54 37 Total: 4485
Frame: 43
t10 19 t21 12 t32 5 t43 4083 t54 36 Total: 4156
Frame: 44
t10 19 t21 11 t32 5 t43 9300 t54 36 Total: 9373
Frame: 45
t10 19 t21 44 t32 6 t43 4237 t54 36 Total: 4344
Frame: 46
t10 19 t21 12 t32 5 t43 4251 t54 48 Total: 4337
Frame: 47
t10 23 t21 13 t32 5 t43 6739 t54 39 Total: 6821
Frame: 48
t10 19 t21 11 t32 5 t43 4153 t54 41 Total: 4231
Frame: 49
t10 20 t21 12 t32 5 t43 4240 t54 40 Total: 4319
Frame: 50
t10 21 t21 11 t32 5 t43 13862 t54 40 Total: 13941
Frame: 51
t10 19 t21 48 t32 6 t43 4167 t54 37 Total: 4278
Frame: 52
t10 19 t21 11 t32 5 t43 9091 t54 40 Total: 9169
Frame: 53
t10 19 t21 11 t32 5 t43 4527 t54 38 Total: 4602
Frame: 54
t10 61 t21 11 t32 5 t43 3995 t54 37 Total: 4113
Frame: 55
t10 54 t21 12 t32 6 t43 7369 t54 39 Total: 7482
Frame: 56
t10 18 t21 11 t32 5 t43 5430 t54 37 Total: 5504
Frame: 57
t10 20 t21 44 t32 6 t43 4334 t54 36 Total: 4442
Frame: 58
t10 19 t21 21 t32 5 t43 4454 t54 35 Total: 4536
Frame: 59
t10 19 t21 11 t32 5 t43 4448 t54 35 Total: 4521
Frame: 60
t10 19 t21 11 t32 5 t43 4192 t54 39 Total: 4269
Frame: 61
t10 19 t21 12 t32 6 t43 4589 t54 36 Total: 4664
Frame: 62
t10 19 t21 12 t32 5 t43 4199 t54 37 Total: 4273
Frame: 63
t10 20 t21 44 t32 6 t43 4043 t54 36 Total: 4152
Frame: 64
t10 28 t21 12 t32 6 t43 9263 t54 36 Total: 9346
Frame: 65
t10 19 t21 11 t32 5 t43 4211 t54 36 Total: 4285
Frame: 66
t10 19 t21 11 t32 89 t43 4250 t54 39 Total: 4410
Frame: 67
t10 53 t21 13 t32 6 t43 4193 t54 38 Total: 4304
Frame: 68
t10 19 t21 11 t32 5 t43 5260 t54 37 Total: 5335
Frame: 69
t10 20 t21 12 t32 6 t43 4535 t54 37 Total: 4611
Frame: 70
t10 20 t21 12 t32 5 t43 4586 t54 38 Total: 4663
Frame: 71
t10 20 t21 12 t32 5 t43 4437 t54 40 Total: 4515
Frame: 72
t10 21 t21 12 t32 6 t43 7299 t54 106 Total: 7446
Frame: 73
t10 20 t21 12 t32 5 t43 6902 t54 37 Total: 6979
Frame: 74
t10 20 t21 11 t32 5 t43 4317 t54 40 Total: 4395
Frame: 75
t10 19 t21 12 t32 5 t43 4267 t54 36 Total: 4342
Frame: 76
t10 20 t21 12 t32 5 t43 6757 t54 46 Total: 6842
Frame: 77
t10 20 t21 50 t32 6 t43 4204 t54 44 Total: 4325
Frame: 78
t10 27 t21 11 t32 5 t43 4908 t54 42 Total: 4996
Frame: 79
t10 24 t21 11 t32 6 t43 7373 t54 38 Total: 7454
Frame: 80
t10 20 t21 12 t32 46 t43 4133 t54 37 Total: 4249
Frame: 81
t10 20 t21 12 t32 5 t43 4237 t54 36 Total: 4311
Frame: 82
t10 19 t21 12 t32 5 t43 4114 t54 36 Total: 4188
Frame: 83
t10 20 t21 12 t32 5 t43 4368 t54 36 Total: 4444
Frame: 84
t10 626 t21 15 t32 9 t43 5290 t54 37 Total: 5981
Frame: 85
t10 20 t21 50 t32 6 t43 4616 t54 42 Total: 4736
Frame: 86
t10 20 t21 11 t32 5 t43 4339 t54 40 Total: 4418
Frame: 87
t10 21 t21 55 t32 6 t43 7598 t54 44 Total: 7725
Frame: 88
t10 20 t21 12 t32 5 t43 4201 t54 42 Total: 4282
Frame: 89
t10 22 t21 55 t32 8 t43 4113 t54 39 Total: 4239
Frame: 90
t10 21 t21 12 t32 6 t43 4051 t54 42 Total: 4132
Frame: 91
t10 21 t21 12 t32 6 t43 4242 t54 42 Total: 4324
Frame: 92
t10 20 t21 11 t32 5 t43 4264 t54 46 Total: 4348
Frame: 93
t10 21 t21 12 t32 5 t43 7499 t54 46 Total: 7585
Frame: 94
t10 22 t21 12 t32 5 t43 4279 t54 37 Total: 4357
Frame: 95
t10 19 t21 12 t32 6 t43 4202 t54 37 Total: 4278
Frame: 96
t10 20 t21 12 t32 37 t43 4110 t54 42 Total: 4222
Frame: 97
t10 20 t21 12 t32 43 t43 5044 t54 36 Total: 5158
Frame: 98
t10 19 t21 12 t32 5 t43 4127 t54 40 Total: 4205
Frame: 99
t10 21 t21 12 t32 6 t43 9238 t54 47 Total: 9325
Frame: 100
t10 22 t21 12 t32 6 t43 4244 t54 39 Total: 4325
Frame: 101
t10 19 t21 12 t32 41 t43 6407 t54 43 Total: 6524
Frame: 102
t10 20 t21 12 t32 5 t43 4545 t54 128 Total: 4713
Frame: 103
t10 20 t21 12 t32 5 t43 10430 t54 46 Total: 10515
Frame: 104
t10 21 t21 12 t32 46 t43 8403 t54 53 Total: 8537
Frame: 105
t10 25 t21 14 t32 6 t43 4571 t54 46 Total: 4664
Frame: 106
t10 21 t21 12 t32 5 t43 9390 t54 95 Total: 9525
Frame: 107
t10 102 t21 74 t32 83 t43 71452 t54 232 Total: 71945
Frame: 108
t10 23 t21 47 t32 17 t43 8315 t54 64 Total: 8469
Frame: 109
t10 25 t21 12 t32 136 t43 5019 t54 39 Total: 5234
Frame: 110
t10 22 t21 12 t32 5 t43 4619 t54 43 Total: 4704
Frame: 111
t10 20 t21 12 t32 41 t43 5706 t54 38 Total: 5820
Frame: 112
t10 24 t21 16 t32 6 t43 7969 t54 40 Total: 8058
Frame: 113
t10 26 t21 16 t32 7 t43 4620 t54 50 Total: 4720
Frame: 114
t10 22 t21 12 t32 6 t43 3888 t54 36 Total: 3967
Frame: 115
t10 19 t21 11 t32 5 t43 4010 t54 37 Total: 4085
Frame: 116
t10 20 t21 11 t32 5 t43 4309 t54 35 Total: 4383
Frame: 117
t10 20 t21 11 t32 6 t43 4145 t54 37 Total: 4220
Frame: 118
t10 20 t21 17 t32 6 t43 8773 t54 39 Total: 8857
Frame: 119
t10 21 t21 14 t32 38 t43 8118 t54 37 Total: 8230
Frame: 120
t10 52 t21 12 t32 6 t43 4854 t54 39 Total: 4964
Frame: 121
t10 20 t21 11 t32 5 t43 5477 t54 43 Total: 5558
Frame: 122
t10 21 t21 11 t32 5 t43 4785 t54 41 Total: 4865
Frame: 123
t10 20 t21 11 t32 6 t43 3954 t54 39 Total: 4032
Frame: 124
t10 20 t21 12 t32 5 t43 4098 t54 31 Total: 4169
Frame: 125
t10 20 t21 12 t32 6 t43 5507 t54 41 Total: 5587
Frame: 126
t10 20 t21 12 t32 5 t43 4015 t54 40 Total: 4094
Frame: 127
t10 20 t21 44 t32 5 t43 6954 t54 41 Total: 7068
Frame: 128
t10 19 t21 12 t32 5 t43 3881 t54 40 Total: 3961
Frame: 129
t10 19 t21 12 t32 5 t43 4191 t54 42 Total: 4271
Frame: 130
t10 19 t21 12 t32 5 t43 4552 t54 44 Total: 4634
Frame: 131
t10 20 t21 12 t32 5 t43 9793 t54 42 Total: 9874
Frame: 132
t10 19 t21 11 t32 5 t43 3790 t54 36 Total: 3863
Frame: 133
t10 19 t21 11 t32 5 t43 6457 t54 36 Total: 6530
Frame: 134
t10 19 t21 46 t32 5 t43 3884 t54 35 Total: 3991
Frame: 135
t10 52 t21 11 t32 6 t43 3770 t54 35 Total: 3877
Frame: 136
t10 19 t21 11 t32 38 t43 3778 t54 35 Total: 3884
Frame: 137
t10 19 t21 10 t32 5 t43 4069 t54 36 Total: 4141
Frame: 138
t10 19 t21 12 t32 5 t43 6093 t54 38 Total: 6169
Frame: 139
t10 19 t21 11 t32 6 t43 9644 t54 36 Total: 9719
Frame: 140
t10 18 t21 11 t32 5 t43 4150 t54 35 Total: 4221
Frame: 141
t10 19 t21 11 t32 5 t43 3870 t54 35 Total: 3943
Frame: 142
t10 18 t21 19 t32 5 t43 3672 t54 35 Total: 3752
Frame: 143
t10 19 t21 12 t32 6 t43 3776 t54 37 Total: 3852
Frame: 144
t10 28 t21 11 t32 5 t43 3619 t54 36 Total: 3700
Frame: 145
t10 20 t21 12 t32 6 t43 3942 t54 37 Total: 4018
Frame: 146
t10 56 t21 12 t32 5 t43 3625 t54 71 Total: 3771
Frame: 147
t10 18 t21 44 t32 5 t43 3569 t54 33 Total: 3672
Frame: 148
t10 19 t21 11 t32 41 t43 3601 t54 34 Total: 3708
Frame: 149
t10 18 t21 46 t32 6 t43 3652 t54 34 Total: 3757
Frame: 150
t10 19 t21 45 t32 6 t43 3428 t54 82 Total: 3581
Here is modified codes, please note, by adding nothing else, but only some timing codes, I can output above process time,
int main(int argc, char *argv[])
{
// OpenCV image that will be wrapped by a VPIImage.
// Define it here so that it's destroyed *after* wrapper is destroyed
cv::Mat cvFrame;
// Declare all VPI objects we'll need here so that we
// can destroy them at the end.
VPIStream stream = NULL;
VPIImage imgPrevious = NULL, imgCurrent = NULL, imgOutput = NULL;
VPIImage frameBGR = NULL;
VPIPayload tnr = NULL;
// main return value
int retval = 0;
try
{
// =============================
// Parse command line parameters
if (argc != 3)
{
throw std::runtime_error(std::string("Usage: ") + argv[0] + " <vic|cuda> <input_video>");
}
std::string strBackend = argv[1];
std::string strInputVideo = argv[2];
// Now parse the backend
VPIBackend backend;
if (strBackend == "cuda")
{
backend = VPI_BACKEND_CUDA;
}
else if (strBackend == "vic")
{
backend = VPI_BACKEND_VIC;
}
else
{
throw std::runtime_error("Backend '" + strBackend + "' not recognized, it must be either cuda or vic.");
}
// ===============================
// Prepare input and output videos
// Load the input video
cv::VideoCapture invid;
if (!invid.open(strInputVideo))
{
throw std::runtime_error("Can't open '" + strInputVideo + "'");
}
// Open the output video for writing using input's characteristics
int w = invid.get(cv::CAP_PROP_FRAME_WIDTH);
int h = invid.get(cv::CAP_PROP_FRAME_HEIGHT);
int fourcc = cv::VideoWriter::fourcc('M', 'P', 'E', 'G');
double fps = invid.get(cv::CAP_PROP_FPS);
// Create the output video
//cv::VideoWriter outVideo("denoised_" + strBackend + ".mp4", fourcc, fps, cv::Size(w, h));
//if (!outVideo.isOpened())
//{
// throw std::runtime_error("Can't create output video");
//}
// =================================
// Allocate all VPI resources needed
// We'll use the backend passed to run remap algorithm and the CUDA to do image format
// conversions, therefore we have to force enabling of CUDA backend, along with the
// desired backend.
CHECK_STATUS(vpiStreamCreate(VPI_BACKEND_CUDA | backend, &stream));
CHECK_STATUS(vpiImageCreate(w, h, VPI_IMAGE_FORMAT_NV12_ER, 0, &imgPrevious));
CHECK_STATUS(vpiImageCreate(w, h, VPI_IMAGE_FORMAT_NV12_ER, 0, &imgCurrent));
CHECK_STATUS(vpiImageCreate(w, h, VPI_IMAGE_FORMAT_NV12_ER, 0, &imgOutput));
// Create a Temporal Noise Reduction payload configured to process NV12_ER
// frames under indoor medium light
CHECK_STATUS(vpiCreateTemporalNoiseReduction(backend, w, h, VPI_IMAGE_FORMAT_NV12_ER, VPI_TNR_DEFAULT, &tnr));
// ====================
// Main processing loop
int curFrame = 0;
while (invid.read(cvFrame))
{
printf("Frame: %d\n", ++curFrame);
// frameBGR isn't allocated yet?
if (frameBGR == NULL)
{
// Create a VPIImage that wraps the frame
CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvFrame, 0, &frameBGR));
}
else
{
// reuse existing VPIImage wrapper to wrap the new frame.
CHECK_STATUS(vpiImageSetWrappedOpenCVMat(frameBGR, cvFrame));
}
auto t0 = std::chrono::steady_clock::now();
// First convert it to NV12_ER
CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, frameBGR, imgCurrent, NULL));
auto t1 = std::chrono::steady_clock::now();
// Apply temporal noise reduction
// For first frame, we have to pass NULL as previous frame,
// this will reset internal state.
VPITNRParams params;
CHECK_STATUS(vpiInitTemporalNoiseReductionParams(¶ms));
params.preset = VPI_TNR_PRESET_INDOOR_MEDIUM_LIGHT;
params.strength = 1.0f;
CHECK_STATUS(vpiSubmitTemporalNoiseReduction(stream, 0, tnr, curFrame == 1 ? NULL : imgPrevious, imgCurrent,
imgOutput, ¶ms));
auto t2 = std::chrono::steady_clock::now();
// Convert output back to BGR
CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, imgOutput, frameBGR, NULL));
auto t3 = std::chrono::steady_clock::now();
CHECK_STATUS(vpiStreamSync(stream));
auto t4 = std::chrono::steady_clock::now();
// Now add it to the output video stream
VPIImageData imgdata;
CHECK_STATUS(vpiImageLockData(frameBGR, VPI_LOCK_READ, VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR, &imgdata));
cv::Mat outFrame;
CHECK_STATUS(vpiImageDataExportOpenCVMat(imgdata, &outFrame));
//outVideo << outFrame;
CHECK_STATUS(vpiImageUnlock(frameBGR));
auto t5 = std::chrono::steady_clock::now();
auto diff10 = std::chrono::duration_cast<std::chrono::microseconds>(t1-t0).count();
auto diff21 = std::chrono::duration_cast<std::chrono::microseconds>(t2-t1).count();
auto diff32 = std::chrono::duration_cast<std::chrono::microseconds>(t3-t2).count();
auto diff43 = std::chrono::duration_cast<std::chrono::microseconds>(t4-t3).count();
auto diff54 = std::chrono::duration_cast<std::chrono::microseconds>(t5-t4).count();
auto diff50 = std::chrono::duration_cast<std::chrono::microseconds>(t5-t0).count();
std::cout << "t10 " << diff10 << " t21 " << diff21 << " t32 " << diff32 << " t43 " << diff43 << " t54 " << diff54 << " Total: " << diff50 << std::endl;
// this iteration's output will be next's previous. Previous, which would be discarded, will be reused
// to store next frame.
std::swap(imgPrevious, imgOutput);
};
}
catch (std::exception &e)
{
std::cerr << e.what() << std::endl;
retval = 1;
}
// =========================
// Destroy all VPI resources
vpiStreamDestroy(stream);
vpiPayloadDestroy(tnr);
vpiImageDestroy(imgPrevious);
vpiImageDestroy(imgCurrent);
vpiImageDestroy(imgOutput);
vpiImageDestroy(frameBGR);
return retval;
}
Please help me out. Thanks very much.
Update 1:
Official samples 05-benchmark
also has this issue by adding some timing codes(print every process time within the out-loop, instead of printing just medians ), increasing out-loop counter and removing inner-loop.