Why time divided by iterations in sampleGoogleNet.cpp of TensorRT 3.0 SDK.

haifengli · December 12, 2017, 4:04am

We just want to profile another tensorrt instance, and clone the Profiler C++ class in sampleGoogleNet.cpp file. It’s report time is strange, so I digged to the document and header file. It seems Profiler class should report time in second instead of millisecond, or just don’t divide time by TIMING_ITERATIONS(1000).

struct Profiler : public IProfiler
{
	typedef std::pair<std::string, float> Record;
	std::vector<Record> mProfile;

	virtual void reportLayerTime(const char* layerName, float ms)
	{
		auto record = std::find_if(mProfile.begin(), mProfile.end(), [&](const Record& r){ return r.first == layerName; });
		if (record == mProfile.end())
			mProfile.push_back(std::make_pair(layerName, ms));
		else
			record->second += ms;
	}

	void printLayerTimes()
	{
		float totalTime = 0;
		for (size_t i = 0; i < mProfile.size(); i++)
		{
			//the mProfile[i].second(according to first) is already in millisecond, why divide it by TIMING_ITERATIONS?
			printf("%-40.40s %4.3f<b>ms</b>\n", mProfile[i].first.c_str(), mProfile[i].second / TIMING_ITERATIONS);
			totalTime += mProfile[i].second;
		}
		printf("Time over all layers: %4.3f\n", totalTime / TIMING_ITERATIONS);
	}

} gProfiler;

This is the reportLayerTime function declaration in NvInfer.h

virtual void nvinfer1::IProfiler::reportLayerTime	(	const char * 	layerName,
float 	ms 
)		
pure virtual
layer time reporting callback

Parameters
layerName	the name of the layer, set when constructing the network definition
ms	the time in milliseconds to execute the layer

What’s wrong?

AastaLLL · December 12, 2017, 8:48am

Hi,

As you said, in

virtual void nvinfer1::IProfiler::reportLayerTime ( const char * layerName,
                                                    float ms
                                                  )

ms is the time in milliseconds.

In timeInference() function, we launch tensorRT TIMING_ITERATIONS=1000 times and accumulate the execution time to totalTime.
When printing, we need to divide the totalTime with TIMING_ITERATIONS=1000 to get the average execution time.

Thanks.

haifengli · December 12, 2017, 8:54am

Hmm, so tensorrt is really fast. I just notice my title is wrong, 1000 times just make me misunderstand. Thanks for you reply.

haifengli · December 13, 2017, 1:57pm

the sampleGoogleNet.cpp use batch of 4, but the Profiler haven’t divide time by batch number, it should faster actually for one input?

AastaLLL · December 14, 2017, 2:26am

Hi,

Profiler measure the execution time of one context->execute() call.
For more significant batch size, the execution time goes longer.
Speed from slow to fast is: Batch=1 x 4 > Batch=4 > Batch=1

By default, we use batch size=4.
You can modify it to batch size=1 or divider the execution time by 4 to get the profiling data you preferred.

Usually, launch engine with batch=4 is faster than launch engine with batch=1 four times.
Depends on if you can collect four inputs and inference them at the same time. (Ex. four stream input)

Thanks.

haifengli · December 14, 2017, 2:34am

How good news, we can profile our program more precise now, the “strange” word goes now, thanks.