PoseNet PreProcessing?

I’m trying to do some debugging on the PoseNet modules in jetson-inference so that I can do a performance comparison between different AI accelerator chipsets and I’m a little lost once it gets into the CUDA code. I’m trying to understand what happens to the image (a cudaImage of uchar3 if I’m not mistaken) before it goes into the model.

Can someone help me understand what happens to the image in the preprocessing? I see that when poseNet::Process() is called it passes the image to cudaTensorNormMeanRGB() which runs in CUDA. That calls launchTensorNormMean(), which calls gpuTensorNormMean() and I don’t understand what happens there. That function looks like:

// gpuTensorNormMean
template<typename T, bool isBGR>
__global__ void gpuTensorNormMean( T* input, int iWidth, float* output, int oWidth, int oHeight, int stride, float2 scale, float multiplier, float min_value, const float3 mean, const float3 stdDev )
{
	const int x = blockIdx.x * blockDim.x + threadIdx.x;
	const int y = blockIdx.y * blockDim.y + threadIdx.y;

	if( x >= oWidth || y >= oHeight )
		return;

	const int m  = y * oWidth + x;
	const int dx = ((float)x * scale.x);
	const int dy = ((float)y * scale.y);

	const T px = input[ dy * iWidth + dx ];

	const float3 rgb = isBGR ? make_float3(px.z, px.y, px.x)
						: make_float3(px.x, px.y, px.z);
	
	output[stride * 0 + m] = ((rgb.x * multiplier + min_value) - mean.x) / stdDev.x;
	output[stride * 1 + m] = ((rgb.y * multiplier + min_value) - mean.y) / stdDev.y;
	output[stride * 2 + m] = ((rgb.z * multiplier + min_value) - mean.z) / stdDev.z;
}

What is blockIdx and threadIdx? Where do they come from? How are the mean values chosen when calling cudaTensorNormMeanRGB()?

Basically, I’m trying to write code in python that will take an opencv image and prepare it to pass to the same posenet model, but I am completely lost on what I need to do to the image.

Thanks in advance.

I was able to get some generic pre-processing python code together from research around the web and experimentation that works with the original ONNX model using onnxruntime. Does this mean the same pre-processing code will be ok for the converted TensorRT model as well, or does the TensorRT conversion require any different inputs?

Hi,

blockIdx and threadIdx are CUDA built-in variables.

The function implements the following functions:

output = (( input * multiplier + min_value) - mean ) / stdDev ;

But it also converts the format from HWC into CHW.
HWC: r1g1b1r2g2b2 …
CHW: r1r2…g1g2…b1b2

The preprocessing depends on the model you used.
You will need to align the pre-processing when training.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.