Building and Deploying a Face Mask Detection Application Using NGC Collections

Originally published at: Building and Deploying a Face Mask Detection Application Using NGC Collections | NVIDIA Developer Blog

AI workflows are complex. Building an AI application is no trivial task, as it takes various stakeholders with domain expertise to develop and deploy the application at scale. Data scientists and developers need easy access to software building blocks, such as models and containers, that are not only secure and highly performant, but which have…

Hello and thanks for the information.

I’m trying to recreate the code in my environment but I cannot find any clue to set the following parameters:

–category-limit $CATEGORY_LIMIT
–tlt-input-dims_width $TLT_INPUT_DIMS_WIDTH
–tlt-input-dims_height $TLT_INPUT_DIMS_HEIGHT \

I will really appreciate some guidance about this or any point out to a reference.

Thanks in advance!

Great question! We actually go into more detail about the input-dims parameter in this blog which is definitely worth checking out if you’re interested.


Determines the number of inference categories to display in DeepStream.


Tells TLT what resolution to use. So while I can’t give you an absolute answer without knowing what camera/feed/input sensor/dataset you’re using, you should try updating them to match the height/width of your images/feed.

The full docs are here.

Dear jwitsoe,

I followed the above blog to train Face Mask model and it is working fine with deepstream. But I tried to make a study code to inference this model after using tlt_convert to convert to egine .
For processing output , I tried to follow jetson-inference/detectNet.cpp at master · dusty-nv/jetson-inference · GitHub to process as caffe model:
// clusterDetections (caffe)
int detectNet::clusterDetections( Detection* detections, uint32_t width, uint32_t height )
// cluster detection bboxes
float* net_cvg = mOutputs[OUTPUT_CVG].CPU;
float* net_rects = mOutputs[OUTPUT_BBOX].CPU;

const int ow  = DIMS_W(mOutputs[OUTPUT_BBOX].dims);	// number of columns in bbox grid in X dimension
const int oh  = DIMS_H(mOutputs[OUTPUT_BBOX].dims);	// number of rows in bbox grid in Y dimension
const int owh = ow * oh;							// total number of bbox in grid
const int cls = GetNumClasses();					// number of object classes in coverage map

const float cell_width  = /*width*/ GetInputWidth() / ow;
const float cell_height = /*height*/ GetInputHeight() / oh;

const float scale_x = float(width) / float(GetInputWidth());
const float scale_y = float(height) / float(GetInputHeight());

LogDebug(LOG_TRT “input width %i height %i\n”, (int)DIMS_W(mInputDims), (int)DIMS_H(mInputDims));
LogDebug(LOG_TRT “cells x %i y %i\n”, ow, oh);
LogDebug(LOG_TRT “cell width %f height %f\n”, cell_width, cell_height);
LogDebug(LOG_TRT “scale x %f y %f\n”, scale_x, scale_y);

// extract and cluster the raw bounding boxes that meet the coverage threshold
int numDetections = 0;

for( uint32_t z=0; z < cls; z++ )	// z = current object class
	for( uint32_t y=0; y < oh; y++ )
		for( uint32_t x=0; x < ow; x++)
			const float coverage = net_cvg[z * owh + y * ow + x];
			if( coverage > mCoverageThreshold )
				const float mx = x * cell_width;
				const float my = y * cell_height;
				const float x1 = (net_rects[0 * owh + y * ow + x] + mx) * scale_x;	// left
				const float y1 = (net_rects[1 * owh + y * ow + x] + my) * scale_y;	// top
				const float x2 = (net_rects[2 * owh + y * ow + x] + mx) * scale_x;	// right
				const float y2 = (net_rects[3 * owh + y * ow + x] + my) * scale_y;	// bottom 
				LogDebug(LOG_TRT "rect x=%u y=%u  cvg=%f  %f %f   %f %f \n", x, y, coverage, x1, x2, y1, y2);

				// merge with list, checking for overlaps
				bool detectionMerged = false;

				for( uint32_t n=0; n < numDetections; n++ )
					if( detections[n].ClassID == z && detections[n].Expand(x1, y1, x2, y2) )
						detectionMerged = true;

				// create new entry if the detection wasn't merged with another detection
				if( !detectionMerged )
					detections[numDetections].Instance   = numDetections;
					detections[numDetections].ClassID    = z;
					detections[numDetections].Confidence = coverage;
					detections[numDetections].Left   = x1;
					detections[numDetections].Top    = y1;
					detections[numDetections].Right  = x2;
					detections[numDetections].Bottom = y2;

return numDetections;


But it doesn’t work. The outpout value is very small.
How can I treat with output_bbox/BiasAdd (8, 34, 60) output_cov/Sigmoid (2, 34, 60).
Thank for your support!

@thuy.hoang19 ,
Your question is actually talking about how to run inference with the tensorrt engine.
Officially, please see Integrating TAO CV Models with Triton Inference Server — TAO Toolkit 3.0 documentation and then leverage tao-toolkit-triton-apps/ at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub and tao-toolkit-triton-apps/ at fc7e222c036354498e53a8ed11b5cf7c0a3e5239 · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub

Also, there are similar topics shared by other customers in TAO fourm.

1 Like

@Morganh : Thank you~