Triton Server Crashing Running Centerpoint Keypoint (hourglass_512x512_kpts) on Jetson via Dockerized Triton

fifofonix · January 6, 2022, 8:49pm

I can see that memory usage for these object detection models is close to the capacity of the TX2. With the working Retinanet model I have been seeing usage of 6.5G/8.0G, and indeed with a second non-TF2 model loaded in Triton (pushing memory higher even if not invoked) I then saw similar inference crashes on the TF2 model, i.e. with no error messages.

When such crashes do occur I do see freeing of the used memory occur with usage on the unit dropping to 1.3GB. Regardless of this I have then occasionally seen issues on inferences post a server restart as if some resource I’m unaware of is being retained. Are there recommended actions to take post a Triton crash like this to free resources?

My understanding is that the Jetson TX2 memory is shared by the CPU and GPU. Up until now I’ve been leaving memory settings to default to themselves - which means 256BM pinned memory, and 64MB GPU memory.

This post touches on even tighter constraints with Jetson Nano although it doesn’t provide any specific recommendations in terms of Triton server settings to go with viz-a-vis limits:

I’ve not experimented with converting the large object detection models I’m running here which are UINT8 input matrices to TensorRT because a) I’m not sure the conversion is supported with this type, b) one of the main gains seemed to be in going to UINT8 which we have already…