I am currently working on a transformer project (
GitHub - facebookresearch/detr: End-to-End Object Detection with Transformers).
This model is to be imported to a Jetson AGX Xavier.
So I converted the model to a TensorRT model, which worked fine (fp32, fp16, best, …).
Now I am trying to use the trt model on the embedded device. However, the problem is, no matter which trt model I load, the model does not output proper results or I load the results incorrectly. The shape of the results fits, only the content is permanently at 0.
Jetson AGX Xavier
Cuda compilation tools, release 10.2, V10.2.300
your_model.trt (87.9 MB)
test3.py (6.2 KB)
cant upload bigger models due to upload-limitation
November 17, 2022, 10:07pm
This looks like a Jetson issue. Please refer to the below samples in case useful.
For any further assistance, we will move this post to to Jetson related forum.
Thank you in advance.
I believe that the TRT models were compiled correctly. But I am not 100% sure about that either.
Since I have debugged many parts of my code and only a certain section of code is not really working, I would like to ask you to roughly analyze exactly this section of code and confirm if this is the right way to allocate memory and execute computational instructions:
self.input_dimension = np.empty([
self.output_boxes_dimension = np.empty([
self.output_logits_dimension = np.empty([
input_batch = torch.from_numpy(self.input_dimension)
output_boxes = torch.from_numpy(self.output_boxes_dimension)
output_logits = torch.from_numpy(self.output_logits_dimension)
cuda_inputs = cuda.mem_alloc(input_batch.detach().cpu().numpy().nbytes)
cuda_outputs_boxes = cuda.mem_alloc(output_boxes.detach().cpu().numpy().nbytes)
cuda_outputs_logits = cuda.mem_alloc(output_logits.detach().cpu().numpy().nbytes)
bindings = [int(cuda_inputs), int(cuda_outputs_boxes), int(cuda_outputs_logits)]
boxes = self.output_boxes_dimension
logits = self.output_logits_dimension
start_time = time.time()
cuda.memcpy_htod_async(cuda_inputs, np_image, stream)
context.execute_async_v2(bindings, stream.handle, None)
cuda.memcpy_dtoh_async(boxes, cuda_outputs_boxes, stream)
cuda.memcpy_dtoh_async(logits, cuda_outputs_logits, stream)
print("[perception_detr] Laufzeit Batch: "+ str(time.time()-start_time))
return boxes, logits
Thank you in advance.
Do you still have this issue?
Yes, even tried a YOLO-Model, same Problem.
Could you share a link to this sample? I will try it.
You can find under
January 25, 2023, 2:30am
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.