Description
Hello,
I am currently working on a transformer project (GitHub - facebookresearch/detr: End-to-End Object Detection with Transformers ).
This model is to be imported to a Jetson AGX Xavier.
So I converted the model to a TensorRT model, which worked fine (fp32, fp16, best, …).
Now I am trying to use the trt model on the embedded device. However, the problem is, no matter which trt model I load, the model does not output proper results or I load the results incorrectly. The shape of the results fits, only the content is permanently at 0.
Environment
nvidia-tensorrt 4.6-b199
tensorrt 8.0.1.6-1+cuda10.2
Jetson AGX Xavier
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
your_model.trt (87.9 MB)
test3.py (6.2 KB)
cant upload bigger models due to upload-limitation
NVES
November 17, 2022, 10:07pm
3
Hi,
This looks like a Jetson issue. Please refer to the below samples in case useful.
For any further assistance, we will move this post to to Jetson related forum.
Thanks!
Thank you in advance.
I believe that the TRT models were compiled correctly. But I am not 100% sure about that either.
Since I have debugged many parts of my code and only a certain section of code is not really working, I would like to ask you to roughly analyze exactly this section of code and confirm if this is the right way to allocate memory and execute computational instructions:
Memory Allocation:
self.input_dimension = np.empty([
self.batch_size,
self.channel_size,
self.image_size,
self.image_size],
dtype=self.PRECISION)
self.output_boxes_dimension = np.empty([
self.batch_size,
self.n_predicitons,
4],
dtype=self.PRECISION)
self.output_logits_dimension = np.empty([
self.batch_size,
self.n_predicitons,
self.n_CLASSES+1],
dtype=self.PRECISION)
input_batch = torch.from_numpy(self.input_dimension)
output_boxes = torch.from_numpy(self.output_boxes_dimension)
output_logits = torch.from_numpy(self.output_logits_dimension)
cuda_inputs = cuda.mem_alloc(input_batch.detach().cpu().numpy().nbytes)
cuda_outputs_boxes = cuda.mem_alloc(output_boxes.detach().cpu().numpy().nbytes)
cuda_outputs_logits = cuda.mem_alloc(output_logits.detach().cpu().numpy().nbytes)
bindings = [int(cuda_inputs), int(cuda_outputs_boxes), int(cuda_outputs_logits)]
Prediction:
boxes = self.output_boxes_dimension
logits = self.output_logits_dimension
start_time = time.time()
cuda.memcpy_htod_async(cuda_inputs, np_image, stream)
context.execute_async_v2(bindings, stream.handle, None)
cuda.memcpy_dtoh_async(boxes, cuda_outputs_boxes, stream)
cuda.memcpy_dtoh_async(logits, cuda_outputs_logits, stream)
stream.synchronize()
print("[perception_detr] Laufzeit Batch: "+ str(time.time()-start_time))
self.cfx.pop()
return boxes, logits
Thank you in advance.
Dear @zml-koop ,
Do you still have this issue?
Yes, even tried a YOLO-Model, same Problem.
Could you share a link to this sample? I will try it.
Dear @zml-koop ,
You can find under /usr/src/tensorrt/samples/python/
system
Closed
January 25, 2023, 2:30am
15
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.