I have been trying to get this working on two different machines one with a RTX2070 and the other a Quadro RTX4000. Both machines have CUDA 10.0 installed and working correctly. When I run:
the factory scene appears as it should, However, when I run:
bazel run packages/cart_delivery/apps:cart_delivery
I eventually get an error that causes the application to crash. The error is:
2020-06-04 14:37:42.870 ERROR packages/navigation/gems/algorithms/cuda/multi_trace_and_match_gpu.cu.cpp@256: CUDA ERROR: an illegal memory access was encountered
Can you please share the stdout for the app? I’d run in to similar issue as well and I’d like to compare the stdout to see if there are matching signatures.
It turned out that I had a mismatch in CUDA version, i.e. even though I had CUDA 10.2 installed, the path pointed to CUDA 10.0. Cross checking the CUDA and cuDNN installations might help fix the problem.
Hi sdesai I have attached the output to the Bazel run command and some additional CUDA setting information. I stated before I am trying this out on two laptops an HP OMEN with 64gb and an RTX2070 and an HP ZBOOk with 32GB and a Quadro RTX 4000, both setup the same and experiencing the same error.
I was able to reproduce the issue @bascoul is facing and am following up to get the issue triaged. Based on both of your responses, the system settings look good.
I will update the thread once I have more info. Apologies for the delay.
I also got similar issues when running the Future Factory Scene with Cart Delivery Application, however, I am able to run the Dolly Docking in it without a problem.
My System Setup:
Nvidia GTX 1070Ti
Nvidia Driver 450
CUDA V10.0.130
cuDNN 7.6.3
TensorRT 6.0
Here are the errors I got:
2020-06-22 12:13:51.485 ERROR packages/navigation/gems/algorithms/cuda/multi_trace_and_match_gpu.cu.cpp@234: CUDA ERROR: an illegal memory access was encountered
2020-06-22 12:13:51.523 PANIC engine/core/allocator/cuda_malloc_allocator.cpp@22: Could not allocate memory. Error: 77
2020-06-22 12:13:51.524 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: …/rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
2020-06-22 12:13:51.550 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: FAILED_EXECUTION: std::exception
2020-06-22 12:13:51.550 ERROR engine/alice/components/Codelet.cpp@229: Component ‘delivery.detection_pose_estimation.object_detection.tensor_r_t_inference/isaac.ml.TensorRTInference’ of type ‘isaac::ml::TensorRTInference’ reported FAILURE:
2020-06-22 12:13:51.998 ERROR engine/alice/backend/event_manager.cpp@42: Stopping node ‘delivery.detection_pose_estimation.object_detection.tensor_r_t_inference’ because it reached status ‘FAILURE’
2020-06-22 12:13:52.000 ERROR packages/navigation/gems/algorithms/cuda/multi_trace_and_match_gpu.cu.cpp@256: CUDA ERROR: an illegal memory access was encountered
Hi,
I’m facing a similar error. I have the similar dependencies version as ilengyel and bascoul. I was learning the Navigation Stack and Costmap Planner GEMS and when I run:
bazel run //packages/flatsim/apps:flatsim -- --demo demo_1
bazel run //packages/flatsim/apps:flatsim -- --demo demo_2
No error appeared and the apps works as it should. However, when I run:
bazel run //packages/flatsim/apps:flatsim -- --demo demo_5
2020-06-29 18:59:53.309 DEBUG packages/ml/TensorRTInference.cpp@174: TRT INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
2020-06-29 18:59:58.765 INFO engine/alice/backend/allocator_backend.cpp@57: Optimized memory CPU allocator.
2020-06-29 18:59:58.765 INFO engine/alice/backend/allocator_backend.cpp@66: Optimized memory CUDA allocator.
2020-06-29 19:00:04.828 DEBUG packages/ml/TensorRTInference.cpp@174: TRT INFO: Detected 1 inputs and 2 output network tensors.
2020-06-29 19:00:04.857 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
2020-06-29 19:00:04.859 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
2020-06-29 19:00:04.859 PANIC packages/ml/TensorRTInference.cpp@246: Failed to build TensorRT engine from external/industrial_dolly_pose_estimation_cnn_model/resnet18_detector_industrial_dolly_fof.etlt.
| Isaac application terminated unexpectedly |
Version compare script:
python3 engine/build/scripts/version_checker.py
I found out that .config.json and .graph.json file are missing in the /home/USER/isaac/apps/assets/maps directory for corridor_office.png and virtual_factory_1.png. Not sure if this is intended or whether it is the cause for the error as demo_3 and demo_4 uses corridor_office.png while demo_5 uses virtual_factory_1.png.
Can you please update your cudnn to 7.6.5.xx as shown in the comment above and give it a try?
Regarding the json files for the maps, the virtual_factory_1.json is an unified json file that should have the both the config and graph json contents.