Cart Delivery in the Factory of the Future

I have been trying to get this working on two different machines one with a RTX2070 and the other a Quadro RTX4000. Both machines have CUDA 10.0 installed and working correctly. When I run:

./builds/factory_of_the_future.x86_64 --scene Factory01

the factory scene appears as it should, However, when I run:

bazel run packages/cart_delivery/apps:cart_delivery

I eventually get an error that causes the application to crash. The error is:

2020-06-04 14:37:42.870 ERROR packages/navigation/gems/algorithms/cuda/multi_trace_and_match_gpu.cu.cpp@256: CUDA ERROR: an illegal memory access was encountered

Any idea what may be causing this?

1 Like

Hi ilengyl,

Can you please share the stdout for the app? I’d run in to similar issue as well and I’d like to compare the stdout to see if there are matching signatures.
It turned out that I had a mismatch in CUDA version, i.e. even though I had CUDA 10.2 installed, the path pointed to CUDA 10.0. Cross checking the CUDA and cuDNN installations might help fix the problem.

1 Like

Hi sdesai I have attached the output to the Bazel run command and some additional CUDA setting information. I stated before I am trying this out on two laptops an HP OMEN with 64gb and an RTX2070 and an HP ZBOOk with 32GB and a Quadro RTX 4000, both setup the same and experiencing the same error.

output_from_bazel_run.txt (83.2 KB)

1 Like

Hello,
I have a similar mistake. I have the exact configuration I recommended:

  • nvidia driver 440.82
  • cuda 10.0.130
  • cudnn 7.6.3.30
    I attach the log file.
    Thankserror.txt (53.8 KB)
1 Like

Hi sdesai,

Just checking to see if you had any thoughts on this?

Cheers,
ilengyel

Hi ilegyel/bascoul,

I was able to reproduce the issue @bascoul is facing and am following up to get the issue triaged. Based on both of your responses, the system settings look good.

I will update the thread once I have more info. Apologies for the delay.

1 Like

Hi,

I also got similar issues when running the Future Factory Scene with Cart Delivery Application, however, I am able to run the Dolly Docking in it without a problem.

My System Setup:
Nvidia GTX 1070Ti
Nvidia Driver 450
CUDA V10.0.130
cuDNN 7.6.3
TensorRT 6.0

Here are the errors I got:
2020-06-22 12:13:51.485 ERROR packages/navigation/gems/algorithms/cuda/multi_trace_and_match_gpu.cu.cpp@234: CUDA ERROR: an illegal memory access was encountered
2020-06-22 12:13:51.523 PANIC engine/core/allocator/cuda_malloc_allocator.cpp@22: Could not allocate memory. Error: 77
2020-06-22 12:13:51.524 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: …/rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
2020-06-22 12:13:51.550 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: FAILED_EXECUTION: std::exception
2020-06-22 12:13:51.550 ERROR engine/alice/components/Codelet.cpp@229: Component ‘delivery.detection_pose_estimation.object_detection.tensor_r_t_inference/isaac.ml.TensorRTInference’ of type ‘isaac::ml::TensorRTInference’ reported FAILURE:
2020-06-22 12:13:51.998 ERROR engine/alice/backend/event_manager.cpp@42: Stopping node ‘delivery.detection_pose_estimation.object_detection.tensor_r_t_inference’ because it reached status ‘FAILURE’
2020-06-22 12:13:52.000 ERROR packages/navigation/gems/algorithms/cuda/multi_trace_and_match_gpu.cu.cpp@256: CUDA ERROR: an illegal memory access was encountered

Thank you!

Hi @ilengyel, @bascoul

Can you try with cuDNN 7.6.5?

https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.0_20191031/cudnn-10.0-linux-x64-v7.6.5.32.tgz

tar -xzvf cudnn-x.x-linux-x64-v7.x.x.x.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

For reference you can check the current versions, using the version checker in Isaac. Here is the sample output for my setup:

$ python3 engine/build/scripts/version_checker.py 
---------------------------------------------------------------
|Package             |Recommended Version |Current Version     |
---------------------------------------------------------------
|OS                  |Ubuntu 18.04.2 LTS  |Ubuntu 18.04.4 LTS  |
|Bazel               |2.2.0               |2.2.0               |
|GPU_Driver          |>=418               |440.64              |
|Cuda                |10.x.x              |10.0.326            |
|Cudnn               |7.6.x.x             |7.6.5.32            |
|TensorFlow          |1.15.0              |1.15.0              |
|pycapnp             |>=0.6.3             |0.6.4               |
|librosa             |>=0.6.3             |0.7.2               |
|SoundFile           |>=0.10.2            |0.10.3.post1        |
|Python2             |2.7.x               |2.7.17              |
|Python3             |3.6.x               |3.6.9               |
---------------------------------------------------------------

Still no luck, same error with an upgrade to CuDNN 7.6.5.

The error seems to be ever-presenterror.txt (53.8 KB) version_checker.txt (972 Bytes)

Hi,
I’m facing a similar error. I have the similar dependencies version as ilengyel and bascoul. I was learning the Navigation Stack and Costmap Planner GEMS and when I run:

bazel run //packages/flatsim/apps:flatsim -- --demo demo_1
bazel run //packages/flatsim/apps:flatsim -- --demo demo_2

No error appeared and the apps works as it should. However, when I run:

bazel run //packages/flatsim/apps:flatsim -- --demo demo_5

I obtain the following error,


The error also occur for,

bazel run //packages/flatsim/apps:flatsim -- --demo demo_3
bazel run //packages/flatsim/apps:flatsim -- --demo demo_4

Hoping to find the solution for this issue. Thank you.

TensorRT not compiling as well.

2020-06-29 18:59:53.309 DEBUG packages/ml/TensorRTInference.cpp@174: TRT INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
2020-06-29 18:59:58.765 INFO engine/alice/backend/allocator_backend.cpp@57: Optimized memory CPU allocator.
2020-06-29 18:59:58.765 INFO engine/alice/backend/allocator_backend.cpp@66: Optimized memory CUDA allocator.
2020-06-29 19:00:04.828 DEBUG packages/ml/TensorRTInference.cpp@174: TRT INFO: Detected 1 inputs and 2 output network tensors.
2020-06-29 19:00:04.857 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
2020-06-29 19:00:04.859 ERROR packages/ml/TensorRTInference.cpp@168: TRT ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
2020-06-29 19:00:04.859 PANIC packages/ml/TensorRTInference.cpp@246: Failed to build TensorRT engine from external/industrial_dolly_pose_estimation_cnn_model/resnet18_detector_industrial_dolly_fof.etlt.

| Isaac application terminated unexpectedly |

Version compare script:
python3 engine/build/scripts/version_checker.py

|Package |Recommended Version |Current Version |

|OS |Ubuntu 18.04.2 LTS |Ubuntu 18.04.4 LTS |
|Bazel |2.2.0 |2.2.0 |
|GPU_Driver |>=418 |450.36.06 |
|Cuda |10.x.x |10.0.130 |
|Cudnn |7.6.x.x |7.6.3.30 |
|TensorFlow |1.15.0 |1.15.0 |
|pycapnp |>=0.6.3 |0.6.4 |
|librosa |>=0.6.3 |0.7.2 |
|SoundFile |>=0.10.2 |0.10.3.post1 |
|Python2 |2.7.x |2.7.17 |
|Python3 |3.6.x |3.6.9 |

Something is not linking Isaac with tensorRT.

I found out that .config.json and .graph.json file are missing in the /home/USER/isaac/apps/assets/maps directory for corridor_office.png and virtual_factory_1.png. Not sure if this is intended or whether it is the cause for the error as demo_3 and demo_4 uses corridor_office.png while demo_5 uses virtual_factory_1.png.

I’m able to execute demo_3, demo_4 and demo_5 without error if I change the map to elevator_office_1.png.

@ilengyel @bascoul, Can you try re-installing cudnn from the tar package?

2.3.1. Installing From A Tar File

Before issuing the following commands, you’ll need to replace x.x and v8.x.x.x with your specific CUDA version and cuDNN version and package date.

Procedure

  1. Navigate to your directory containing the cuDNN Tar file.
  2. Unzip the cuDNN package.
$ tar -xzvf cudnn-x.x-linux-x64-v8.x.x.x.tgz
  1. Copy the following files into the CUDA Toolkit directory, and change the file permissions.
$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

While the issue is still being triaged for root cause, I was able to verify this workaround to work.

@kvynlim

Can you please update your cudnn to 7.6.5.xx as shown in the comment above and give it a try?

Regarding the json files for the maps, the virtual_factory_1.json is an unified json file that should have the both the config and graph json contents.