Nsight "Resource temporarily unavailable"

From

> Blockquote

we are running nsys against training_results_v2.1/NVIDIA/benchmarks/resnet/implementations/mxnet-22.04 using the following nsys command line:

NSYSCMD="/usr/local/cuda/bin/nsys profile --trace=cuda,nvtx,osrt,cublas --gpu-metrics-device=all --export=sqlite --cuda-memory-usage=true --force-overwrite true --trace-fork-before-exec true --output /results/se_image_classification_mxnet_${DGXNNODES}x${DGXNGPU}x${BATCHSIZE}_${DATESTAMP}.nsys-rep "

We have no issues when running with 1 GPU but we get the following when running with 2 GPUs:

Generating '/tmp/nsys-report-f168.qdstrm'
Generating '/tmp/nsys-report-f495.qdstrm'
[1/2] [========================100%] se_image_classification_mxnet_1x2x408_.nsys-rep
Importer error status: Importation failed.
Import Failed with unexpected exception: /build/agent/work/20a3cfcd1c25021d/QuadD/Common/StreamSections/FileStream.cpp(281): Throw in function void QuadDCommon::FileStream::openFile(bool, bool, bool)
Dynamic exception type: boost::wrapexcept<QuadDCommon::CreateFileException>
std::exception::what: CreateFileException
[QuadDCommon::tag_report_file_name*] = "/results/se_image_classification_mxnet_1x2x408_.nsys-rep"
[boost::errinfo_errno_*] = 11, "Resource temporarily unavailable"
[boost::errinfo_file_name_*] = /results/se_image_classification_mxnet_1x2x408_.nsys-rep
Generated:
    /results/se_image_classification_mxnet_1x2x408_.qdstrm
ENDING TIMING RUN AT 2023-02-13 01:54:43 PM
RESULT,IMAGE_CLASSIFICATION,,1765,,2023-02-13 01:25:18 PM

Importer error status: An unknown error occurred.
Dynamic exception type: boost::filesystem::filesystem_error
std::exception::what: boost::filesystem::file_size: No such file or directory: "/results/se_image_classification_mxnet_1x2x408_.nsys-rep"
Generated:
    /results/se_image_classification_mxnet_1x2x408_.qdstrm
ENDING TIMING RUN AT 2023-02-13 01:55:16 PM
RESULT,IMAGE_CLASSIFICATION,,1798,,2023-02-13 01:25:18 PM
+ set -eux
+ cleanup_docker
+ docker container rm -f image_classification
image_classification

Any suggestions would be greatly appreciated.

Oops, this application uses MPI. I’ll give that a whirl.
T

Never mind, my dumb mistake.
T

How did you solve this problem?
I’m suffering the same issue.