Nsight "Resource temporarily unavailable"

From

> Blockquote

we are running nsys against training_results_v2.1/NVIDIA/benchmarks/resnet/implementations/mxnet-22.04 using the following nsys command line:

NSYSCMD="/usr/local/cuda/bin/nsys profile --trace=cuda,nvtx,osrt,cublas --gpu-metrics-device=all --export=sqlite --cuda-memory-usage=true --force-overwrite true --trace-fork-before-exec true --output /results/se_image_classification_mxnet_${DGXNNODES}x${DGXNGPU}x${BATCHSIZE}_${DATESTAMP}.nsys-rep "

We have no issues when running with 1 GPU but we get the following when running with 2 GPUs:

Generating '/tmp/nsys-report-f168.qdstrm'
Generating '/tmp/nsys-report-f495.qdstrm'
[1/2] [========================100%] se_image_classification_mxnet_1x2x408_.nsys-rep
Importer error status: Importation failed.
Import Failed with unexpected exception: /build/agent/work/20a3cfcd1c25021d/QuadD/Common/StreamSections/FileStream.cpp(281): Throw in function void QuadDCommon::FileStream::openFile(bool, bool, bool)
Dynamic exception type: boost::wrapexcept<QuadDCommon::CreateFileException>
std::exception::what: CreateFileException
[QuadDCommon::tag_report_file_name*] = "/results/se_image_classification_mxnet_1x2x408_.nsys-rep"
[boost::errinfo_errno_*] = 11, "Resource temporarily unavailable"
[boost::errinfo_file_name_*] = /results/se_image_classification_mxnet_1x2x408_.nsys-rep
Generated:
    /results/se_image_classification_mxnet_1x2x408_.qdstrm
ENDING TIMING RUN AT 2023-02-13 01:54:43 PM
RESULT,IMAGE_CLASSIFICATION,,1765,,2023-02-13 01:25:18 PM

Importer error status: An unknown error occurred.
Dynamic exception type: boost::filesystem::filesystem_error
std::exception::what: boost::filesystem::file_size: No such file or directory: "/results/se_image_classification_mxnet_1x2x408_.nsys-rep"
Generated:
    /results/se_image_classification_mxnet_1x2x408_.qdstrm
ENDING TIMING RUN AT 2023-02-13 01:55:16 PM
RESULT,IMAGE_CLASSIFICATION,,1798,,2023-02-13 01:25:18 PM
+ set -eux
+ cleanup_docker
+ docker container rm -f image_classification
image_classification

Any suggestions would be greatly appreciated.

Oops, this application uses MPI. I’ll give that a whirl.
T

Never mind, my dumb mistake.
T

How did you solve this problem?
I’m suffering the same issue.

Im having this issue, how did you solve it

what are the versions of Nsys on your host and target?