error occurs about CreateMutex on CUDA

jmguo · November 2, 2017, 6:40am

I have run the program on my desktop GPU (1050Ti),but error occurs when runs it on TX2.
Here is the error output:
“Cannot create operator of type ‘CreateMutex’ on the device ‘CUDA’”
It occurs on the command:
workspace.RunNetOnce(train_model.param_init_net)
workspace.CreateNet(train_model.net)

Why does it happen? Any suggestion is great help.

AastaLLL · November 3, 2017, 6:26am

Hi,

Do you use caffe2? Or could you share more information about your program?

Thanks.

jmguo · November 7, 2017, 2:20am

Sorry late for reply.Here is my main scripts for caffe2 :
def create_model_FP16(m, device_opts,dtype):
with core.DeviceScope(device_opts):
initializer = pFP16Initializer
with brew.arg_scope([brew.conv, brew.fc],
WeightInitializer=initializer,
BiasInitializer=initializer,
enable_tensor_core=True):
conv1 = brew.conv(m, ‘data’, ‘conv1’, dim_in=1, dim_out=96, kernel=11,stride=4)
relu1 = brew.relu(m,conv1, ‘conv1’)
norm1 = brew.lrn(m,relu1, ‘norm1’, size=5, alpha=0.0001, beta=0.75)
pool1 = brew.max_pool(m, norm1, ‘pool1’, kernel=3, stride=2)
conv2 = brew.conv(m, pool1, ‘conv2’, dim_in=96, dim_out=256, kernel=5)
relu2 = brew.relu(m,conv2, ‘conv2’)
norm2 = brew.lrn(m,relu2, ‘norm2’, size=5, alpha=0.0001, beta=0.75)
pool2 = brew.max_pool(m, norm2, ‘pool2’, kernel=3, stride=2)
conv3 = brew.conv(m, pool2, ‘conv3’, dim_in=256, dim_out=384, kernel=3)
relu3 = brew.relu(m,conv3, ‘conv3’)
conv4 = brew.conv(m, relu3, ‘conv4’, dim_in=384, dim_out=384, kernel=3)
relu4 = brew.relu(m,conv4, ‘conv4’)
conv5 = brew.conv(m, conv4, ‘conv5’, dim_in=384, dim_out=256, kernel=3)
relu5 = brew.relu(m,conv5, ‘conv5’)
pool5 = brew.max_pool(m, relu5, ‘pool5’, kernel=3, stride=2)
fc6 = brew.fc(m, pool5, ‘fc6’, dim_in=25622, dim_out=4096)
relu6 = brew.relu(m,fc6, ‘fc6’)
dropout1 = brew.dropout(m,relu6, ‘dropout1’, ratio=0.5, is_test=0)
fc7 = brew.fc(m, dropout1, ‘fc7’, dim_in=4096, dim_out=4096)
relu7 = brew.relu(m,fc7, ‘fc7’)
dropout2 = brew.dropout(m,relu7, ‘dropout2’, ratio=0.5, is_test=0)
fc8 = brew.fc(m, dropout2, ‘fc8’, dim_in=4096, dim_out=1000)
softmax = brew.softmax(m, fc8, ‘softmax’)
m.net.AddExternalOutput(softmax)
# New Addition
softmax=m.net.HalfToFloat(softmax, softmax + ‘_fp32’)
xent = m.LabelCrossEntropy([softmax, “label”], ‘xent’)
loss = m.AveragedLoss(xent, “loss”)
brew.accuracy(m, [softmax, “label”], “accuracy”)

        m.AddGradientOperators([loss])
        opt = optimizer.build_sgd(m, base_learning_rate=0.01, policy="step", stepsize=1, gamma=0.999)
    return softmax

#########
Here is my run the function “create_model_FP16”:
workspace.FeedBlob(“data”, data, device_option=device_opts)
workspace.FeedBlob(“label”, label, device_option=device_opts)

train_model= model_helper.ModelHelper(name="train_net")
softmax = create_model_FP16(train_model, device_opts=device_opts,dtype=dtype,is_test='false')
with core.DeviceScope(device_opts):
    brew.add_weight_decay(train_model, 0.001)  # any effect???

workspace.RunNetOnce(train_model.param_init_net)
workspace.CreateNet(train_model.net)

I run this code on desktop GPU(1050Ti) is ok,but on Nvidia TX2 it occurs error as belows:
“Cannot create operator of type ‘CreateMutex’ on the device ‘CUDA’”
Thanks for reply.

AastaLLL · November 7, 2017, 7:06am

Thanks.

We will reproduce this issue and update information to you later.

AastaLLL · November 7, 2017, 11:18am

Hi,

Could you check if there is a built-it example can reproduce this issue?
We have tested several built-in samples, but all work correctly.

Thanks.

jmguo · November 8, 2017, 6:28am

Thanks for reply. Could I send a email of my codes to you?

jmguo · November 9, 2017, 3:24am

Hi AastaLLL, could you give me your email address? You can run my scripts to reproduce the issue.Thanks for help.

AastaLLL · November 9, 2017, 4:44am

Hi,

Sorry for the late reply. We can’t reveal our mail here.
Could you upload the script and pass the link via private message?

Thanks.

jmguo · November 9, 2017, 9:37am

Hi AastaLLL, I have send a private message to you.

AastaLLL · November 10, 2017, 2:13am

Hi,

Thanks for the feedback.
We have downloaded the script. Will update information to you later.

AastaLLL · November 10, 2017, 5:01am

Hi,

The cause is a missing TX2 GPU architecture in the caffe2.
We can run your script after applying this change in caffe2:

diff --git a/cmake/Cuda.cmake b/cmake/Cuda.cmake
index 2425375..54605ef 100644
--- a/cmake/Cuda.cmake
+++ b/cmake/Cuda.cmake
@@ -6,7 +6,7 @@
 # Default is set to cuda 9. If we detect the cuda architectores to be less than
 # 9, we will lower it to the corresponding known archs.
 set(Caffe2_known_gpu_archs "30 35 50 52 60 61 70") # for CUDA 9.x
-set(Caffe2_known_gpu_archs8 "20 21(20) 30 35 50 52 60 61") # for CUDA 8.x
+set(Caffe2_known_gpu_archs8 "20 21(20) 30 35 50 52 60 61 62") # for CUDA 8.x
 set(Caffe2_known_gpu_archs7 "20 21(20) 30 35 50 52") # for CUDA 7.x

Thanks.

jmguo · November 11, 2017, 6:26am

Thanks , i will try it soon.

jmguo · November 11, 2017, 6:39am

Thanks for help.It doesn’t work fine.I change the cuda.cmake:

Known NVIDIA GPU achitectures Caffe2 can be compiled for.

This list will be used for CUDA_ARCH_NAME = All option

set(Caffe2_known_gpu_archs “20 21(20) 30 35 50 52 60 61 70”)#for cuda 9.x
set(Caffe2_known_gpu_archs8 “20 21(20) 30 35 50 52 60 61 62”)#for cuda 8.x
set(Caffe2_known_gpu_archs7 “20 21(20) 30 35 50 52”)#for cuda 7.x

And then do this :
rm -rf build
./scripts/build_tegra_x1.sh
cd XX/XX
python XXX.py

Still occurs the error:

Traceback (most recent call last):
File “AlexTest2.py”, line 242, in
train(INIT_NET, PREDICT_NET, epochs=3, batch_size=100, device_opts=device_opts,dtype=dtype,is_test=is_test)
File “AlexTest2.py”, line 162, in train
workspace.RunNetOnce(train_model.param_init_net)
File “/usr/local/caffe2/python/workspace.py”, line 161, in RunNetOnce
return C.run_net_once(StringifyProto(net))
RuntimeError: [enforce fail at operator.cc:110] op. Cannot create operator of type ‘CreateMutex’ on the device ‘CUDA’. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: output: “iteration_mutex” name: “” type: “CreateMutex” device_option { device_type: 1 cuda_gpu_id: 0 }

Any other suggestions?

AastaLLL · November 13, 2017, 2:00am

Hi,

Which JetPack do you use?
We test your source on JetPack3.1. Could you also try caffe2 on JetPack3.1?

Thanks.

jmguo · November 13, 2017, 2:28am

I also use JetPack 3.1 on my TX2 board.I am confused.

AastaLLL · November 14, 2017, 1:50am

Hi,

Okay. We will check caffe2 with another TX2 board and update information to you later.
Thanks.

AastaLLL · November 14, 2017, 2:01am

Hi,

Could you help us check CUDA functionality first?

/usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh .
cd NVIDIA_CUDA-8.0_Samples/0_Simple/vectorAdd
make
./vectorAdd

Thanks.

jmguo · November 14, 2017, 3:00am

Hi, AastaLLL.
I have done the script list. Result is as follows:

nvidia@tegra-ubuntu:/usr/local/cuda-8.0/bin/NVIDIA_CUDA-8.0_Samples/0_Simple/vectorAdd$ ./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

AastaLLL · November 14, 2017, 9:30am

Hi,

Thanks for your checking. Your GPU looks good.

We have double checked caffe2 today, and we can run your script without error.
Here are our steps:

1. Flash the device with JetPack3.1
2. Clone

$ git clone --recursive https://github.com/caffe2/caffe2.git

3. Apply change in comment #11.
4. Build and run

$ ./scripts/build_tegra_x1.sh
$ export PYTHONPATH=$PYTHONPATH:[caffe2_root]/build
$ sudo pip install future
$ sudo apt-get install python-six
$ python [Your script]

By the way, could you try to install caffe2 and run the script with ‘nvidia’ account?
Please also let us know your results.

Thanks.

Topic		Replies	Views
Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "10.2") Jetson Nano cuda	16	8433	August 25, 2023
CUDA error: no kernel image is available for execution on the device Error from operator: output CUDA Setup and Installation	3	18199	January 25, 2019
Caffe for Jetson tx2 with Jetpack 4.2 Jetson TX2	23	5383	October 18, 2021
Matlab 2007b and CUDA Can't compile CUDA example with Matlab CUDA Programming and Performance	19	20099	June 25, 2009
Installing pytorch - /usr/local/cuda/lib64/libcudnn.so: error adding symbols: File in wrong format collect2: error: ld returned 1 exit status Jetson TX2 pytorch	20	5327	March 11, 2022
Problems with CUDA and Matlab R2009a CUDA Programming and Performance	10	11246	March 3, 2010
Issue of Jetson TX1 running samples with Jetpack 2.3 and L4T R24.2 Jetson TX1	14	4306	October 18, 2021
NVCaffe support on TX2 Jetson TX2	24	7235	October 18, 2021
CUDA Fail when running Tensorflow inference Jetson TX2	10	3321	February 2, 2018
IDE for CUDA Using Netbeans C/C++ IDE with CUDA CUDA Programming and Performance	26	39833	February 21, 2011

error occurs about CreateMutex on CUDA

Known NVIDIA GPU achitectures Caffe2 can be compiled for.

This list will be used for CUDA_ARCH_NAME = All option

Related topics