MXNetError: ImageRec need opencv to process

I’m trying to train the MXNet resnet-50 model from the finetune notebook (Fine-tune with Pretrained Models — mxnet documentation)

and I keep encountering the following error on my Jetson Xavier NX:

MXNetError: Traceback (most recent call last):
File “/home/nvidia/mxnet/mxnet/src/io/iter_image_recordio_2.cc”, line 260
MXNetError: ImageRec need opencv to process

It seems to not have access to opencv even though I have Opencv4.5 installed. I’ve tried re-installing mxnet 1.7 several times (following the instructions from Jetson zoo) with no luck getting this to work.

I even tried modifying the autoinstall_mxnet.sh file prior to install by changing the line
"sudo make -j$(nproc) install && "
with
"sudo make -j$(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 install && "

That didn’t seem to work. Any ideas what is wrong?

Hi,

The package doesn’t build with OpenCV support.
Could you turn it on and build it from source?

Thanks.

I built mxnet from source using the above autobuild_mxnet.sh script but its giving me an error when I try to use it on the command prompt:

Any ideas?

Hi,

Which branch do you build from.
It seems that ndarray class occurs from branch 1.6.x.
https://github.com/apache/incubator-mxnet/blob/v1.6.x/python/mxnet/init.py#L33

Thanks.

From JEP/autobuild_mxnet.sh at master · AastaNV/JEP · GitHub
script, I built using
git clone GitHub - apache/incubator-mxnet: Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more --branch v1.7.x --recursive mxnet

So its mxnet v1.7.x. Are you saying that v1.7.x won’t work for the autobuild_mxnet.sh install? I ask because it seemed to work fine with the autoinstall_mxnet.sh install.

Hi,

The class exists in branch 1.6.x or newer.
So it should be included in v1.7.x also.

We are going to check this issue internally.
Will update more information later.

Thanks.

Ok I’ll try again with v1.7x. Should I use the autobuild_mxnet.sh script? Please let me know. I really trying to get this working. Thanks.

Hi,

We can build MXNet v1.8.0 on JetPack 4.5.1 and the mx.nd can work without issue.
Could you give it a try?

1. Put autobuild_mxnet.sh (3.5 KB) and mxnet_v1.8.x.patch (2.4 KB) in the same folder.

2. Build MXNet from source

$ sudo chmod +x autobuild_mxnet.sh
$ ./autobuild_mxnet.sh Xavier
$ cd mxnet/build/
$ pip3 install mxnet-1.8.0-py3-none-any.whl

Thanks.

I was able to build from source using the above files and run mxnet and downloaded mxnet 1.8 from.
"
gdown “Google Drive - Virus scan warning” -O “mxnet-1.8.0-py3-none-any.whl”
"
Not sure if that’s right.
The test examples work fine now. Thanks.

a = mx.nd.ones((2, 3), mx.gpu())
b = a * 2 + 1
b.asnumpy()
array([[3., 3., 3.],
[3., 3., 3.]], dtype=float32)

However, I’m still getting the same opencv error from the finetune example:

MXNetError: Traceback (most recent call last):
File “/home/nvidia/mxnet/mxnet/src/io/iter_image_recordio_2.cc”, line 260
MXNetError: ImageRec need opencv to process

Do we need to set the path for opencv somewhere?

Hi,

Could you share the source/model/steps to run the finetune example.
So we can check this in our environment directly?

Thanks.

finetune.ipynb (10.3 KB)

I have attached the fine_tune notebook as well as provided the mxnet link for the notebook. Thanks.

Hi,

Thanks for your sharing.

We are checking this issue internally.
Will update more information here once we got any progress.

Thanks.

Thank you

Hi,

Is there any update on fine_tune notebook using mxnet?

Hi,

We didn’t meet the OpenCV issue.
It seems iter_image_recordio_2.cc can work in our environment.

However, we meet another CUDA related issue.
We are checking this issue internally.
Will share more information with you later.

$ python3 finetune.py
[13:00:20] /home/nvidia/topic_175013/mxnet/src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[13:00:20] /home/nvidia/topic_175013/mxnet/src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[13:00:20] /home/nvidia/topic_175013/mxnet/src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: ./caltech-256-60-train.rec, use 4 threads for decoding..
[13:00:27] /home/nvidia/topic_175013/mxnet/src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: ./caltech-256-60-val.rec, use 4 threads for decoding..
Traceback (most recent call last):
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1938, in simple_bind
    ctypes.byref(exe_handle)))
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "/home/nvidia/topic_175013/mxnet/src/engine/./../common/cuda_utils.h", line 395
CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: invalid device ordinal

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "finetune.py", line 214, in <module>
    mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)
  File "finetune.py", line 196, in fit
    eval_metric='acc')
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/module/base_module.py", line 498, in fit
    for_training=True, force_rebind=force_rebind)
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/module/module.py", line 429, in bind
    state_names=self._state_names)
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 280, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 384, in bind_exec
    shared_group))
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 678, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/home/nvidia/.local/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1944, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (16, 3, 224, 224)
softmax_label: (16,)
Traceback (most recent call last):
  File "/home/nvidia/topic_175013/mxnet/src/engine/./../common/cuda_utils.h", line 395
CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: invalid device ordinal

Thanks.

Hi,

The error shared in Jun 15 is caused by incorrect gpu number.
We change the num_gpus parameter, and the script can be launched without any error.

num_gpus = 1
$ python3 finetune.py
[13:58:24] /home/nvidia/topic_175013/mxnet/src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[13:58:24] /home/nvidia/topic_175013/mxnet/src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[13:58:25] /home/nvidia/topic_175013/mxnet/src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: ./caltech-256-60-train.rec, use 4 threads for decoding..
[13:58:33] /home/nvidia/topic_175013/mxnet/src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: ./caltech-256-60-val.rec, use 4 threads for decoding..
[13:58:38] /home/nvidia/topic_175013/mxnet/src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
2021-06-16 13:58:52,901 Epoch[0] Batch [0-10]   Speed: 36.31 samples/sec        accuracy=0.000000
2021-06-16 13:58:57,310 Epoch[0] Batch [10-20]  Speed: 36.31 samples/sec        accuracy=0.025000
2021-06-16 13:59:01,722 Epoch[0] Batch [20-30]  Speed: 36.27 samples/sec        accuracy=0.018750
2021-06-16 13:59:06,120 Epoch[0] Batch [30-40]  Speed: 36.39 samples/sec        accuracy=0.025000
2021-06-16 13:59:10,504 Epoch[0] Batch [40-50]  Speed: 36.50 samples/sec        accuracy=0.012500
2021-06-16 13:59:14,876 Epoch[0] Batch [50-60]  Speed: 36.61 samples/sec        accuracy=0.037500
2021-06-16 13:59:19,251 Epoch[0] Batch [60-70]  Speed: 36.58 samples/sec        accuracy=0.093750
2021-06-16 13:59:23,630 Epoch[0] Batch [70-80]  Speed: 36.56 samples/sec        accuracy=0.075000
2021-06-16 13:59:28,009 Epoch[0] Batch [80-90]  Speed: 36.55 samples/sec        accuracy=0.087500
2021-06-16 13:59:32,387 Epoch[0] Batch [90-100] Speed: 36.55 samples/sec        accuracy=0.081250
2021-06-16 13:59:36,765 Epoch[0] Batch [100-110]        Speed: 36.55 samples/sec        accuracy=0.150000
2021-06-16 13:59:41,151 Epoch[0] Batch [110-120]        Speed: 36.48 samples/sec        accuracy=0.137500
2021-06-16 13:59:45,528 Epoch[0] Batch [120-130]        Speed: 36.56 samples/sec        accuracy=0.218750
2021-06-16 13:59:49,908 Epoch[0] Batch [130-140]        Speed: 36.54 samples/sec        accuracy=0.168750
2021-06-16 13:59:54,283 Epoch[0] Batch [140-150]        Speed: 36.58 samples/sec        accuracy=0.175000
2021-06-16 13:59:58,657 Epoch[0] Batch [150-160]        Speed: 36.59 samples/sec        accuracy=0.212500
2021-06-16 14:00:03,038 Epoch[0] Batch [160-170]        Speed: 36.53 samples/sec        accuracy=0.156250
2021-06-16 14:00:07,422 Epoch[0] Batch [170-180]        Speed: 36.50 samples/sec        accuracy=0.218750
2021-06-16 14:00:11,796 Epoch[0] Batch [180-190]        Speed: 36.58 samples/sec        accuracy=0.250000
...

It seems that MXNet build from this comment can work correctly.
Could you give it a try? We test it on JetPack 4.5.1.

Thanks.

I removed my original installation with mxnet and got following error. Not sure if I need to install or remove anything else. Where can I download “mxnet-1.8.0-py3-none-any.whl”? There doesn’t seem to be any place to download it.

I was able to get it working through a workaround. MXNet now works fine with the source build installation without the “MXNetError: ImageRec need opencv to process” error.

BTW, I had to downgrade numpy. Apparently numpy 1.19.5 has issues which is causing the “core dumped” error. Thanks again for your help.

Good to know this!
Thanks for updating the latest status with us : )

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.