I know there are many out there using these platforms that use Davis King’s Dlib software.
Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software.
Having used this for some time now,
on a number of architectures including x86 and various arm based hardware.
Indeed I have been using it regularly on a number of the Jetson development environments.
In particular the Jetson nano, TX2 and Xavier.
With some success and without any problems that I was aware of.
During some recent testing, I was exercising the test suite that Davis provides with dlib,
to test the various units that make up the library.
The test suite performs flawlessly on all the x86 boxes with various NVIDIA hardware on board and CUDA enabled.
However, on all of the Jetson platforms there is one of the tests that fails identically on
the Jetson machines with cuDNN installed and enabled.
if you exercise
./dtest -d -l all --test_dnn
This will fail on all of the Jetson machines using CUDA.
The particular failure shows up with a gradient_error returned 4.90299+e28.
Obviously a bad out of range error!
The test suite seems well designed and implemented.
If you build dlib without CUDA enabled the tests will pass. ie. with DLIB_USE_CUDA=0.
So a software implementation of the DNN works as it is suppose to and as it does on the X86 boxes.
All be is slower!
here is the end of the x86 execution using CUDA:
57469 INFO [0] test.dnn: slope_error: 0.000217438
57469 INFO [0] test.dnn: intercept_error: 0.00847244
62949 INFO [0] test.dnn: rs.mean(): 0.0057435
62949 INFO [0] test.dnn: rs.stddev(): 0.00305919
62949 INFO [0] test.dnn: rs.max(): 0.00976033
74753 INFO [0] test.main: Testing Finished
74753 INFO [0] test.main: Total number of individual testing statements executed: 563439
74753 INFO [0] test.main: All tests completed successfully
here is the end of the Jetson Nano execution using CUDA:
7698 ERROR [0] test.main: Failure message from test:
Error occurred at line 933.
Error occurred in file /h/rfg/w/dlib/dlib/test/dnn.cpp.
Failing expression was max(abs(mat(data_gradient1)-mat(data_gradient2))) < 1e-3.
7698 INFO [0] test.main: Testing Finished
7698 INFO [0] test.main: Total number of individual testing statements executed: 473
7698 WARN [0] test.main: Number of failed tests: 1
7698 WARN [0] test.main: Number of passed tests: 0
Here is the end of the Jetson TX2 execution using CUDA:
8059 ERROR [0] test.main: Failure message from test:
Error occurred at line 933.
Error occurred in file /x/rfg/tx2/w/dlib/dlib/test/dnn.cpp.
Failing expression was max(abs(mat(data_gradient1)-mat(data_gradient2))) < 1e-3.
8060 INFO [0] test.main: Testing Finished
8060 INFO [0] test.main: Total number of individual testing statements executed: 473
8060 WARN [0] test.main: Number of failed tests: 1
8060 WARN [0] test.main: Number of passed tests: 0
And Here is end of the Jetson TX2 execution NOT using CUDA:
212068 INFO [0] test.dnn: slope_error: 9.53674e-05
212068 INFO [0] test.dnn: intercept_error: 0.00631332
220693 INFO [0] test.dnn: rs.mean(): 0.00574357
220693 INFO [0] test.dnn: rs.stddev(): 0.00305946
220693 INFO [0] test.dnn: rs.max(): 0.00976036
469043 INFO [0] test.main: Testing Finished
469043 INFO [0] test.main: Total number of individual testing statements executed: 516379
469043 INFO [0] test.main: All tests completed successfully
So it seems that there might be something amiss with the cuDNN implementation on
the Jetson hardware.
The implementation on the x86 will always work on the x86 hardware with CUDA enabled and cuDNN installed.
Likewise the builds on the Jetson Nano, TX2 and Xavier will always fail in the same way
with CUDA enabled. CUDA enabled is the default.
Davis’ website and git repository give excellent instructions on building, installing, and testing
the software.
As mentioned earlier I have been using dlib without any problems that I was aware of with my applications, it just that the test suite will report these errors.
Plus it is the DNN test that is failing, something that my applications just happen to be using!
Regards,
Ross
bald_guys_errors.txt (47.9 KB)
bald_guys_noerrors.txt (63.3 KB)