Darknet compile error with cudnn 8

garminardev · July 9, 2020, 7:19am

Hi all,

Do you know how to compile YOLOv4 on NX? i got this error with cudnn 8
“CUDNN_CONVOLUTION_FWD_PREFER_FASTEST” was not declared.
Can i downgrade NX to use cudnn 7.x

Thank you~

AastaLLL · July 9, 2020, 8:12am

Hi,

We don’t recommend to downgrade the cuDNN package since this will break lots of dependency.
There are some API changes in our new cuDNN v8.0 and we are working to provide a fix for this.

Will keep you updated.

Thanks.

AastaLLL · July 10, 2020, 4:48am

Hi,

Please check following for the change of darknet cuDNNv8 update.

...
$ wget https://raw.githubusercontent.com/AastaNV/JEP/master/script/topics/0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
$ git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
...

Thanks.

garminardev · July 10, 2020, 6:20am

Hi,
Compiled well, but can’t run on NX, the ram usage in very high and unstable.

AastaLLL · July 10, 2020, 9:26am

Hi,

Sorry that we didn’t notice this since our testing model is tiny-based.

The patch is updated and limit the memory into 2000000000 bytes (default value used in darkent).
Please reset the patch and try it again.

Thanks.

garminardev · July 13, 2020, 1:51am

Hi AastaLLL,
Thanks for great support, new patch works but seems fps it’s extremely low, is this issue the same as “Darknet slower using Jetpack 4.4 (cuDNN 8.0.0 / CUDA 10.2) than Jetpack 4.3 (cuDNN 7.6.3 / CUDA 10.0)”?

AastaLLL · July 13, 2020, 2:29am

Hi,

Yes.
It’s known that there are some performance drop in our latest cuDNN and our internal team is actively working on this.
Some information can be found in our release note here: (in 'Known Issues')

Thanks.

jacky103120038rqj6 · July 13, 2020, 7:36am

HI AastaLLL:

when I try

$ wget https://raw.githubusercontent.com/AastaNV/JEP/master/script/topics/0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
$ git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch

I have some problem
1.where I type these command (location? folder?)
2.when i type git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch

the error message as above: .
what should I do , I use Nvidia Jetson NANO not NX.
THX.

nvidia@nvidia-desktop:~/darknet-master$ git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
15:28:22.413913 git.c:344 trace: built-in: git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
15:28:22.414612 run-command.c:646 trace: run_command: git mailsplit -d4 -o.git/rebase-apply -b – darknet-master//0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
15:28:22.419888 git.c:344 trace: built-in: git mailsplit -d4 -o.git/rebase-apply -b – darknet-master//0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
Applying: fix for cudnn_v8 (limited memory to default darknet setting)
error: src/convolutional_layer.c: does not exist in index
Patch failed at 0001 fix for cudnn_v8 (limited memory to default darknet setting)
Use ‘git am --show-current-patch’ to see the failed patch
When you have resolved this problem, run “git am --continue”.
If you prefer to skip this patch, run “git am --skip” instead.
To restore the original branch and stop patching, run “git am --abort”.

AastaLLL · July 14, 2020, 4:09am

Hi,

Here are the detail steps to run darknet on JetPack4.4 GA.

1. Clone source and apply cuDNN patch

$ git clone https://github.com/pjreddie/darknet.git
$ cd darknet/
$ wget https://raw.githubusercontent.com/AastaNV/JEP/master/script/topics/0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch
$ git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch

2. Apply OpenCV4 patch

diff --git a/Makefile b/Makefile
index 63e15e6..9a7471d 100644
--- a/Makefile
+++ b/Makefile
@@ -42,8 +40,8 @@ CFLAGS+=$(OPTS)
 ifeq ($(OPENCV), 1) 
 COMMON+= -DOPENCV
 CFLAGS+= -DOPENCV
-LDFLAGS+= `pkg-config --libs opencv` -lstdc++
-COMMON+= `pkg-config --cflags opencv` 
+LDFLAGS+= `pkg-config --libs opencv4` -lstdc++
+COMMON+= `pkg-config --cflags opencv4` 
 endif
 
 ifeq ($(GPU), 1) 
diff --git a/src/image_opencv.cpp b/src/image_opencv.cpp
index 7511280..c11805a 100644
--- a/src/image_opencv.cpp
+++ b/src/image_opencv.cpp
@@ -9,30 +9,34 @@ using namespace cv;
 
 extern "C" {
 
-IplImage *image_to_ipl(image im)
+Mat image_to_mat(image im)
 {
+    assert(im.c == 3 || im.c == 1);
     int x,y,c;
-    IplImage *disp = cvCreateImage(cvSize(im.w,im.h), IPL_DEPTH_8U, im.c);
-    int step = disp->widthStep;
+    image copy = copy_image(im);
+    constrain_image(copy);
+    if(im.c == 3) rgbgr_image(copy);
+    Mat m(im.h, im.w, CV_MAKETYPE(CV_8U, im.c));
     for(y = 0; y < im.h; ++y){
         for(x = 0; x < im.w; ++x){
             for(c= 0; c < im.c; ++c){
-                float val = im.data[c*im.h*im.w + y*im.w + x];
-                disp->imageData[y*step + x*im.c + c] = (unsigned char)(val*255);
+                float val = copy.data[c*im.h*im.w + y*im.w + x];
+                m.data[y*im.w*im.c + x*im.c + c] = (unsigned char)(val*255);
             }
         }
     }
-    return disp;
+    free_image(copy);
+    return m;
 }
 
-image ipl_to_image(IplImage* src)
+image mat_to_image(Mat m)
 {
-    int h = src->height;
-    int w = src->width;
-    int c = src->nChannels;
+    int h = m.rows;
+    int w = m.cols;
+    int c = m.channels();
     image im = make_image(w, h, c);
-    unsigned char *data = (unsigned char *)src->imageData;
-    int step = src->widthStep;
+    unsigned char *data = (unsigned char*)m.data;
+    int step = m.step;
     int i, j, k;
 
     for(i = 0; i < h; ++i){
@@ -42,26 +46,6 @@ image ipl_to_image(IplImage* src)
             }
         }
     }
-    return im;
-}
-
-Mat image_to_mat(image im)
-{
-    image copy = copy_image(im);
-    constrain_image(copy);
-    if(im.c == 3) rgbgr_image(copy);
-
-    IplImage *ipl = image_to_ipl(copy);
-    Mat m = cvarrToMat(ipl, true);
-    cvReleaseImage(&ipl);
-    free_image(copy);
-    return m;
-}
-
-image mat_to_image(Mat m)
-{
-    IplImage ipl = m;
-    image im = ipl_to_image(&ipl);
     rgbgr_image(im);
     return im;
 }
@@ -72,9 +56,9 @@ void *open_video_stream(const char *f, int c, int w, int h, int fps)
     if(f) cap = new VideoCapture(f);
     else cap = new VideoCapture(c);
     if(!cap->isOpened()) return 0;
-    if(w) cap->set(CV_CAP_PROP_FRAME_WIDTH, w);
-    if(h) cap->set(CV_CAP_PROP_FRAME_HEIGHT, w);
-    if(fps) cap->set(CV_CAP_PROP_FPS, w);
+    if(w) cap->set(CAP_PROP_FRAME_WIDTH, w);
+    if(h) cap->set(CAP_PROP_FRAME_HEIGHT, w);
+    if(fps) cap->set(CAP_PROP_FPS, w);
     return (void *) cap;
 }
 
@@ -123,7 +107,7 @@ void make_window(char *name, int w, int h, int fullscreen)
 {
     namedWindow(name, WINDOW_NORMAL); 
     if (fullscreen) {
-        setWindowProperty(name, CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
+        setWindowProperty(name, WND_PROP_FULLSCREEN, WINDOW_FULLSCREEN);
     } else {
         resizeWindow(name, w, h);
         if(strcmp(name, "Demo") == 0) moveWindow(name, 0, 0);

3. Update Makefile based on your device

GPU=1
CUDNN=1
OPENCV=1

Xavier & XavierNX:

ARCH= -gencode arch=compute_72,code=sm_72 \
      -gencode arch=compute_72,code=[sm_72,compute_72]

TX2:

ARCH= -gencode arch=compute_62,code=sm_62 \
      -gencode arch=compute_62,code=[sm_62,compute_62]

Nano:

ARCH= -gencode arch=compute_53,code=sm_53 \
      -gencode arch=compute_53,code=[sm_53,compute_53]

4. Build and Test

$ make -j8
$ wget https://pjreddie.com/media/files/yolov3-tiny.weights
$ ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights [video]

Thanks.

jacky103120038rqj6 · July 14, 2020, 5:25am

Hi AastaLL:

./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights [video]

Means that only tiny-yolov3 is currently supported?

2.the detail after
$git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch

the file “convolutional_layer.c.rej”

==============================================================================
diff a/src/convolutional_layer.c b/src/convolutional_layer.c (rejected hunks)
@@ -8,6 +8,9 @@
#include <stdio.h>
#include <time.h>

+#define PRINT_CUDNN_ALGO 0
+#define MEMORY_LIMIT 2000000000
+
#ifdef AI2
#include “xnor_layer.h”
#endif
@@ -145,6 +148,76 @@ void cudnn_convolutional_setup(layer *l)
}
#endif

#if CUDNN_MAJOR >= 8
int returnedAlgoCount;
cudnnConvolutionFwdAlgoPerf_t fw_results[2 * CUDNN_CONVOLUTION_FWD_ALGO_COUNT];
cudnnConvolutionBwdDataAlgoPerf_t bd_results[2 * CUDNN_CONVOLUTION_BWD_DATA_ALGO_COUNT];
cudnnConvolutionBwdFilterAlgoPerf_t bf_results[2 * CUDNN_CONVOLUTION_BWD_FILTER_ALGO_COUNT];
cudnnFindConvolutionForwardAlgorithm(cudnn_handle(),
```
       l->srcTensorDesc,
```
```
       l->weightDesc,
```
```
       l->convDesc,
```
```
       l->dstTensorDesc,
```

       CUDNN_CONVOLUTION_FWD_ALGO_COUNT,

```
       &returnedAlgoCount,
```
```
  fw_results);
```
for(int algoIndex = 0; algoIndex < returnedAlgoCount; ++algoIndex){
```
   #if PRINT_CUDNN_ALGO > 0
```

   printf("^^^^ %s for Algo %d: %f time requiring %llu memory\n",

          cudnnGetErrorString(fw_results[algoIndex].status),

          fw_results[algoIndex].algo, fw_results[algoIndex].time,

          (unsigned long long)fw_results[algoIndex].memory);

```
   #endif
```

   if( fw_results[algoIndex].memory < MEMORY_LIMIT ){

       l->fw_algo = fw_results[algoIndex].algo;

```
       break;
```
}
}
cudnnFindConvolutionBackwardDataAlgorithm(cudnn_handle(),
```
       l->weightDesc,
```
```
       l->ddstTensorDesc,
```
```
       l->convDesc,
```
```
       l->dsrcTensorDesc,
```

       CUDNN_CONVOLUTION_BWD_DATA_ALGO_COUNT,

```
       &returnedAlgoCount,
```
```
       bd_results);
```
for(int algoIndex = 0; algoIndex < returnedAlgoCount; ++algoIndex){
```
   #if PRINT_CUDNN_ALGO > 0
```

   printf("^^^^ %s for Algo %d: %f time requiring %llu memory\n",

          cudnnGetErrorString(bd_results[algoIndex].status),

          bd_results[algoIndex].algo, bd_results[algoIndex].time,

          (unsigned long long)bd_results[algoIndex].memory);

```
   #endif
```

   if( bd_results[algoIndex].memory < MEMORY_LIMIT ){

       l->bd_algo = bd_results[algoIndex].algo;

```
       break;
```
```
   }
```
}
cudnnFindConvolutionBackwardFilterAlgorithm(cudnn_handle(),
```
       l->srcTensorDesc,
```
```
       l->ddstTensorDesc,
```
```
       l->convDesc,
```
```
       l->dweightDesc,
```

       CUDNN_CONVOLUTION_BWD_FILTER_ALGO_COUNT,

```
       &returnedAlgoCount,
```
```
       bf_results);
```
for(int algoIndex = 0; algoIndex < returnedAlgoCount; ++algoIndex){
```
   #if PRINT_CUDNN_ALGO > 0
```

   printf("^^^^ %s for Algo %d: %f time requiring %llu memory\n",

          cudnnGetErrorString(bf_results[algoIndex].status),

          bf_results[algoIndex].algo, bf_results[algoIndex].time,

          (unsigned long long)bf_results[algoIndex].memory);

```
   #endif
```

   if( bf_results[algoIndex].memory < MEMORY_LIMIT ){

       l->bf_algo = bf_results[algoIndex].algo;

```
       break;
```
```
   }
```
}
#else
cudnnGetConvolutionForwardAlgorithm(cudnn_handle(),
l->srcTensorDesc,
l->weightDesc,
@@ -169,6 +242,7 @@ void cudnn_convolutional_setup(layer *l)
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT,
2000000000,
&l->bf_algo);
#endif
}
#endif
#endif

==============================================================================

after I fixed these code in “convolutional_layer.c”
but still “CUDNN_CONVOLUTION_FWD_PREFER_FASTEST” was not declared.

thanks

foobar.warren · July 14, 2020, 7:05am

Re: Step 2 , how to apply the OpenCV4 patch?

Thank you.

yannic.bartel · July 14, 2020, 8:41am

Copy the contents of step 2 into a file, for example “opencv4patch.diff”. Place this file in the root directory of your darknet. Then type git apply opencv4patch.diff

AastaLLL · July 15, 2020, 2:21am

Hi, jacky103120038rqj6

To use other model, just download the corresponding weights file and update the command.
Thanks.

deidnani · July 21, 2020, 9:49pm

Does the patch work with GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet ) because I was thinking of trying to get YOLOv4 to run on the Jetson using CUDNN v8.0 or is it not necessary to apply this patch anymore because for some reason when I use darknet in my program w/ or w/o CUDNN, I get a segmentation fault?
I get the following error when I apply the patch:
Applying: fix for cudnn_v8 (limited memory to default darknet setting)
error: patch failed: src/convolutional_layer.c:145
error: src/convolutional_layer.c: patch does not apply
Patch failed at 0001 fix for cudnn_v8 (limited memory to default darknet setting)

AastaLLL · August 5, 2020, 4:38am

Hi,

AFAIK, the cuDNN v8.0 support is added into AlexeyAB’s source few weeks ago.
So you don’t need to manually apply this change and it should already work with cuDNN v8.0.

Thanks.

deidnani · August 5, 2020, 4:20pm

Thank you for the reply! It turns out that the error was due to recent fixes in darknet (and not due to cuDNN v8.0).

AastaLLL · August 6, 2020, 6:10am

Hi,

Good to know you find the cause.
Do you the sample work on your side now?

Thanks.

deidnani · August 7, 2020, 1:13pm

It does indeed work on my side now!