NPP with non-default streams

Hello,

We have some code using NPP with a non-default streams. It was working as expected with CUDA 6.5, and based on our testing, it works with CUDA <= 7.5 and breaks under CUDA >= 8. We would like to use the newest version, CUDA 9.2. We’re using driver version 396.37 with CentOS 7.

In the following code sample, the program exits with SIGABRT and the message

terminate called after throwing an instance of 'NppStatus'
Aborted

The backtrace shows the program aborts in nppiMinMaxIndxGetBufferHostSize_16u_C1MR. We’d like to understand if we’re abusing the CUDA API, or if this is a bug.

Makefile (modify CUDA_INSTALL_PATH and CUDA7_INSTALL_PATH):

CUDA_INSTALL_PATH = /data/zmarvel/cuda-9.2
CUDA7_INSTALL_PATH = /data/zmarvel/cuda-7.5

NVCC = $(CUDA_INSTALL_PATH)/bin/nvcc
NVCC7 = $(CUDA7_INSTALL_PATH)/bin/nvcc

# Flags for CUDA 9.2
CUDA_LIB_NAMES = -lnppc -lnppist
# Flags for CUDA 7.5
CUDA7_LIB_NAMES = -lnppc -lnppi

CUDA_INC_PATH = $(CUDA_INSTALL_PATH)/include
CUDA_LIB_PATH = $(CUDA_INSTALL_PATH)/lib64

CUDA7_INC_PATH = $(CUDA7_INSTALL_PATH)/include
CUDA7_LIB_PATH = $(CUDA7_INSTALL_PATH)/lib64

NVCCFLAGS := -g -O0 -Xcompiler -fPIC --gpu-architecture=compute_30 \
	--gpu-code=sm_30,compute_30 --ptxas-options=-v -Xcompiler -Wno-enum-compare

all: main7.5 main9.2

main9.2: main.cpp Makefile
	$(NVCC) $(NVCCFLAGS) \
		-I$(CUDA_INC_PATH) -L$(CUDA_LIB_PATH) \
		$(CUDA_LIB_NAMES) $< -o $@

main7.5: main.cpp Makefile
	$(NVCC7) $(NVCCFLAGS) -I$(CUDA7_INC_PATH) -L$(CUDA7_LIB_PATH) \
		$(CUDA7_LIB_NAMES) $< -o $@

clean:
	rm -f main main7.5 main9.2

Code sample (set LD_LIBRARY_PATH appropriately for main7.5 and main9.2):

#include <stdio.h>
#include <unistd.h>

#include "cuda.h"
#include "cuda_runtime.h"
#include "npp.h"

int main(int argc, char *argv[]) {
#define CUDA_CHECK(x) do { \
   cudaError_t X = x; \
   if (X != CUDA_SUCCESS) \
     return -1; \
} while (0);

#define NPP_CHECK(x) do { \
   NppStatus X = x; \
   if (X != NPP_SUCCESS) \
     return -1; \
} while (0);

   int bufsz;
   NppiSize sz = { 1024, 1024 };
   cudaStream_t stream1, stream2;

   CUDA_CHECK(cudaStreamCreate(&stream1));
   CUDA_CHECK(cudaDeviceSynchronize());
   nppSetStream(stream1);
   CUDA_CHECK(cudaStreamDestroy(stream1));

   // The program will not abort if the following line is removed.
   CUDA_CHECK(cudaDeviceReset());

   CUDA_CHECK(cudaStreamCreate(&stream2));
   CUDA_CHECK(cudaDeviceSynchronize());
   nppSetStream(stream2);

   // Program aborts here
   NPP_CHECK(nppiMinMaxIndxGetBufferHostSize_16u_C1MR(sz, &bufsz));
   printf("bufsz: %d\n", bufsz);

   CUDA_CHECK(cudaStreamDestroy(stream2));

   return 0;
}

Thanks in advance,
Zack

my suggestion would be to file a bug at developer.nvidia.com

the system there is kind of picky, so start with just a very simple empty bug. Then you can add info later. A link to this thread can probably be sufficient. If you get a bug filed and give me the bug number, I can fix it up if needed.

Thanks, I have filed bug 2376318.

Thanks, I’ve assigned it to the appropriate engineering team.

The engineering team has looked at it and acknowledged the defect.

Probably the best recommendation right now is to not use NPP with non-default streams. A future CUDA release is expected to address NPP non-default stream issues.

Thanks for your help!

Does CUDA 10 have this fix?

The latest CUDA 10.1U1 (10.1.168) has various fixes for various issues with NPP and CUDA streams, including this issue.

There may still be issues however:

https://devtalk.nvidia.com/default/topic/1056761/gpu-accelerated-libraries/nppiabsdiff_8u_c3r-incorrect-for-non-blocking-npp-stream/

bugs are always possible.