NPP with non-default streams

zmarvel · September 10, 2018, 7:59pm

Hello,

We have some code using NPP with a non-default streams. It was working as expected with CUDA 6.5, and based on our testing, it works with CUDA <= 7.5 and breaks under CUDA >= 8. We would like to use the newest version, CUDA 9.2. We’re using driver version 396.37 with CentOS 7.

In the following code sample, the program exits with SIGABRT and the message

terminate called after throwing an instance of 'NppStatus'
Aborted

The backtrace shows the program aborts in nppiMinMaxIndxGetBufferHostSize_16u_C1MR. We’d like to understand if we’re abusing the CUDA API, or if this is a bug.

Makefile (modify CUDA_INSTALL_PATH and CUDA7_INSTALL_PATH):

CUDA_INSTALL_PATH = /data/zmarvel/cuda-9.2
CUDA7_INSTALL_PATH = /data/zmarvel/cuda-7.5

NVCC = $(CUDA_INSTALL_PATH)/bin/nvcc
NVCC7 = $(CUDA7_INSTALL_PATH)/bin/nvcc

# Flags for CUDA 9.2
CUDA_LIB_NAMES = -lnppc -lnppist
# Flags for CUDA 7.5
CUDA7_LIB_NAMES = -lnppc -lnppi

CUDA_INC_PATH = $(CUDA_INSTALL_PATH)/include
CUDA_LIB_PATH = $(CUDA_INSTALL_PATH)/lib64

CUDA7_INC_PATH = $(CUDA7_INSTALL_PATH)/include
CUDA7_LIB_PATH = $(CUDA7_INSTALL_PATH)/lib64

NVCCFLAGS := -g -O0 -Xcompiler -fPIC --gpu-architecture=compute_30 \
	--gpu-code=sm_30,compute_30 --ptxas-options=-v -Xcompiler -Wno-enum-compare

all: main7.5 main9.2

main9.2: main.cpp Makefile
	$(NVCC) $(NVCCFLAGS) \
		-I$(CUDA_INC_PATH) -L$(CUDA_LIB_PATH) \
		$(CUDA_LIB_NAMES) $< -o $@

main7.5: main.cpp Makefile
	$(NVCC7) $(NVCCFLAGS) -I$(CUDA7_INC_PATH) -L$(CUDA7_LIB_PATH) \
		$(CUDA7_LIB_NAMES) $< -o $@

clean:
	rm -f main main7.5 main9.2

Code sample (set LD_LIBRARY_PATH appropriately for main7.5 and main9.2):

#include <stdio.h>
#include <unistd.h>

#include "cuda.h"
#include "cuda_runtime.h"
#include "npp.h"

int main(int argc, char *argv[]) {
#define CUDA_CHECK(x) do { \
   cudaError_t X = x; \
   if (X != CUDA_SUCCESS) \
     return -1; \
} while (0);

#define NPP_CHECK(x) do { \
   NppStatus X = x; \
   if (X != NPP_SUCCESS) \
     return -1; \
} while (0);

   int bufsz;
   NppiSize sz = { 1024, 1024 };
   cudaStream_t stream1, stream2;

   CUDA_CHECK(cudaStreamCreate(&stream1));
   CUDA_CHECK(cudaDeviceSynchronize());
   nppSetStream(stream1);
   CUDA_CHECK(cudaStreamDestroy(stream1));

   // The program will not abort if the following line is removed.
   CUDA_CHECK(cudaDeviceReset());

   CUDA_CHECK(cudaStreamCreate(&stream2));
   CUDA_CHECK(cudaDeviceSynchronize());
   nppSetStream(stream2);

   // Program aborts here
   NPP_CHECK(nppiMinMaxIndxGetBufferHostSize_16u_C1MR(sz, &bufsz));
   printf("bufsz: %d\n", bufsz);

   CUDA_CHECK(cudaStreamDestroy(stream2));

   return 0;
}

Thanks in advance,
Zack

Robert_Crovella · September 10, 2018, 8:11pm

my suggestion would be to file a bug at developer.nvidia.com

the system there is kind of picky, so start with just a very simple empty bug. Then you can add info later. A link to this thread can probably be sufficient. If you get a bug filed and give me the bug number, I can fix it up if needed.

zmarvel · September 10, 2018, 8:26pm

Thanks, I have filed bug 2376318.

Robert_Crovella · September 10, 2018, 8:41pm

Thanks, I’ve assigned it to the appropriate engineering team.

Robert_Crovella · September 10, 2018, 9:01pm

The engineering team has looked at it and acknowledged the defect.

Probably the best recommendation right now is to not use NPP with non-default streams. A future CUDA release is expected to address NPP non-default stream issues.

zmarvel · September 11, 2018, 5:07pm

Thanks for your help!

excubiteur · August 8, 2019, 7:08am

Does CUDA 10 have this fix?

Robert_Crovella · August 8, 2019, 1:24pm

The latest CUDA 10.1U1 (10.1.168) has various fixes for various issues with NPP and CUDA streams, including this issue.

There may still be issues however:

[url]https://devtalk.nvidia.com/default/topic/1056761/gpu-accelerated-libraries/nppiabsdiff_8u_c3r-incorrect-for-non-blocking-npp-stream/[/url]

bugs are always possible.

Topic		Replies	Views
Using nppiMean_StdDev_8u_C1R after setNppStream returns NPP_RANGE_ERROR GPU-Accelerated Libraries	2	1665	March 20, 2018
NPP Stream crash GPU-Accelerated Libraries	5	2446	March 21, 2017
NPP & stream problems? GPU-Accelerated Libraries npp	1	1656	October 12, 2021
nppiAbsDiff_8u_C3R incorrect for non-blocking npp stream GPU-Accelerated Libraries	3	1051	July 4, 2019
NPP function nppiCrossCorrFull_NormLevel_8u32f_C1R too slow??? CUDA Programming and Performance	8	1521	March 7, 2015
NPP unresolved external symbol issues GPU-Accelerated Libraries	5	3139	April 20, 2018
Running NPP library with CUDA CUDA Programming and Performance	6	17768	August 1, 2010
Ubuntu 20.04, GCC 9.3, Cuda Toolkit 11.3 - not a supported combination? CUDA Programming and Performance	11	8949	November 4, 2021
undefined reference to `cudaSetupArgument', `cudaLaunch' CUDA Programming and Performance	9	6007	November 12, 2019
Using nppiResizeBatch_8u_C3R causes exception wrap illegal address GPU-Accelerated Libraries npp	3	806	August 24, 2022

NPP with non-default streams

Related topics