Hello!
I have problem when using beta 3.0 (cudadriver_3.0-beta1_linux_64_195.17-beta.run) and DRIVER api with cufft. Host is native linux, and gcc is:
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.1.3 --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --with-tune=generic --enable-checking=release x86_64-linux-gnu
Thread model: posix
gcc version 4.1.3 20080704 (prerelease) (Ubuntu 4.1.2-27ubuntu1)
The problem is that i cannot destroy (or push) context after CUFFT call is made. Pop reports no error but following Destroy and Push return CUDA_ERROR_INVALID_VALUE.
The program code, that i used do detect this is
Here are the highlights:
// Get handle for device 0
CUdevice cuDevice = 0;
CUDA_CHECK( cuDeviceGet(&cuDevice, 0) );
// Create context
CUcontext cuContext;
CUDA_CHECK( cuCtxCreate(&cuContext, 0, cuDevice) );
// Create module from binary file
CUmodule cuModule;
printf("Loading module PTX!\n");
CUDA_CHECK( cuModuleLoad(&cuModule, "test_drv_kernel.ptx") );
// Get function handle from module
CUfunction vecAdd;
CUDA_CHECK( cuModuleGetFunction(&vecAdd, cuModule, "VecAdd") );
.... MORE CODE IN ACTUAL FILE ...
// OK then freese here
CUDA_CHECK( cuCtxPopCurrent(NULL));
printf("Waiting for 1 s %lu %lu\n", sizeof(CUcontext), sizeof(void*));
sleep(1);
CUDA_CHECK( cuCtxPushCurrent( cuContext ) );
.... MORE CODE IN ACTUAL FILE ...
CUFFT_CHECK( cufftPlan1d( &fftkernel, size, CUFFT_C2C, 1 ) );
CUFFT_CHECK( cufftExecC2C(fftkernel, (cufftComplex *)(size_t)d_a,
(cufftComplex *)(size_t)d_a, CUFFT_FORWARD));
CUFFT_CHECK( cufftDestroy(fftkernel) );
CUDA_CHECK( cuMemFree(d_a) );
CUDA_CHECK( cuMemFree(d_b) );
CUDA_CHECK( cuMemFree(d_c) );
CUDA_CHECK( cuCtxPopCurrent(NULL));
CUDA_CHECK( cuCtxDestroy( cuContext ) );
And the first call that fails is the cuCtxDestroy.
Am i doing something wrong or is this really error (bug)?
Also valgrind reports some memory leaks + uninitialised conditional jumps on libraries:
==12562== Memcheck, a memory error detector
==12562== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==12562== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
==12562== Command: ./test_drv.exe
==12562==
Creating context
==12562== Syscall param ioctl(generic) points to uninitialised byte(s)
==12562== at 0x82CD587: ioctl (in /lib/libc-2.10.1.so)
==12562== by 0x4F29EB3: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EF0EBE: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4ED4057: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EACBEA: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EA597D: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4F3D3A6: cuCtxCreate (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4011DB: main (test_drv.c:95)
==12562== Address 0x7fefff6a0 is on thread 1's stack
==12562==
Loading module PTX!
==12562== Conditional jump or move depends on uninitialised value(s)
==12562== at 0x50C50EF: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x50C4E9B: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x50C53D1: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x50C5750: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x5089E96: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x5090B73: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x5086C81: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EBD1DB: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EBDEAD: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EA9C81: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4F3C30B: cuModuleLoad (in /usr/lib/libcuda.so.195.17)
==12562== by 0x401225: main (test_drv.c:100)
==12562==
==12562== Conditional jump or move depends on uninitialised value(s)
==12562== at 0x50C50F4: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x50C4E9B: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x50C53D1: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x50C5750: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x5089E96: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x5090B73: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x5086C81: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EBD1DB: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EBDEAD: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4EA9C81: ??? (in /usr/lib/libcuda.so.195.17)
==12562== by 0x4F3C30B: cuModuleLoad (in /usr/lib/libcuda.so.195.17)
==12562== by 0x401225: main (test_drv.c:100)
==12562==
Allocating memory
Memory allocated and transferred ok!
Doing 40 blocks each containing 128 threads
Waiting for 1 s 8 8
Kernel done!
Total diff was: 0.000000
Testing FFT
Bye bye!
==12562==
==12562== HEAP SUMMARY:
==12562== in use at exit: 16,592 bytes in 12 blocks
==12562== total heap usage: 6,507 allocs, 6,495 frees, 10,715,544 bytes allocated
==12562==
==12562== LEAK SUMMARY:
==12562== definitely lost: 3,376 bytes in 2 blocks
==12562== indirectly lost: 0 bytes in 0 blocks
==12562== possibly lost: 0 bytes in 0 blocks
==12562== still reachable: 13,216 bytes in 10 blocks
==12562== suppressed: 0 bytes in 0 blocks
==12562== Rerun with --leak-check=full to see details of leaked memory
==12562==
==12562== For counts of detected and suppressed errors, rerun with: -v
==12562== Use --track-origins=yes to see where uninitialised values come from
==12562== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 4 from 4)
sundberg@mediapc:~/eigenor_local/tomosuite/branches/cuda_drv_test$ valgrind --leak-check=full ./test_drv.exe
==12567== Memcheck, a memory error detector
==12567== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==12567== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
==12567== Command: ./test_drv.exe
==12567==
Creating context
==12567== Syscall param ioctl(generic) points to uninitialised byte(s)
==12567== at 0x82CD587: ioctl (in /lib/libc-2.10.1.so)
==12567== by 0x4F29EB3: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EF0EBE: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4ED4057: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EACBEA: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EA597D: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4F3D3A6: cuCtxCreate (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4011DB: main (test_drv.c:95)
==12567== Address 0x7fefff6a0 is on thread 1's stack
==12567==
Loading module PTX!
==12567== Conditional jump or move depends on uninitialised value(s)
==12567== at 0x50C50EF: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x50C4E9B: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x50C53D1: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x50C5750: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x5089E96: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x5090B73: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x5086C81: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EBD1DB: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EBDEAD: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EA9C81: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4F3C30B: cuModuleLoad (in /usr/lib/libcuda.so.195.17)
==12567== by 0x401225: main (test_drv.c:100)
==12567==
==12567== Conditional jump or move depends on uninitialised value(s)
==12567== at 0x50C50F4: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x50C4E9B: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x50C53D1: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x50C5750: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x5089E96: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x5090B73: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x5086C81: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EBD1DB: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EBDEAD: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EA9C81: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4F3C30B: cuModuleLoad (in /usr/lib/libcuda.so.195.17)
==12567== by 0x401225: main (test_drv.c:100)
==12567==
Allocating memory
Memory allocated and transferred ok!
Doing 40 blocks each containing 128 threads
Waiting for 1 s 8 8
Kernel done!
Total diff was: 0.000000
Testing FFT
Bye bye!
==12567==
==12567== HEAP SUMMARY:
==12567== in use at exit: 16,592 bytes in 12 blocks
==12567== total heap usage: 6,507 allocs, 6,495 frees, 10,715,544 bytes allocated
==12567==
==12567== 32 bytes in 1 blocks are definitely lost in loss record 3 of 12
==12567== at 0x4C25153: malloc (vg_replace_malloc.c:195)
==12567== by 0x4ECB422: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4ECB80C: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EACD34: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EA597D: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4F3D3A6: cuCtxCreate (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4011DB: main (test_drv.c:95)
==12567==
==12567== 3,344 bytes in 1 blocks are definitely lost in loss record 10 of 12
==12567== at 0x4C25153: malloc (vg_replace_malloc.c:195)
==12567== by 0x508625D: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EBD14E: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EBDEAD: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4EA9C81: ??? (in /usr/lib/libcuda.so.195.17)
==12567== by 0x4F3C30B: cuModuleLoad (in /usr/lib/libcuda.so.195.17)
==12567== by 0x401225: main (test_drv.c:100)
==12567==
==12567== LEAK SUMMARY:
==12567== definitely lost: 3,376 bytes in 2 blocks
==12567== indirectly lost: 0 bytes in 0 blocks
==12567== possibly lost: 0 bytes in 0 blocks
==12567== still reachable: 13,216 bytes in 10 blocks
==12567== suppressed: 0 bytes in 0 blocks
==12567== Reachable blocks (those to which a pointer was found) are not shown.
==12567== To see them, rerun with: --leak-check=full --show-reachable=yes
==12567==
==12567== For counts of detected and suppressed errors, rerun with: -v
==12567== Use --track-origins=yes to see where uninitialised values come from
==12567== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 4 from 4)