Problems using cuda context

Hello,

I have some problems using the Cuda context.

I want to allocate memory on the gpu in one DLL, then make my calculations in a second DLL that will be executed in a while loop and finally free the memory in a third DLL that will be called once after the while loop ended.

To make sure that all DLLs can access the allocated data I tried to use a Cuda context that will be created in the first DLL and detatched there from the host thread. The second DLL should “catch” this context, do some calculations and detatch it again. Finally the third DLL should destroy the context.

How it seems that first idea does not work, I hope someone can help me using these functions or to make it work by another way.

Here is a scheme of my first trial, that does not work:

[codebox]//First DLL: allocate memory, executes one time before a while loop

void KernelCaller(uint32_t *h_ctx,uint32_t *d_R, …)

{

    CUdevice hDev;

CUcontext hCtx;

cuDeviceGet(&hDev,0);

cuCtxCreate(&hCtx,0,hDev);

*h_ctx=(uint32_t)&hCtx;

float *dR, … ;

cudaMalloc((void**)&dR,size);

*d_R=(uint32_t)dR;

cuCtxPopCurrent(&hCtx);checkCUDAError(“pop ctx”);

}

// Second DLL: will be executed in a while loop

void KernelCaller(float *R, …, uint32_t hctx, float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{

CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;

cuCtxPushCurrent(h_Ctx);



dim3 grid, block;

block.x=BLOCK_SIZE;

grid.x=(n/BLOCK_SIZE);

        cudaMemcpy(d_R,R,size,cudaMemcpyHostToDevice);

        Kernel_gpu<<<grid,block>>>(d_R, ... );

cuCtxPopCurrent(&h_Ctx);

}

// Third DLL: free allocated memory, executes one time after while loop

void KernelCaller(uint32_t hctx,float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{

CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;

cudaFree(&d_R);

cuCtxDestroy(h_Ctx);

}[/codebox]

Hi Gantenbein

I’ve your same problem!..I’m answering here in the nvidia forum but no one replies! have you solved your problem? can you give me an hand with my program:

I’m trying to write a simple program to understando how CUDA context works:

this is my program:


#include <pthread.h>

#include <stdio.h>

#include <cuda.h>

#define NUM_THREADS 2

float d1, d2;

float * m1, * m2;

int devnumber = 1;

CUcontext hcuContext = 0;

void *

inizialize (void *)

{

CUdevice hcuDevice;

cuDeviceGet( &hcuDevice, devnumber );

cuCtxCreate( &hcuContext, 0, hcuDevice );

cudaMalloc ((void **) &m1, sizeof (float));

cudaMalloc ((void **) &m2, sizeof (float));

float dd1 = 1.0;

float dd2 = 2.0;

cudaMemcpy (m1, &dd1, sizeof (float), cudaMemcpyHostToDevice);

cudaMemcpy (m2, &dd2, sizeof (float), cudaMemcpyHostToDevice);

//cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

//cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

//fprintf (stdout, “%f %f \n”, d1, d2);

cuCtxPopCurrent(&hcuContext);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

void *

compute_function (void *)

{

cuCtxPushCurrent( hcuContext );

cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

fprintf (stdout, “%f %f \n”, d1, d2);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

int

main (int argc, char *argv)

{

pthread_t threads;

pthread_create (&threads, NULL, inizialize, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

pthread_create (&threads, NULL, compute_function, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

cuCtxDestroy(hcuContext);

return EXIT_SUCCESS;

}


is corret to use the context in this way? I need that the first thread allocate the memory on device and the second thread print 1.0 and 2.0 but without cuda context doesn’t work. With this solution the compiler return the following errors:


/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `main’:

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a5c): undefined reference to `cuCtxDestroy’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `compute_function(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a80): undefined reference to `cuCtxPushCurrent’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `inizialize(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b12): undefined reference to `cuDeviceGet’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b24): undefined reference to `cuCtxCreate’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b90): undefined reference to `cuCtxPopCurrent’


give me an hand please i need it works for my degree thesis

Hi Gantenbein

I’ve your same problem!..I’m answering here in the nvidia forum but no one replies! have you solved your problem? can you give me an hand with my program:

I’m trying to write a simple program to understando how CUDA context works:

this is my program:


#include <pthread.h>

#include <stdio.h>

#include <cuda.h>

#define NUM_THREADS 2

float d1, d2;

float * m1, * m2;

int devnumber = 1;

CUcontext hcuContext = 0;

void *

inizialize (void *)

{

CUdevice hcuDevice;

cuDeviceGet( &hcuDevice, devnumber );

cuCtxCreate( &hcuContext, 0, hcuDevice );

cudaMalloc ((void **) &m1, sizeof (float));

cudaMalloc ((void **) &m2, sizeof (float));

float dd1 = 1.0;

float dd2 = 2.0;

cudaMemcpy (m1, &dd1, sizeof (float), cudaMemcpyHostToDevice);

cudaMemcpy (m2, &dd2, sizeof (float), cudaMemcpyHostToDevice);

//cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

//cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

//fprintf (stdout, “%f %f \n”, d1, d2);

cuCtxPopCurrent(&hcuContext);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

void *

compute_function (void *)

{

cuCtxPushCurrent( hcuContext );

cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

fprintf (stdout, “%f %f \n”, d1, d2);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

int

main (int argc, char *argv)

{

pthread_t threads;

pthread_create (&threads, NULL, inizialize, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

pthread_create (&threads, NULL, compute_function, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

cuCtxDestroy(hcuContext);

return EXIT_SUCCESS;

}


is corret to use the context in this way? I need that the first thread allocate the memory on device and the second thread print 1.0 and 2.0 but without cuda context doesn’t work. With this solution the compiler return the following errors:


/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `main’:

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a5c): undefined reference to `cuCtxDestroy’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `compute_function(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a80): undefined reference to `cuCtxPushCurrent’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `inizialize(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b12): undefined reference to `cuDeviceGet’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b24): undefined reference to `cuCtxCreate’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b90): undefined reference to `cuCtxPopCurrent’


give me an hand please i need it works for my degree thesis

Let me try to answer this one.

First, a CUDA context is just a pointer to a special structure, as defined in cuda.h.

typedef struct CUctx_st *CUcontext;

The crucial question, which I’m not sure I can answer in detail, is who is responsible for managing the context pool. Intuitively it must be the thread from which you call cuInit(). The context pool is most likely a static variable declared in cuInit().

If that’s the case, you need to call cuInit() in your main() function. I would create and destroy the contexts here too.

Since the context pool is a static variable, if you call cuInit() in a DLL, the context pool will be visible in that DLL only. Therefore, any context pointer will be meaningless outside the DLL.

Remember that a DLL has its own address space. Also google about DLLs and static functions. One useful link:

[url=“static variable vs. DLL”]http://cboard.cprogramming.com/cplusplus-p...ble-vs-dll.html[/url]

Regards,
Mike

Let me try to answer this one.

First, a CUDA context is just a pointer to a special structure, as defined in cuda.h.

typedef struct CUctx_st *CUcontext;

The crucial question, which I’m not sure I can answer in detail, is who is responsible for managing the context pool. Intuitively it must be the thread from which you call cuInit(). The context pool is most likely a static variable declared in cuInit().

If that’s the case, you need to call cuInit() in your main() function. I would create and destroy the contexts here too.

Since the context pool is a static variable, if you call cuInit() in a DLL, the context pool will be visible in that DLL only. Therefore, any context pointer will be meaningless outside the DLL.

Remember that a DLL has its own address space. Also google about DLLs and static functions. One useful link:

[url=“static variable vs. DLL”]http://cboard.cprogramming.com/cplusplus-p...ble-vs-dll.html[/url]

Regards,
Mike