Problems using cuda context

Gantenbein · June 7, 2010, 12:25pm

Hello,

I have some problems using the Cuda context.

I want to allocate memory on the gpu in one DLL, then make my calculations in a second DLL that will be executed in a while loop and finally free the memory in a third DLL that will be called once after the while loop ended.

To make sure that all DLLs can access the allocated data I tried to use a Cuda context that will be created in the first DLL and detatched there from the host thread. The second DLL should “catch” this context, do some calculations and detatch it again. Finally the third DLL should destroy the context.

How it seems that first idea does not work, I hope someone can help me using these functions or to make it work by another way.

Here is a scheme of my first trial, that does not work:

[codebox]//First DLL: allocate memory, executes one time before a while loop

void KernelCaller(uint32_t *h_ctx,uint32_t *d_R, …)

{

    CUdevice hDev;

CUcontext hCtx;

cuDeviceGet(&hDev,0);

cuCtxCreate(&hCtx,0,hDev);

*h_ctx=(uint32_t)&hCtx;

float *dR, … ;

cudaMalloc((void**)&dR,size);

…

*d_R=(uint32_t)dR;

…

cuCtxPopCurrent(&hCtx);checkCUDAError(“pop ctx”);

}

// Second DLL: will be executed in a while loop

void KernelCaller(float *R, …, uint32_t hctx, float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{

CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;

cuCtxPushCurrent(h_Ctx);



dim3 grid, block;

block.x=BLOCK_SIZE;

grid.x=(n/BLOCK_SIZE);

        cudaMemcpy(d_R,R,size,cudaMemcpyHostToDevice);

        Kernel_gpu<<<grid,block>>>(d_R, ... );

cuCtxPopCurrent(&h_Ctx);

}

// Third DLL: free allocated memory, executes one time after while loop

void KernelCaller(uint32_t hctx,float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{

CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;

cudaFree(&d_R);

cuCtxDestroy(h_Ctx);

}[/codebox]

Jackal7 · August 11, 2010, 10:02am

Hello,

I have some problems using the Cuda context.

I want to allocate memory on the gpu in one DLL, then make my calculations in a second DLL that will be executed in a while loop and finally free the memory in a third DLL that will be called once after the while loop ended.

To make sure that all DLLs can access the allocated data I tried to use a Cuda context that will be created in the first DLL and detatched there from the host thread. The second DLL should “catch” this context, do some calculations and detatch it again. Finally the third DLL should destroy the context.

How it seems that first idea does not work, I hope someone can help me using these functions or to make it work by another way.

Here is a scheme of my first trial, that does not work:

[codebox]//First DLL: allocate memory, executes one time before a while loop

void KernelCaller(uint32_t *h_ctx,uint32_t *d_R, …)

{
    CUdevice hDev;

CUcontext hCtx;

cuDeviceGet(&hDev,0);

cuCtxCreate(&hCtx,0,hDev);

*h_ctx=(uint32_t)&hCtx;
float *dR, … ;

cudaMalloc((void**)&dR,size);

…

*d_R=(uint32_t)dR;

…

cuCtxPopCurrent(&hCtx);checkCUDAError(“pop ctx”);

}

// Second DLL: will be executed in a while loop

void KernelCaller(float *R, …, uint32_t hctx, float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{
CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;

cuCtxPushCurrent(h_Ctx);



dim3 grid, block;

block.x=BLOCK_SIZE;

grid.x=(n/BLOCK_SIZE);

        cudaMemcpy(d_R,R,size,cudaMemcpyHostToDevice);

        Kernel_gpu<<<grid,block>>>(d_R, ... );
cuCtxPopCurrent(&h_Ctx);

}

// Third DLL: free allocated memory, executes one time after while loop

void KernelCaller(uint32_t hctx,float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{
CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;
cudaFree(&d_R);

cuCtxDestroy(h_Ctx);

}[/codebox]

Hi Gantenbein

I’ve your same problem!..I’m answering here in the nvidia forum but no one replies! have you solved your problem? can you give me an hand with my program:

I’m trying to write a simple program to understando how CUDA context works:

this is my program:

include <pthread.h>

include <stdio.h>

include <cuda.h>

define NUM_THREADS 2

float d1, d2;

float * m1, * m2;

int devnumber = 1;

CUcontext hcuContext = 0;

void *

inizialize (void *)

{

CUdevice hcuDevice;

cuDeviceGet( &hcuDevice, devnumber );

cuCtxCreate( &hcuContext, 0, hcuDevice );

cudaMalloc ((void **) &m1, sizeof (float));

cudaMalloc ((void **) &m2, sizeof (float));

float dd1 = 1.0;

float dd2 = 2.0;

cudaMemcpy (m1, &dd1, sizeof (float), cudaMemcpyHostToDevice);

cudaMemcpy (m2, &dd2, sizeof (float), cudaMemcpyHostToDevice);

//cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

//cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

//fprintf (stdout, “%f %f \n”, d1, d2);

cuCtxPopCurrent(&hcuContext);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

void *

compute_function (void *)

{

cuCtxPushCurrent( hcuContext );

cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

fprintf (stdout, “%f %f \n”, d1, d2);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

int

main (int argc, char *argv)

{

pthread_t threads;

pthread_create (&threads, NULL, inizialize, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

pthread_create (&threads, NULL, compute_function, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

cuCtxDestroy(hcuContext);

return EXIT_SUCCESS;

}

is corret to use the context in this way? I need that the first thread allocate the memory on device and the second thread print 1.0 and 2.0 but without cuda context doesn’t work. With this solution the compiler return the following errors:

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `main’:

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a5c): undefined reference to `cuCtxDestroy’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `compute_function(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a80): undefined reference to `cuCtxPushCurrent’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `inizialize(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b12): undefined reference to `cuDeviceGet’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b24): undefined reference to `cuCtxCreate’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b90): undefined reference to `cuCtxPopCurrent’

give me an hand please i need it works for my degree thesis

Jackal7 · August 11, 2010, 10:02am

Hello,

I have some problems using the Cuda context.

I want to allocate memory on the gpu in one DLL, then make my calculations in a second DLL that will be executed in a while loop and finally free the memory in a third DLL that will be called once after the while loop ended.

To make sure that all DLLs can access the allocated data I tried to use a Cuda context that will be created in the first DLL and detatched there from the host thread. The second DLL should “catch” this context, do some calculations and detatch it again. Finally the third DLL should destroy the context.

How it seems that first idea does not work, I hope someone can help me using these functions or to make it work by another way.

Here is a scheme of my first trial, that does not work:

[codebox]//First DLL: allocate memory, executes one time before a while loop

void KernelCaller(uint32_t *h_ctx,uint32_t *d_R, …)

{
    CUdevice hDev;

CUcontext hCtx;

cuDeviceGet(&hDev,0);

cuCtxCreate(&hCtx,0,hDev);

*h_ctx=(uint32_t)&hCtx;
float *dR, … ;

cudaMalloc((void**)&dR,size);

…

*d_R=(uint32_t)dR;

…

cuCtxPopCurrent(&hCtx);checkCUDAError(“pop ctx”);

}

// Second DLL: will be executed in a while loop

void KernelCaller(float *R, …, uint32_t hctx, float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{
CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;

cuCtxPushCurrent(h_Ctx);



dim3 grid, block;

block.x=BLOCK_SIZE;

grid.x=(n/BLOCK_SIZE);

        cudaMemcpy(d_R,R,size,cudaMemcpyHostToDevice);

        Kernel_gpu<<<grid,block>>>(d_R, ... );
cuCtxPopCurrent(&h_Ctx);

}

// Third DLL: free allocated memory, executes one time after while loop

void KernelCaller(uint32_t hctx,float *d_R, …) //uint32_t d_R already castet to *float i wrapper function

{
CUcontext h_Ctx;

    h_Ctx=(CUcontext)hctx;
cudaFree(&d_R);

cuCtxDestroy(h_Ctx);

}[/codebox]

Hi Gantenbein

I’ve your same problem!..I’m answering here in the nvidia forum but no one replies! have you solved your problem? can you give me an hand with my program:

I’m trying to write a simple program to understando how CUDA context works:

this is my program:

include <pthread.h>

include <stdio.h>

include <cuda.h>

define NUM_THREADS 2

float d1, d2;

float * m1, * m2;

int devnumber = 1;

CUcontext hcuContext = 0;

void *

inizialize (void *)

{

CUdevice hcuDevice;

cuDeviceGet( &hcuDevice, devnumber );

cuCtxCreate( &hcuContext, 0, hcuDevice );

cudaMalloc ((void **) &m1, sizeof (float));

cudaMalloc ((void **) &m2, sizeof (float));

float dd1 = 1.0;

float dd2 = 2.0;

cudaMemcpy (m1, &dd1, sizeof (float), cudaMemcpyHostToDevice);

cudaMemcpy (m2, &dd2, sizeof (float), cudaMemcpyHostToDevice);

//cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

//cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

//fprintf (stdout, “%f %f \n”, d1, d2);

cuCtxPopCurrent(&hcuContext);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

void *

compute_function (void *)

{

cuCtxPushCurrent( hcuContext );

cudaMemcpy (&d1, m1, sizeof (float), cudaMemcpyDeviceToHost);

cudaMemcpy (&d2, m2, sizeof (float), cudaMemcpyDeviceToHost);

fprintf (stdout, “%f %f \n”, d1, d2);

cudaThreadSynchronize ();

pthread_exit (NULL);

}

int

main (int argc, char *argv)

{

pthread_t threads;

pthread_create (&threads, NULL, inizialize, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

pthread_create (&threads, NULL, compute_function, NULL);

if (pthread_join (threads, NULL))

{

fprintf (stderr, “error pthread_join\n”);

return EXIT_FAILURE;

}

cuCtxDestroy(hcuContext);

return EXIT_SUCCESS;

}

is corret to use the context in this way? I need that the first thread allocate the memory on device and the second thread print 1.0 and 2.0 but without cuda context doesn’t work. With this solution the compiler return the following errors:

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `main’:

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a5c): undefined reference to `cuCtxDestroy’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `compute_function(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a80): undefined reference to `cuCtxPushCurrent’

/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `inizialize(void*)':

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b12): undefined reference to `cuDeviceGet’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b24): undefined reference to `cuCtxCreate’

tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b90): undefined reference to `cuCtxPopCurrent’

give me an hand please i need it works for my degree thesis

DarkRoom · August 11, 2010, 12:44pm

Let me try to answer this one.

First, a CUDA context is just a pointer to a special structure, as defined in cuda.h.

typedef struct CUctx_st *CUcontext;

The crucial question, which I’m not sure I can answer in detail, is who is responsible for managing the context pool. Intuitively it must be the thread from which you call cuInit(). The context pool is most likely a static variable declared in cuInit().

If that’s the case, you need to call cuInit() in your main() function. I would create and destroy the contexts here too.

Since the context pool is a static variable, if you call cuInit() in a DLL, the context pool will be visible in that DLL only. Therefore, any context pointer will be meaningless outside the DLL.

Remember that a DLL has its own address space. Also google about DLLs and static functions. One useful link:

[url=“static variable vs. DLL”]http://cboard.cprogramming.com/cplusplus-p...ble-vs-dll.html[/url]

Regards,
Mike

DarkRoom · August 11, 2010, 12:44pm

Let me try to answer this one.

First, a CUDA context is just a pointer to a special structure, as defined in cuda.h.

typedef struct CUctx_st *CUcontext;

The crucial question, which I’m not sure I can answer in detail, is who is responsible for managing the context pool. Intuitively it must be the thread from which you call cuInit(). The context pool is most likely a static variable declared in cuInit().

If that’s the case, you need to call cuInit() in your main() function. I would create and destroy the contexts here too.

Since the context pool is a static variable, if you call cuInit() in a DLL, the context pool will be visible in that DLL only. Therefore, any context pointer will be meaningless outside the DLL.

Remember that a DLL has its own address space. Also google about DLLs and static functions. One useful link:

[url=“static variable vs. DLL”]http://cboard.cprogramming.com/cplusplus-p...ble-vs-dll.html[/url]

Regards,
Mike

Topic		Replies	Views
MultiGPU start help CUDA Programming and Performance	8	10522	August 10, 2010
CUDA context simple program. Need help! CUDA Programming and Performance	0	4521	August 10, 2010
Bad performance or bad coding? CUDA Programming and Performance	21	763	October 12, 2021
How is the compiler optimizing the thread launch? CUDA Programming and Performance	12	314	October 26, 2022
cudaMalloced memory cannot be used in other functions memory managment CUDA Programming and Performance	10	7057	May 24, 2010
Effective Parallelisation of CUDA C code CUDA Programming and Performance	38	1918	December 27, 2021
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3190	August 27, 2010
Strange problem with kernel launch CUDA Programming and Performance	13	1882	October 21, 2010
Beginer question Thread synchronization with shared memory CUDA Programming and Performance	35	9312	April 6, 2010
cudamalloc struct problems - unspecified launch failure CUDA Programming and Performance	16	3611	April 8, 2011

Problems using cuda context

Related topics