I’m trying to make a program work on a multiGPU setup. But I have a few questions.
before running the code I need to allocate some memory on each device. I do that in a init function because It’s for openGL and I can’t alloc in the loop, because then i would quite fast run opt of memory.
is that just
cudaSetDevice(0);
cudaMalloc((void**)&d_Data_0, number * sizeof(float));
cudaSetDevice(1);
cudaMalloc((void**)&d_Data_1, number * sizeof(float));
I can’t see how any of them would be easy to play with. I would prefer not declaring pointers within a thread, since it loops, and I reuse it instead of allocating for new, because the first have giving me some memory leak problems. But my best guess is that they can’t use the same device pointer, but I might be wrong here. I can see the simplicity in ocelot, but I think it have the same problem.
But if they can use the same pointers, as long as they are within a “thread” then I could see some smart things. Though I don’t have diffrent devices on the system, I can imagine that would give allocation problems with a threadded approach
Yeah, even with ocelot, if you allocate memory on one device pointers to that memory will only be valid in kernels that are called on that device. It is one of the drawbacks of each card having a separate address space.
If I understand what you’re asking correctly, you’re trying to do something very similar to what I’m doing. Let me take a stab at what I think you want to accomplish.
//Prepare your host data
#pragma omp parallel
{
switch(omp_get_thread_num()) {
case 0:
cudaMemcpy(g_data0, &h_data[0], SIZE/2*sizeof(float), cudaMemcpyHostToDevice);
//Of course, this won't work without the thread information
launch_Kernel(g_data0);
cudaMemcpy(&h_data[0], g_data0, SIZE/2*sizeof(float), cudaMemcpyDeviceToHost);
break;
case 1:
cudaMemcpy(g_data1, &h_data1, number*sizeof(float), cudaMemcpyHostToDevice);
launch_Kernel(g_data1);
cudaMemcpy(&h_data1, g_data1, number*sizeof(float), cudaMemcpyDeviceToHost);
break;
}
}[/codebox]
Two things to note. In the memory transfers, you are using an offset for h_data so that the end result is a combination of the two arrays. The other thing is that while you have to use different device pointers for each device, you are able to reuse them throughout the loop.
is corret to use the context in this way? I need that the second thread print 1.0 and 2.0 but without cuda context doesn’t work. With this solution the compiler return the following errors:
/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `main’:
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a5c): undefined reference to `cuCtxDestroy’
/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `compute_function(void*)':
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a80): undefined reference to `cuCtxPushCurrent’
/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `inizialize(void*)':
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b12): undefined reference to `cuDeviceGet’
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b24): undefined reference to `cuCtxCreate’
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b90): undefined reference to `cuCtxPopCurrent’
someone can give me an hand? Please i need it works for my degree thesis
is corret to use the context in this way? I need that the second thread print 1.0 and 2.0 but without cuda context doesn’t work. With this solution the compiler return the following errors:
/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `main’:
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a5c): undefined reference to `cuCtxDestroy’
/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `compute_function(void*)':
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10a80): undefined reference to `cuCtxPushCurrent’
/tmp/tmpxft_00006a46_00000000-12_th1.o: In function `inizialize(void*)':
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b12): undefined reference to `cuDeviceGet’
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b24): undefined reference to `cuCtxCreate’
tmpxft_00006a46_00000000-1_th1.cudafe1.cpp:(.text+0x10b90): undefined reference to `cuCtxPopCurrent’
someone can give me an hand? Please i need it works for my degree thesis