Suggestions / help Hit a rut

ShingleServ · June 15, 2011, 1:23pm

Hi all,

I’ve been working on a summer research project to recreate the stationary phase of chromatography. The idea is to rewrite a VBA program into c++ and then to run in parallel with CUDA. Currently I have the c++ version fully functional and working, so I began trying to change the project over to CUDA to run on the GPU (C2050)

When planning the project I expected this part to go relatively smoothly. I planned on merely taking the bulk of the program and throwing it in a global kernel and setting the external functions to device calls. The idea being to run multiple instances of the program at once, now instead of getting 1 million results I could get many million results at once for analyzing.

However, since I am posting this it should be obvious this didn’t turn out to be as simple as I expected.
I’m having difficulties with passing variables between device functions while not over writing the data.
I am also having trouble with my calls to Mersenne Twister since they are technically host calls in a device / global function.

My plan is to create ~400 instances of the main program and have them run in parallel, while having the results (pretty much one variable) stored in a global array indexed by the blockIdx to be added and summed after all instances has ran. Since I have this working correctly in C++ I was wondering if I might be approaching this project poorly.

My approach has been to take the original variables and create dev_ versions which are allocated on the GPU and copy the values from the host variables to those on the GPU. I figured I have to do this inside the kernel call so that there are original instances of each variables to avoid accessing the same memory locations by 400+ threads at once. There are probably around 40-50 variables through the program. From there I planned to have to pass the values around through parameters since there doesn’t seem to be a way to make a variable “global to the thread’s scope” as far as I can tell. The problem seems to exist with passing variables around, and getting pointers/references mixed up while moving them around.

Basically I am looking for any suggestions or tips for the easiest and best way to go about doing this. Any and all suggestions or thoughts are more than welcome, and will help much more than banging my head against the wall.

Thank you for the read, I know it’s long.

ShingleServ · June 15, 2011, 1:23pm

Hi all,

I’ve been working on a summer research project to recreate the stationary phase of chromatography. The idea is to rewrite a VBA program into c++ and then to run in parallel with CUDA. Currently I have the c++ version fully functional and working, so I began trying to change the project over to CUDA to run on the GPU (C2050)

When planning the project I expected this part to go relatively smoothly. I planned on merely taking the bulk of the program and throwing it in a global kernel and setting the external functions to device calls. The idea being to run multiple instances of the program at once, now instead of getting 1 million results I could get many million results at once for analyzing.

However, since I am posting this it should be obvious this didn’t turn out to be as simple as I expected.
I’m having difficulties with passing variables between device functions while not over writing the data.
I am also having trouble with my calls to Mersenne Twister since they are technically host calls in a device / global function.

My plan is to create ~400 instances of the main program and have them run in parallel, while having the results (pretty much one variable) stored in a global array indexed by the blockIdx to be added and summed after all instances has ran. Since I have this working correctly in C++ I was wondering if I might be approaching this project poorly.

My approach has been to take the original variables and create dev_ versions which are allocated on the GPU and copy the values from the host variables to those on the GPU. I figured I have to do this inside the kernel call so that there are original instances of each variables to avoid accessing the same memory locations by 400+ threads at once. There are probably around 40-50 variables through the program. From there I planned to have to pass the values around through parameters since there doesn’t seem to be a way to make a variable “global to the thread’s scope” as far as I can tell. The problem seems to exist with passing variables around, and getting pointers/references mixed up while moving them around.

Basically I am looking for any suggestions or tips for the easiest and best way to go about doing this. Any and all suggestions or thoughts are more than welcome, and will help much more than banging my head against the wall.

Thank you for the read, I know it’s long.

ShingleServ · June 16, 2011, 1:36pm

is it possible to use memcpy from device to device as a means to copy data from one function to another within the kernel?
Or is device to device used for multi GPU situations?

ShingleServ · June 16, 2011, 1:36pm

is it possible to use memcpy from device to device as a means to copy data from one function to another within the kernel?
Or is device to device used for multi GPU situations?

Topic		Replies	Views
Calling CUDA functions from cpp files - scope of variables CUDA Programming and Performance	7	1430	July 18, 2014
how to run two __global__ funtions simultaneously in two CUDA devices CUDA Programming and Performance	3	1988	March 28, 2012
How to pass variables to different kernal functions via global variables? CUDA Programming and Performance	8	3242	June 9, 2010
A few questions CUDA Beginner CUDA Programming and Performance	8	1094	June 9, 2011
Global variables Across Threads CUDA Programming and Performance	4	2996	February 4, 2010
Vars in global memory vs. Parameters in kernels CUDA Programming and Performance	3	6697	February 25, 2008
Passing Pointers between .cpp and .cu files CUDA Programming and Performance	1	3146	April 21, 2011
memcpy errors memcpy seems to fail with global data CUDA Programming and Performance	2	4574	April 18, 2008
can i do my memory transfer in a separate function CUDA Programming and Performance	2	1114	April 8, 2010
Global device pointer access using cudaMemcpyToSymbol and cudaMemcpyFromSymbol CUDA Programming and Performance	4	3622	November 21, 2013

Suggestions / help Hit a rut

Related topics