Ok, this is probably in the documentation somewhere, but I either am not understanding what I’m reading, or I just haven’t found it yet:
Is it possible to get a chunk of memory that is shared inside the thread only, and that I won’t have to pass around from device function to function as an argument? i.e., I have my kernel, which calls other device functions, and I want to have the equivalent of global variables accessible to them, but I want a different instantiation for each thread. My impression is that global, constant and shared declarations get executed once per kernel call, and I want my memory chunked out once per thread. The way I’m fearing I’ll have to do this is by just doing a cudaMalloc from the host with enough space for all my threads, then pass the pointer to the kernel, and force each thread to figure out which chunk of the memory belongs to it. That seems like a lot of hassle.
Haha, thanks. I’m still confused though. What I want to do is use static variables, but for some reason I got the impression from the programming guide that plain old static declarations don’t work. However, last night I whipped up a program, used a static, altered it both from a device function and the kernel, and it ran in emulation mode. However, I’m not sure if this is doing what I want it to (creating an instance of the static variable for each thread, and keeping it private to that thread). What do you guys think?
Ok, it looks like static variables work fine so long as I am in emulation mode, but they don’t seem to work once I turn device emulation off. Does this mean I have to pass them around in a struct or something?
I take that back. In my dummy program, statics work just fine. In my real program though, I’m getting errors like:
/Users/TraxusIV/Documents/Programming/Projects/cuLsoda/cuLsoda.cu(2455): error: identifier “jstart” is undefined
for every one of my static variables. they are defined right up at the top of the file, before they are ever called, so I don’t get what’s going on here. Anyone have any ideas?
GAH!!! Ok, I just tried to put all my common variables in a struct, and pass that around by using a struct pointer, and doncha know, nvcc hates it. I have seriously like 50 or more variables (some of which are arrays) which need to be shared between the different functions that each thread calls. How do I get these guys ported around without having to do a massive rewrite?