syoon
1
so i gained a little bit of shared memory usage for CUDA through lots of helps from many experts here…
then i am now interested in ‘prefetching’ and registers…
in the following very simple description of my code
i want to prefetch COEFs0 to COEFs1.
{
extern shared REAL COEFs0; /* size 1000 */
float COEFs1[100];
for ( islice = 0, islice <10, islice += 10) {
}
can i use threadIdx.x for indexing COEFs1 like COEFs1[threadIdx.x] ?
all the examples i have found use fixed integer for the registers like COEFs1[0], COEFs1[2] …
any comments are welcome and appreciated… thanks in advance.
syoon
2
so i gained a little bit of shared memory usage for CUDA through lots of helps from many experts here…
then i am now interested in ‘prefetching’ and registers…
in the following very simple description of my code
i want to prefetch COEFs0 to COEFs1.
{
extern shared REAL COEFs0; /* size 1000 */
float COEFs1[100];
for ( islice = 0, islice <10, islice += 10) {
}
can i use threadIdx.x for indexing COEFs1 like COEFs1[threadIdx.x] ?
all the examples i have found use fixed integer for the registers like COEFs1[0], COEFs1[2] …
any comments are welcome and appreciated… thanks in advance.