Hi,
Can some one help me understand the concept of Shared memory in CUDA programming. I need a very simple code say for addition using shared memory.
Thanks for your time
Hi,
Can some one help me understand the concept of Shared memory in CUDA programming. I need a very simple code say for addition using shared memory.
Thanks for your time
The transpose example in the SDK (and the accompanying whitepaper) contain just about everything you need to know.