Hello all,
I finished reading Cuda By Example recently and there are a couple of things that I need clarification on.
- One of the examples allocates Shared memory for the block inside of the kernel:
global void histo_kernel( unsigned char *buffer,long size,unsigned int *histo )
{
shared unsigned int temp[256];
.
some other code
.
}
Why do we define the shared memory there inside of the kernel. Isn’t the shared memory being allocated multiple times for each of the threads in a particular block?
- The Heat 2d example from chapter 7 is taking a really long time for each render. Somewhere between 350ms - 500ms depending on the arc I’m using (1.0, 1.3, 2.0). Is this time normal for this example.
Okay, this next one isn’t a Cuda By Example question, but I’m trying to use curand to generate some random numbers on the device. However the numbers are not completely random. I have a lot of copies of numbers. For example it would look something like this. {5642, 12314, 8469, 12314,8964302, 5642 96341}
I’m using the same major parts of the code that is outlined in the CURAND_Library.pdf
Thanks for your time