Hi all, I am working on a project about GPU programming, however I had runtime error on my program.
I wish to had a kernel that runs multiple threads. and one temp array which share between these threads.
By the values never returns what i needed.
Could anyone help to solve it?
shared float* temp; global void Moller(float* tr, float* triangle, float* ray, float* view, int imageW, int imageH, int nbTri)
{
const int ix = blockDim.x * blockIdx.x + threadIdx.x;
const int iy = blockDim.y * blockIdx.y + threadIdx.y;
Move [font=“Courier New”]__syncthreads()[/font] out of the conditional. The effect of [font=“Courier New”]__syncthreads()[/font] that are not encountered by all threads of a block is undefined.
[]You have only allocated a pointer in shared memory, not an array. Declare temp as [font=“Courier New”]shared float temp[imageHimageW][/font]. If imageH and imageW aren’t compile-time constants, declare temp as [font=“Courier New”]extern shared float temp[/font] and call your kernel as [font=“Courier New”]Moller<<<…, …, imageHimageWsizeof(float)>>>(…)[/font].
[*]The way you try to use shared memory to find a minimum for all threads of a block (or even kernel?) does not work (whether with __syncthreads() or without), as the comparisons are not atomic. You probably want to exchange (ix,iy) and j, so that the minimum operation is between the results within each thread, not for each iteration between the results of all threads.
[*]You don’t need an array in shared memory at all, as at any time only one of the elements is in use. Not sure though if this is because you shortened the kernel for presentation in the forum. Anyway, for any reasonable size this array probably is not going to fit into shared memory.
[*]In the initialization and writeback loops each thread in the block does the same, resulting in needless duplication of work.
some error occurs, when changing the temp into a array it return this error .
Error 13 error : Entry function ‘_Z6MollerPfS_S_S_iii’ uses too much shared data (0x4002c bytes + 0x10 bytes system, 0x4000 max) D:\Project for Kit\rayTracing\CUDACOMPILE
change extern shared with this error
Error 6 error : local and shared variables cannot have external linkage D:\Project for Kit\rayTracing\cuda_moller.cu 71