Hi all,

I’ve modified a large CUDA program to use textures for a particular array that is a constant, but too large to fit in the **constant** space.

This array is read sort of randomly (impossible to coalesce, besides by luck) - so the texture cache will help.

Anyways, I’ve implemented the following into my program, and now the kernel can only run once per program execution without a hiccup. If I have the kernel inside a CPU loop (which is the normal scenario), it will fail with the message: “Cuda error: kernel invocation: the launch timed out and was terminated.”

Anyone have any ideas why that is? I’ve posted relevant parts of the code below.

Texture declaration (global in context)

[codebox]texture<float,1,cudaReadModeElementType> big_Emesh_t;

[/codebox]

Assigning array to texture big_Emesh_t[codebox] cudaMalloc((void**)&big_Emesh_d,big_Emesh_size*sizeof(float));

```
cudaMemcpy(big_Emesh_d,big_Emesh,big_Emesh_size*sizeof(float
```

),cudaMemcpyHostToDevice);

```
cudaBindTexture(0,big_Emesh_t,big_Emesh_d);
checkCUDAError("texture");[/codebox]
```

Using the tex1Dfetch in a binary Srch algorithm:

[codebox]**device** unsigned int textureSearch(unsigned int first, unsigned int last, float key, unsigned int loc)

{

```
unsigned int return_val=0;
while ((first <= last)&&(return_val==0))
{
unsigned int mid = (first + last) / 2; // compute mid point.
if (key > tex1Dfetch(big_Emesh_t,mid))
first = mid + 1; // repeat search in top half.
else if (key < tex1Dfetch(big_Emesh_t,mid))
last = mid - 1; // repeat search in bottom half.
else
return_val= mid; // found it. return position /////
}
if (return_val==0)
return_val= last+1; // failed to find key
return_val-=big_Emesh_offsets_d[loc];
return return_val;
```

}[/codebox]

nevermind big_Emesh_offsets, it just corrects the result to point to the correct location in a seperate array.

*EDIT*: I should note that the result for that one loop is exactly what it should be and matches my ‘pre-texture’ code.

Thanks all!

Adam