Please help if you can because I’ve checked again and again and can’t see where I’m going wrong. Please check I am allocating the arrays properly and accessing them correctly. The spheres render, randomly as they should but there seem to be two (consistently) massive spheres in the center of the screen, even though the radius’ are all set at a constant 0.5f.
Anyone? It seems if I comment out the texture that stores the spheres materials then the spheres get rendered correctly, with all the same radius values, as they should be.
Yes, but for some reason when I set a breakpoint, i always = 0 and I can’t see what the values of the radius’ for the other spheres are. I doubt this would help though, because I know all the radius aren’t what they should be. I’ve checked at the point when the radius’ are set and copied to the texture memory and this seems ok, which leads me to believe it’s something to do with tex1Dfetch.
That’s because every thread will execute in an “emulated parallel,” so they will alternate execution. Just set the condition threadIdx.x == 0 in the breakpoint condition; that should clear it up.
Ok, I’ve debugged it now and it seems that both my sphereData and spheresMaterial textures seem to be interleaved. So if I do tex2Dfetch([name of texture], 0) the output is fine. If I do tex2Dfetch([name of texture], 1), then the first two elements of the float4 sphereData are actually the spheresMaterials values and the second two are correct. If I do tex2Dfetch([name of texture], 2), then the last two elements of the float4 sphereData are the spheresMaterials values and the first two are correct. when fetching from address 3, the values are correct again. What’s going on?! I’ve tried adding an offset to the spheresMateriall texture, but not sure what this does and didn’t seem to fix it. Should I be using the same address value to be fetching from both textures? Please check out the code above again if in doubt.
It seems now, when I step through the code and debug, that the values of the four floats are separated by two zeros each time. Here is a picture describing texture memory, where X is useful spehereData and - is crappy zeros:
XXXX–XXXX–XXXX. Obviously using tex1Dfetch, picks up these zeros and messes up the positions of my sphere. So my question is how to get rid of those zeros and why are they appearing?
Any NVIDIA employees? I’m beginning to think it’s a bug as there aren’t many examples around on the net (I’ve checked extensively) and I’ve no idea what the offset value is when doing cudaBindTexture and the text in the cuda guide is a bit vague on how to reference textures properly etc. Someone please help :(
The amount of code you posted is insufficient to make any accurate statements as to the cause of the problem. Nvidia employees will deal with a problem only if it reveals a bug in the CUDA libraries, but that most likely isn’t the case here.
Without the definition of Sphere, and a kernel, there’s not much I can say about your code.
And the declaration of hNumSpheres would help a bit in trying to understand what you’re trying to do.
Ok, here is all the code you ask for… please work your magic, guys!
[codebox]
int* hNumSpheres;
void setupRaytracer(int pbo_in, int pbo_out, int width, int height, float cameraEye[3], float rotate[3], int rcMoveX, int rcMoveY, int prev_rcMoveX, int prev_rcMoveY, int move)
Ok, I did the code like you said but am still getting pairs of zeros in my output from tex1Dfetch, so it is still messing up the placement of the sphere as sometimes the x and y, or z and radius co-ordinates equal zero. Was still getting interleaved output due to both sphere data members, so as you (or someoen else suggested) I separated the data in memory before copying to device. Next problem was that because my spheresMaterials was float2, this was being separated by pairs of zeros, so I had to do this:
Your code snippets still really don’t give us anything to help you debug. Where is the kernel that reads the data? Maybe the problem is there. Where are the texture definitions? Maybe you are declaring a float texture and trying to read it as a float2?
What we really need a completeand SIMPLE example that we can compile and run ourselves. Only then can we do anything except guess at the problem.
What you really need to do is to keep removing stuff from your code until you have the bare minimum simple piece of code that demonstrates the problem. Then the problem will become obvious and you will solve it and won’t actually need our help :) This is standard problem-solving methodology: find the root cause, then fix it. Since you seem unwilling to do so, I will be very nice and give you a simple fully working example that demonstrates the proper use of textures.
First and foremost, there is no problem with textures in CUDA. You can and should be able to do what you want to do, there is just one or more bugs in your code.
Here is the example: it can be compiled with a simple nvcc -o test test.cu so you can test it yourself. It runs without any gaps of 0’s in the output.
#include "stdio.h"
texture<float4, 1, cudaReadModeElementType> tex;
__global__ void test_read(float4 *d_out)
{
unsigned int idx = blockDim.x * blockIdx.x + threadIdx.x;
d_out[idx] = tex1Dfetch(tex, idx);
}
int main()
{
int block_size = 32;
int n_blocks = 4;
int N = block_size * n_blocks;
float4 *d_in, *d_out, *h_in, *h_out;
h_in = (float4*)malloc(sizeof(float4)*N);
h_out = (float4*)malloc(sizeof(float4)*N);
cudaMalloc((void**)&d_in, sizeof(float4)*N);
cudaMalloc((void**)&d_out, sizeof(float4)*N);
// fill out data
for (unsigned int i = 0; i < N; i++)
{
h_in[i].x = (float)(i*4);
h_in[i].y = (float)(i*4+1);
h_in[i].z = (float)(i*4+2);
h_in[i].w = (float)(i*4+3);
}
cudaMemcpy(d_in, h_in, sizeof(float4)*N, cudaMemcpyHostToDevice);
cudaBindTexture(0, tex, d_in);
test_read<<<n_blocks, block_size>>>(d_out);
cudaMemcpy(h_out, d_out, sizeof(float4)*N, cudaMemcpyDeviceToHost);
// print out data
for (unsigned int i = 0; i < N; i++)
{
printf("%f %f %f %f\n", h_out[i].x, h_out[i].y, h_out[i].z, h_out[i].w);
}
free(h_in);
free(h_out);
cudaFree(d_in);
cudaFree(d_out);
return 1;
}
Thanks for your help guys. Sorry about not posting more code but my code is compex and spead out over about 16 files and would take all year for me to paste it here and for you to try and run.
Mister Anderson - I have fixed the problem with being able to read from one float4 texture. But reading from a second float2 texture seems to be my problem at the moment. Any chance you could adapt your example to read from a float4 texture and and float2 texture and see what you get? Like mine:
I’m trying to implement a grid spatial subdivision structure with variable amounts of cells and variable amounts of objects in those cells. The plan is for this structure to then be copied into texture memory boo hoo :(. More debugging I think. I would have used global memory if it was easier but I don’t want to pass my structure to the 100 odd functions that use them