Hello,
I am attempting to use a 3D texture cache in this example code (which I have attached, but removed all of the code that doesn’t pertain to the issue I’m having). I followed the following forum posts:
which were very helpful, but they didn’t seem to resolve my issue.
When I run my code, I get the following message every time:
— Error from cudaMemcpy3D - invalid argument —
I spent a significant amount of time looking through the solutions presented in the above forum posts, but I must be missing something small, as I continue to get the same error message. Could anyone please help me with this issue?
Any help would be greatly appreciated!
Thanks,
Matt
myCode.cu (2.77 KB)
I suppose I should be a bit more clear. I am fairly confident that the error comes from the following line:
textParams.dstArray = textIntensityData;
(which of course comes from the cudaMalloc3DArray() call)
but I’m not sure why this line is an issue. It seems to fit into the format laid out in the Programming Guide and in the CUDA Library.
Matt
I have narrowed down (I think) the issue to the following line:
textParams.srcPtr = make_cudaPitchedPtr((void *)&hostArray[0], NUM_X * sizeof(float), NUM_X, NUM_Y);
(Note: I changed arg0 to &hostArray[0] from hostArray after looking at some of the other forum posts that had similar issues. This didn’t change the error though.)
Specifically, I believe that arg1 (NUM_X * sizeof(float)) is the issue. I’m not sure why this is the issue though. According to the Memory Management Library description for this function (http://developer.download.nvidia.com/compute/cuda/3_0-Beta1/toolkit/docs/online/group__CUDART__MEMORY_g47a7d89a9b1361212ac4ac3998670e0d.html
#g47a7d89a9b1361212ac4ac3998670e0d), it says that arg1 needs to be “Pitch of allocated memory in bytes.”
My understanding of pitch is that it is sort of like a fake width for the device, which is why I thought NUM_X should be the value multiplied by sizeof(float). Is this an incorrect assumption?
It is also possible that there’s something else wrong that I’m just overlooking. Has anyone found anything else that I might be doing wrong?
Thanks,
Matt
So I’ve done some more digging. I used these 2 sources:
The information in these 2 sources led me to change this line:
[codebox]textExtent = make_cudaExtent(NUM_X * sizeof(float), NUM_Y, NUM_Z);[/codebox]
to this:
[codebox]textExtent = make_cudaExtent(NUM_X, NUM_Y, NUM_Z);[/codebox]
I realize that there are numerous forum posts that claim you need to do it the other way, but that line was the only line I changed and I was able to get past the error I encountered before. However, I encounter a new error (which may mean that you do need to do it the other way?) when I return from my kernel and try to run ThreadSynchronize():
— Error from ThreadSynchronize - unspecified launch failure —
I know there are a few forum topics about this, so I’ll try to give those a look. In the meantime, does anyone have an idea if this is caused by the kernel or if it’s caused by the commands I presented here?
Thanks,
Matt
After looking at the other forums talking about the error I’m now encountering, I decided to look at my array accesses to make sure I wasn’t accessing anything out of bounds. It turns out that I was, but when I fixed that issue, I get this error now:
— Error from Kernel - no error —
(I check the error status when I return from the kernel with the following command (immediately after the kernel call):
[codebox]
if ( cudaGetLastError() != cudaSuccess )
{
fprintf(stderr, "\n--- Error from Kernel - %s ---\n", cudaGetErrorString(cudaGetLastError()));
exit(1);
}[/codebox]
I was able to isolate this issue down to a single line, where I’m reading my 3D texture:
float val = tex3D(textRef, 0, 0, 0)
(I removed the actual values I was using in the lookup because I wanted to see if they were causing the problem. However, it still happens when I use all 0’s, so I guess they weren’t the problem).
Any ideas on what could be causing this now? Also, how could an error be returned with the error message reading “no error”?!
Thanks,
Matt
EDIT: It only returns “No error” in emulation mode. In normal mode, it still returns the “ThreadSynchronize” error. I am still curious as to what would cause it to return “no error” though. I’ve searched pretty extensively through the forums and never found an issue with that.
I realized that I should probably give my system logistics. I’m using CUDA 2.2, CUDA Programming Guide 2.2.1, GPU = Quadro NVS 295 (Compute Capability 1.1)
I was able to solve this issue by removing the header file I had for my kernel code, and instead integrating all of my kernel code into my main code file. I feel like I shouldn’t have to do this though…does anyone know of why nvcc wouldn’t have been picking up on my header file (which is where my texture cache reference was declared)?
Thanks,
Matt