Hi everybody here.
I have a program, it runs correctly in geforce 8800GT, geforce 8800GTS.
My kernel runs with 400microsecond.
But when i used Tesla C870, my program can not run.
It stand by for long time enough for timeout occurs.
I don’t known why. does some things different with tesla C870.
Before, I had some programs and worked Ok in Tesla C870.
I checked, the SDK program runs perfectly.
Thank.
I solve my problem
In my kernel function I have used 3 syncthreads(), and it works perfectly in geforce 8800GT/GTS. but can not work in Tesla C870.
So I split my kernel function to 3 smaller function, each small kernel function uses one syncthreads().
And now it work perfectly in both geforce 8800GT/GTS and Tesla.
I really don’t know why.
HI
Does anyone have ever used 2 syncthreads() functions inside a kernel in Tesla C870?
I can not, I don’t know why.
I checked my program carefully, nothings wrong at all.
My program works well on geforce 8800GT and 8800GTS.
I have a mistake, :D
using 2 syncthreads() inside a kernel in Tesla C870 is not a problem which i thought before.
simple example.
[i]define ELEMENTS 5
AddArray<<<1, 8>>>(array1, array2);
global void AddArray(int array1, int array2)
{
__shared int sArray[ELEMENTS];
if (threadIdx.x < elements) {
sArray[threadIdx.x] = array1[threadIdx.x]
<b>__syncthreads();</b>
//do some things-----------
}
}[/i]
this program can not run on Tesla C870, it can runs on geforce8800GT/GTS
pay attention at “if ()” clause.
the __syncthreads(); function has used inside if() clause.
the total threads/block is 8. using threads/block is 5, so some threads will not be used. these threads will do nothing.
with __syncthreads(), program wait until all threads/block finish,
this situation never occurs on Tesla? (my guess), so program wait a long time enough for timeout occurs.
Now I wonder that what is the difference between hardware of Tesla and geforce?
I will very happy if someone know and explain for me.
thank in advance
I don’t know why nobody else take care about this problem.
I had created a test program for this experiment.
and all my report, have been written in the report.txt
Thank you.
:)
TeslaTest.zip (15.7 KB)
I think the problem may be that you are putting __syncthreads() inside a conditional. Try moving it out of the if () block.
Thank you for sharing, we can all learn from your post.
Yes, you are corrected.
I have done and it works ok.
My question is why the same program works on Geforce 8800GT/GTS but can not works on Tesla C870.
Thank you. :)
Hi,
For me I cannot even use a single __syncthreads(), that too outside of any conditional. C870 is very weird!
-Oj
Oh,
My program works if I use __syncthreads() out of condition,
This problem took me alot time.
In my computer, when using __syncthreads() inside of a condition, my computer get risk, and it can not runs even the SDK sample programs of NVIDIA.
so I must restart my computer.
What happen occured if you used __syncthreads(), did your computer getting hang or same my problem?
:)
No, only my program waits for quite long and I can kill it. My kernel is like this:
void kernel(args)
{
if(condition) return; // return some threads that don't have any work to do.
//some code
__syncthreads();//here my program keeps waiting
output = something;
}
-Oj
Can you change code like this.
void kernel(args)
{
if (useThreads < needThreads) {
//code for active threads
}
//some code
__syncthreads();//here my program keeps waiting
output = something;
}
I did it, and my computer runned correctly.
:)