Problem with Tesla C870 Program can not run

Hi everybody here.
I have a program, it runs correctly in geforce 8800GT, geforce 8800GTS.
My kernel runs with 400microsecond.
But when i used Tesla C870, my program can not run.
It stand by for long time enough for timeout occurs.
I don’t known why. does some things different with tesla C870.
Before, I had some programs and worked Ok in Tesla C870.
I checked, the SDK program runs perfectly.

I solve my problem
In my kernel function I have used 3 syncthreads(), and it works perfectly in geforce 8800GT/GTS. but can not work in Tesla C870.
So I split my kernel function to 3 smaller function, each small kernel function uses one syncthreads().
And now it work perfectly in both geforce 8800GT/GTS and Tesla.
I really don’t know why.

Does anyone have ever used 2 syncthreads() functions inside a kernel in Tesla C870?
I can not, I don’t know why.
I checked my program carefully, nothings wrong at all.
My program works well on geforce 8800GT and 8800GTS.

I have a mistake, :D

using 2 syncthreads() inside a kernel in Tesla C870 is not a problem which i thought before.

simple example.

[i]#define ELEMENTS 5

AddArray<<<1, 8>>>(array1, array2);

global void AddArray(int array1, int array2)


__shared int sArray[ELEMENTS];

if (threadIdx.x < elements) {

sArray[threadIdx.x] = array1[threadIdx.x]


//do some things-----------



this program can not run on Tesla C870, it can runs on geforce8800GT/GTS

pay attention at “if ()” clause.

the __syncthreads(); function has used inside if() clause.

the total threads/block is 8. using threads/block is 5, so some threads will not be used. these threads will do nothing.

with __syncthreads(), program wait until all threads/block finish,

this situation never occurs on Tesla? (my guess), so program wait a long time enough for timeout occurs.

Now I wonder that what is the difference between hardware of Tesla and geforce?

I will very happy if someone know and explain for me.

thank in advance

I don’t know why nobody else take care about this problem.
I had created a test program for this experiment.
and all my report, have been written in the report.txt
Thank you.
:) (15.7 KB)

I think the problem may be that you are putting __syncthreads() inside a conditional. Try moving it out of the if () block.

Thank you for sharing, we can all learn from your post.

Yes, you are corrected.

I have done and it works ok.

My question is why the same program works on Geforce 8800GT/GTS but can not works on Tesla C870.

Thank you. :)


For me I cannot even use a single __syncthreads(), that too outside of any conditional. C870 is very weird!



My program works if I use __syncthreads() out of condition,

This problem took me alot time.

In my computer, when using __syncthreads() inside of a condition, my computer get risk, and it can not runs even the SDK sample programs of NVIDIA.

so I must restart my computer.

What happen occured if you used __syncthreads(), did your computer getting hang or same my problem?


No, only my program waits for quite long and I can kill it. My kernel is like this:

void kernel(args)


  if(condition) return; // return some threads that don't have any work to do.

  //some code

  __syncthreads();//here my program keeps waiting 

  output = something;



Can you change code like this.

void kernel(args)


   if (useThreads < needThreads) {

	 //code for active threads


  //some code

  __syncthreads();//here my program keeps waiting 

  output = something;


I did it, and my computer runned correctly.