Bad performance using VS 2010 + CUDA 4.0

israel2 · July 18, 2011, 11:11am

Hi,

I am trying to convert our code from Visual 2008 + CUDA 3.0 to Visual 2010 + CUDA 4.0.
Unfortunately, after building the code in the new environment it runs very slow.
I reduced our code to the following sample below.
When built in the old environment (VS 2008 + CUDA 3.0) it completes after 125 milliseconds.
When build with VS 2010 + CUDA 4.0 it completes after about 2500 milliseconds (20 times slower!) and most of the times the display stops responding within that time.

Could someone help me and tell me what did I do wrong?

Thanks, External Image

Israel

#include <stdio.h>

struct DataStruct
{
float m[6];
float3 data1;
float3 data2;
};

#define MAX_X_Y 700
#define NUM_DATA_OBJECTS 10000

global void Kernel ( DataStruct* data
, int* indices
)
{
for(int i = 0; i < NUM_DATA_OBJECTS; ++i)
{
int tiptr_i = indices[i];
if ( tiptr_i == 0 )
DataStruct tri = data[tiptr_i];
}
}

inline int iDivUp(int a, int b)
{
return (a + b - 1)/b;
}

#define NUM_THREADS_X (8)
#define NUM_THREADS_Y (8)
void RunKernel(void)
{
DataStruct* m_data = NULL;
int* m_indices = NULL;

cudaMalloc( (void**)&m_data, NUM_DATA_OBJECTS*sizeof(DataStruct));
cudaMalloc( (void**)&m_indices, NUM_DATA_OBJECTS*sizeof(int));

cudaMemset( (void*)m_data, 0 , NUM_DATA_OBJECTS*sizeof(DataStruct));
cudaMemset( (void*)m_indices, 0 , NUM_DATA_OBJECTS*sizeof(int));
	
dim3 dimBlock(NUM_THREADS_X, NUM_THREADS_Y);
dim3 dimGrid(iDivUp(MAX_X_Y, NUM_THREADS_X), iDivUp(MAX_X_Y, NUM_THREADS_Y) ); 

cudaGetLastError(); // reset the error code
Kernel<<<dimGrid, dimBlock>>>(m_data, m_indices);

cudaError err = cudaThreadSynchronize();
if ( err != 0 )
{
	const char *errstr = cudaGetErrorString(err);
	printf("Cuda error: (%d) %s.\n" , err, errstr);
}

}

BlueKDS · July 18, 2011, 12:41pm

Hello Israel,
I have similar problem with you.

In my case, when I use VS2008 + CUDA 4.0 + cuda_build_rule 4.0, my code runs bad.
However, if I use cuda_build_rule 3.0 in same environment, the code shows original performance.

Have you ever try to use cuda build rule 3.0?

israel2 · July 19, 2011, 7:48am

Hi,

With CUDA 3.0 my code worked OK.
Finlay I found the problem - I turned OF the option “Generate GPU Debug Information” and now the CUDA 4.0 code works as fast as the original code.

Israel

Topic		Replies	Views
CUDA build rule v4.0 VS v3.0 for MS visual studio CUDA Programming and Performance	0	816	June 29, 2011
Slow perfomance Runtime 4.1 CUDA Programming and Performance	2	711	February 9, 2012
Code compiled in CUDA 4.0 slower than CUDA 3.1 CUDA Programming and Performance	11	17036	September 21, 2011
CUDA 4.0 RC and Visual Studio 2010? Visual Studio 2010 supported? CUDA Programming and Performance	4	18402	March 29, 2011
CUDA 4.0 with VS2010 Strange issue with CUDA on VS 2010 CUDA Programming and Performance	9	3994	August 4, 2011
Visual Studio 2010 + CUDA - how to start? Help the newbie CUDA Programming and Performance	13	5945	November 1, 2010
Tesla K40c Cuda6.5 Visual studio 2010 x64 system vs Cuda 3.2 Visual Studio 2008 x64 performance CUDA Programming and Performance	10	1675	July 6, 2015
Slow CUDA kernels/programs in cuda 4.0 CUDA Programming and Performance	2	547	January 24, 2012
can't make cuda 3.0 work with visual c++ 2008 CUDA Programming and Performance	1	1981	June 6, 2010
VS just doesn't work with CUDA on my computer! CUDA Programming and Performance	11	4217	April 25, 2011

Bad performance using VS 2010 + CUDA 4.0

Related topics