global promble

Ultraman · October 23, 2008, 9:31am

this code:
__global void test (float *gpu)
{
for(int i=0;i<1000;i++)
{
gpu[i]=i; //!!! this cost a lot of time
}
main.cu

float * gpu;
gpu=cudaMalloc((void **)&gpu,4*1000);
dim grid grids(1)
dim thead threads(100)

test<<<girds,threads>>>gpu;

this code cost a lot of time more than 100ms

what’s wrong ?
Is write global memory slower than CPU memory

I use this code fo Genetic Algorithm,so code must be wirtten like this.

Ailleur · October 23, 2008, 12:58pm

This kernel makes no sense.

Youre having every thread write every memory location with the same value.
Effectively writing BLOCKDIM*GRIDDIM times the value 0 at gpu[0].

Read up on cuda some more, you have not grasped how it has to be used.

Topic		Replies	Views
cuda accessing global memory slow CUDA Programming and Performance	1	761	May 24, 2016
global memory writing problem CUDA Programming and Performance	0	874	September 24, 2009
Problems when writing local memory ->global memory CUDA Programming and Performance	0	3301	December 5, 2007
strage low of writing global mem CUDA Programming and Performance	5	2223	February 22, 2012
Global memory write cost CUDA Programming and Performance	4	7997	March 11, 2011
Speed-Loss by Writing to Global Mem CUDA Programming and Performance	3	2098	March 31, 2008
How to write efficient from local to glocal memory Writing - time problems CUDA Programming and Performance	3	5593	December 5, 2007
Wrong results with CUDA threads writing on private locations in global memory CUDA Programming and Performance	1	834	December 9, 2013
Writing to global memory failing at runtime CUDA Programming and Performance	4	3877	November 15, 2009
Global memory coalescing Poor write to global memory CUDA Programming and Performance	1	2425	April 20, 2010