max thread per block and memory device question

guan · January 8, 2009, 8:45pm

Hi everybody,

I have two question in order to improve my code.

First question:

I'm doing my first step in CUDA and I want to make a kernell call using all the thread that I can with my graphical card (9600GT) for testing.
For getting the number of the Thread I use in the Host code the runtime API:


  cudaDeviceProp Dispositivo;  // It is define as global variable
   ....

cudaSetDevice(i);

// get the device properties
CUDA_SAFE_CALL(cudaGetDeviceProperties(&Dispositivo, i));

   So, in the field Dispositivo.maxThreadsPerBlock I had the number of Thread that I can use per block.

   Using the DeviceInfor form CUDA SDK I know that this number of threads are  512.

   If I make a call in this way the code doesn't work.

   CUDASearch<<<1,Dispositivo.maxThreadsPerBlock,0>>>(StrCuda,lencad,CudaD,LastCaracter);

   If I fix the threads in the block to 128 all work fine
 
  What can I make wrong ?

Second Question:

  When I ask Memory in the device using cudaMalloc where is stored in global or share memory ?

Thanks in advance,

GUAN

Jey · January 9, 2009, 12:11am

Hi everybody,

I have two question in order to improve my code.

First question:

I’m doing my first step in CUDA and I want to make a kernell call using all the thread that I can with my graphical card (9600GT) for testing.
For getting the number of the Thread I use in the Host code the runtime API:
cudaDeviceProp Dispositivo; // It is define as global variable
   ....

cudaSetDevice(i);

// get the device properties

CUDA_SAFE_CALL(cudaGetDeviceProperties(&Dispositivo, i));
So, in the field Dispositivo.maxThreadsPerBlock I had the number of Thread that I can use per block.

Using the DeviceInfor form CUDA SDK I know that this number of threads are 512.

If I make a call in this way the code doesn’t work.

CUDASearch<<<1,Dispositivo.maxThreadsPerBlock,0>>>(StrCuda,lencad,CudaD,LastCaracter);

If I fix the threads in the block to 128 all work fine

What can I make wrong ?

Second Question:

When I ask Memory in the device using cudaMalloc where is stored in global or share memory ?

Thanks in advance,

GUAN

First, maxThreadsPerBlock shows just maximum size of each dimension.

CUDA can run only 512 threads concurrently at a time.

Second, it is on the global memory.

more detailed answers are written in CUDA Programming Guide. :rolleyes:

Regards,

Jey.

guan · January 9, 2009, 6:59am

Yes I know it and I know too that it is not a good way to use only 1 block in the grid but I’m only making several tests to lern how it is working this.

CUDA Programming Guide tells that CUDA can run only 512 Thread per block, and my devices supports the same number per thread, so

I don’t understand why a call as follow does not work properly, maybe I’m forgetting to take acound some condition

KernellCall<<<1,512>>>(argument);

Topic		Replies	Views
I wonder maximum number of threads per block really limits the number of threads in each block. CUDA Programming and Performance	5	3978	February 9, 2024
Relation between # of blocks and devicememory size questions about blocks and memory CUDA Programming and Performance	3	1778	July 23, 2008
Max threads/block CUDA Programming and Performance	10	22209	March 7, 2011
finding the best number of threads per block CUDA Programming and Performance	3	7846	January 29, 2010
Run 2 Multiprocessors from one global function CUDA Programming and Performance	3	542	January 18, 2018
Maximum number of blocks Legacy PGI Compilers	5	2387	April 7, 2020
Maximum number of threads on thread block CUDA Programming and Performance	12	73800	September 21, 2023
Maximum stack size? CUDA Programming and Performance	7	879	March 24, 2024
maximum thread numbers CUDA Programming and Performance	5	12058	October 4, 2011
Launching Kernel Fail CUDA Programming and Performance	15	3403	May 28, 2014

max thread per block and memory device question

Related topics