Need your help with these doubts

I have some doubts please help me

  1. Is it a good practice to initialize a variable before using it, say 0. Assume any general program, instead of writing

int *a_d;

i write int *a_d=0; does it help compiler in any way.

  1. What is the difference between __syncthreads() and __threadfence()

  2. Why do we call functions executed on device as kernel. Kernel means core part of an OS. why do we use that word here!!?

  3. my VS shows error if i pass matrix parameter other than const float* datatype in Function cublasSgemm(). what if i want to use int datatype. It wont even accept float datatype. Isnt there any way apart typecasting.

  4. How to use timing function in cublas, i tried #include<time.h> functions but its showing wrong time, I used it on matrix multiplication using cublas visual studio showed less than 8ms but fedora showed around 40ms. why is that!

Thanks for your time, I am sorry if my questions are silly

Let me answer the first two…

  1. Is it a good practice to initialize a variable before using it, say 0.

I always see this as a matter of convenience :), instead of starting with some garbage value for a pointer!

  1. What is the difference between __syncthreads() and __threadfence()

__syncthreads(): All threads in the block will wait until all threads have reached this point of execution.

__threadfence(): This is one of the beautiful ways to make sure that some changes made by a particular thread is visible to all other threads before this thread can proceed with its execution. eg:

__shared__ bool thirdDone;

if(threadIdx.x == 0) {

	thirdDone = false;



if(threadIdx.x == 3) {

	data[blockIdx.x] = 100;

	__threadfence();  // makes this thread to wait till the above statement is committed to memory

	thirdDone = true;


__syncthreads();   // ensures that all threads read the correct value of 'thirdDone'.

if(thirdDone) {

	// do some operation....


oh… and for your 3rd question, here’s my speculation…
Like the OS-kernel is a bridge between the apps and h/w, cuda-kernel is the first point of contact between the code executed on CPU and the one on GPU. (global functions are only call-able from CPU). Hence, they might have kept this name. But I’m sure it shouldn’t be any confusion if you call these functions as ‘cuda-kernels’ :)

I think the term “kernel” actually comes from mathematics. The function you write is applied to many data elements in parallel, (very) vaguely reminiscent of how you might apply the kernel in an integral transform (like convolution). The Brook language uses the term “kernel” in the same way as CUDA, so I suspect it derives from some older parallel language or theory.

I think i got answers for 4th and 5th, please correct me if i am wrong

  1. When we write functions this could be used in someone else’s library. If we dont want to risk of changing data by other user we declare the parameter as const.

while passing parameter may be fedora automatically does typecasting from float to const float. May be this feature is not available in visual studio

I read this section:
Where it Gets Messy - in Parameter Passing


  1. I had used #include<time.h> header and clock() function to calculate time. I used to get different time in different OS because of number of ticks in a second

clock function is used to count number of ticks in a second.

Now how many ticks are there in a second? that depends, there is a standard created by IBM which is 18.2, if the system is not following this standard than on calculating (finish-start)/18.2 it may give different answer

Also ticks depends on CPU clock speed. So i defined a macro in the program and set the number of ticks to 1000.0 i mean 1000ms in a second.

#define CLK_PER_SEC 1000.0

printf("\nTime taken : %f seconds",(fin_h-start_h)/CLK_PER_SEC);

If u need further details u can look at time.h header file and search CLK_TCK. or see this part