What's the problem in my code?

keepdash · August 24, 2016, 3:02am

I wrote a simple CUDA code to describe the problem. The followed code run and give the correct result (65535 65534…), but if I uncomment the line 26 (sdata[y] = 5.0f;), the results become wrong (0 1 2…).
What’s wrong of this syntax setting value to shared memory?
Thank you.

#include <cuda_runtime.h>
#include <cstdio>
#include <cstdlib>

struct SharedMemory{
	__device__ inline operator float *()
	{
		extern __shared__ int __smem[];
		return (float *)__smem;
	}

	__device__ inline operator const float *() const
	{
		extern __shared__ int __smem[];
		return (float *)__smem;
	}
};

__global__ void kTestFunc(int* arr){
	unsigned int x = threadIdx.x;
	unsigned int y = blockIdx.x;
	unsigned int w = blockDim.x;
	unsigned int offset = w*y + x;

	float* sdata = SharedMemory();
	//sdata[y] = 5.0f;

	arr[offset] = 65535 - arr[offset];
}

int main(void){
	int* hData = new int[65536];
	int* dData;
	cudaMalloc((void**)&dData, sizeof(int) * 65536);
	for (int i = 0; i < 65536; i++)
		hData[i] = i;

	cudaMemcpy(dData, hData, 65536 * sizeof(int), cudaMemcpyHostToDevice);
	kTestFunc << <256, 256 >> >(dData);
	cudaMemcpy(hData, dData, 65536 * sizeof(int), cudaMemcpyDeviceToHost);

	for (int i = 0; i < 10; i++)
		printf("%d ", hData[i]);

	cudaFree(dData);
	delete[] hData;

	return 1;
}

Robert_Crovella · August 24, 2016, 3:43am

When using dynamically allocated shared memory:

extern __shared__ int __smem[];

You need to allocate the necessary size for it using the 3rd kernel launch configuration parameter:

kTestFunc << <256, 256, ???????? >> >(dData);

This is covered in the programming guide, and there are a great many sample codes that demonstrate proper usage of dynamically allocated shared memory.

If you used proper cuda error checking and ran your code with cuda-memcheck, you would have discovered that errors are being thrown. Doing these basic debug steps before asking for help on a public forum is good practice. If you don’t know what proper cuda error checking is, please google “proper cuda error checking” and take the first hit, and read it, and apply it to your code. If you don’t know what cuda-memcheck is, please google it or refer to the cuda-memcheck documentation available at docs.nvidia.com

Even if you don’t understand the error output using these basic debug steps (cuda-memcheck and error checking), the error output will be useful for others trying to help you.

keepdash · August 24, 2016, 3:58am

When using dynamically allocated shared memory:
extern __shared__ int __smem[];
You need to allocate the necessary size for it using the 3rd kernel launch configuration parameter:
kTestFunc << <256, 256, ???????? >> >(dData);
This is covered in the programming guide, and there are a great many sample codes that demonstrate proper usage of dynamically allocated shared memory.

If you used proper cuda error checking and ran your code with cuda-memcheck, you would have discovered that errors are being thrown. Doing these basic debug steps before asking for help on a public forum is good practice. If you don’t know what proper cuda error checking is, please google “proper cuda error checking” and take the first hit, and read it, and apply it to your code. If you don’t know what cuda-memcheck is, please google it or refer to the cuda-memcheck documentation available at docs.nvidia.com

Even if you don’t understand the error output using these basic debug steps (cuda-memcheck and error checking), the error output will be useful for others trying to help you.

Oh, yes, I forgot set the size for shared memory… Thank you! Also thanks for the error checking experience.

Topic		Replies	Views
extern __shared__ does not allocate memory CUDA Programming and Performance	1	7525	December 1, 2009
How to allocate shared memory for an array CUDA Programming and Performance cuda	4	993	February 11, 2022
New dynamic shared memory allocation in CUDA 5? CUDA Programming and Performance	5	4391	November 8, 2012
Shared Memory extern vs "normal" Not the same behavior between dynamic shared memory and sta CUDA Programming and Performance	7	1364	November 27, 2010
Error: __shared__ variables cannot have external linkage CUDA Programming and Performance	1	2194	April 26, 2016
Shared Memory initialization CUDA Programming and Performance	19	45555	March 26, 2007
Problem with Shared Memory What allocate dynamic shared memory??? CUDA Programming and Performance	1	9354	November 5, 2009
How to allocate shared memory? CUDA Programming and Performance	2	2226	April 7, 2011
Problem with dynamically allocated shared memory CUDA Programming and Performance	3	2766	July 11, 2008
strange error about shared memory CUDA Programming and Performance	4	2378	November 30, 2007

What's the problem in my code?

Related topics