cuMemcpyHtoD CUDA ERROR INVALID VALUE

I have the following code,
I can not run it on VS with nsight profiler. It runs, after a few seconds, it does give the following error:

I can directly run the code on visual studio run button, there should be a problem. Can you help me solve the issue?

Code:

include “cuda_runtime.h”
include “device_launch_parameters.h”
include <stdio.h>
define N 1024
define size N*N

global void MatrixMultiplcation(float a,floatb,float*c,int n)
{
int row = blockIdx.x * blockDim.x + threadIdx.x;
int column = blockIdx.y * blockDim.y + threadIdx.y;
float sum = 0.0f;

if (row < n && column < n)
{
	

	for (int i = 0; i < n; i++) 
	{
		sum += a[row * n + i] * b[i * n + column];
	
	}

	c[row * n + column] = sum;

}

}

int main()
{

//Performance Inspection On Console.
cudaEvent_t start;
cudaEvent_t end;
float kernelTime = 0.0f;


int threadsPerBlocK = 16;
int blocksPerGrid = (N + threadsPerBlocK - 1) / threadsPerBlocK;


float* h_a; 
float* h_b;
float* h_c;

float* d_a, * d_b, * d_c;

h_a=(float*)malloc(size * sizeof(float));
h_b=(float*)malloc(size * sizeof(float));
h_c=(float*)malloc(size * sizeof(float));



cudaMalloc(&d_a, size * sizeof(float));
cudaMalloc(&d_b, size * sizeof(float));
cudaMalloc(&d_c, size * sizeof(float));

for (int i = 0; i < size; ++i)
{
	h_a[i] = 1;
	h_b[i] = 2;	
}
cudaEventCreate(&start);
cudaEventCreate(&end);

printf("Matrices are initialized.");

cudaMemcpy(&d_a, h_a, size * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(&d_b, h_b, size * sizeof(float), cudaMemcpyHostToDevice);

cudaEventRecord(start);
MatrixMultiplcation<<< blocksPerGrid, threadsPerBlocK >>>(d_a, d_b, d_c, N);

cudaEventRecord(end);
cudaDeviceSynchronize();

cudaEventElapsedTime(&kernelTime, start, end);

cudaEventDestroy(start);
cudaEventDestroy(end);

cudaMemcpy(h_c, d_c, size * sizeof(float), cudaMemcpyDeviceToHost);


free(h_a);
free(h_b);
free(h_c);
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);

printf("Size is: %d\n",size);
printf("The Matrix Multiplication Kernel Time: %fms", kernelTime);
 
return 0;

}

You should do proper CUDA error checking. Then you would observe the same error when running the code directly from visual studio.

cudaMemcpy(&d_a, h_a, size * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(&d_b, h_b, size * sizeof(float), cudaMemcpyHostToDevice);

should be

cudaMemcpy(d_a, h_a, size * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_b, h_b, size * sizeof(float), cudaMemcpyHostToDevice);
1 Like

It solved the working however now I am getting this error:

Also is there any way to freeze console output results while looking from nsight debug button? It rapidly disappears after showing results.

For instance, runtime for this kernel is 1.4ms on VS run and 1100ms on nsight profiler. I think the second one is the proper measurement.

You might want to direct your attention to the part of the error message that starts with “For instructions …” Alternatively:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.