Blackwell Integer

global void addKernel(int* c, const int* a, const int* b)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;

for (int i = 0; i < NUM_ITERATIONS_IN_KERNEL; i++)
{
    c[index] = a[index] + b[index];
}

}

Tried other more complex, but int32 kernels also. Same result.
Also tried sm_89, sm_100, sm_101, sm_120. Best result on sm_89 strangely.

1 Like