I’m working on CUDA 11.2 and Compute Capability 8.6. I have a kernel which might require more than 48KB of dynamic shared memory, so I wrote
int dynamic_smem_size = // ... some calculation logic
if (dynamic_smem_size >= (48 << 10)) {
cudaError_t result = cudaFuncSetAttribute(
MyKernel, cudaFuncAttributeMaxDynamicSharedMemorySize,
dynamic_smem_size);
}
// launch the kernel..
but launching the kernel gives me cudaErrorInvalidValue
when the dynamic_smem_size
is 48964. Then I tried
cudaFuncAttributes attrs;
cudaFuncGetAttributes(&attrs, MyKernel);
LOG(INFO) << attrs.maxDynamicSharedSizeBytes;
the result is 48724. It looks like the the default value for cudaFuncAttributeMaxDynamicSharedMemorySize for my kernel is not exactly 48 << 10 bytes. Why is that?