Hi forum,
I came across a strange error this morning, the code I have been working on return a cudaMalloc failed issue (after I add one .cu file to compile), my cudaMalloc and check error code is
float* device_matA = 0;
cudaError_t err = cudaMalloc((void**)&device_matA, M * K * sizeof(float));
printf("Error: %s\n", cudaGetErrorString(err));
if(device_matA == 0 || device_matBT == 0 || device_matC == 0) {
printf("couldn't allocate memory\n");
return matC;
}
and it returns
Error: unknown error
couldn't allocate memory
It seems that the code did not core dump or crush, it just did not change device_matA and let it remain 0…But on my dish usage manager I still have over 30GB free space.
Here are some details:
in the project, there are several .cu files I want to compile and link to python api via pybind
src/
--common.cu
--mat_mul_naive.cu
--mat_mul_half.cu
--mat_mul_simt.cu
In my CMakeList.txt, I compile all those .cu files as:
add_library(matrix_mul_lib_kernel STATIC src/common.cu src/matrix_mul_naive.cu src/matrix_mul_half.cu src/matrix_mul_simt.cu)
set_target_properties(matrix_mul_lib_kernel PROPERTIES
POSITION_INDEPENDENT_CODE ON
CUDA_VISIBILITY_PRESET "hidden"
CUDA_SEPARABLE_COMPILATION ON
CUDA_ARCHITECTURES 87
)
(I will post the full CMakeList.txt in the reply)
The cudaMalloc issue will occur if I compile as the code above, but I found if i delete the last file mat_mul_simt.cu in CmakeList, the code will work again and output correct results
add_library(matrix_mul_lib_kernel STATIC src/common.cu src/matrix_mul_naive.cu src/matrix_mul_half.cu)
set_target_properties(matrix_mul_lib_kernel PROPERTIES
POSITION_INDEPENDENT_CODE ON
CUDA_VISIBILITY_PRESET "hidden"
CUDA_SEPARABLE_COMPILATION ON
CUDA_ARCHITECTURES 87
)
Do you know how I could correctly compile all those functions? Thank you!
Best,
Chengzhe