Hello everyone,
I am new to parallel programming and I am more theoretician than programmer, so please apologize for any obvious mistakes!
My goal : I want to call a kernel function inside an OpenMP loop.
My problem is that even when my kernel function kernel_normalization_voxels
is empty, and even for a same number of threads sometimes the code continues to the end sometimes not.
Here is my code :
#pragma omp parallel num_threads(num_slices_to_reconstruct)
{int current_slice = omp_get_thread_num(); memset(normalization_voxels, 0, sizeof(float) * current_slice * (*NB_VOXELS) * nb_OS); int numero_subset, num_projection; for (numero_subset = 0; numero_subset < nb_OS; numero_subset++) { for (int i = 0; i < NB_PROJECTIONS / nb_OS; i++) { num_projection = Table_OS[numero_subset * NB_PROJECTIONS / nb_OS + i]; kernel_normalization_voxels << <nb_cuda_cores, 32 >> > (...); cudaDeviceSynchronize(); } }
}
-
normalization_voxels
, NB_VOXELS are defined with acudaMallocManged
; -
nb_OS
,nb_cuda_cores
andnum_slices_to_reconstruct
and the tableTable_OS
are defined on the CPU ; -
NB_PROJECTIONS
is#define
in a.h
file.
Even with an empty kernel kernel_normalization_voxels
global void kernel_projection(…)
{
/* … */
}
my program does not go all the time through all the code to the end.
Does anyone have an idea where the problem might come from?
Thank you a lot for your help!