Hello to everyone.
Can someone explain me the reason why, when I try to accelerate with the -ta=multicore flag (MULTICORE CODE) a piece of code in a #pragma acc routine, I don’t get any parallel code?
Hi khrishino,
You should be able to have “routine” when targeting a multicore device, so it’s unclear what’s preventing this in your case. Can you please provide a reproducing example?
Thanks,
Mat
Hello Mat,
Thank you for the reply.
The example code is shown below:
signatureMethod(paramList){
if (class1 != NULL && class2 != NULL) {
g1 = ag1;
g2 = ag2;
n1 = ag1->NodeCount();
n2 = ag2->NodeCount();
last_candidate_index = 0;
classes_count = nclass;
core_len = orig_core_len = 0;
added_node1 = NULL_NODE;
core_1 = (node_id*) malloc(sizeof(node_id) * n1);
core_2 = (node_id*) malloc(sizeof(node_id) * n2);
core_len_c = (node_id*) malloc(classes_count * sizeof(node_id));
predecessors = (node_id*) malloc(sizeof(node_id) * n1);
dir = (node_dir_t*) malloc(sizeof(node_dir_t) * n1);
order = (node_id*) malloc(sizeof(node_id) * n1);
class_1 = (uint32_t*) malloc(sizeof(uint32_t) * n1);
class_2 = (uint32_t*) malloc(sizeof(uint32_t) * n2);
#pragma acc enter data copyin(this[0:1]) \
create(g1[0:1], g2[0:1], core_1[0:n1], core_2[0:n2], core_len_c[0:classes_count], \
predecessors[0:n1], dir[0:n1], order[0:n1], class_1[0:n1], class_2[0:n2])
#pragma acc parallel present(core_1[0:n1], core_2[0:n2], core_len_c[0:classes_count], \
predecessors[0:n1], dir[0:n1], order[0:n1], class_1[0:n1], class_2[0:n2]) \
copyin(orderVec[0:n1], class1[0:n1], class2[0:n2]) \
vector_length(256)
{
#pragma acc loop independent
for (uint32_t i = 0; i < n1; i++) {
core_1[i] = NULL_NODE;
dir[i] = NODE_DIR_NONE;
predecessors[i] = NULL_NODE;
order[i] = orderVec[i];
class_1[i] = class1[i];
}
#pragma acc loop independent
for (uint32_t i = 0; i < n2; i++) {
core_2[i] = NULL_NODE;
class_2[i] = class2[i];
}
#pragma acc loop independent
for (uint32_t i = 0; i < classes_count; i++) {
core_len_c[i] = 0;
}
ComputeFirstGraphTraversing();
}
#pragma acc exit data delete(orderVec[0:n1], class1[0:n1], class2[0:n2])
}
}
In ComputeFirstGraphTraversing() there are some #pragma acc loop independent loops and there is also the #pragma acc routine gang above the signature method.
When I generate the Multicore Code, it seems that there is no parallelism and the -Minfo=accel doesn’t show anything about the parallelization of the routine.
Why?