No Multicore Core generated: why?

Hello to everyone.
Can someone explain me the reason why, when I try to accelerate with the -ta=multicore flag (MULTICORE CODE) a piece of code in a #pragma acc routine, I don’t get any parallel code?

Hi khrishino,

You should be able to have “routine” when targeting a multicore device, so it’s unclear what’s preventing this in your case. Can you please provide a reproducing example?

Thanks,
Mat

Hello Mat,
Thank you for the reply.
The example code is shown below:

signatureMethod(paramList){
	if (class1 != NULL && class2 != NULL) {
		g1 = ag1;
		g2 = ag2;
		n1 = ag1->NodeCount();
		n2 = ag2->NodeCount();
		last_candidate_index = 0;
		classes_count = nclass;
		core_len = orig_core_len = 0;
		added_node1 = NULL_NODE;

		core_1 = (node_id*) malloc(sizeof(node_id) * n1);
		core_2 = (node_id*) malloc(sizeof(node_id) * n2);
		core_len_c = (node_id*) malloc(classes_count * sizeof(node_id));
		predecessors = (node_id*) malloc(sizeof(node_id) * n1);
		dir = (node_dir_t*) malloc(sizeof(node_dir_t) * n1);
		order = (node_id*) malloc(sizeof(node_id) * n1);
		class_1 = (uint32_t*) malloc(sizeof(uint32_t) * n1);
		class_2 = (uint32_t*) malloc(sizeof(uint32_t) * n2);

#pragma acc enter data copyin(this[0:1]) \
	create(g1[0:1], g2[0:1], core_1[0:n1], core_2[0:n2], core_len_c[0:classes_count], \
			predecessors[0:n1], dir[0:n1], order[0:n1], class_1[0:n1], class_2[0:n2])

#pragma acc parallel present(core_1[0:n1], core_2[0:n2], core_len_c[0:classes_count], \
		predecessors[0:n1], dir[0:n1], order[0:n1], class_1[0:n1], class_2[0:n2]) \
			copyin(orderVec[0:n1], class1[0:n1], class2[0:n2]) \
			vector_length(256)
		{
			#pragma acc loop independent
			for (uint32_t i = 0; i < n1; i++) {
				core_1[i] = NULL_NODE;
				dir[i] = NODE_DIR_NONE;
				predecessors[i] = NULL_NODE;
				order[i] = orderVec[i];
				class_1[i] = class1[i];
			}
			#pragma acc loop independent
			for (uint32_t i = 0; i < n2; i++) {
				core_2[i] = NULL_NODE;
				class_2[i] = class2[i];
			}
			#pragma acc loop independent
			for (uint32_t i = 0; i < classes_count; i++) {
				core_len_c[i] = 0;
			}
			ComputeFirstGraphTraversing();
		}
#pragma acc exit data delete(orderVec[0:n1], class1[0:n1], class2[0:n2])
	}
}

In ComputeFirstGraphTraversing() there are some #pragma acc loop independent loops and there is also the #pragma acc routine gang above the signature method.
When I generate the Multicore Code, it seems that there is no parallelism and the -Minfo=accel doesn’t show anything about the parallelization of the routine.

Why?