Same worksharing type in nested loops - parallel construct

I can specify a “gang vector” loop schedule for both loop parts within a nested loop while using the kernels construct:

#pragma acc kernels
#pragma acc loop gang vector
        for( int j = 0; j < n; j++)
#pragma acc loop gang vector
            for( int i = 0; i < m; i++ ) {...}

Then the compiler uses a 2 dimensional grid and 2 dimensional blocks (that is exactly what I want):

         67, #pragma acc loop gang, vector(2) /* blockIdx.y threadIdx.y */
         70, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */

HOWEVER, if I use the parallel construct instead of kernels, I get an error message and the inner loop schedule will be ignored:

PGC-S-0155-Nested loops cannot have the same worksharing type  (file.c: 67)
67, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */

Why do I get this error when it apparently workd nicely (and as expected) with the kernels construct?
How can I get 2 dimensional grids and 2 dimensional blocks with the parallel construct?
Bye, Sandra

Any news?

Sandra: This is defined behavior for the parallel construct. It’s more like the OpenMP loop construct (omp for or omp do). The kernels construct essentially allows tiling. For the parallel construct, we’re adding an explicit tile clause for nested loops in the next OpenACC version which should give you the behavior you want.