I wrote a small program to test out a few issues I was having with OpenACC and C++ classes. The program creates a C-array called data_
on the heap, fills each element of the array with the same value equal to 1.0
, copies the class member variables into GPU memory, and then tries to print out some values from the C-array using printf()
statements in the process_data()
function. The code is as follows:
#include <cstddef>
#include <stdio.h>
class DataProcessor {
public:
DataProcessor();
~DataProcessor();
void process_data();
private:
unsigned int num_rows_;
unsigned int num_cols_;
double *data_;
};
DataProcessor::DataProcessor() {
unsigned int num_rows_ = 10;
unsigned int num_cols_ = 10;
data_ = new double[num_rows_*num_cols_];
for(std::size_t i = 0; i < num_rows_; ++i) {
for(std::size_t j = 0; j < num_cols_; ++j) {
data_[i*num_rows_ + j] = 1.0;
}
}
#pragma acc enter data copyin(this)
#pragma acc enter data copyin(num_rows_)
#pragma acc enter data copyin(num_cols_)
#pragma acc enter data copyin(data_[0:num_rows_*num_cols_])
}
DataProcessor::~DataProcessor() {
#pragma acc exit data delete(num_rows_)
#pragma acc exit data delete(num_cols_)
#pragma acc exit data delete(data_[0:num_rows_*num_cols_])
#pragma acc exit data delete(this)
delete[] data_;
}
void DataProcessor::process_data() {
int num_beams = 1;
int count = 0;
#pragma acc data copyin(num_beams) copy(count)
{
#pragma acc parallel loop
for(std::size_t i = 0; i < num_beams; ++i) {
count++;
printf("data_[0]: %f \n", data_[0]);
}
}
printf("count = %i \n", count);
}
int main() {
DataProcessor data_processor;
data_processor.process_data();
return 0;
}
And can be compiled with the following command:
pgc++ -g -acc -ta=tesla -Minfo=accel data_processor.cpp
Looking at the output from the process_data()
function when running the program, I’ve noticed that a few strange issues occur:
- Even though the
count
variable is copied into and out of the OpenACCdata
region usingcopy(count)
, its value is still0
when it is printed at the end of the function usingprintf("count = %i \n", count)
. However, I had expected it to be1
sincenum_beams = 1
and thus thefor
loop is only iterated once, incrementingcount
by1
. - The
num_beams
variable is set to1
, copied into the OpenACCdata
region, and then used in the subsequentfor
loop. However, when looking at what is printed out in the console usingprintf("data_[0]: %f \n", data_[0])
, I seedata_[0]: 1.000000
printed10
times instead of only once. But, when I instead replacefor(std::size_t i = 0; i < num_beams; ++i)
withfor(std::size_t i = 0; i < 1; ++i)
. I seedata_[0]: 1.000000
printed out only once as expected. Sincenum_beams = 1
, I would have expected the same output in both cases. - When the program ends, I sometimes get an error similar to the following:
(null) lives at 0x19c5e70 size 10296506880 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 6.1, threadid=1
host:0x19c5e70 device:0x7f991ecfa600 size:800 presentcount:0+1 line:34 name:(null)
host:0x7ffdd7cda708 device:0x7f991ecfa400 size:4 presentcount:0+1 line:34 name:num_cols_
host:0x7ffdd7cda70c device:0x7f991ecfa200 size:4 presentcount:0+1 line:34 name:num_rows_
allocated block device:0x7f991ecfa200 size:512 thread:1
allocated block device:0x7f991ecfa400 size:512 thread:1
allocated block device:0x7f991ecfa600 size:1024 thread:1
deleted block device:0x7f991ecfa000 size:512 threadid=1
FATAL ERROR: variable in data clause is partially present on the device: name=(unknown)
file:/home/alex/Desktop/OpenACC Tests/data_processor.cpp _ZN13DataProcessorD1Ev line:39
and judging by the last line in the error, it occurs due to the last line in the destructor given by delete[] data_;
. However, since data_
is deleted from the GPU with #pragma acc exit data delete(data_[0:num_rows_*num_cols_])
before delete[] data_
is called, I’m surprised an error occurs as all. Furthermore, I’m surprised that the error only occurs sometimes, and not deterministically each time the program is run.
So far I’ve been left scratching my head at the above-mentioned behavior, and was wondering if there’s something that I’m doing incorrectly that is causing such issues. In case it may be of use, I’m running the code on Ubuntu 18.04, and pgc++ --version
gives me the following compiler information:
pgc++ (aka nvc++) 20.11-0 LLVM 64-bit target on x86-64 Linux -tp haswell
PGI Compilers and Tools
Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
Any help would be appreciated, thanks.