I’m trying to inference a YoloV2 network.
With input shape (1,608,608,3) and output shape (1, 19, 19, 5, 6). NHWC formatting, there are sufficient transpose layers on in/out for this.
I am preparing the image buffer as follows: Where device buffer is a void*[2] passed to the function as void* (&buffer)[2]. Note the input mat is a RGB not BGR formatted mat.
cv::cuda::GpuMat flMat;
cv::cuda::resize(irImg, irImg, cv::Size(608,608), 0, 0, cv::INTER_CUBIC);
irImg.convertTo(flMat, CV_32FC3);
cv::cuda::divide(flMat, 255.0, flMat);
float* devicePtr;
cudaMalloc(&devicePtr, bufferSize[0]);
cv::cuda::GpuMat deviceMat(mat.rows, mat.cols, CV_32FC3, devicePtr);
mat.copyTo(deviceMat);
cudaError_t err = cudaMemcpy(deviceBuffer[0], devicePtr, bufferSize[0], cudaMemcpyDeviceToDevice);
if (err != cudaError::cudaSuccess) {
DLOG(INFO) << "Buffer load failed: " << err;
return false;
}
cudaFree(devicePtr);
I think execute inference using the void*[2] in/out buffer and attempt to decode the output using a 3D for loop over the 19,19,5 dims.
The indexing for the output is calculated using, where i is the 5th dim on the output shape above:
dim.d[1] * (dim.d[2]*(dim.d[3]*r+c)+b)+i
With the loop looking like:
for (int r = 0; r < dataDim.d[1]; r++) { for (int c = 0; c < dataDim.d[2]; c++) { for (int b = 0; b < dataDim.d[3]; b++) { float tp = data[calculateIdx(r,c,b,4,dataDim)]; float prob = sigmoid(tp); if (prob < 0.5f) { continue; }
The decoded output is completely incorrect, almost all grids return sufficient detection probability.
Any thoughts would be appreciated