You may try these examples for a way of using unified memory:
I don’t know for sure, but my understanding or feeling so far is that you would have to allocate with a special allocator for sharing memory, so it should be done before.
I don’t know any way to test if a buffer has been allocated in unified memory. It may exist ways I don’t know, though, but I think the probabilty that you face an unexpected unified memory buffer from your code is very low, so is it worth a try?
I’d also suggest the following structure for your example:
//
// SimpleTestGPU.…
Be aware that CUDA stuff may be long to set up the first time, up to a few seconds.
Make a loop and you’ll probably find that the next convolutions are much faster.
[EDIT: Just checked now with this code:
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include "cuda_runtime.h"
#include "opencv2/core.hpp"
#include "opencv2/cudaarithm.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/highgui.hpp"
int main() {
/* Convolution kernel in unified memory */
const int kern_h…
In short, first allocate unified memory. You would then be able to use its address for both CPU and GPU processing, such as read from CPU into it, then transform it from GPU.