cudaGraphicsGLRegisterImage memory usage


I have a problem with high memory usage in my app when using OpenGL interop.

size_t free1, free2, total;
cudaMemGetInfo(&free1, &total);

glTexImage2D(GL_TEXTURE_2D, 0, GL_R16, w, h, 0, GL_RED, GL_UNSIGNED_SHORT, NULL);
//glClearTexImage(m_textureIn, 0, GL_RED, GL_UNSIGNED_SHORT, NULL);
cudaGraphicsGLRegisterImage(&m_cudaResourceIn, m_textureIn, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsWriteDiscard));

cudaMemGetInfo(&free2, &total);
size_t usage = free1 - free2;

The memory usage should IMHO be (and it is if commented out cudaGraphicsGLRegisterImage):
(w * h * 2) B

But according to my measurement (GeForce 970, Win 10 x64, latest drivers) it is:
(w * h * 2) * 2 B

So it is twice as large as I expected.
BTW If I uncomment the glClearTexImage line, the usage is even higher:
(w * h * 2) * 3 B

I think CUDA/OpenGL should operate on the same buffer without additional memory allocation. It looks like a buffer orphaning.
Any explanation for this behaviour?