Texturing from Zero-Copy Memory

cudaBindTexture2D fails when using a device pointer returned by cudaHostGetDevicePointer with invalid device pointer.

Is it possible to texture from host-mapped memory? There is no indication either way in programming guide nor reference manual.

Last August I’ve opened a bug for it #583022. I think it should be fixed in 3.0.


The following items have been modified for this Bug:

  • Bug disposition changed from “Bug - Fixed” to “Bug - Fixed”

  • Bug action changed from “QA - Open - Verify to close” to “QA - Open - Verify to close”

  • Customer Status changed from “Open” to “Open”

  • A new Comment has been added


Bug Information


Customer Bug ID:

NVIDIA Bug ID: 583022

        Date: 8/4/2009 4:41:42 AM

Company/Division: GPU Computing

    Severity: High

    Priority: 1 - High

    Synopsis: Texture over pinned memory fails.

 Description: Hi,

I get “Invalid device pointer” when trying to bind a texture to a device pointer created with cudaHostAllocMapped and cudaHostAlloc (pinned memory pointer).

Is there a way to do this?

Reproducing code:

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <math.h>

// includes, project

#include <cutil_inline.h>

#include <cuda.h>

int main(int argc, char** argv)

{

unsigned int flags = cudaHostAllocMapped;

float *fData_d = NULL;

unsigned int iSamples = 2001;

unsigned int iSize = 2 * 189676; 

iSize *= iSamples;		// iSize = 558,983,352

cudaSetDevice(1);

cudaSetDeviceFlags(cudaDeviceMapHost);

printf( "Allocating [%u] bytes...\n", iSize * sizeof( float ) );

cudaHostAlloc((void **)&fData_h, iSize * sizeof( float ), flags );

                 cudaThreadSynchronize();

cudaHostGetDevicePointer((void **)&fData_d, (void *)fData_h, 0);

                 cudaThreadSynchronize();

cudaBindTexture( 0, tex_LargeFloat2, fData_d, iSize );   // This fails with invalide device pointer [17]

                 cudaThreadSynchronize();

cudaFreeHost( fData_d ); }

Compilation done with CUDA 2.3 beta:

1>“C:\CUDA\bin\nvcc.exe” -arch sm_13 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 8\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT " -I”“C:\CUDA\include”" -I"…/…/common/inc" -maxrregcount=32 --compile -o “test.cu.obj” “test.cu”

-------------------- Additional Information ------------------------ Computer Type: PC System Model Type:

System Model Number:

CPU Type:

Video Memory Type:

Chipset Mfg:

Chipset Type:

Sound Card:

CPU Speed:

Network:

Modem:

North Bridge:

South Bridge:

TV Encoder:

Bus Type: AGP

OS Language:

Application:

Driver Version: 190.15

System BIOS Version:

Video BIOS Mfg:

Video BIOS Version:

Direct X Version:

Monitor Type:

Monitor 1:

Monitor 2:

Monitor 3:

Video 1:

Video 2:

Video 3:

Resolution:

Color Depth:

Products: other

Application Version:

Application Setting:

Multithreaded Application:

Other open applications:

Release:

OS Details:

Problem Category:

How often does problem occur:

Video Memory Size:

CPUs (single or multi):

RAM (amount & type):

AGP Aperture Size:


Latest Comment update from NVIDIA (8/21/2009 10:43:28 AM):

Eyal, a fix for this bug is in place and will be available with a future release.

BTW - Paulius from NVIDIA (I hope I’ve spelled right :) ) suggested a temporary fix for this.
You can use un-textured pinn memory and then copy the data using DeviceToDevice (which should be fast) and bind
the texture to the second device allocation.
Downside is its a hack, requires twice the dataset and probably a bit slower :)

eyal

Good to hear it will be fixed !

We are actually looking at not doing any copies, just use the texture in our first kernel and have it texture from mapped host memory.