cudaBindTexture2D fails when using a device pointer returned by cudaHostGetDevicePointer with invalid device pointer.
Is it possible to texture from host-mapped memory? There is no indication either way in programming guide nor reference manual.
cudaBindTexture2D fails when using a device pointer returned by cudaHostGetDevicePointer with invalid device pointer.
Is it possible to texture from host-mapped memory? There is no indication either way in programming guide nor reference manual.
Last August I’ve opened a bug for it #583022. I think it should be fixed in 3.0.
The following items have been modified for this Bug:
Bug disposition changed from “Bug - Fixed” to “Bug - Fixed”
Bug action changed from “QA - Open - Verify to close” to “QA - Open - Verify to close”
Customer Status changed from “Open” to “Open”
A new Comment has been added
Bug Information
Customer Bug ID:
NVIDIA Bug ID: 583022
Date: 8/4/2009 4:41:42 AM
Company/Division: GPU Computing
Severity: High
Priority: 1 - High
Synopsis: Texture over pinned memory fails.
Description: Hi,
I get “Invalid device pointer” when trying to bind a texture to a device pointer created with cudaHostAllocMapped and cudaHostAlloc (pinned memory pointer).
Is there a way to do this?
Reproducing code:
include <stdlib.h>
include <stdio.h>
include <string.h>
include <math.h>
// includes, project
include <cutil_inline.h>
include <cuda.h>
int main(int argc, char** argv)
{
unsigned int flags = cudaHostAllocMapped;
float *fData_d = NULL;
unsigned int iSamples = 2001;
unsigned int iSize = 2 * 189676;
iSize *= iSamples; // iSize = 558,983,352
cudaSetDevice(1);
cudaSetDeviceFlags(cudaDeviceMapHost);
printf( "Allocating [%u] bytes...\n", iSize * sizeof( float ) );
cudaHostAlloc((void **)&fData_h, iSize * sizeof( float ), flags );
cudaThreadSynchronize();
cudaHostGetDevicePointer((void **)&fData_d, (void *)fData_h, 0);
cudaThreadSynchronize();
cudaBindTexture( 0, tex_LargeFloat2, fData_d, iSize ); // This fails with invalide device pointer [17]
cudaThreadSynchronize();
cudaFreeHost( fData_d ); }
Compilation done with CUDA 2.3 beta:
1>“C:\CUDA\bin\nvcc.exe” -arch sm_13 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 8\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT " -I”“C:\CUDA\include”" -I"…/…/common/inc" -maxrregcount=32 --compile -o “test.cu.obj” “test.cu”
-------------------- Additional Information ------------------------ Computer Type: PC System Model Type:
System Model Number:
CPU Type:
Video Memory Type:
Chipset Mfg:
Chipset Type:
Sound Card:
CPU Speed:
Network:
Modem:
North Bridge:
South Bridge:
TV Encoder:
Bus Type: AGP
OS Language:
Application:
Driver Version: 190.15
System BIOS Version:
Video BIOS Mfg:
Video BIOS Version:
Direct X Version:
Monitor Type:
Monitor 1:
Monitor 2:
Monitor 3:
Video 1:
Video 2:
Video 3:
Resolution:
Color Depth:
Products: other
Application Version:
Application Setting:
Multithreaded Application:
Other open applications:
Release:
OS Details:
Problem Category:
How often does problem occur:
Video Memory Size:
CPUs (single or multi):
RAM (amount & type):
AGP Aperture Size:
Latest Comment update from NVIDIA (8/21/2009 10:43:28 AM):
Eyal, a fix for this bug is in place and will be available with a future release.
BTW - Paulius from NVIDIA (I hope I’ve spelled right :) ) suggested a temporary fix for this.
You can use un-textured pinn memory and then copy the data using DeviceToDevice (which should be fast) and bind
the texture to the second device allocation.
Downside is its a hack, requires twice the dataset and probably a bit slower :)
eyal
Good to hear it will be fixed !
We are actually looking at not doing any copies, just use the texture in our first kernel and have it texture from mapped host memory.