<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
      <title>Tagged with gpu-debugging - NVIDIA Developer Forums</title>
      <link>http://forums.developer.nvidia.com/devforum/discussions/tagged/gpu-debugging/feed.rss</link>
      <pubDate>Wed, 16 May 12 17:32:22 -0400</pubDate>
         <description>Tagged with gpu-debugging - NVIDIA Developer Forums</description>
   <language>en-CA</language>
   <atom:link href="/devforum/discussions/taggedgpu-debugging/feed.rss" rel="self" type="application/rss+xml" />
   <item>
      <title>ArrayFire + Nsight???</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7996/arrayfire-nsight</link>
      <pubDate>Wed, 09 May 2012 21:08:48 -0400</pubDate>
      <dc:creator>sizheng</dc:creator>
      <guid isPermaLink="false">7996@/devforum/discussions</guid>
      <description><![CDATA[i'm trying ArrayFire, but it seems that i cannot debug arrayfire code by nsight~~~<br /><br />anybody knows how to?]]></description>
   </item>
      <item>
      <title>Parallel Nsight 2.2 RC2 - No Source Available</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7886/parallel-nsight-2-2-rc2-no-source-available</link>
      <pubDate>Fri, 04 May 2012 16:35:28 -0400</pubDate>
      <dc:creator>wdrozd</dc:creator>
      <guid isPermaLink="false">7886@/devforum/discussions</guid>
      <description><![CDATA[When selecting CUDA Debugging with memory checker enabled I get a kernel crash with a window in  Nsight that says "No Source Available". When I click on the link "Browse to Find Source", a message says "The source code cannot be displayed".<br /><br />My application compiles fine, so clearly it can find the source (for both the C code and the Cuda code)<br /><br />Also I have no problem stopping at a breakpoint set in my Kernel prior to the crash (grid launch failure)<br /><br />The call-stack says "No active Cuda Kernels".<br /><br />Can you please me determine how to set Nsight to detect the source?<br /><br />Thanks.]]></description>
   </item>
      <item>
      <title>Matrix multiplication doesn&#039;t work! Output always different...</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7861/matrix-multiplication-doesnt-work-output-always-different-</link>
      <pubDate>Fri, 04 May 2012 07:47:28 -0400</pubDate>
      <dc:creator>Z0K4</dc:creator>
      <guid isPermaLink="false">7861@/devforum/discussions</guid>
      <description><![CDATA[Hello everyone...<br /><br /><br />Recently I started playing with the CUDA computing, and I want to write a kernel that will multiply matrices... So I started searching and found that SDK has an example for matrix multiplication. The problem is that I always get a different output meaning that the resulting matrix is always different. How is that possible? Anyway, I was unable to integrate nvcc with the Visual Studio so I couldn't use debugger to see what went wrong. Any help is much appreciated! Here is the code:<br /><br /><code>#include&lt;stdio.h&gt;<br />#include&lt;cuda.h&gt;<br />#include&lt;cuda_runtime.h&gt;<br />#include&lt;cuda_runtime_api.h&gt;<br />#include&lt;device_functions.h&gt;<br /><br />static void HandleError(cudaError_t err, const char *file, int line)<br />{<br />    if(err!=cudaSuccess){<br />		printf("%s in %s file at line %s\n", cudaGetErrorString(err), file, line);<br />		exit(EXIT_FAILURE);<br />    }<br />}<br /><br /><a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> HANDLE_ERROR(err) (HandleError(err, __FILE__, __LINE__))<br /><br /><a href="/devforum/search?Search=%23ifndef&amp;Mode=like">#ifndef</a> _MATRIXMUL_KERNEL_H_<br /><a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> _MATRIXMUL_KERNEL_H_<br /><br /><a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> BLOCK_SIZE 4<br /><a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> TILE_SIZE 4<br /><br />__global__ void matrixMul( int* A, int* B, int* C, int wA, int wB)<br />{<br />	int bx = blockIdx.x;<br />    int by = blockIdx.y;<br /><br />	int tx = threadIdx.x;<br />	int ty = threadIdx.y;<br /><br /><br />	int aBegin = wA * BLOCK_SIZE * by;<br /><br />	int aEnd   = aBegin + wA - 1;<br /><br />	int aStep  = BLOCK_SIZE;<br /><br />	int bBegin = BLOCK_SIZE * bx;<br /><br />	int bStep  = BLOCK_SIZE * wB;<br /><br />	float Csub=0;<br /><br />	for (int a = aBegin, b = bBegin; a &lt;= aEnd; a += aStep, b += bStep) <br />	{<br />		__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];<br /><br />		__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];<br /><br />		As[ty][tx] = A[a + wA * ty + tx];<br />		Bs[ty][tx] = B[b + wB * ty + tx];<br /><br />		__syncthreads();<br /><br /><a href="/devforum/search?Search=%23pragma&amp;Mode=like">#pragma</a> unroll<br /><br />		for (int k = 0; k &lt; BLOCK_SIZE; ++k)<br />			Csub += As[ty][k] * Bs[k][tx];<br /><br />		__syncthreads();<br />	}<br /><br />	int c = wB * BLOCK_SIZE * by + BLOCK_SIZE * bx;<br />	C[c + wB * ty + tx] = Csub;<br />}<br /><br /><a href="/devforum/search?Search=%23endif&amp;Mode=like">#endif</a><br /><br />int main()<br />{<br />	int *a=(int*)malloc(BLOCK_SIZE*BLOCK_SIZE*sizeof(int));<br />	int *b=(int*)malloc(BLOCK_SIZE*BLOCK_SIZE*sizeof(int));<br />	int *c=(int*)malloc(BLOCK_SIZE*BLOCK_SIZE*sizeof(int));<br /><br />	int *dev_a, *dev_b, *dev_c;<br /><br />	HANDLE_ERROR(cudaMalloc((void**)&amp;dev_a, BLOCK_SIZE*BLOCK_SIZE*sizeof(int*)));<br />	HANDLE_ERROR(cudaMalloc((void**)&amp;dev_b, BLOCK_SIZE*BLOCK_SIZE*sizeof(int*)));<br />	HANDLE_ERROR(cudaMalloc((void**)&amp;dev_c, BLOCK_SIZE*BLOCK_SIZE*sizeof(int*)));<br /><br />	for(int i=0; i&lt;BLOCK_SIZE*BLOCK_SIZE; i++)<br />	{<br />		a[i]=1;<br />		b[i]=2;<br />	}<br /><br />	HANDLE_ERROR(cudaMemcpy(dev_a, a, BLOCK_SIZE*BLOCK_SIZE*sizeof(int), cudaMemcpyHostToDevice));<br />	HANDLE_ERROR(cudaMemcpy(dev_b, b, BLOCK_SIZE*BLOCK_SIZE*sizeof(int), cudaMemcpyHostToDevice));<br /><br />	matrixMul&lt;&lt;&lt;BLOCK_SIZE, BLOCK_SIZE&gt;&gt;&gt;(dev_a, dev_b, dev_c, BLOCK_SIZE, BLOCK_SIZE);<br /><br />	HANDLE_ERROR(cudaMemcpy(c, dev_c, BLOCK_SIZE*BLOCK_SIZE*sizeof(int), cudaMemcpyDeviceToHost));<br /><br />	for(int i=0; i&lt;BLOCK_SIZE*BLOCK_SIZE; i++)<br />	{<br />		if(i%BLOCK_SIZE==0)<br />			printf("\n\n");<br />		printf("%d\t", a[i]);<br />	}<br /><br />	for(int i=0; i&lt;BLOCK_SIZE*BLOCK_SIZE; i++)<br />	{<br />		if(i%BLOCK_SIZE==0)<br />			printf("\n\n");<br />		printf("%d\t", b[i]);<br />	}<br /><br />	for(int i=0; i&lt;BLOCK_SIZE*BLOCK_SIZE; i++)<br />	{<br />		if(i%BLOCK_SIZE==0)<br />			printf("\n\n");<br />		printf("%d\t", c[i]);<br />	}<br /><br />	cudaFree(dev_a);<br />	cudaFree(dev_b);<br />	cudaFree(dev_c);<br /><br />	return 0;<br />}</code><br /><br /><br />Any suggestion?<br /><br />]]></description>
   </item>
      <item>
      <title>Portable pinned memory and multiple GPUs: Performance and stability</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7386/portable-pinned-memory-and-multiple-gpus-performance-and-stability</link>
      <pubDate>Sun, 22 Apr 2012 13:47:59 -0400</pubDate>
      <dc:creator>tbenson</dc:creator>
      <guid isPermaLink="false">7386@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />I am having some problems using portable pinned memory to share one pinned buffer between multiple GPUs.  I have two separate issues:<br /><br />	 1) Performance of transfers for the GPUs not corresponding to the allocation context are massively degraded; and<br />	 2) It tends to crash my Linux host and force a reboot.<br /><br />I included the code at the end.  There are several flags at the top of the source file to control behavior, including NDEVICES, NBUFFERS, and USE_PINNED_MEMORY.  NDEVICES is the number of GPUs to use, NBUFFERS is the number of buffers to be allocated, and USE_PINNED_MEMORY determines whether or not the buffers are pinned.  The case that fails is NDEVICES = 2, NBUFFERS = 1, and USE_PINNED_MEMORY = true. If I use as many buffers as devices, then things work with or without pinned memory.  It also works without pinned memory for any number of buffers.  However, with the failing case, I get the following:<br /><br />[host:portable_pinned]$ ./portable <br />id = 0, cudaMemcpy time = 22.32 ms<br />id = 0, val = 3.000000 (should be 3.000000)<br />id = 1, cudaMemcpy time = 5457.76 ms<br />id = 1, val = 6.000000 (should be 6.000000)<br /><br />Message from syslogd@host at Apr 22 13:25:16 ...<br /> kernel:[41786.826763] Stack:<br /><br />Message from syslogd@host at Apr 22 13:25:16 ...<br /> kernel:[41786.828257] Call Trace:<br /><br />Message from syslogd@host at Apr 22 13:25:16 ...<br /> kernel:[41786.845154] Code: f6 62 00 85 c0 74 10 e8 69 e4 65 00 0f 1f 00 eb 06 89 77 6c 89 4f 70 48 83 c5 10 5b c3 41 54 53 48 83 ec 08 48 83 ed 08 41 89 f4 &lt;39&gt; 77 6c 73 17 39 77 70 0f 87 ac 00 00 00 39 77 6c 73 09 39 77 <br /><br />The host at this point is only partially responsive and needs to be rebooted.  The system log is full of errors, but a sampling is attached.  This is using driver version 285.05.33, CUDA 4.1, Fedora 14, and kernel 2.6.35.6-45.  The GPUs are two Tesla C2050s that reside in a Tesla S2050 compute server.  They are connected to the host via a single PCI-e cable.  This is a single host in a cluster, so updating the driver is not trivial, although I will do so if this is a known bug.<br /><br />In any case, I suspect that the kernel/driver error is just a bug as I have done something similar in the past without this problem.  However, I still had the poor performance in the past.  Above, the PCIe transfer to the CUDA context in which the allocation was not made is over 200x slower than the transfer for the context in which the allocation was made.  Is this normal?  The documentation just says that cudaHostAllocPortable allows pinned memory to be recognized by other contexts, but does not mention the performance implications of accessing the memory.<br /><br />Thanks for any help/comments,<br /><br />Tom<br /><br />The code is below.  The Timing class is just a wrapper that I have for host timing.  I can include it if needed, but already had to rework this email due to character limitations.  The references can be commented out to compile  the code.<br /><br /><code><br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> &lt;cuda_runtime.h&gt;<br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> &lt;cassert&gt;<br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> &lt;cstdio&gt;<br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> &lt;pthread.h&gt;<br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> "timing.hpp"<br /><br />namespace<br />{<br />    const size_t BUFSIZE = 32*1024*1024;<br />    const int NDEVICES = 2;<br />    const int NBUFFERS = 1;<br />    const bool USE_PINNED_MEMORY = true;<br />}<br /><br />struct Params<br />{<br />    float *buf;<br />    int id;<br />};<br /><br />__global__ void test_kernel(float *buf, float val) { buf[0] = val; }<br /><br />void *gpu_thread(void *v)<br />{<br />    Params *params = (Params *) v;<br /><br />    cudaSetDevice(params-&gt;id);<br /><br />    float *dev_buf;<br />    assert(cudaMalloc((void **) &amp;dev_buf, sizeof(float)*BUFSIZE) == cudaSuccess);<br /><br />    double start = Timing::ElapsedTimeMs();<br />    assert(cudaMemcpy(dev_buf, params-&gt;buf, sizeof(float)*BUFSIZE, cudaMemcpyHostToDevice) == cudaSuccess);<br />    double elapsed = Timing::ElapsedTimeMs() - start;<br />    printf("id = %d, cudaMemcpy time = %.2f ms\n", params-&gt;id, elapsed);<br /><br />    test_kernel&lt;&lt;&lt;1,1&gt;&gt;&gt;( dev_buf, (params-&gt;id+1) * 3.0f );<br />    assert(cudaThreadSynchronize() == cudaSuccess);<br /><br />    float retval;<br />    assert(cudaMemcpy(&amp;retval, dev_buf, sizeof(float), cudaMemcpyDeviceToHost) == cudaSuccess);<br /><br />    printf("id = %d, val = %f (should be %f)\n", params-&gt;id, retval, (params-&gt;id+1)*3.0f);<br /><br />    assert(cudaFree(dev_buf) == cudaSuccess);<br /><br />    return NULL;<br />}<br /><br />int main(int argc, char **argv)<br />{<br />    float *pinned[NDEVICES];<br /><br />    assert(NBUFFERS &lt;= NDEVICES);<br /><br />    for (int i = 0; i &lt; NBUFFERS; ++i)<br />    {<br />        assert(cudaSetDevice(i) == cudaSuccess);<br />        if (USE_PINNED_MEMORY)<br />        {<br />            assert(cudaHostAlloc((void **) &amp;pinned[i], sizeof(float)*BUFSIZE, cudaHostAllocPortable) == cudaSuccess);<br />        }<br />        else<br />        {<br />            pinned[i] = new float[BUFSIZE];<br />        }<br />        for (size_t k = 0; k &lt; BUFSIZE; ++k) { pinned[i][k] = 1.0f; }<br />    }<br /><br />    pthread_t tid[NDEVICES];<br />    Params params[NDEVICES];<br /><br />    for (int i = 0; i &lt; NDEVICES; ++i)<br />    {<br />        params[i].id = i;<br />        params[i].buf = pinned[i%NBUFFERS];<br />        assert(pthread_create(tid+i, NULL, gpu_thread, (void *) &amp;params[i]) == 0);<br />    }<br /><br />    for (int i = 0; i &lt; NDEVICES; ++i)<br />    {<br />        assert(pthread_join(tid[i], NULL) == 0);<br />    }<br /><br />    for (int i = 0; i &lt; NBUFFERS; ++i)<br />    {<br />        if (USE_PINNED_MEMORY)<br />        {<br />            assert(cudaFreeHost(pinned[i]) == cudaSuccess);<br />        }<br />        else<br />        {<br />            delete [] pinned[i];<br />        }<br />    }<br /><br />    return 0;<br />}<br /></code>]]></description>
   </item>
      <item>
      <title>Failed lunching kernal</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7851/failed-lunching-kernal</link>
      <pubDate>Thu, 03 May 2012 19:52:11 -0400</pubDate>
      <dc:creator>Saouli</dc:creator>
      <guid isPermaLink="false">7851@/devforum/discussions</guid>
      <description><![CDATA[Hi again<br />after all what i learned about cuda and how to use but still not enghoth well i wrote cuda kernal for ray casting to render some DICOM files it's k<br />I used shared Memo and texture memo<br /><code><br />void CallCUDAKernel(dim3 gridDim, dim3 blockDim,unsigned int *Outputi, int *Winds,float *Spacing, int *VolDim,float *Boxmin,float *Boxmax,float *UP,float *AT,<br />	                                float *OThreshold,float *Omega,float *angle, float *CamPos)<br /><br />{<br /><br />	RaycastingRender&lt;&lt;&lt;gridDim, blockDim&gt;&gt;&gt;(Outputi, Winds,Spacing, VolDim,Boxmin,Boxmax,UP,AT,<br />	                               OThreshold,Omega, angle,CamPos);<br />}<br /></code><br />somthing like that before i call my cernal i do allocate all the varable on globale device using<br />cudaMalloc and cudaMemocpy Note some varaible should be cpy from Host complex structr to device<br /><br />I dont know why but my Kernal stop and give me cuda kernel : (11) invalide argument error<br />cuda kernel  invalide argument error]]></description>
   </item>
      <item>
      <title>Getting started, price-worthy hardware?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7891/getting-started-price-worthy-hardware</link>
      <pubDate>Fri, 04 May 2012 18:15:05 -0400</pubDate>
      <dc:creator>AniSkywalker</dc:creator>
      <guid isPermaLink="false">7891@/devforum/discussions</guid>
      <description><![CDATA[Hi!<br /><br /><br /><br />I'm new to both this forum and CUDA but it is very much in my line of interest. I already know both asm and some GPU-programming (float point arithmetics etc) with asm. <br /><br /><br /><br />I want to start with CUDA-programming. I'm searching for price-worthy and CUDA 4 compatible hardware. Since I am very new to the subject, I'd like to be directed to hardware choices that gives relevant experience when writing CUDA code. That is, if double gpu or double cpus are beneficial, I'd like to be pointed towards good and price-worthy solutions there. If a single gpu/cpu solutions is a good enough place to start and get experience (say 35000-50000 lines of code) then I'd go with it. And if if there is some solution that works for now and is upgradeable, I might go with it.<br /><br /><br /><br />Just to be clear, I wouldn't ask here if I wasn't entirely new to this, so please don't mock me if some of my questions are pure stupid. I just don't know better ways to form them right now...]]></description>
   </item>
      <item>
      <title>Batch testing with Parallel Nsight</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7806/batch-testing-with-parallel-nsight</link>
      <pubDate>Wed, 02 May 2012 20:20:01 -0400</pubDate>
      <dc:creator>nunosilva800</dc:creator>
      <guid isPermaLink="false">7806@/devforum/discussions</guid>
      <description><![CDATA[Hello.<br />In building an OpenGL program that is basically a visualizer, and I would like to test it under various configurations (number of lights, model to load, and textures) to assess performance, scalability, etc...<br /><br />So I would like to know how I can make a script to define a bunch of tests, so that I can leave if doing them during the night, and go analyze results the next day. <br />I've found the TestRunner.exe program in C:\Program Files (x86)\NVIDIA Parallel Nsight 2.2\Common, but I don't know what parameters to use with it. <br /><br />I've searched though the user guide and the interwebs, but I can't find anything resembling batch testing with Nsight....<br /><br />How can I do it?<br />thx.]]></description>
   </item>
      <item>
      <title>Does current cuda-gdb allow single GPU debugging like Nsight 2.2? in CUDA 5 will support it?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7611/does-current-cuda-gdb-allow-single-gpu-debugging-like-nsight-2-2-in-cuda-5-will-support-it</link>
      <pubDate>Thu, 26 Apr 2012 19:31:13 -0400</pubDate>
      <dc:creator>oscarbg</dc:creator>
      <guid isPermaLink="false">7611@/devforum/discussions</guid>
      <description><![CDATA[As Nsight 2.2 now supports single GPU debugging via called software preemption cuda-gdb supports same technology on Linux or Mac? will it support it soon? as seems GTC will unveil nsight for mac and linux hope it's added there too as I think it will use cuda-gdb underneath..]]></description>
   </item>
      <item>
      <title>Which version of the NVAPI should I use with my Quadro FX 570 card?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7876/which-version-of-the-nvapi-should-i-use-with-my-quadro-fx-570-card</link>
      <pubDate>Fri, 04 May 2012 15:19:38 -0400</pubDate>
      <dc:creator>braggo</dc:creator>
      <guid isPermaLink="false">7876@/devforum/discussions</guid>
      <description><![CDATA[Dell desktop, Windows 7<br />Nvidia Quadro FX 570<br /><br />I am currently using R295 February 2012 release of the NVAPI and some of the functions are returning an error code -9 NVAPI_INCOMPATIBLE_STRUCT_VERSION.<br /><br />Specifcally, the NvAPI_DISP_SetDisplayConfig() and NvAPI_DISP_GetDisplayConfig() when I pass in a NV_DISPLAYCONFIG_PATH_INFO struct.<br /><br />Here are my debug print outs:<br /><br />NvAPI_Initialize(): mStatus = NVAPI_OK<br />STRUCT Versions:<br />NV_DISPLAY_DRIVER_VERSION_VER                  = 65676<br />NV_DISPLAY_PORT_INFO_VER                       = 65580<br />NV_DISPLAYCONFIG_PATH_INFO_VER                 = 131100<br />NV_DISPLAYCONFIG_PATH_ADVANCED_TARGET_INFO_VER = 65664<br />NvAPI_SYS_GetChipSetInfo(&amp;mChipSetInfo): mStatus = NVAPI_OK<br /> -- ChipSet Info --<br />Device Id      = 10720<br />HBdeviceId     = 10720<br />Vendor Name    = Intel<br />struct version = 262376<br />NvAPI_GetInterfaceVersionString(): mStatus = NVAPI_OK<br />NvAPI Version = NVidia Complete Version 1.10<br />NvAPI_SYS_GetDriverAndBranchVersion(): mStatus = NVAPI_OK<br />GPU Driver Version = 27533<br />Branch String = r275_21<br /><br />Is the GPU driver version incompatible with the API version macros?<br /><br />Thank you in advance for your help!]]></description>
   </item>
      <item>
      <title>Issue using NSight 2.2 rc2 with VS11</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7796/issue-using-nsight-2-2-rc2-with-vs11</link>
      <pubDate>Wed, 02 May 2012 15:19:27 -0400</pubDate>
      <dc:creator>diver182</dc:creator>
      <guid isPermaLink="false">7796@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I installed VS11 (Ultimate) beta on my Win7 machine,<br />after that the 4.2x SDK and then Parallel NSight 2.2 rc2.<br />The NSight installer claimed to have made modifications to the VS11 installation.<br />But I can neither find the NSight menu on the upper pane nor the templates<br />for creating a cuda 4.2 project.<br /><br />Do I have to adjust anything to make it work or what did I miss?]]></description>
   </item>
      <item>
      <title>HW Debug Support</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7751/hw-debug-support</link>
      <pubDate>Tue, 01 May 2012 19:41:18 -0400</pubDate>
      <dc:creator>Vector</dc:creator>
      <guid isPermaLink="false">7751@/devforum/discussions</guid>
      <description><![CDATA[Which gpus have hardware debug support to use with NSight 2.2's single gpu debug feature?<br />Thanks<br />]]></description>
   </item>
      <item>
      <title>Timing asynch streams with cudaDeviceSynchronize and two events</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7746/timing-asynch-streams-with-cudadevicesynchronize-and-two-events</link>
      <pubDate>Tue, 01 May 2012 18:20:21 -0400</pubDate>
      <dc:creator>dlowell</dc:creator>
      <guid isPermaLink="false">7746@/devforum/discussions</guid>
      <description><![CDATA[Here is what the docs say about cudaEventRecord on stream 0:<br />Records an event. If stream is non-zero, the event is recorded after all preceding operations in stream have been completed; otherwise, it is recorded after all preceding operations in the CUDA context have been completed.<br /><br /><code>      cudaStream_t *streamid;<br />      streamid = (cudaStream_t*)malloc(nstreams*sizeof(cudaStream_t));<br />      float elapsed;<br />      cudaEvent_t start, stop;<br />      cudaEventCreate(&amp;start);<br />      cudaEventCreate(&amp;stop);<br />      cudaDeviceSynchronize();<br />      cudaEventRecord(start,0);<br />      /*invoke device kernel*/<br />       for(i=0;i&lt;nstreams;i++){<br />          cudaStreamCreate(&amp;(streamid[i]));<br />          orcu_kernel&lt;&lt;&lt;dimGrid,dimBlock,0,streamid[i]&gt;&gt;&gt;(n,dev_y,dev_x);<br />      }<br />      cudaDeviceSynchronize();<br />      cudaEventRecord(stop,0);<br />      cudaEventSynchronize(stop);<br />      cudaEventElapsedTime(&amp;elapsed,start,stop);<br />      cudaEventDestroy(start);<br />      cudaEventDestroy(stop);<br />      for(i=0;i&lt;nstreams;i++){<br />         cudaStreamDestroy(streamid[i]);<br />      }</code><br /><br />According to the api docs as long as we stick with the zero stream it is fine. <br /><br />I checking of a more experienced CUDA dev knows if this is a legitimate way of timing multiple streams.<br /><br />Update: <br />Also according the the literature out there I don't need cudaDeviceSynchronize() after the kernel call....<br /><br />Thanks all. ]]></description>
   </item>
      <item>
      <title>More info on Nsight plugin for GPU HW debugging of C++ AMP code..</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7616/more-info-on-nsight-plugin-for-gpu-hw-debugging-of-c-amp-code-</link>
      <pubDate>Thu, 26 Apr 2012 19:34:26 -0400</pubDate>
      <dc:creator>oscarbg</dc:creator>
      <guid isPermaLink="false">7616@/devforum/discussions</guid>
      <description><![CDATA[Seems Nsight 2.2 ships with a plugin for GPU HW debugging of C++ AMP code I have installed but not achieved gpu debugging.. what driver needs (301.32 wddm 1.1 or 296.17 wddm 1.2 drivers?).. also supports single GPU debugging or needs two GPUs.. also supports ATI card as driving display secondary card?..]]></description>
   </item>
      <item>
      <title>2 GTX265 cards not detected for debugging</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6796/2-gtx265-cards-not-detected-for-debugging</link>
      <pubDate>Mon, 09 Apr 2012 01:11:30 -0400</pubDate>
      <dc:creator>vinaybgavirangaswamy</dc:creator>
      <guid isPermaLink="false">6796@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />Thank you for reading this and taking time to help me configure for debugging<br /><br />I am trying to do code line by line debugging on a system with 2 gtx 265 cards. I have tried to list my system configuration below if thinking it might help<br /><br />Mother board gigabyte 990fxa-ud3<br />graphics card gtx265 (one from zotech and other is from asus)<br />OS: Windows 7 (64 bit)<br />graphics driver: devdriver_4.1_winvista-win7_64_286.19_general<br />CUDE toolkit: cudatoolkit_4.1.28_win_64<br />parallel nsight: Parallel_Nsight_Win64_2.1.0.12046<br />SLI: disabled<br />WDDM TDR enabled: false<br />wpf hardware acceleration: disabled by running registry file in nsight common folder<br /><br />I am not able to make other graphics card as headless as I do not have that option in nvidia controller. I have attached few pictures of what I see in the controller screen<br /><br />Please help me as I am trying to use this for a course project...<br /><br />Thank you in advance!]]></description>
   </item>
      <item>
      <title>I can&#039;t debug: Connection to Nsight monitor on &quot;mycomputer&quot; failed</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7691/i-cant-debug-connection-to-nsight-monitor-on-mycomputer-failed</link>
      <pubDate>Sun, 29 Apr 2012 11:57:49 -0400</pubDate>
      <dc:creator>morringo</dc:creator>
      <guid isPermaLink="false">7691@/devforum/discussions</guid>
      <description><![CDATA[Hi everyone, I'm new in Cuda programming and I can't do the example at page: <a href="http://www.pgroup.com/lit/articles/insider/v3n2a3.htm">Example</a>.<br /><br />I've a DELL XPS L501X with Nvidia Optimus, TDR Level to 0, CUDA Toolkit 4.1, Nvidia Parallel Nsight 2.1, Visual Studio 2010 with SP1, WPF is turned off, Aero is turned off, and all steps on the webs that I've founded.<br /><br />I followed the following steps: <a href="http://aresio.blogspot.mx/2012/03/using-nsight-with-gtx-590.html">Using Nsight with a GTX 590</a><br /><br />¿Any idea?<br />Thanks.]]></description>
   </item>
      <item>
      <title>The results generate by code are always different in different compiling process （GTX 590）</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7631/the-results-generate-by-code-are-always-different-in-different-compiling-process-gtx-590</link>
      <pubDate>Thu, 26 Apr 2012 23:11:38 -0400</pubDate>
      <dc:creator>lust0yixiong</dc:creator>
      <guid isPermaLink="false">7631@/devforum/discussions</guid>
      <description><![CDATA[The results generate by code are always different in different compiling process. That means the results will change when i compile the program in different times. I have checked my program in many times and can't find the the mistake. So i wonder do i need some special settings for the GTX 590, because it is a dual-core card i never used in before.<br />]]></description>
   </item>
      <item>
      <title>CUDA debugger locals window shows only first 4 elements of an int array</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7036/cuda-debugger-locals-window-shows-only-first-4-elements-of-an-int-array</link>
      <pubDate>Sat, 14 Apr 2012 14:45:10 -0400</pubDate>
      <dc:creator>tvandervlies</dc:creator>
      <guid isPermaLink="false">7036@/devforum/discussions</guid>
      <description><![CDATA[When I run the "addWithCuda" (from default VS2010 template) with CUDA debugging and put a breakpoint in addKernel.  When the breakpoint is hit the locals can be shown, but from array a, b and c only four elements will be shown. The real size of the arrays is 5 elements.<br /><br />Even when I change the kernel declaration to:<br />__global__ void addKernel(int c[5], const int a[5], const int b[5])<br />Still 4 elements are shown in the locals window in stat of 5 elements.<br /><br />Why isn't the real size of the array taken?<br /><br />]]></description>
   </item>
      <item>
      <title>One question about Parallel Nsight graphics debugging locally</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7341/one-question-about-parallel-nsight-graphics-debugging-locally</link>
      <pubDate>Thu, 19 Apr 2012 16:29:46 -0400</pubDate>
      <dc:creator>swxjs</dc:creator>
      <guid isPermaLink="false">7341@/devforum/discussions</guid>
      <description><![CDATA[My program run on vs2010 have no problem,but when I run the Parallel Nsight to debug the graphics,I get a warning that is "Parallel Nsight Debug Shader Debugging and Pixel History are disabled when running locally."<br /><br />My Parallel Nsight version is 2.1<br />Graphics Driver version is 296.35<br />My notebook is produced by Dell,the model is latitude D630.<br />GPU: Quadro NVS 135M<br />OS: Win7 32-bit.]]></description>
   </item>
      <item>
      <title>Breakpoints not being hit in NSight 2.1(and 2.0)</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/1701/breakpoints-not-being-hit-in-nsight-2-1and-2-0</link>
      <pubDate>Sun, 20 Nov 2011 14:35:35 -0500</pubDate>
      <dc:creator>dblack</dc:creator>
      <guid isPermaLink="false">1701@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I am currently trying to debug some d3d 11shaders, however when I set breakpoints in my project they are not triggered. (The variance shadow mapping sample works fine).<br /><br />However there are a couple of things which differ with my main project and I am not sure if they should work?<br /><br />* Shaders are pre-built, however they have the correct flags set(eg debug, skip opt, prefer flow control). One thing which might cause an issue is that they content isnt built on the host, so the path name supplied when building is not available. Still nsight says it has symbols and when clicking the shader it opens a file in the temp dir.<br /><br />* My solution is loaded over a share from the client, then it is copied back using NSights built in synchronization. (the reason for this funny setup is that I only have one machine with d3d11 capable hardware).<br /><br />FWIW exactly the same thing happens with 2.0. Frame profiling and timing work fine.<br /><br />Thanks,<br /><br />David]]></description>
   </item>
      <item>
      <title>Single GPU debugging for Nsight RC 2.2</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6946/single-gpu-debugging-for-nsight-rc-2-2</link>
      <pubDate>Thu, 12 Apr 2012 00:10:28 -0400</pubDate>
      <dc:creator>wdrozd</dc:creator>
      <guid isPermaLink="false">6946@/devforum/discussions</guid>
      <description><![CDATA[Hello, I notice that in the release notes there is now support for local debugging on a single GPU. Are there instructions on how to set this up? The Nsight 2.2 user guide seemed to refer to the 2 GPU local debugging setups only.]]></description>
   </item>
      <item>
      <title>Parallel Nsight 2.2 Release Candidate 1 is available!</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6936/parallel-nsight-2-2-release-candidate-1-is-available</link>
      <pubDate>Wed, 11 Apr 2012 18:33:03 -0400</pubDate>
      <dc:creator>Rafael Campana</dc:creator>
      <guid isPermaLink="false">6936@/devforum/discussions</guid>
      <description><![CDATA[The NVIDIA Parallel Nsight development team is proud to announce Release Candidate 1 of Parallel Nsight™ 2.2. This new release brings support for single GPU Debugging for CUDA developers on systems equipped with any GPU that supports hardware GPU debugging. For graphics developers, Nsight now supports DirectX 9 in the Frame Debugger, Frame Profiler, Analysis and Nsight HUD.<br /><br />For more information about the release, and how to download it, please visit: <br /><a href="http://developer.nvidia.com/content/nvidia-parallel-nsight-22-rc1-now-available" target="_blank" rel="nofollow">http://developer.nvidia.com/content/nvidia-parallel-nsight-22-rc1-now-available</a> ]]></description>
   </item>
      <item>
      <title>how do we debug cuda in visual studio???</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6451/how-do-we-debug-cuda-in-visual-studio</link>
      <pubDate>Tue, 27 Mar 2012 13:33:58 -0400</pubDate>
      <dc:creator>vishalthelegend</dc:creator>
      <guid isPermaLink="false">6451@/devforum/discussions</guid>
      <description><![CDATA[how do we debug cuda code in visual studio 8.0 as we debug step by step in visual studio using F10...or wats the procedure for step by step execution of cuda code<br /><br />reply is appreciated]]></description>
   </item>
      <item>
      <title>What is the meaning of a cudaUnknownError returned from cudaGraphicsGLRegisterImage?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6416/what-is-the-meaning-of-a-cudaunknownerror-returned-from-cudagraphicsglregisterimage</link>
      <pubDate>Tue, 27 Mar 2012 04:34:23 -0400</pubDate>
      <dc:creator>mkastrop</dc:creator>
      <guid isPermaLink="false">6416@/devforum/discussions</guid>
      <description><![CDATA[Hi there,<br /><br />I have a valid GL_TEXTURE_3D object that is accessible with id nTextureID. Some facts on it:<br />- internal format set to GL_LUMINANCE8_ALPHA8<br />- data format set to GL_LUMINANCE_ALPHA<br />- data type set to GL_UNSIGNED_BYTE<br /><br />When I run this code I get a cudaUnknownError in return:<br /><code><br />cudaGraphicsResource* pGraphicResource = 0;<br />cudaError_t eError = cudaGraphicsGLRegisterImage(&amp;pGraphicResource, nTextureID, GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore); // eError = cudaUnknownError<br /></code><br /><br />As you might infer from the call above I use the CUDA Runtime API. My graphics device is one NVIDIA Quadro 600 that shall be capable of computing capability 2.1 (--&gt; surface writes are supported).<br /><br />An additional short question: Is there the need to call cudaSetDevice and cudaGLSetGLDevice at least once in your code? I tried this but ended up with "all CUDA-capable devices are busy or unavailable" on my first CUDA call (e.g. cudaMalloc). In consequence to this I spared them out, because I could find out that the currently used device remains my Quadro 600 with or without these calls.]]></description>
   </item>
      <item>
      <title>nsight debug session of opengl application missing extensions</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6006/nsight-debug-session-of-opengl-application-missing-extensions</link>
      <pubDate>Fri, 16 Mar 2012 07:03:06 -0400</pubDate>
      <dc:creator>phpfreaked9</dc:creator>
      <guid isPermaLink="false">6006@/devforum/discussions</guid>
      <description><![CDATA[I have an opengl application which I would like to profile with Nsight, during my debug section I noticed, that a lot of extensions are not initialized. For the initialization I use glew and the application works w/o nsight attached. Examples of extensions that remain uninitialized are frame-buffer.<br /><br />I have reinstalled the latest drivers, disabled WDDM TDR. The monitor states it has been properly configured for debugging. However the issue persists. Could anybody lend me a hand with this?  <br /><br />]]></description>
   </item>
      <item>
      <title>Issue reading environment variables on remote machine</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6051/issue-reading-environment-variables-on-remote-machine</link>
      <pubDate>Sat, 17 Mar 2012 14:54:23 -0400</pubDate>
      <dc:creator>diver182</dc:creator>
      <guid isPermaLink="false">6051@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I'm using Parallel Nsight in a remote setup (laptop -&gt; remote machine).<br />Both systems use the same software versions (Cuda Toolkit 4.1/Nsight 2.1/Windows 7)<br />The application requires the use of environment variables to retrieve paths to test data.<br /><br />The following part of a routine causes an assertion failure,<br />which does not appear when executing the application either on the laptop or the remote machine explicitly<br />(meaning without Parallel Nsight, by executing the application directly on either laptop or remote machine):<br /><br /><code><br />char * val;<br />size_t reqSize;<br /><br />// env_var := name of the environment variable to retrieve the value for<br />getenv_s(&amp;reqSize, NULL, 0, env_var); <br /><br />val = (char*)malloc(reqSize * sizeof(char));<br />if (!val)<br />   _RPT0(_CRT_ERROR, "Failed to ...");<br /><br />getenv_s(&amp;reqSize, val, reqSize, env_var);<br />// ...<br /></code><br /><br />The assertion failure states:<br />"File: f:\dd\vctools\crt_bld\self_x86\crt\ src\getenv.c Line:266<br />Expression: (buffer != NULL &amp;&amp; sizeInTChars &gt; 0) || (buffer == NULL &amp;&amp; sizeInTChars == 0)"<br /><br />(Please note that drive f: does not exist on either the laptop or the remote machine).<br /><br />What could be the cause for this behaviour and how would I fix this?<br /><br />Suggestions greatly appreciated. ]]></description>
   </item>
      <item>
      <title>Parallel Nsight Win32 windows xp</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6116/parallel-nsight-win32-windows-xp</link>
      <pubDate>Mon, 19 Mar 2012 18:17:32 -0400</pubDate>
      <dc:creator>gpuhomedev</dc:creator>
      <guid isPermaLink="false">6116@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />Anybody knows where i can find the "Parallel Nsight Win32" for windows xp version or if it's possible to make a hack in order to install "Parallel_Nsight_Win32_2.1.0.12046" on xp (force install)?<br /><br />Thanks,]]></description>
   </item>
      <item>
      <title>Instrumented driver for profiling on Linux</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5256/instrumented-driver-for-profiling-on-linux</link>
      <pubDate>Wed, 29 Feb 2012 05:58:11 -0500</pubDate>
      <dc:creator>thorfdbg</dc:creator>
      <guid isPermaLink="false">5256@/devforum/discussions</guid>
      <description><![CDATA[Dear NVidia team,<br /><br />looking currently into OpenCL development, I'm missing a method for profiling my kernel code which runs slower than expected. The nvvp debugger on Linux is of less help than expected as it cannot collect all necessary data for a full analysis [I get an error saying "CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT"]. I believe this might be because I'm not using an instrumented X driver for my 560GT graphics card. However, the latest available driver I could find, NVPerfKit-Linux-x86_64-195.36.31, does not support the Fermi-based chips. Where would I find newer instrumented drivers and/or a fully functional nvvp profiler?]]></description>
   </item>
      <item>
      <title>Enabling Double Precision on Tesla C2050 Using arch=sm_20</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6101/enabling-double-precision-on-tesla-c2050-using-archsm_20</link>
      <pubDate>Mon, 19 Mar 2012 08:53:09 -0400</pubDate>
      <dc:creator>waltee1000</dc:creator>
      <guid isPermaLink="false">6101@/devforum/discussions</guid>
      <description><![CDATA[Hello All,<br />I tried to declare double in my .cu file and compile the code using -arch=sm_20, the code compiles without any error or warning but when I run the code I get a following message and the run aborts.<br />*** glibc detected *** ./a.out: double free or corruption (out): 0x0000000000a83da0 ***<br />======= Backtrace: =========<br />/lib/libc.so.6[0x7f38fb293928]<br />/lib/libc.so.6(cfree+0x76)[0x7f38fb295a36]<br />./a.out[0x41b744]<br />/lib/libc.so.6(__libc_start_main+0xe6)[0x7f38fb23e1a6]<br />./a.out(__gxx_personality_v0+0xa1)[0x401449]<br />======= Memory map: ========<br /><br />If I replace the double with float everything works fine. the code complies, runs to completion with correct answers.<br /><br />I saw on the internet that the only change needed to make a double data type work is to compile the code with the -arch=sm_20 argument. why is it not working even after doing setting the argument?<br /><br />Regards,<br /><br />Walter   <br /><br />]]></description>
   </item>
      <item>
      <title>A problem with setting up a CUDA card in Windows</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6041/a-problem-with-setting-up-a-cuda-card-in-windows</link>
      <pubDate>Fri, 16 Mar 2012 23:21:19 -0400</pubDate>
      <dc:creator>Echilon</dc:creator>
      <guid isPermaLink="false">6041@/devforum/discussions</guid>
      <description><![CDATA[I have recently become interested in CUDA programming, and decided that I would like to work on it on my desktop. I have originally invested in AMD graphics cards mainly for playing games. Now I have also invested in a GTX 560Ti for CUDA programming.<br /><br />I am currently developing on Windows, and my predicament is that Windows does not recognize the GTX 560Ti. Originally, when I first put in the GTX 560Ti, Windows recognized it, however a program I had installed on the desktop was complaining about my AMD driver, and so I went about fixing it(I had a few issues earlier with the specific graphics card I have and the drivers not playing well together). Once I had fixed the issue, I then turned my attention towards the GTX 560Ti.<br /><br />I placed the GTX 560Ti in the original spot I placed it in and started up the machine. Once booted and logged in, I tried to install WHQL 296.10 and it told me that there was no NVIDIA hardware in the machine. I checked in Windows Device Manager, and sure enough, it did not list the GTX 560Ti. I forced the Device Manager to scan for changes in hardware, but to no avail. I then shut down the machine and tried reseating the card(and checking if I put it in the right slot) multiple times, without success. The question I have is, should I move the GTX 560Ti to the primary PCI-E x16 slot and install the driver, and then put in the original set up, or is there some clever way to get Windows to see that I have the GTX560Ti in the machine?<br /><br />I still plan on using the AMD cards for my graphical heavy lifting and every day use.]]></description>
   </item>
      <item>
      <title>CUDA on GTX 560TI  Launch in Timeout (Watchdog TDR) ... even if I don&#039;t use it as display card ??</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5931/cuda-on-gtx-560ti-launch-in-timeout-watchdog-tdr-even-if-i-dont-use-it-as-display-card-</link>
      <pubDate>Wed, 14 Mar 2012 21:04:16 -0400</pubDate>
      <dc:creator>ericbeaumier</dc:creator>
      <guid isPermaLink="false">5931@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I know the problem of TDR (Watchdog) when running long Parallel jobs in CUDA on GTX GPU...<br /><br />But I have bought a second card (ATI 5870) espacially for my dual monitors and let free my GTX 560TI only to run CUDA GPU jobs.  No cables at all start from my GTX 560TI.<br /><br />I use Cudafy.NET now, and when I look at GPUProperties, I see the only available Device (GTX 560). <br /><br />When I run, I still receive Timeout ...  What can I set to allow my GTX 560 to run longer than the 2-5 seconds limitation?  I have tried the TdrDelay and other parms in registry ... without  really success.  How can we do some special settings only for my GTX560 ... to allow long running jobs?<br /><br /><br /><a href="http://forums.nvidia.com/index.php?showtopic=176465" target="_blank" rel="nofollow">http://forums.nvidia.com/index.php?showtopic=176465</a><br /><a href="http://www.sevenforums.com/crashes-debugging/51028-help-me-configure-registry-correctly-solve-tdr-issue.html" target="_blank" rel="nofollow">http://www.sevenforums.com/crashes-debugging/51028-help-me-configure-registry-correctly-solve-tdr-issue.html</a><br />]]></description>
   </item>
      <item>
      <title>Nsight: how to set texture name?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3126/nsight-how-to-set-texture-name</link>
      <pubDate>Thu, 05 Jan 2012 21:39:43 -0500</pubDate>
      <dc:creator>steel3d</dc:creator>
      <guid isPermaLink="false">3126@/devforum/discussions</guid>
      <description><![CDATA[I'd like to know how to name my resources such that I can see them in nsight captures.<br /><br />Thanks]]></description>
   </item>
      <item>
      <title>NSight: Crashes</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5566/nsight-crashes</link>
      <pubDate>Wed, 07 Mar 2012 10:59:25 -0500</pubDate>
      <dc:creator>MatthiasK</dc:creator>
      <guid isPermaLink="false">5566@/devforum/discussions</guid>
      <description><![CDATA[Hey,<br /><br />I'm trying to use NSight for graphics profiling &amp; debugging (D3D11). I encountered a lot of issues which make NSight somewhat unusable for me at this time:<br /><br />1) Resizing a swap chain does not work with NSight. It fails with a debug message (triggered by IDXGISwapChain::ResizeBuffer):<br /><br />"DXGI Error: Swapchain cannot be resized unless all outstanding buffer references have been released"<br /><br />Without NSight attached everything works/resizes fine.<br /><br />2) The frame profiling tool immediately crashes visual studio when I click on an event. So no way to see or investigate any performance data at all. This is really a pity since it's exactly what I'm after.<br /><br />3) Viewing any kind of buffer (e.g. vertex buffer, rgba8 texture raw memory buffer, ...) crashes visual studio<br /><br />4) Frame timings does not work at all. Error: "Timeout waiting for workload results". No crash.<br /><br />5) Minor: The texture array slider sometimes does not work and always snaps back to level 1. No crash.<br /><br />The chosen target is an (intentionally old) GT8800 on Vista-32. Host running on Win7-64 with Visual Studio 2010.<br /><br />Is there anything I can do to circumvent all those problems? Will a new version address them?<br /><br />Thanks, <br />-Matthias]]></description>
   </item>
      <item>
      <title>CUDA Debugging with Parallel NSight on a single GTX285</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5036/cuda-debugging-with-parallel-nsight-on-a-single-gtx285</link>
      <pubDate>Thu, 23 Feb 2012 09:55:46 -0500</pubDate>
      <dc:creator>leenetherton</dc:creator>
      <guid isPermaLink="false">5036@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />Does anyone know if it is possible to use Parallel NSight to debug code running on a single GTX285? I can't seem to find any resources to confirm that 285s are not supported, but then nothing to say that they are either.<br /><br />I know that NSight requires a GPU not attached to the display to run, and the GTX285 only has a single GPU, but is there some trick (maybe using remote desktop) to get it to work?<br /><br />Thanks in advance.]]></description>
   </item>
      <item>
      <title>Linux NVidia debug enable how</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5186/linux-nvidia-debug-enable-how</link>
      <pubDate>Mon, 27 Feb 2012 04:46:11 -0500</pubDate>
      <dc:creator>tiaanwessels</dc:creator>
      <guid isPermaLink="false">5186@/devforum/discussions</guid>
      <description><![CDATA[Is there some way to get the standard Linux NVidia driver to log verbosely to some file in order to assist with OpenGL development debugging ?]]></description>
   </item>
      <item>
      <title>Access Violation with Similar Shaders</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5731/access-violation-with-similar-shaders</link>
      <pubDate>Fri, 09 Mar 2012 20:01:43 -0500</pubDate>
      <dc:creator>Geometrian</dc:creator>
      <guid isPermaLink="false">5731@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I'm writing a shader in GLSL.  However, when compiling very similar fragment shaders, I get an access violation in nvoglv32.dll!68dbc8cc().  The only change is the shader's source, so I suspect an issue with the GLSL compiler.  Both shaders compile without warnings or errors.<br /><br />The full (unmodified) sources of the two fragment shaders are as follows.  The only changes are within the functions trace<em>n</em>.  Note that I am emulating recursion by manually creating these different functions (the recursion depth is known and not deep)<br /><br />This shader compiles and runs normally.  Note that the variable "count" is unused, and so will be edited away:<br /><code><br />varying vec3 vec_normal_obj;<br />varying vec4 vec_vertex_obj;<br /><br />varying vec3   vec_vertex_eye_i;<br />varying vec3   vec_normal_eye_i;<br />varying vec3  vec_tangent_eye_i;<br />varying vec3 vec_binormal_eye_i;<br /><br />uniform sampler2D   tex2D_1;<br />uniform mat4 model_matrix;<br />vec3 vec_normal_eye;<br />vec3 vec_binormal_eye;<br />vec3 vec_tangent_eye;<br />vec3 vec_vertex_eye;<br />vec3 vec_light_eye;<br />uniform mat4 transform_matrix;<br /><br />uniform vec3 bound_neg; uniform vec3 bound_pos; uniform vec3 camera_position;<br />varying vec3 graph_coord;<br />varying vec3 graph_dir;<br /><br />float sum(vec2 vec) { return vec.x+vec.y; }<br />float sum(vec3 vec) { return vec.x+vec.y+vec.z; }<br />float sum(vec4 vec) { return vec.x+vec.y+vec.z+vec.w; }<br />float sum(mat2 mat) { return sum(mat[0])+sum(mat[1]); }<br />float sum(mat3 mat) { return sum(mat[0])+sum(mat[1])+sum(mat[2]); }<br />float sum(mat4 mat) { return sum(mat[0])+sum(mat[1])+sum(mat[2])+sum(mat[3]); }<br /><br />vec3 trace2(vec3 level_pos, vec3 dir) {<br />	return level_pos;<br />}<br />vec3 trace1(vec3 level_pos, vec3 dir) {<br />	level_pos *= 3.0;<br />	int count = 0;<br />	if (level_pos.x&gt;1.0&amp;&amp;level_pos.x&lt;2.0) ++count;<br />	if (level_pos.y&gt;1.0&amp;&amp;level_pos.y&lt;2.0) ++count;<br />	if (level_pos.z&gt;1.0&amp;&amp;level_pos.z&lt;2.0) ++count;<br />	level_pos = fract(level_pos);<br />	while (true) {<br />		return trace2(level_pos,dir);<br />	}<br />}<br />vec3 trace0(vec3 level_pos, vec3 dir) {<br />	level_pos *= 3.0;<br />	int count = 0;<br />	if (level_pos.x&gt;1.0&amp;&amp;level_pos.x&lt;2.0) ++count;<br />	if (level_pos.y&gt;1.0&amp;&amp;level_pos.y&lt;2.0) ++count;<br />	if (level_pos.z&gt;1.0&amp;&amp;level_pos.z&lt;2.0) ++count;<br />	level_pos = fract(level_pos);<br />	while (true) {<br />		return trace1(level_pos,dir);<br />	}<br />}<br /><br />void main() {<br />	vec4 color = vec4(0.0,0.0,0.0,1.0);<br /><br />	vec_vertex_eye = normalize(-vec_vertex_eye_i);<br />	vec_normal_eye = normalize(vec_normal_eye_i);<br />	vec_binormal_eye = normalize(vec_binormal_eye_i);<br />	vec_tangent_eye = normalize(vec_tangent_eye_i);<br /><br />	vec4  ambient_color = gl_FrontMaterial.ambient;<br />	vec4  diffuse_color = gl_FrontMaterial.diffuse;<br />	vec4 specular_color = gl_FrontMaterial.specular;<br />	vec4 emission_color = gl_FrontMaterial.emission;<br /><br />	color.rgb = vec3(0.5);<br />	vec3 pos = graph_coord;<br />	vec3 dir = normalize(graph_dir);<br /><br />	vec3 result = trace0(pos,dir);<br />	color.rgb = (sum(result)==0.0) ? vec3(1.0,0.0,0.0) : result;<br /><br />	gl_FragData[0] = color;<br />}<br /></code><br />This shader gives the error described.  The "count" variable is now used to return.<br /><code>varying vec3 vec_normal_obj;<br />varying vec4 vec_vertex_obj;<br /><br />varying vec3   vec_vertex_eye_i;<br />varying vec3   vec_normal_eye_i;<br />varying vec3  vec_tangent_eye_i;<br />varying vec3 vec_binormal_eye_i;<br /><br />uniform sampler2D   tex2D_1;<br />uniform mat4 model_matrix;<br />vec3 vec_normal_eye;<br />vec3 vec_binormal_eye;<br />vec3 vec_tangent_eye;<br />vec3 vec_vertex_eye;<br />vec3 vec_light_eye;<br />uniform mat4 transform_matrix;<br /><br />uniform vec3 bound_neg; uniform vec3 bound_pos; uniform vec3 camera_position;<br />varying vec3 graph_coord;<br />varying vec3 graph_dir;<br /><br />float sum(vec2 vec) { return vec.x+vec.y; }<br />float sum(vec3 vec) { return vec.x+vec.y+vec.z; }<br />float sum(vec4 vec) { return vec.x+vec.y+vec.z+vec.w; }<br />float sum(mat2 mat) { return sum(mat[0])+sum(mat[1]); }<br />float sum(mat3 mat) { return sum(mat[0])+sum(mat[1])+sum(mat[2]); }<br />float sum(mat4 mat) { return sum(mat[0])+sum(mat[1])+sum(mat[2])+sum(mat[3]); }<br /><br />vec3 trace2(vec3 level_pos, vec3 dir) {<br />	return level_pos;<br />}<br />vec3 trace1(vec3 level_pos, vec3 dir) {<br />	level_pos *= 3.0;<br />	int count = 0;<br />	if (level_pos.x&gt;1.0&amp;&amp;level_pos.x&lt;2.0) ++count;<br />	if (level_pos.y&gt;1.0&amp;&amp;level_pos.y&lt;2.0) ++count;<br />	if (level_pos.z&gt;1.0&amp;&amp;level_pos.z&lt;2.0) ++count;<br />	if (count&gt;2) return vec3(0.0);<br /><br />	level_pos = fract(level_pos);<br />	while (true) {<br />		return trace2(level_pos,dir);<br />	}<br />}<br />vec3 trace0(vec3 level_pos, vec3 dir) {<br />	level_pos *= 3.0;<br />	int count = 0;<br />	if (level_pos.x&gt;1.0&amp;&amp;level_pos.x&lt;2.0) ++count;<br />	if (level_pos.y&gt;1.0&amp;&amp;level_pos.y&lt;2.0) ++count;<br />	if (level_pos.z&gt;1.0&amp;&amp;level_pos.z&lt;2.0) ++count;<br />	if (count&gt;2) return vec3(0.0);<br /><br />	level_pos = fract(level_pos);<br />	while (true) {<br />		return trace1(level_pos,dir);<br />	}<br />}<br /><br />void main() {<br />	vec4 color = vec4(0.0,0.0,0.0,1.0);<br /><br />	vec_vertex_eye = normalize(-vec_vertex_eye_i);<br />	vec_normal_eye = normalize(vec_normal_eye_i);<br />	vec_binormal_eye = normalize(vec_binormal_eye_i);<br />	vec_tangent_eye = normalize(vec_tangent_eye_i);<br /><br />	vec4  ambient_color = gl_FrontMaterial.ambient;<br />	vec4  diffuse_color = gl_FrontMaterial.diffuse;<br />	vec4 specular_color = gl_FrontMaterial.specular;<br />	vec4 emission_color = gl_FrontMaterial.emission;<br /><br />	color.rgb = vec3(0.5);<br />	vec3 pos = graph_coord;<br />	vec3 dir = normalize(graph_dir);<br /><br />	vec3 result = trace0(pos,dir);<br />	color.rgb = (sum(result)==0.0) ? vec3(1.0,0.0,0.0) : result;<br /><br />	gl_FragData[0] = color;<br />}<br /></code><br /><br />GPU is GeForce GTX 580M, with driver 295.73.<br /><br />Thanks,<br />Ian]]></description>
   </item>
      <item>
      <title>GeForce GTX 560 Ti Clock-Texture Corruption Issue?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/341/geforce-gtx-560-ti-clock-texture-corruption-issue</link>
      <pubDate>Mon, 05 Sep 2011 13:54:29 -0400</pubDate>
      <dc:creator>keiichi25</dc:creator>
      <guid isPermaLink="false">341@/devforum/discussions</guid>
      <description><![CDATA[I do not know where to ask this, so I will put it in general and hope it will be moved to an appropriate forum.  I was wondering if there is any known issues with having the 500 series Geforce cards, namely the GeForce GTX 560 Ti OEM series, having a slight issue with having the card set at Adaptive and running for long periods of time?<br /><br />The reason I am asking, is that I happen to notice with the Alienware provided OEM GeForce GTX 560 Ti card, under Windows 7, if the card is running longer than 20+ hours, the card will invariably have two events happening:<br /><br />1) The driver for the card will experience a Timeout, Driver not responding, but has recovered.<br />2) Any games using the 3D portion of the GPU start to suffer graphically Texture corruptions, such as textures gaining weird colors, textures being stretched/torn from models in weird directions.<br /><br />This happens with all drivers, from 270.66 to 280.26.  Short of putting the card into Performance, which locks the card to its performance clock times (Mind you, I did not tweak any of the card's clock speeds from the way the card was given to me) I was not able to run the card past 20 hours for the OpenGL/Cuda enabled games without some sort of noticeable corruption in the graphics.  The desktop, on the other hand, was functioning without problems.<br /><br />Speaking with nVidia tech support, who had me try the various drivers and also looking at some of the programs running, nothing short of putting the card in performance mode seem to have rectified the problem.<br />]]></description>
   </item>
      <item>
      <title>Please help me with my problem.</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5431/please-help-me-with-my-problem-</link>
      <pubDate>Mon, 05 Mar 2012 09:09:34 -0500</pubDate>
      <dc:creator>chan16</dc:creator>
      <guid isPermaLink="false">5431@/devforum/discussions</guid>
      <description><![CDATA[1.)display driver nvidia windows kernel mode driver version 295.73 stopped responding and has successfully recovered.<br /><br />2.)the nvidia opengl driver lost connection with the display driver due to exceeding the windows time-out<br /><br /><br /><br />my desktop is.<br /><br />AMD Athlon(tm) II X2 250 Processor 3.0 Ghz<br />2 GB RAM<br />500 GB Hard Drive<br />32-bit Operating System Windows7<br />1 GB Video Card NVIDIA GeForce GT220<br /><br /><br /><br /><br />Hope you can help me, and what to do on this problem.<br />]]></description>
   </item>
      <item>
      <title>[BUG] - Interface Block initializers</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5456/bug-interface-block-initializers</link>
      <pubDate>Mon, 05 Mar 2012 10:57:07 -0500</pubDate>
      <dc:creator>nunosilva800</dc:creator>
      <guid isPermaLink="false">5456@/devforum/discussions</guid>
      <description><![CDATA[I stumbled upon this bug, I tried to initialize this uniform block:<br />Code:<br /><a href="/devforum/search?Search=%23version&amp;Mode=like">#version</a> 330<br />layout (std140) uniform Material {<br />	vec4 diffuse = vec4(0.2, 0.2, 0.2 ,1.0);<br />	vec4 ambient;<br />	vec4 specular;<br />	vec4 emissive;<br />	float shininess = 0.5f;<br />	int texCount = 0;<br />};<br /><br />And the program just crashed when calling glLinkProgram.<br />I'm using Opengl 3.3 and GLSL 330.<br />According to GLSLangSpec 3.30.6, this is, in fact, not allowed, but it should produce an error, not crash the application.<br /><br />I tested it with Nvidia driver versions 295.51 and 295.73 32bit and 64 bit (windows).]]></description>
   </item>
      <item>
      <title>NVidial X11 driver resources free on exit</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5251/nvidial-x11-driver-resources-free-on-exit</link>
      <pubDate>Wed, 29 Feb 2012 05:36:11 -0500</pubDate>
      <dc:creator>tiaanwessels</dc:creator>
      <guid isPermaLink="false">5251@/devforum/discussions</guid>
      <description><![CDATA[I can find no concrete guidance on what happens when an application using OpenGL on Linux crashes with regards to its resources on the GPU. Will the driver automatically free allocated textures and display lists that occupied memory on the GPU if the application crashes and had no opportunity to free these or is there something one needs to do afterwards to get memory released e.g. command-line utility ?<br /><br />The reason I'm asking is that sometimes during development, restarting the app after a crash, the app fails to have its graphics run accelerated. Its as if software rendering occurs. Even a couple of restarts sometimes don't do the trick and all that helps is to wait a couple of minutes and try again. Its like some maintenance garbage collection in the driver needs to run. I'm using RHEL5 64-bit with a Quadro FX 550 and the latest NVidia drivers.]]></description>
   </item>
      <item>
      <title>glGenLists intermittent problem on Linux</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5181/glgenlists-intermittent-problem-on-linux</link>
      <pubDate>Mon, 27 Feb 2012 03:15:58 -0500</pubDate>
      <dc:creator>tiaanwessels</dc:creator>
      <guid isPermaLink="false">5181@/devforum/discussions</guid>
      <description><![CDATA[I am experiencing a problem which I could not solve after lots of head scratching. glGenLists on Linux RHEL5 64-bit with the latest NVidia driver is sometimes returning 0 without setting any GL error code (glGetError returns no error). I have verified 100% that there is no list nesting occurring by accident and it is not called inside direct mode glBegin/End. I have used gDEBugger and at the time when this happens, the debugger shows not a single trace of any problem even though all type of breakpoints are enabled. Any ideas on how I can try and resolve this ?]]></description>
   </item>
      <item>
      <title>NVIDIA Parallel Nsight 2.1 Release Candidate 2 now available!</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/2171/nvidia-parallel-nsight-2-1-release-candidate-2-now-available</link>
      <pubDate>Mon, 05 Dec 2011 19:16:15 -0500</pubDate>
      <dc:creator>Sebastien Domine</dc:creator>
      <guid isPermaLink="false">2171@/devforum/discussions</guid>
      <description><![CDATA[<br /><p class="MsoNormal">NVIDIA Parallel Nsight 2.1 Release Candidate 2 now available! </p><br /><p class="MsoNormal">Dear Parallel Nsight User,</p><br /><p class="MsoNormal">Building on the NVIDIA Parallel Nsight™ 2.1 Release Candidate 1 release with multiple bug fixes and stability improvements, we are proud to announce the release of <b>NVIDIA Parallel Nsight™ 2.1 Release Candidate 2</b>. This release<br />brings support for the new <b>CUDA Toolkit 4.1 </b>Release Candidate 2, which can be downloaded under the CUDA Registered Developer Program (<a href="http://www.developer.nvidia.com/join">www.developer.nvidia.com/join</a>). Parallel Nsight 2.1 adds a number of new features to enhance debugging and profiling capabilities. </p><br /><p class="MsoNormal">This release requires <b>NVIDIA Display Driver Release 285.86</b>, available on the same download site. </p><br /><ul style="list-style-type:disc;margin-top:0in;"><li class="MsoNormal" style="margin-bottom:.0001pt;"> Traced workloads can now <b>navigate the dependencies and call stack</b> to allow the developer to follow through GPU workloads, corresponding API calls and host code that was the cause of the activity.</li><li class="MsoNormal" style="margin-bottom:.0001pt;"><b>CUDA warp watch</b> visualizes variables and expressions across an entire CUDA warp.</li><li class="MsoNormal"></li></ul>]]></description>
   </item>
      <item>
      <title>OptiX exception: Insufficient device memory</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5226/optix-exception-insufficient-device-memory</link>
      <pubDate>Tue, 28 Feb 2012 12:13:16 -0500</pubDate>
      <dc:creator>wonderboy2005</dc:creator>
      <guid isPermaLink="false">5226@/devforum/discussions</guid>
      <description><![CDATA[I'm running into an issue that hasn't presented itself prior to OptiX 2.5.  For a large scene with a large amount of data stored on GPU, I'm getting the following exception at runtime:<br />Unknown error (Details: Function "RTresult _rtContextCompile(RTcontext_api*)" caught exception: Insufficient device memory. GPU does not support paging., [1574505])<br />This happens when the context is compiled.  <br /><br />As the exception states, I am probably attempting to use memory that the GPU doesn't have.  However, this was not a problem with previous versions of OptiX.  Compiling the exact same code against OptiX 2.1.1 allows the program to run without issue.  I'm also fairly sure its not just a matter of OptiX 2.5 using a bit more memory, putting me over the edge.  I have some debugging code which allocates ~1600MB of data for this scene, which I toggle on/off at compile time.  With OptiX 2.1.1, this scene still runs with that additional overhead.  <br /><br />I'm working with two GPUs - a GTX480 and a GTX580.  Both have 1535MB of memory.  <br /><br />Correct me if I'm wrong, but I believe that the entire context must fit in the memory of each device used.  If that is the case, then my debugging code alone should force the use of paging or something similar.  If that is the case, running this scene with my debugging code on using OptiX 2.1.1 must result in paging.  <br /><br />So my question is this:  has something changed with OptiX 2.5 which would result in this behavior?  It seems like paging is what I'm looking for.  Is there any way to use paging with these GPUs?  <br /><br />Thanks in advance.]]></description>
   </item>
      <item>
      <title>the nsight debugger can`t hit break point in kernel, while the program runs correctly at full speed</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5171/the-nsight-debugger-cant-hit-break-point-in-kernel-while-the-program-runs-correctly-at-full-speed</link>
      <pubDate>Sun, 26 Feb 2012 21:04:02 -0500</pubDate>
      <dc:creator>fisher5595</dc:creator>
      <guid isPermaLink="false">5171@/devforum/discussions</guid>
      <description><![CDATA[This problem drives me crazy. I`m using nsight 2.1 with gtx560ti*2. when i was debugging using the vs2010debug, the program appeared to be normal, even the final result was correct. But when i use the nisight cuda debugger, it can not even hit the breakpoint in the kernel. I have to delete and add the code line by line to check where the problem happens. <br /><br />I find out that when i delete the host code which uses some class to read bmp file, the nsight debugger can hit the breakpoint in kernel.<br /> <br />I`m quite sure that all the settings are correct, including the nsight property, disabling the sli, and the woking directory. Otherwise, the nsight debugger would`t work correctly when i wrote the basic codes. When i used the basic codes of my project, the nsight cuda debugger could hit breakpoint in the kernel. However, when i add some codes in host codes which used the a class defined in the host code to read data from image files, the nsight cuda debugger was abnormal, and couldn`t hit the break point in the kernel. The cuda debugger just popped the two messages at the same time as shown bellow in the image. Even when i started a analysis activity, it reported some error about no events written to file.<br />Analysis session created.<br />Connection state changed to Connecting.<br />Connection state changed to Connected.<br />Physical devices detected:<br />- GeForce GTX 560 Ti (WDDM)<br />- GeForce GTX 560 Ti (WDDM)<br />Session state changed to Running.<br />Application state changed to Suspended.<br />Capture state changed to Started.<br />Application state changed to Running.<br />Application state changed to None.<br />Capture state changed to Stopped.<br />Session state changed to None.<br />Info : Loading files at c:\temp\rndsearch120225_000\rndsearch120225_000_Capture_000<br />Info : Loading c:\temp\rndsearch120225_000\rndsearch120225_000_Capture_000\rndsearch120225_000_Capture_000.nvact with the NV Activity loader<br />Info : Loaded c:\temp\rndsearch120225_000\rndsearch120225_000_Capture_000\rndsearch120225_000_Capture_000.nvact in 0.0264042 s<br />Info : Loading c:\temp\rndsearch120225_000\rndsearch120225_000_Capture_000\rndsearch120225_000_Capture_000.nvreport with the NV Report loader<br />Info : Loaded c:\temp\rndsearch120225_000\rndsearch120225_000_Capture_000\rndsearch120225_000_Capture_000.nvreport in 0.0023158 s<br />Info : Loading c:\temp\rndsearch120225_000\rndsearch120225_000_Capture_000\rndsearch120225_000_Capture_000.nvevents with the NV Events loader<br />Error : No events written to file.<br />Info : <br />------------------------<br /><br />Statistics for NV Events<br /><br />------------------------<br /><br />Info : Loaded c:\temp\rndsearch120225_000\rndsearch120225_000_Capture_000\rndsearch120225_000_Capture_000.nvevents in 0.0870916 s<br />Info : Post-processing data...<br />Info : Post-processed data in 0.0002522 s<br />Info : Resolving module information....<br />Info : Resolved module information in 1.38E-05 s.<br />Session state changed to Running.<br />Application state changed to Suspended.<br />Capture state changed to Started.<br />Application state changed to Running.<br />Application state changed to None.<br />Capture state changed to Stopped.<br />Session state changed to None.<br />Info : Loading files at c:\temp\rndsearch120225_001\rndsearch120225_001_Capture_000<br />Info : Loading c:\temp\rndsearch120225_001\rndsearch120225_001_Capture_000\rndsearch120225_001_Capture_000.nvact with the NV Activity loader<br />Info : Loaded c:\temp\rndsearch120225_001\rndsearch120225_001_Capture_000\rndsearch120225_001_Capture_000.nvact in 0.0259195 s<br />Info : Loading c:\temp\rndsearch120225_001\rndsearch120225_001_Capture_000\rndsearch120225_001_Capture_000.nvreport with the NV Report loader<br />Info : Loaded c:\temp\rndsearch120225_001\rndsearch120225_001_Capture_000\rndsearch120225_001_Capture_000.nvreport in 0.0002246 s<br />Info : Loading c:\temp\rndsearch120225_001\rndsearch120225_001_Capture_000\rndsearch120225_001_Capture_000.nvevents with the NV Events loader<br />Error : No events written to file.<br />Info : <br />------------------------<br /><br />Statistics for NV Events]]></description>
   </item>
      <item>
      <title>Typecasting to custom struct in kernel</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5161/typecasting-to-custom-struct-in-kernel</link>
      <pubDate>Sun, 26 Feb 2012 17:17:30 -0500</pubDate>
      <dc:creator>avion85</dc:creator>
      <guid isPermaLink="false">5161@/devforum/discussions</guid>
      <description><![CDATA[Since this is my first post, greetings to everyone.<br /><br />I've encountered a problem regarding casting to a custom struct in a kernel, hopefully someone else was in the same situation.<br /><br />I'm passing as a parameter into a kernel from a .cu file a large array which I would like to cast into a struct and access as an array of structures.<br /><br />pseudo-code:<br /><br />kernels.cu (with nvcc)<br /><code><br />struct myMatrix<br />{<br />	float e[6];<br />};<br />__global__ myKernel(float *raw, myMatrix *p){<br /> myID = int me_idx = blockIdx.x * blockDim.x + threadIdx.x;<br /><br /> myMatrix m = p[myID];	  //does not work - "???" in nsight for all values <br /><br /> myMatrix n =((myMatrix *)raw)[myID];     //does not work also - "???"<br /><br /> float a = raw[0];    //works and I get correct single float values, but unstructured<br /><br /> float 4 b = ((float4*)raw)[0];  //works and I get correct tuples<br /><br />//what I want:<br />Matrix m = p[myID];<br />float something = m.e[3];<br />}<br /></code><br /><br /><br />main.cu (with microsoft c compiler)<br /><code><br />float *p = [large array];<br />myKernel&lt;&lt;&lt;block,thread&gt;&gt;&gt;(p,(myMatrix*)p);<br /></code><br /><br />I am using Parallel Nsight to inspect the values and what I get is "???" while stepping through the program. I have never had problems if I use the built-in types like float4. However,  I would of course, like to have my own structures working properly.<br />Maybe the problem is in the alignment? If so, to which value to I align? <br /><br />Appreciate the help.<br />Avion<br /><br />PS.Working with Visual Studio, everything is 64bit.<br /><br />EDIT: added another example that works.]]></description>
   </item>
      <item>
      <title>NSIGHT usage</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5041/nsight-usage</link>
      <pubDate>Thu, 23 Feb 2012 10:15:45 -0500</pubDate>
      <dc:creator>IndrajeetK</dc:creator>
      <guid isPermaLink="false">5041@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I am using Visual Studio 2010 and geforce 525 for running cuda programs.<br /><br />I am using 4.1 sdk.<br /><br />whenever I try to run the debugger i get the error shown in the picture below<br /><br />what might be the problem ?<br /><br />please reply thank you.]]></description>
   </item>
      <item>
      <title>SceniX, OptiX and Entry Points, help!</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4666/scenix-optix-and-entry-points-help</link>
      <pubDate>Mon, 13 Feb 2012 01:26:56 -0500</pubDate>
      <dc:creator>jules123</dc:creator>
      <guid isPermaLink="false">4666@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />In SceniX Viewer, I would like to have a background environment map for the OptiX renderer.<br /><br />I followed the RTFx.pdf and amended ao_ray_generation_cuda.rtfx file (a file from another SceniX sample) to test, the Miss program:<br /><br /><br />RT_PROGRAM void path_miss_radiance()<br />{<br />  thePrd.color = make_float4(1.0f, 0.0f, 0.0f, 1.0f);<br />}<br /><br /><br />I ran rtfxc.exe to compile the ao_ray_generation_cuda.rtfx to o_ray_generation_ptx.rtfx. Then ran bin2c to convert ao_ray_generation_ptx.rtfx to ao_ray_generation_ptx.inc<br /><br />Then guess work set in after that. I looked at the QtAmbientOcclusion sample for inspiration, that uses a variable type RTFxSceneAttributeSharedPtr to set to the RTFx program. Unfortunately SceniX Viewer has no such variable.<br /><br /><br />How do I get SceniX to use my own compiled RTFx OptiX entry point programs in SceniX?<br /><br />Thanks for any help,<br />Jules]]></description>
   </item>
      <item>
      <title>Could someone please help me compile this CUDA code that works?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4646/could-someone-please-help-me-compile-this-cuda-code-that-works</link>
      <pubDate>Sun, 12 Feb 2012 18:30:07 -0500</pubDate>
      <dc:creator>doodles</dc:creator>
      <guid isPermaLink="false">4646@/devforum/discussions</guid>
      <description><![CDATA[Hey everyone, I am fairly new to GPU programming, and have spent many fruitless hours trying to compile some dimension reduction code that uses CUDA and Matlab. I was wondering if anyone here who already has a system configured with Matlab and CUDA would be willing to quickly build the source  for me (which is a 255k zip file) and send me a compiled executable. I have corresponded with the author of the algorithm, and while he couldn't help me with the compiling, he also expressed an interest in having the compiled version to offer to others on his website-- so you would be doing a service to the community of researches in the machine learning community. Here is a link to the file: <br /><br />where it says "CUDA implementation":<br /><a href="http://homepage.tudelft.nl/19j49/t-SNE.html" target="_blank" rel="nofollow">http://homepage.tudelft.nl/19j49/t-SNE.html</a><br /><br />In case the above link isn't working:<br /><a href="http://db.tt/B4Wq6P6P" target="_blank" rel="nofollow">http://db.tt/B4Wq6P6P</a><br /><br />Thanks-- any help will be greatly appreciated! ]]></description>
   </item>
      <item>
      <title>geforce 8400 gs black_bar artifacts in old games (Dangerous Waters, Falcon Bms4.32)</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4706/geforce-8400-gs-black_bar-artifacts-in-old-games-dangerous-waters-falcon-bms4-32</link>
      <pubDate>Mon, 13 Feb 2012 21:35:39 -0500</pubDate>
      <dc:creator>mstram</dc:creator>
      <guid isPermaLink="false">4706@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />In all of the u.i. pages of Dangerous Waters 1.04 and Falcon Bms 4.32 I get thin horizontal bars (see attached image).<br /><br />They are not in the "main sim window" ("3d" fullscreen )<br /><br />I'm running Win xp sp3, and nvidia drivers ver 6.14.12.7061 (270.61).   DirectX 9.0c (4.09.0000.0904), on an HP DC5000 / 1gig / and the geforce 8400 gs.   I also tried installing older drivers (169.21), but the problem remains<br /><br />I'm downloading the Perfkit to see if it will show anything, but having not used it before, even if  it does show "something", I'm wondering how that will translate to a particular driver version / setting.<br /><br />Does anybody here know what the particular DirectX  program call / technique that is being used to cause this ? <br /><br />The card works fine with many other graphic intensive applications (xplane, iRacing, Orbiter, Blender, ClearviewRC...etc)<br /><br />Mike<br /><br /><img src="http://mstram.webfactional.com/webb/dw/dangWaters.jpg" alt="84000 gs artifacts" /><br /><br />]]></description>
   </item>
      <item>
      <title>Can&#039;t save A8 format texture with PerfHUD 6.7</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4476/cant-save-a8-format-texture-with-perfhud-6-7</link>
      <pubDate>Thu, 09 Feb 2012 03:59:41 -0500</pubDate>
      <dc:creator>ZorbaTHut</dc:creator>
      <guid isPermaLink="false">4476@/devforum/discussions</guid>
      <description><![CDATA[I'm attempting to debug a texture creation process that generates A8 format textures, i.e. alpha channel only. PerfHUD refuses to save these textures or inspect them in any useful way - I can see the proper thumbnail when setting the Frame Debugger's texture viewer to Alpha or ARGB, but zooming in results in a black square and the "save texture" command does nothing. (Saving standard A8R8G8B8 textures works fine.)<br /><br />Suggestions?]]></description>
   </item>
      </channel>
</rss>
