<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
      <title>Tagged with gpu-computing-sdk - NVIDIA Developer Forums</title>
      <link>http://forums.developer.nvidia.com/devforum/discussions/tagged/gpu-computing-sdk/feed.rss</link>
      <pubDate>Wed, 16 May 12 17:32:11 -0400</pubDate>
         <description>Tagged with gpu-computing-sdk - NVIDIA Developer Forums</description>
   <language>en-CA</language>
   <atom:link href="/devforum/discussions/taggedgpu-computing-sdk/feed.rss" rel="self" type="application/rss+xml" />
   <item>
      <title>Variables have &quot;no value&quot; in NSight</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8256/variables-have-no-value-in-nsight</link>
      <pubDate>Tue, 15 May 2012 20:32:56 -0400</pubDate>
      <dc:creator>robosmith</dc:creator>
      <guid isPermaLink="false">8256@/devforum/discussions</guid>
      <description><![CDATA[Several variables which are in scope at the breakpoint are shown as "no value at the target location." Sometimes they have a value at the assignment point, but change to "no value" after executing another line. How can I see these values?<br /><br />This is using Dev Kit v4.2 &amp; NSight v2.2 release with VS2008 on GTX 525 with Optimus.]]></description>
   </item>
      <item>
      <title>simpleStreams example in SDK not working</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8101/simplestreams-example-in-sdk-not-working</link>
      <pubDate>Sat, 12 May 2012 03:36:29 -0400</pubDate>
      <dc:creator>madhur13490</dc:creator>
      <guid isPermaLink="false">8101@/devforum/discussions</guid>
      <description><![CDATA[I've installed CUDA 4.1 GPUComputingSDK and GPUComputing toolkit. I'm trying to see performance improvement for simpleStreams example given in src folder but it seems there is some problem in new version. Streamed version is consistently taking more time than non-streamed version. I've no modified code. It seems there is some bug new examples.]]></description>
   </item>
      <item>
      <title>Trouble with processing image in rows</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8121/trouble-with-processing-image-in-rows</link>
      <pubDate>Sun, 13 May 2012 05:29:07 -0400</pubDate>
      <dc:creator>laz007</dc:creator>
      <guid isPermaLink="false">8121@/devforum/discussions</guid>
      <description><![CDATA[Hello!<br />I'm making an image filter that is processing the image in rows.<br />Two weeks I'm trying to figure out why it's not working when executed in parallel.<br />I use only threads in the Y dimension. Is that a problem?<br /><br /><br /><br />Here is part of the code:<br />BLOCKDIM_Y=16;<br />....<br />dim3 threads(1, BLOCKDIM_Y);<br />dim3 grid(1,  iDivUp(h, BLOCKDIM_Y));<br /><br />my_CUDA_filter&lt;&lt;&lt; grid, threads&gt;&gt;&gt;(sumR, sumG, sumB, mask,h,w, inD, outD, test);<br />...<br /><br />__global__ void my_CUDA_filter_simple222(int* sumR, int* sumG, int* sumB, int mask,int h,int w, u_int8_t *in, u_int8_t *out, int* test){<br />...<br />int iy = blockDim.y * blockIdx.y + threadIdx.y;<br />int ix=0;<br /><br />	if (iy&gt;=m &amp;&amp; iy&lt;(h-m)) {<br /><br />	//for(iy=m; iy&lt;h-m; iy++){<br /><br />	 ...<br />	for(ix=m+1;ix&lt;w-m;ix++){<br />	 ...<br />	 }<br />}<br /><br />The result image is messed up...<br />If I use for(iy=m; iy&lt;h-m; iy++){ <br />and run the kernel with one single thread (that means there is no parallelization) everything is OK.<br /><br />Any ideas?<br /><br />]]></description>
   </item>
      <item>
      <title>Linker error with c function in .cu file</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7881/linker-error-with-c-function-in-cu-file</link>
      <pubDate>Fri, 04 May 2012 16:08:40 -0400</pubDate>
      <dc:creator>basementscientist</dc:creator>
      <guid isPermaLink="false">7881@/devforum/discussions</guid>
      <description><![CDATA[I've created a kernel inside a .cu file. Also inside the .cu file is a c++ function that calls<br />the kernal. Everything compiles ok, but on the final linking, the c++ function is not visible to the rest of the program. How do I make the function visible?<br /><br />I am using Visual Studio 2010 on Windows 8, and the newest SDK and Toolkit.]]></description>
   </item>
      <item>
      <title>npp problems</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7321/npp-problems</link>
      <pubDate>Thu, 19 Apr 2012 11:05:15 -0400</pubDate>
      <dc:creator>lancewellspring</dc:creator>
      <guid isPermaLink="false">7321@/devforum/discussions</guid>
      <description><![CDATA[I have 2 problems.<br />1) Function call to <code>nppiGetAffineTransform</code> is returning NPP_AFFINE_QUAD_INCORRECT_WARNING.<br /><em>parameter srcRoi is:</em><br />x	0	int<br />y	0	int<br />width	5000	int<br />height	5000	int<br /><em>parameter quad is:</em><br />[0]	0x00000000002af1d0	double [2]<br />	[0]	0.00000000000000000	double<br />	[1]	102.69965808786287	double<br />[1]	0x00000000002af1e0	double [2]<br />	[0]	5128.9289884048958	double<br />	[1]	0.00000000000000000	double<br />[2]	0x00000000002af1f0	double [2]<br />	[0]	5230.9576202374628	double<br />	[1]	5149.2023110261380	double<br />[3]	0x00000000002af200	double [2]<br />	[0]	102.53232406430637	double<br />	[1]	5251.8818716857804	double<br /><br />Does the function expect the points of quad in a specific order?  Right now they are: topleft, topright, botleft, botright.<br /><br />2) Function call to <code>nppiWarpAffine_8u_C3R</code> is returning NPP_STEP_ERROR.<br /><em>parameter pSrc is 75000000 bytes.</em> <br /><em>parameter srcSize is:</em><br />width	5000	int<br />height	5000	int<br /><em>parameter nSrcStep is 15000.</em> <br /><em>parameter srcRoi is:</em><br />x	0	int<br />y	0	int<br />width	5000	int<br />height	5000	int<br /><em>parameter pDst is 82419636 bytes.</em><br /><em>parameter nDstStep is 15693.</em><br /><em>parameter dstRoi is:</em><br />x	0	int<br />y	0	int<br />width	5231	int<br />height	5252	int<br /><em>parameter coeffs is:</em><br /><br />coeffs	0x00000000002af328	double [2][3]<br />[0]	0x00000000002af328	double [3]<br />	[0]	1.0259909958801552	double<br />	[1]	0.020409808328179044	double<br />	[2]	0.00000000000000000	double<br />[1]	0x00000000002af340	double [3]<br />	[0]	-0.020544040425657707	double<br />	[1]	1.0300464714995274	double<br />	[2]	102.69965808786287	double<br /><em>parameter interpolation is NPPI_INTER_CUBIC.</em><br /><br />I dont have any idea what is going wrong here.<br /><br />Any help is greatly appreciated!<br /><br />I'm running on a Windows 7 machine, with a Quadro FX 1800M, using Visual Studio 2010.  Running the basic cuda examples works just fine.]]></description>
   </item>
      <item>
      <title>Cuda Kernels Stop Running After Few Iterations</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7381/cuda-kernels-stop-running-after-few-iterations</link>
      <pubDate>Sat, 21 Apr 2012 14:15:31 -0400</pubDate>
      <dc:creator>Eman</dc:creator>
      <guid isPermaLink="false">7381@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />I am writing a code that calls a number of kernels inside a for loop. The number of the loop iterations is 1000. When I run the program, the kernels stop running after a number of iterations. I tried to use cudaGetLastError(); but it didn't give me any information as the output was "Error: unknown error". AS I increase the size of the blocks and the number of threads, the kernels stop running sooner. For example, when the block size is 8 it stopped at iteration 740, while when the size of the block is 16, it stopped at iteration 440.  In each iteration the same resources is being re-used so I really don't understand what is the problem!. <br /><br />Any help will be appreciated. <br /><br />Thanks, <br /> ]]></description>
   </item>
      <item>
      <title>cuda measure execution time</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7811/cuda-measure-execution-time</link>
      <pubDate>Thu, 03 May 2012 01:25:57 -0400</pubDate>
      <dc:creator>vlbthambawita</dc:creator>
      <guid isPermaLink="false">7811@/devforum/discussions</guid>
      <description><![CDATA[How to measure execution time of cuda program? <br />what is the wrong with following code? it always return (-) values as the result? why?<br /><br />	 cudaEvent_t s1,e1;<br />	float time;<br />	cudaEventCreate(&amp;s1);<br />	cudaEventCreate(&amp;e1);<br />	cudaEventRecord(s1,0);<br /><br /><del></del> kernel&lt;&lt;&lt;&gt;&gt;&gt;<br /><br />        cudaEventSynchronize(e1);<br />	cudaEventElapsedTime(&amp;time,s1,e1);<br />      printf("time=%f\n",time);]]></description>
   </item>
      <item>
      <title>linking/make error while compiling SDK on Ubuntu 11.10</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3486/linkingmake-error-while-compiling-sdk-on-ubuntu-11-10</link>
      <pubDate>Sun, 15 Jan 2012 19:22:44 -0500</pubDate>
      <dc:creator>boerd</dc:creator>
      <guid isPermaLink="false">3486@/devforum/discussions</guid>
      <description><![CDATA[I got GLU, glut installed:<br /><br />ldconfig -p | grep -i glu<br />	libglut.so.3 (libc6,x86-64) =&gt; /usr/lib/libglut.so.3<br />	libglut.so (libc6,x86-64) =&gt; /usr/lib/libglut.so<br />	libGLU.so.1 (libc6,x86-64) =&gt; /usr/lib/x86_64-linux-gnu/libGLU.so.1<br />	libGLU.so (libc6,x86-64) =&gt; /usr/lib/x86_64-linux-gnu/libGLU.so<br /><br />I got all the libraries mentioned in the doc installed:<br />sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev <br /><br />Now I get this error while trying to compile the SDK:<br /><br />../../lib/librendercheckgl_x86_64.a(rendercheck_gl.cpp.o): In function `CheckBackBuffer::checkStatus(char const*, int, bool)':<br />rendercheck_gl.cpp:(.text+0x119b): undefined reference to `gluErrorString'<br />collect2: ld returned 1 exit status<br /><br />Used nm to make sure libGLU.so contains this symbol:<br />nm -D /usr/lib/x86_64-linux-gnu/libGLU.so | grep gluErrorString<br />00000000000048b0 T gluErrorString<br />echo $LD_LIBRARY_PATH<br />/usr/lib/x86_64-linux-gnu/]]></description>
   </item>
      <item>
      <title>GPU Direct supported boards</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7571/gpu-direct-supported-boards</link>
      <pubDate>Thu, 26 Apr 2012 03:56:30 -0400</pubDate>
      <dc:creator>reubensant</dc:creator>
      <guid isPermaLink="false">7571@/devforum/discussions</guid>
      <description><![CDATA[Hello<br /><br />I spoke to an nvidia representative at the NAB about the gpudirect technology and its support by io card manufacturers.<br /><br />We currently use blackmagic cards without gpu direct. Support on blackmagic is coming soon (at least they say).  <br /><br />Anyone has experience on other boards? (AJA, Blackmagic Design, Bluefish 444, Deltacast, DVS and Matrox).<br /><br />I'm interested in bluefish. Has anyone had any opportunity comparig them with blackmagic?<br /><br />Thanks and Regards<br /><br />Reuben Sant<br />iMedia Ltd. ]]></description>
   </item>
      <item>
      <title>nvcc 4.2; a cicc and gcc preprocessing issue</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7441/nvcc-4-2-a-cicc-and-gcc-preprocessing-issue</link>
      <pubDate>Mon, 23 Apr 2012 16:23:17 -0400</pubDate>
      <dc:creator>dlowell</dc:creator>
      <guid isPermaLink="false">7441@/devforum/discussions</guid>
      <description><![CDATA[After upgrading to SDK 4.2 for some reason when I am building my library I now get this error below:<br /><br /><br /><br /><code>#$ cicc  -arch compute_20 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 -g -O0 "/tmp/tmpxft_00002684_00000000-10_vecgpu" "/tmp/tmpxft_00002684_00000000-7_vecgpu.cpp3.i"  -o "/tmp/tmpxft_00002684_00000000-2_vecgpu.ptx"<br />&lt;built-in&gt;(2): error: "__STDC_HOSTED__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(8): error: "__WCHAR_TYPE__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(115): error: "__x86_64" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(116): error: "__x86_64__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(126): error: "__linux__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(128): error: "__unix__" is predefined; attempted redefinition ignored<br /><br />6 errors detected in the compilation of "/tmp/tmpxft_00002684_00000000-7_vecgpu.cpp3.i".<br /># --error 0x1 --</code><br /><br /><br /><br /><br />My gcc version is 4.4 though I've attempted this on 4.3<br />I am not sure why it is getting caught on this. If the redefinition is being ignored, why is it throwing an error and stopping compilation at all? Additionally the NVCC doc still mentions cicc as nvopencc, and in fact nowhere mentions cicc.<br /><br />Has anyone else had this issue? Any tips would be greatly appreciated.]]></description>
   </item>
      <item>
      <title>Why is a GTX680 even slower than a GTX480 when using CUDA?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7181/why-is-a-gtx680-even-slower-than-a-gtx480-when-using-cuda</link>
      <pubDate>Tue, 17 Apr 2012 07:39:41 -0400</pubDate>
      <dc:creator>nepluno</dc:creator>
      <guid isPermaLink="false">7181@/devforum/discussions</guid>
      <description><![CDATA[I've tested several Apps in the GPU Computing SDK, such as the GrabCutNPP. Surprisingly I found the  GTX680 is even slower than my old GTX480 (about 0.9x). Why could this happen? In contrast, the test on 3DMark11 reported that the GTX680 is 2x faster.<br /><br />The installed driver is 301.10, with a CUDA Toolkit 4.26. My OS is Windows 7 SP1. I even compile the code using compute_30 and sm_30, but the result kept the same.<br /><br />ps: I couldn't find a developer version driver that supports GTX680.]]></description>
   </item>
      <item>
      <title>cuda-jobs  GPGPU Engineer Position Medical Imaging - Santa Clara, CA</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7216/cuda-jobs-gpgpu-engineer-position-medical-imaging-santa-clara-ca</link>
      <pubDate>Tue, 17 Apr 2012 19:17:33 -0400</pubDate>
      <dc:creator>Hologic Jobs</dc:creator>
      <guid isPermaLink="false">7216@/devforum/discussions</guid>
      <description><![CDATA[Summary of Duties and Responsibilities <br />•	Port advanced image processing / computer vision algorithms for medical imaging to the CUDA architecture<br />•	Analyze requirements, design and implement software components. Responsible for official builds/releases of image processing algorithms<br />•	Build/own SW tools (image visualization, Neural Network training, truth marking/scoring) used by the scientists<br />•	Trouble-shoot system issues and software bugs <br />•	Evaluate technical options and provide recommendation for solution <br />•	Provide technical support to other groups <br />•	May provide technical leadership on projects or on specific components of projects <br />•	May contribute to the intellectual position of the company through invention and patent applications<br />•	Other duties as assigned<br /><br />Qualifications <br />•	Exceptional working knowledge of the CUDA framework and technology is required<br />•	Parallel Programming experience is required<br />•	Working Knowledge of .NET 2.0, C#, C++ is required<br />•	Full understanding of object-oriented design and architecture is required<br />•	Exceptional communication skills that demonstrate understanding of complex technical details, clarity of thought, and the ability to persuade others<br />•	Pragmatic approach to development that balances the technical approach with business objectives and user needs<br />	<br />Education <br />•	B.S in Computer Science, Engineering or related discipline <br /><br />Experience	<br />•	Experience with CUDA GPGPU development required.<br /><br />Specialized Knowledge<br />•	DICOM knowledge or medical imaging a plus <br /><br />Please send resume to Deanna at Deanna.Tone@hologic.com]]></description>
   </item>
      <item>
      <title>Kernel invocation line in C++ error &quot;no global operator found&quot; (Parallel Nsight)</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5711/kernel-invocation-line-in-c-error-no-global-operator-found-parallel-nsight</link>
      <pubDate>Fri, 09 Mar 2012 09:54:24 -0500</pubDate>
      <dc:creator>wdrozd</dc:creator>
      <guid isPermaLink="false">5711@/devforum/discussions</guid>
      <description><![CDATA[For some reason when trying to invoke my kernel like this:<br /><br />EvaluateKernel&lt;&gt;(param_a, param_b, param_c);<br /><br />I get these errors:<br /><br />error C2677: binary '&lt;&lt;' : no global operator found which takes type 'dim3' (or there is no acceptable conversion)<br /><br />error C2297: '&gt;&gt;' : illegal, right operand has type 'float *'<br /><br />BTW param_a is a float*<br /><br />I have declared my Kernel using extern "C" at the beginning of the C++ file, but it seems my code is not recognizing the cuda code? My Cuda code is definitely being built by the NVCC compiler as I receive the Building NVCC (Device) messages which complete (although with a couple of warnings)<br /><br />Thanks for any help you can give.]]></description>
   </item>
      <item>
      <title>Is there an efficient CUDA sorting Algorithm?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6836/is-there-an-efficient-cuda-sorting-algorithm</link>
      <pubDate>Tue, 10 Apr 2012 00:15:59 -0400</pubDate>
      <dc:creator>addio3305</dc:creator>
      <guid isPermaLink="false">6836@/devforum/discussions</guid>
      <description><![CDATA[Hi everyone.<br /><br />I'd like to implement CUDA sorting algorithm. So I found the information and the example from the <br /><br />Internet, but I can't find the efficient algorithm because of some problem.<br /><br />First of all, most of example and the thesis use the power of 2 as its inputs, but I want to <br /><br />not the power of 2. For example, the merge sort or bitonic merge sort in CODE Samples, <br /><br />Second, I want to sort massive data set such as 10million. <br /><br />Is it possible to solve these problems? <br /><br />I want to find any reference or example for these problem.<br /><br />Thanks for your help. <br /><br /> ]]></description>
   </item>
      <item>
      <title>Getting GPU connection type with NvAPI in Win XP</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6846/getting-gpu-connection-type-with-nvapi-in-win-xp</link>
      <pubDate>Tue, 10 Apr 2012 09:51:01 -0400</pubDate>
      <dc:creator>licensetobill</dc:creator>
      <guid isPermaLink="false">6846@/devforum/discussions</guid>
      <description><![CDATA[Hi everyone,<br /><br />I'm using NvAPI and need to detect how the graphics card is connected to the monitor/s (e.g. vga or dvi) in code, from looking at the documentation the only way I can find of doing this is with NvAPI_DISP_GetMonitorCapabilities, passing in a display name, but this function isn't supported in XP.<br /><br />Any ideas?<br /><br />Thanks,<br /><br />Bill. ]]></description>
   </item>
      <item>
      <title>Can this serial program executed on GPU to get faster simulation time?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6761/can-this-serial-program-executed-on-gpu-to-get-faster-simulation-time</link>
      <pubDate>Sat, 07 Apr 2012 10:53:56 -0400</pubDate>
      <dc:creator>sampathreddyv</dc:creator>
      <guid isPermaLink="false">6761@/devforum/discussions</guid>
      <description><![CDATA[To start with i'm very new to cuda and GPU computing.<br />I have a code written in matlab right now. It is shown below. In most of the code the execution of the line depends on results from earlier lines of code. This is taking lot of time to execute. Is it possible to reduce the simulation time by writing cuda program or any other programing if any??<br /><br /><code>clear all<br />clc<br />tic<br />freq=1000000;<br />tmax=1;<br />Kac=0.1555;<br />fi1=0;<br />fi2=0;<br />vari=0;<br />c=3*10^8;<br />dwd=2*pi*2*10^9;<br />lam=632.8*10^-9;<br />wd=2*pi*400;<br />M=250;<br />k=2*pi/lam;<br />wc=2*pi*c/lam;<br />n=tmax*freq+1;<br />count=0;<br />y=zeros([n 1]);<br />pulses=zeros([n 1]);<br />peak=zeros([n 1]);<br />I1=zeros([n 1]);<br />I2=zeros([n 1]);<br />z=zeros([n 1]);<br />phase=zeros([n 1]);<br />E01=zeros([n 1]);<br />E02=zeros([n 1]);<br />L=.28;<br />count100=0;<br />Vhfo=6.5;<br />tem=0;<br />temp1=0;<br />km=0;<br />km1=0;<br />error1=0;<br />Wn=[(240/(5*10^5)) (260/(5*10^5))]; <br />[B A] = butter(2,Wn,'bandpass');<br />s=[];<br />maxcount=zeros([n 1]);<br />cnt100=zeros([n/100 1]);<br />tc=zeros([n/100 1]);<br />ij=0;<br />ip=0;<br />im=0;<br />omega=pi/240;<br />dell=((4*(L/4)^2)/c)*omega;<br />l1=L+dell;<br />l2=L-dell;<br />q=round(L/lam);<br />v1=q*(c/l1);<br />w1=2*pi*v1;<br />v2=q*(c/l2);<br />w2=2*pi*v2;<br />E01(1)=(Kac*Vhfo)*exp(-(4*log(2)*((w1-wc)/dwd)^2));<br />E02(1)=(Kac*Vhfo)*exp(-(4*log(2)*((w2-wc)/dwd)^2));<br />I1(1)=2*E01(1)*E02(1);<br />I2(1)=2*E01(1)*E02(1);<br />for ii=2:n,<br />t=ii/freq;                                              %time<br />Vhfo=error1*10+2;                                       %error feedback<br />L=.2800000784+(1e-7*sin(2*pi*250*t));                   %optical length with modulation in meters<br />dell=(4*((L/4)^2)/c)*omega;                               %variation in length due to rotation<br />l1=L+dell;                                              <br />l2=L-dell;<br />q=round(L/lam);<br />v1=q*(c/l1);<br />w1=2*pi*v1;<br />v2=q*(c/l2);<br />w2=2*pi*v2;<br />E01(ii)=(Kac*Vhfo)*exp(-(4*log(2)*((w1-wc)/dwd)^2));    %beam1<br />E02(ii)=(Kac*Vhfo)*exp(-(4*log(2)*((w2-wc)/dwd)^2));    %beam2<br /><br />I1(ii)=2*(E01(ii)*E02(ii))*sin((w2-w1)*t+M*sin(wd*t));  %intensity beam1<br />I2(ii)=2*(E01(ii)*E02(ii))*cos((w2-w1)*t+M*sin(wd*t));  %intensity beam2<br />f=I1(ii)&gt;0;<br />g=I2(ii)&gt;0;<br />pb=I2(ii-1)&gt;0;<br />pa=I1(ii-1)&gt;0;<br /><br /><br />% phase reversal calculation<br />if pb~=g || pa~=f<br />    y(ii)=xor(f,pb);<br />else<br />    y(ii)=y(ii-1);<br />end<br /><br />%pulses in each phase reversal<br />if y(ii)==y(ii-1)<br />    if f~=pa<br />        pulses(ii)=pulses(ii-1)+1;<br />    else<br />        pulses(ii)=pulses(ii-1);<br />    end<br />else<br />    pulses(ii)=0;<br />end<br /><br /><br />%100micro count<br />km1=km1+1;<br />if f~=pa &amp;&amp; y(ii)==1<br />    count100=count100+1;<br />elseif f~=pa &amp;&amp; y(ii)==0<br />    count100=count100-1;<br />end<br />if km1==100,<br />    ij=ij+1;<br />    tc(ij)=ij;<br />    cnt100(ij)=count100;<br />    count100=0;<br />    km1=0;<br />end<br /><br />%peak and error detection of intensity<br />if I2(ii)&gt;peak(ii-1);<br />    peak(ii)=I2(ii);<br />else<br />    peak(ii)=peak(ii-1)*exp(-.0000004);<br />end<br />error1(ii)=(2-peak(ii));<br /><br />%  bandpass filter <br />[z(ii) s]=filter(B,A,peak(ii),s);<br />r=(1e-7*sin(2*pi*250*t))&gt;0;<br /> if (~(xor(r,(z(ii)&gt;0)))) &amp;&amp; vari~=0<br />     if z(ii)~=z(ii-1)<br />         vari=-vari;<br />     end<br />     phase(ii)=360*vari*40/10^6;<br />        vari=0;<br />    else<br />        phase(ii)=phase(ii-1);<br />  end<br />    if xor(r,z(ii)&gt;0),<br />        vari=vari+1;<br />    end<br />end<br />countd=-sum(cnt100)/4<br />angle=(countd*lam*L/(4*(L/4)^2))*180/pi<br />toc</code>]]></description>
   </item>
      <item>
      <title>No global counters in CUPTI 4.1? Separate counters in CUPTI 4.1?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6426/no-global-counters-in-cupti-4-1-separate-counters-in-cupti-4-1</link>
      <pubDate>Tue, 27 Mar 2012 09:55:29 -0400</pubDate>
      <dc:creator>drcuda</dc:creator>
      <guid isPermaLink="false">6426@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I have developed monitoring software based on CUPTI 4.0. The idea was based on the event_sampling from CUDAToolsSDK. It seems that in CUPTI 4.0, CUPTI 4.0 exposed to each process the same set of counters. Specifically, if the process A was using GPU, the process B could detect that GPU was used based on reading the counters. In that context, the set of counters was global and visible to all CUPTI clients. <br /><br />Now, i.e., in CUPTI 4.1 it seems that each process has its own set of counters, so process A cannot detect any activity on GPU, even if process B executes kernels on GPU. Is my understanding correct, or do I miss something? <br /><br />I suspect that this might be because of the new driver that <br />restricts visibility of counters to a single LINUX process and does not allow to share them across different processes in the system. <br /><br />I have not checked this, but maybe for "global" monitoring of the GPU state, I could use CUPTI Activity API, which is a new feature in CUPTI 4.1.<br /><br />Thanks,]]></description>
   </item>
      <item>
      <title>error MSB3721 when compiling</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6391/error-msb3721-when-compiling</link>
      <pubDate>Mon, 26 Mar 2012 14:09:09 -0400</pubDate>
      <dc:creator>brachistochron</dc:creator>
      <guid isPermaLink="false">6391@/devforum/discussions</guid>
      <description><![CDATA[Hi<br />there is compilation error that i got when .cu file compile<br /><br />(there some cyrillic symbols, because i have russian version of VS2010)<br />&gt;  Compiling CUDA source file c.cu...<br />1&gt;  <br />1&gt;  c:\Users\Андрей\Documents\Visual Studio 2005\Projects\cudatest3\cudatest3&gt;"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\include"  -G0  --keep-dir "Debug" -maxrregcount=0  --machine 32 --compile  -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MDd " -o "Debug\c.cu.obj" "c:\Users\??????\Documents\Visual Studio 2005\Projects\cudatest3\cudatest3\c.cu" <br />1&gt;c1xx : fatal error C1083: ═х єфрхЄё  юЄъЁ√Є№ Їрщы шёЄюўэшъ: c:/Users/??????/Documents/Visual Studio 2005/Projects/cudatest3/cudatest3/c.cu: Invalid argument<br />1&gt;C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 4.1.targets(361,9): error MSB3721: выход из команды ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\include"  -G0  --keep-dir "Debug" -maxrregcount=0  --machine 32 --compile  -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MDd " -o "Debug\c.cu.obj" "c:\Users\Андрей\Documents\Visual Studio 2005\Projects\cudatest3\cudatest3\c.cu"" with code "2".<br /><br />i use this manual for cofigure vs :<br />http://www.aimantarek.com/2011/01/how-to-make-new-cuda-project-in-vs-2010.html<br />my configuration is i7+GTX560+win7x64<br /><br /><br /><br />can anybody help me?]]></description>
   </item>
      <item>
      <title>NVIDIA Parallel Insight 2.1 - debugging GTX 680 in VS2010  (Windows 7)</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6316/nvidia-parallel-insight-2-1-debugging-gtx-680-in-vs2010-windows-7</link>
      <pubDate>Sat, 24 Mar 2012 10:07:29 -0400</pubDate>
      <dc:creator>tvandervlies</dc:creator>
      <guid isPermaLink="false">6316@/devforum/discussions</guid>
      <description><![CDATA[I get the following warning when I start CUDA debugging:<br /><br /><strong>Parallel Nsight Debug<br />A CUDA context was created on a GPU that is not currently debuggable. breakpoints will be disabled.<br /><br />Adapter: Geforce GTX 680</strong><br /><br />When I change the CUDA context to the second controller a GTS 250 and connects the monitor to the GTX 680 debugging works normal. Why is GTX 680 not debuggable?  <br /><br />]]></description>
   </item>
      <item>
      <title>Crash debug symbols?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6176/crash-debug-symbols</link>
      <pubDate>Wed, 21 Mar 2012 22:50:35 -0400</pubDate>
      <dc:creator>dandrumea</dc:creator>
      <guid isPermaLink="false">6176@/devforum/discussions</guid>
      <description><![CDATA[Hello, thanks for looking to help out.<br /><br />Picked category 'Mobile' as it seems more likely for a cause.. <br /><br />Problem: I crash on CUDA/OpenGL interop on an Optimus mobile machine (intel 3000 + GT540M) with Win7SP1 inside cudaGraphicsGLRegisterBuffer. <br /><br />Report: Using CUDASDK sample simpleGL for reporting (stack below - if anyone could point me to NVIDIA debug symbols that could help?). Currently on CUDA 4.1 with driver 286.16 (see stack), but it always happens(ed), on 4.0 and earlier, with drivers like 285.62, and earlier. Here is the stack of the crash in cudaGraphicsGLRegisterBuffer:<br /><br /> 	KernelBase.dll!_RaiseException@16()  + 0x58 bytes	<br /> 	cudart32_41_28.dll!100387f7() 	<br /> 	[Frames below may be incorrect and/or missing, no symbols loaded for cudart32_41_28.dll]	<br /> 	cudart32_41_28.dll!10011d27() 	<br /> 	cudart32_41_28.dll!10008d45() 	<br /> 	cudart32_41_28.dll!1002ff2f() 	<br /> 	gdi32.dll!7614e8d9() 	<br /> 	ig4icd32.dll!025daf62() 	<br /> 	ig4icd32.dll!025be2ae() 	<br /> 	ig4icd32.dll!0259a511() 	<br /> 	ig4icd32.dll!025c192c() 	<br />&gt;	simpleGL.exe!mainCRTStartup()  Line 189	C<br /> 	kernel32.dll!75f9339a() 	<br /> 	ntdll.dll!77a89ef2() 	<br /> 	ntdll.dll!77a89ec5() 	<br /><br />Other Info: Optimus switching to NVIDIA graphics always fails obviously when "run with graphics processor", etc. is invoked for applications.<br /><br />Any suggestions are most welcome, thank you!<br /><br />Edit: March 22, 2012 3:43pm est - changed question category from 'mobile' to 'gpu computing']]></description>
   </item>
      <item>
      <title>[volumeRender] Why are unequally sized volumes rendered as cubes (i.e., scaled)?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4791/volumerender-why-are-unequally-sized-volumes-rendered-as-cubes-i-e-scaled</link>
      <pubDate>Thu, 16 Feb 2012 09:12:51 -0500</pubDate>
      <dc:creator>ivma</dc:creator>
      <guid isPermaLink="false">4791@/devforum/discussions</guid>
      <description><![CDATA[Hi!<br />I am trying out the volume renderer from the NVIDIA GPU Computing SDK 4.1/4.0 and I was wondering why it renders the Bucky.raw volume (256x256x256) accordingly but it scales unequally sized volumes such as for example the lobster (120x120x34)[1].<br /><br />Here are some resulting images (from bottom 120x120 and from the side where the volume resolution in Z direction is only 34 voxels):<br /><img src="http://img593.imageshack.us/img593/6965/lobsterbottom.png" alt="Bottom view" /><br /><img src="http://img853.imageshack.us/img853/2559/lobsterside.png" alt="Side view" /><br /><br />Does anybody have a clue why that is and possibly how to fix it?<br /><br />PS: I have tried it on a couple of other data sets as well but with the same effect.<br /><br />Greetings,<br />ivma<br /><br />[1] <a href="http://www.cg.tuwien.ac.at/courses/Visualisierung/data/lobster.zip">lobster.zip</a>]]></description>
   </item>
      <item>
      <title>unable to install gpu computing sdk</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3491/unable-to-install-gpu-computing-sdk</link>
      <pubDate>Mon, 16 Jan 2012 00:42:39 -0500</pubDate>
      <dc:creator>dllahr</dc:creator>
      <guid isPermaLink="false">3491@/devforum/discussions</guid>
      <description><![CDATA[Hello<br /><br />I am trying to install the GPU computing sdk.  As soon as I run it I get the error message<br />"Setup has experienced a problem <br />Please do the following <br />-Close any running program<br />-Empty your temporary folder<br />-Check your internet connection<br /><br />Then try to run Setup again<br /><br />Error code:  -6005<br /><br />I tried all 3, including rebooting my laptop several times.  Any suggestions?<br /><br />edit:  system specs<br />dell laptop<br />NVS 4200M <br />i7 CPU<br />win7]]></description>
   </item>
      <item>
      <title>Error in OptiX: create buffer from gl buffer object</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5666/error-in-optix-create-buffer-from-gl-buffer-object</link>
      <pubDate>Thu, 08 Mar 2012 17:57:24 -0500</pubDate>
      <dc:creator>hangdou</dc:creator>
      <guid isPermaLink="false">5666@/devforum/discussions</guid>
      <description><![CDATA[I tried to use rtBufferCreateFromGLBO and rtTextureSamplerCreateFromGLImage but I keep get return value of -1 which does not belong to any RTResult mentioned in OptiX API reference. I am using 64 bit xubuntu with two GTX 550 Ti cards, glew 1.7.0, OptiX 2.5.0 and CUDA 4.0. Below is a piece of code on buffer create. Anyone can help? Thanks.<br /><br />	GLuint testBufGL;<br />	glGenBuffers(1, &amp;testBufGL);<br />	glBindBuffer(GL_ARRAY_BUFFER, testBufGL);  	<br />	glBufferData(GL_ARRAY_BUFFER, (GLsizeiptr)sizeof(float)*screenWidth*screenHeight, NULL, GL_STATIC_DRAW);<br />  	glBindBuffer(GL_ARRAY_BUFFER, 0);<br />  		<br />	RTbuffer testBuf;<br />	int error = rtBufferCreateFromGLBO( device_0.context-&gt;GetContextID(), RT_BUFFER_INPUT, testBufGL, &amp;testBuf);<br />	printf("\nINVALID_CONTEXT is: %d,the MEMORY_ALLOCATION_FAILED is: %d\n", RT_ERROR_INVALID_CONTEXT, RT_ERROR_MEMORY_ALLOCATION_FAILED);<br />	printf("\nINVALID_VALUE is: %d, the SUCCESS is %d, the error is: %d\n\n", RT_ERROR_INVALID_VALUE, RT_SUCCESS, error);]]></description>
   </item>
      <item>
      <title>(code updated)Optix Error:&quot;rtTextureSamplerCreateFromGLImage&quot; keeps returning RT_ERROR_INVALID_VALUE</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5606/code-updatedoptix-errorrttexturesamplercreatefromglimage-keeps-returning-rt_error_invalid_value</link>
      <pubDate>Wed, 07 Mar 2012 18:58:56 -0500</pubDate>
      <dc:creator>hangdou</dc:creator>
      <guid isPermaLink="false">5606@/devforum/discussions</guid>
      <description><![CDATA[Hello, I try to use rtTextureSamplerCreateFromGLImage(rtTextureSamplerCreateFromGLImage(context, gl_id, target, sampler), but it keeps returning invalid. <br />The context id is valid. The gl_id refers to a texture of "GL_TEXTURE_2D, GL_R32F, GL_RED, GL_FLOAT". I have spent long time on this but still can not fix it. Anyone can help? Thanks a lot.]]></description>
   </item>
      <item>
      <title>Different OpenCL results with different drivers</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5681/different-opencl-results-with-different-drivers</link>
      <pubDate>Thu, 08 Mar 2012 18:41:25 -0500</pubDate>
      <dc:creator>Dan Mackley</dc:creator>
      <guid isPermaLink="false">5681@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br />Until today, I have been using the libOpenCL.so that came with the 280 driver to build a C program that does a just-in-time compile of OpenCL code and then runs it and uses the results.  The application runs on multiple, different platforms with possibly different nVidia hardware and driver versions.<br />Today I loaded the 295 driver, and the program gives different results -- different enough that they're unacceptable, numerically.<br />For various reasons, I do not want to keep multiple versions of our application around, built with and keyed to the different nVidia drivers (we have multiple customers with different hardware configs)... I'd prefer to stick with just 32- and 64-bit versions.<br />Is this the way things work -- does the libOpenCL that's used when building a C program need to match the runtime system's libOpenCL?<br />(By the way, both libOpenCL's are numbered as version 1.0.0)]]></description>
   </item>
      <item>
      <title>I want to know the meaning of envreg0~31 in PTX code.</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5131/i-want-to-know-the-meaning-of-envreg031-in-ptx-code-</link>
      <pubDate>Sun, 26 Feb 2012 01:31:59 -0500</pubDate>
      <dc:creator>komb</dc:creator>
      <guid isPermaLink="false">5131@/devforum/discussions</guid>
      <description><![CDATA[I am a programmer for GPGPU. I am studying OpenCL.<br />I have GeForce GT520M. <br /><br />I have a question about PTX code.<br />I made a PTX code for matrix multiplication.<br />The special register is used like %envreg0 ~ %envreg6.<br />I guess that envreg0 and envreg1 are group id for x, y at two dimension.<br />But I can't find the meanings of  the other registers.<br />I can't find any documents on that. The meaning is not descirbed in PTX spec. <br />Please let me know the meaning of special register envreg.  Are there any documents? ]]></description>
   </item>
      <item>
      <title>Windows 7, Visual studio C++ 2010, Error on cutil32.dll</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3046/windows-7-visual-studio-c-2010-error-on-cutil32-dll</link>
      <pubDate>Wed, 04 Jan 2012 13:19:05 -0500</pubDate>
      <dc:creator>hassy1977</dc:creator>
      <guid isPermaLink="false">3046@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br />I am now installing Cuda to my windows PC with windows 7 of 64 bit.<br />I an going to use visual studio C++ 2010 to write programs.<br />I installed everything according to the procedure shown in the following HP<br /><a href="http://forums.nvidia.com/index.php?showtopic=216829">http://forums.nvidia.com/index.php?showtopic=216829</a><br />Sample programs work correctly.<br /><br />I made a simple program of for test.<br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> <br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> <br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> <br /><br />     int main(int argc, char** argv){<br /><br />         CUT_DEVICE_INIT(argc, argv);<br />         CUT_EXIT(argc, argv);<br />         return 0;<br />     }<br /><br />The program looks successfully compiled, but in the end, it shows error comment of<br />   "The program can't start because cutil32.dll is missing from your computer.<br />   Try reinstalling the program to fix this problem."<br /><br />I renew the cutil32.dll file again and again with cutil_vc2010.sln but the result is same.<br /><br />Does someone else face to the same problem?<br />]]></description>
   </item>
      <item>
      <title>Floating Point Number Errors</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5126/floating-point-number-errors</link>
      <pubDate>Sun, 26 Feb 2012 01:27:01 -0500</pubDate>
      <dc:creator>komb</dc:creator>
      <guid isPermaLink="false">5126@/devforum/discussions</guid>
      <description><![CDATA[I have a question.<br />I intalled SDK and am studing OpenCL. <br />The SDK is included examples made by OpenCL for example DCT8X8.<br />It check floating point values as comparing the result computed by GPU with the result computed by CPU.<br />But the floating point values are not same. There is a little difference.<br />Why has it difference?<br /><br />I checked the difference. <br />Every part in IEEE754 is the same but mantissa is a little deffirent.<br />Please let me know why it has difference.]]></description>
   </item>
      <item>
      <title>How to specify gpu device using optix</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5006/how-to-specify-gpu-device-using-optix</link>
      <pubDate>Wed, 22 Feb 2012 15:47:59 -0500</pubDate>
      <dc:creator>hangdou</dc:creator>
      <guid isPermaLink="false">5006@/devforum/discussions</guid>
      <description><![CDATA[Hello, I am working on a project whichlay  needs me to put render two data sets on two GPUs separately using OptiX. Finally I can display arbitrary rendering result on the screen. <br />I do not know anything about mulit-gpu for OptiX and do not know where to start. Any clue will help. Thanks a lot. <br />Sincerely~]]></description>
   </item>
      <item>
      <title>P2P mem transfer between multiple CPU processes</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4821/p2p-mem-transfer-between-multiple-cpu-processes</link>
      <pubDate>Fri, 17 Feb 2012 16:15:15 -0500</pubDate>
      <dc:creator>tgramicc</dc:creator>
      <guid isPermaLink="false">4821@/devforum/discussions</guid>
      <description><![CDATA[After viewing the webinars on GPU-Direct/UVA and Multi-GPU, I am still confused about whether it is possible to perform a P2P mem copy between 2 GPU's when each GPU context is owned by a different CPU process.  I have looked at the SDK example, threadMigration, and I see how it is possible to perform a P2P copy from different threads within a single process, however I am wondering if it is possible to access a GPU's memory pointer between 2 CPU processes using an IPC shared memory space.  Is this possible, or does the UVA structure make this impossible?  Thanks for your reply.]]></description>
   </item>
      <item>
      <title>How to develop for RTOS(VxWorks, RT Linux, etc.)?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4781/how-to-develop-for-rtosvxworks-rt-linux-etc-</link>
      <pubDate>Thu, 16 Feb 2012 06:59:01 -0500</pubDate>
      <dc:creator>leventk</dc:creator>
      <guid isPermaLink="false">4781@/devforum/discussions</guid>
      <description><![CDATA[Nowadays, RTOS(VxWorks, RT Linux, etc.) can share a workstation or a PC with undeterministic (Windows/Linux) OSs. This enhancements has directed us to create products on desktop computers as well. These OSs are deterministic which is a must for real time products.<br /><br />My question is, <br />Can I develop an application using GPGPU on VxWorks OS?<br />Can I create an application using NVIDIA API on for example Windriver Workbench?<br />If no, is there any plan?<br /><br />Best Regards,<br />Levent]]></description>
   </item>
      <item>
      <title>Display driver vs Developer driver  Version</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4496/display-driver-vs-developer-driver-version</link>
      <pubDate>Thu, 09 Feb 2012 11:03:13 -0500</pubDate>
      <dc:creator>4fermi</dc:creator>
      <guid isPermaLink="false">4496@/devforum/discussions</guid>
      <description><![CDATA[<br />The latest "Display Driver" from <a href="http://www.nvidia.com/object/linux-display-amd64-290.10-driver.html">the products page</a> is <strong>ver 290.10</strong>.  But the latest "Developer Driver" from <a href="http://www.developer.nvidia.com/cuda-toolkit-41#s=bcb">the CUDA Developer Toolkit 4.1 download page</a> is <strong>ver 285.05.33</strong>.<br /><br />Question:  why must CUDA developers use an older version of the driver?  Afterall the cuda product made by the developer will be used by people with the newer driver!<br />]]></description>
   </item>
      <item>
      <title>Tesla multi copy not as fast as expected</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4421/tesla-multi-copy-not-as-fast-as-expected</link>
      <pubDate>Wed, 08 Feb 2012 04:37:54 -0500</pubDate>
      <dc:creator>paulvisschers</dc:creator>
      <guid isPermaLink="false">4421@/devforum/discussions</guid>
      <description><![CDATA[When I run the simpleMultiCopy in the SDK (4.0) on the Tesla C2050 I get the following results:<br /><code>[simpleMultiCopy] starting...<br />[Tesla C2050] has 14 MP(s) x 32 (Cores/MP) = 448 (Cores)<br />&gt; Device name: Tesla C2050<br />&gt; CUDA Capability 2.0 hardware with 14 multi-processors<br />&gt; scale_factor = 1.00<br />&gt; array_size   = 4194304<br /><br /><br />Relevant properties of this CUDA device<br />(X) Can overlap one CPU&lt;&gt;GPU data transfer with GPU kernel execution (device property "deviceOverlap")<br />(X) Can overlap two CPU&lt;&gt;GPU data transfers with GPU kernel execution<br />    (compute capability &gt;= 2.0 AND (Tesla product OR Quadro 4000/5000)<br /><br />Measured timings (throughput):<br /> Memcpy host to device	: 2.725792 ms (6.154988 GB/s)<br /> Memcpy device to host	: 2.723360 ms (6.160484 GB/s)<br /> Kernel			: 0.611264 ms (274.467599 GB/s)<br /><br />Theoretical limits for speedup gained from overlapped data transfers:<br />No overlap at all (transfer-kernel-transfer): 6.060416 ms <br />Compute can overlap with one transfer: 5.449152 ms<br />Compute can overlap with both data transfers: 2.725792 ms<br /><br />Average measured timings over 10 repetitions:<br /> Avg. time when execution fully serialized	: 6.113555 ms<br /> Avg. time when overlapped using 4 streams	: 4.308822 ms<br /> Avg. speedup gained (serialized - overlapped)	: 1.804733 ms<br /><br />Measured throughput:<br /> Fully serialized execution		: 5.488530 GB/s<br /> Overlapped using 4 streams		: 7.787379 GB/s<br />[simpleMultiCopy] test results...<br />PASSED</code><br />This shows that the expected runtime is 2.7 ms, while it actually takes 4.3. What is it exactly that causes this discrepancy?]]></description>
   </item>
      <item>
      <title>Is there a way to access an ID3D11Texture3D in CUDA (read/write)?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4221/is-there-a-way-to-access-an-id3d11texture3d-in-cuda-readwrite</link>
      <pubDate>Wed, 01 Feb 2012 09:02:28 -0500</pubDate>
      <dc:creator>SoulWiz</dc:creator>
      <guid isPermaLink="false">4221@/devforum/discussions</guid>
      <description><![CDATA[I have an ID3D11Texture3D with the following descriptor:<br /><code><br />D3D11_TEXTURE3D_DESC td;<br />ZeroMemory(&amp;td, sizeof(td));<br />td.Width = uiWidth;<br />td.Height = uiHeight;<br />td.Depth = uiDepth;<br />td.MipLevels = 1;<br />td.Format = DXGI_FORMAT_R32_FLOAT;<br />td.BindFlags = D3D11_BIND_RENDER_TARGET | D3D11_BIND_SHADER_RESOURCE;<br /></code><br /><br />I want to read and write to this texture using CUDA. Is this possible somehow?<br />I have tried the following:<br />cudaGraphicsD3D11RegisterResource &gt;&gt; cudaGraphicsMapResources &gt;&gt; cudaGraphicsSubResourceGetMappedArray<br />to READ: cudaBindTextureToArray &gt;&gt; tex3D<br />to WRITE: cudaMemcpy3D (from linear memory allocated with cudaMalloc3D)<br />but the memory copy failes with cudaErrorInvalidValue:<br /><code><br />cudaMemcpy3DParms oMemcpy3DParms;<br />memset(&amp;oMemcpy3DParms, 0, sizeof(cudaMemcpy3DParms));<br />oMemcpy3DParms.srcPtr = oPitchedPtr;<br />oMemcpy3DParms.srcPos = make_cudaPos(0, 0, 0);<br />oMemcpy3DParms.dstArray = pArray;<br />oMemcpy3DParms.dstPos = make_cudaPos(0, 0, 0);<br />oMemcpy3DParms.extent = oExtent;<br />oMemcpy3DParms.kind = cudaMemcpyDeviceToDevice;<br />cudaError oCudaError = cudaMemcpy3D(&amp;oMemcpy3DParms);<br /></code><br /><br />Any ideas? ... or does it even work?]]></description>
   </item>
      <item>
      <title>GPU computing in a virtual environment</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/1781/gpu-computing-in-a-virtual-environment</link>
      <pubDate>Wed, 23 Nov 2011 15:44:52 -0500</pubDate>
      <dc:creator>bwatson</dc:creator>
      <guid isPermaLink="false">1781@/devforum/discussions</guid>
      <description><![CDATA[Assume I have server-grade hardware running VMWare ESX and hosting 1 or more virtual machines.  If I were to add NVIDIA graphics to the server, would it be possible for a program running inside one of the virtual machines to utilize the GPU for calculations?]]></description>
   </item>
      <item>
      <title>CUDA Toolkit 4.1.15 on openSUSE 12.1?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3651/cuda-toolkit-4-1-15-on-opensuse-12-1</link>
      <pubDate>Wed, 18 Jan 2012 15:55:04 -0500</pubDate>
      <dc:creator>gue22</dc:creator>
      <guid isPermaLink="false">3651@/devforum/discussions</guid>
      <description><![CDATA[Went to great lengths to install openSUSE 12.1 on a physical machine (as opposed to the VMware and Hyper-V VMs I normally use to try all kinds of things) only to find out upon closer inspection that the CUDA Toolkit 4.1.15 downloads are targeted for 11.2.<br /><br />Any chance for success on 12.1 (quite different core from 11.x) or should I set up Yet Another variant with 11.2?<br />Thx<br />G.]]></description>
   </item>
      <item>
      <title>On streams and asynchronous execution</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3496/on-streams-and-asynchronous-execution</link>
      <pubDate>Mon, 16 Jan 2012 03:32:28 -0500</pubDate>
      <dc:creator>lucana</dc:creator>
      <guid isPermaLink="false">3496@/devforum/discussions</guid>
      <description><![CDATA[This question is just to make sure I'm understading well how CUDA streams work. <br /><br />Imagine I have a for loop like this. I am using only one stream.<br /><br />for (i=0; i &lt; N; i++)<br />{<br />	 run operations on CPU <br />	 copy results of CPU operations to CUDA kernel with cudaMemcpyAsync<br />	 call kernel &lt;&lt;&lt;  , &gt;&gt;&gt;<br />}<br /><br />My understanding is that the kernel for i and the CPU operations for i+1 at the begining of the loop will execute concurrently, but the kernel won't start for i+1 until the CPU has finished computing results for i+1.<br /><br />Is this right? Or will the operations on CPU and GPU never overlap? Will the kernel start before have the proper results computed from the CPU? Is it necessary to put some control flags to make sure the operations on the CPU have finished before the kernel starts?<br /><br />This diagram shows what I want to do. In fact it is a pipeline, but I'm still unsure if it is possible with CUDA. <br /><br />----------i = 0 -------------------- i = 1 --------------------------- i = 2<br />(t0) compute results on CPU<br />(t1) copy results to CUDA kernel -- compute results on CPU<br />(t2) execute kernel --------------- copy results to CUDA kernel -- compute results on CPU<br />(t3) ------------------------------ execute kernel --------------- copy results to CUDA kernel <br />(t4)-------------------------------------------------------------- execute kernel<br /><br />Finally, I would like to ask if it makes sense to use CUDA streams when there is data dependacy between streams, with a pipeline like the one showed before. ]]></description>
   </item>
      <item>
      <title>how many can it(GTX 460) create h.264 codec to decode HD(1280x720) at the same time ?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4251/how-many-can-itgtx-460-create-h-264-codec-to-decode-hd1280x720-at-the-same-time-</link>
      <pubDate>Thu, 02 Feb 2012 01:57:00 -0500</pubDate>
      <dc:creator>shlee7708</dc:creator>
      <guid isPermaLink="false">4251@/devforum/discussions</guid>
      <description><![CDATA[hello,<br /><br />I want to decode HD 720P H.264 32 channel at the same time using cuda.<br /><br />Is it possible ?<br /><br />if it is possible, what kind of ndvia gpu do i use ?]]></description>
   </item>
      <item>
      <title>Is there a way to access an ID3D11Texture2D with 8 samples in CUDA (read/write)?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4201/is-there-a-way-to-access-an-id3d11texture2d-with-8-samples-in-cuda-readwrite</link>
      <pubDate>Wed, 01 Feb 2012 07:24:34 -0500</pubDate>
      <dc:creator>SoulWiz</dc:creator>
      <guid isPermaLink="false">4201@/devforum/discussions</guid>
      <description><![CDATA[I have an ID3D11Texture2D with the following descriptor:<br /><code><br />D3D11_TEXTURE2D_DESC td;<br />ZeroMemory(&amp;td, sizeof(td));<br />td.Width = m_uiWidth;<br />td.Height = m_uiHeight;<br />td.MipLevels = 1;<br />td.ArraySize = 1;<br />td.Format = DXGI_FORMAT_R32_FLOAT;<br />td.SampleDesc.Count = 8;<br />td.SampleDesc.Quality = 0;<br />td.BindFlags = D3D11_BIND_RENDER_TARGET;</code><br /><br />I want to read and write to this texture (all 8 samples) using CUDA. Is this possible somehow?<br />I have tried the following:<br />cudaGraphicsD3D11RegisterResource &gt;&gt; cudaGraphicsMapResources &gt;&gt; cudaGraphicsSubResourceGetMappedArray<br />to READ: cudaBindTextureToArray &gt;&gt; tex2DLayered<br />to WRITE: cudaMemcpy3D (from linear memory allocated with cudaMalloc3D)<br />but it looks like I cannot access all 8 samples this way.<br /><br />I also tried to have direct read/write access using a surface reference:<br />cudaGraphicsD3D11RegisterResource &gt;&gt; cudaGraphicsMapResources &gt;&gt; cudaGraphicsSubResourceGetMappedArray &gt;&gt; cudaBindSurfaceToArray<br />to READ: surf2Dread<br />to WRITE: surf2Dwrite<br /><br />Any ideas? ... or does it even work?]]></description>
   </item>
      <item>
      <title>Please update the openSUSE packages.</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4086/please-update-the-opensuse-packages-</link>
      <pubDate>Sat, 28 Jan 2012 09:51:42 -0500</pubDate>
      <dc:creator>Deanjo</dc:creator>
      <guid isPermaLink="false">4086@/devforum/discussions</guid>
      <description><![CDATA[Can you guys please update the openSUSE packages? openSUSE 11.2's support was discontinued May 12th 2011 and 11.3's support was discontinued January 20th 2012.  12.1 is the current release and all we are asking is for the package to be updated and a bit of equality in support here. Just to give a bit of perspective here, the openSUSE versions support was discontinued around the same time latest Cuda supported version of Ubuntu was released.]]></description>
   </item>
      <item>
      <title>Can we use OpenGL for GPU Graphics and Direct Compute for GPGPU at the same time on the same PC?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3811/can-we-use-opengl-for-gpu-graphics-and-direct-compute-for-gpgpu-at-the-same-time-on-the-same-pc</link>
      <pubDate>Mon, 23 Jan 2012 03:00:14 -0500</pubDate>
      <dc:creator>vanhouten777</dc:creator>
      <guid isPermaLink="false">3811@/devforum/discussions</guid>
      <description><![CDATA[Hi Friends,<br /><br /><br />Can we use OpenGL for GPU Graphics and Direct Compute for GPGPU at the same time on the same PC?<br /><br />Regards,<br />vanhouten777.<br /><br />]]></description>
   </item>
      <item>
      <title>New CUDA Toolkit 4.1, Now in Production</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4036/new-cuda-toolkit-4-1-now-in-production</link>
      <pubDate>Thu, 26 Jan 2012 16:24:57 -0500</pubDate>
      <dc:creator>Nadeem Mohammad</dc:creator>
      <guid isPermaLink="false">4036@/devforum/discussions</guid>
      <description><![CDATA[A new production release of CUDA has been posted. This new release makes it faster and easier to accelerate scientific research with GPUs.  Key features include a re-designed Visual Profiler with automated performance analysis, a new LLVM-based compiler that helps your apps run up to 10% faster, and 1000+ new imaging and signal processing functions in the NPP library.  We’ve also added a new tri-diagonal solver, 2x faster SpMV using the ELL hybrid format, and some great improvements to the debugging and performance analysis tools.  Learn more and download from <a href="http://bit.ly/w3H6Z7">CUDAZone</a>]]></description>
   </item>
      <item>
      <title>GPU Accelerated 2D to Stereo 3D Video Conversion</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3906/gpu-accelerated-2d-to-stereo-3d-video-conversion</link>
      <pubDate>Tue, 24 Jan 2012 16:43:20 -0500</pubDate>
      <dc:creator>DryRiver</dc:creator>
      <guid isPermaLink="false">3906@/devforum/discussions</guid>
      <description><![CDATA[Hello All,<br /><br />I have written a pretty good 2D-to-3D video conversion algorithm in C# NET. (Took a little over 2 years of experimenting to get it right)<br /><br />I now want to GPU accelerate this 2D-to-3D conversion algorithm. I am hoping for a 10x - 20x times speedup using the GPU to do the pixel crunching, instead of the CPU. <br /><br />My requirements are:<br /><br />- The GPU code needs to execute inside a C# .NET Windows Forms Applicaton<br /><br />- I want to use the easiest/beginner friendliest GPU coding method possible<br /><br />Where should I start with this? CUDA.NET? OpenCL.NET? Brahma (for C#)?<br /><br />Are there any beginners tutorials for using CUDA/OpenCL inside C# NET?<br /><br />Are there, specifically, any Image Processing tutorials/examples for CUDA/OpenCL?<br /><br />Thank you for any feedback. I am a complete CUDA/OpenCL Noob and am hoping for expert advice on making my first GPU accelerated project happen.<br /><br />Best Regards,<br /><br />                  DryRiver<br /><br /><br /><br /><br /><br /><br />]]></description>
   </item>
      <item>
      <title>device function pointers</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3921/device-function-pointers</link>
      <pubDate>Wed, 25 Jan 2012 04:12:51 -0500</pubDate>
      <dc:creator>micheletuttafesta</dc:creator>
      <guid isPermaLink="false">3921@/devforum/discussions</guid>
      <description><![CDATA[Dear Sirs,<br />I need a device version of the following<br />host code:<br /><br />double (**func)(double x);<br /><br />double func1(double x)<br />{<br /> return x+1.;<br />}<br /><br />double func2(double x)<br />{<br /> return x+2.;<br />}<br /><br />double func3(double x)<br />{<br /> return x+3.;<br />}<br /><br />void test(void)<br />{<br /> double x;<br /><br /> for(int i=0;i&lt;3;++i){<br />  x=func[i](2.0);<br />  printf("%g\n",x);<br /> }<br /><br />}<br /><br />int main(void)<br />{<br /> func=(double (**)(double))malloc(10*sizeof(double (*)(double)));<br /><br /> test();<br /><br /> return 0;<br />}<br /><br /><br />where func1, func2, func3<br />have to be __device__ functions<br />and "test"<br />has to be a (suitably modified) __global__ kernel.<br /><br />I have a NVIDIA GeForce GTS 450 (compute capability 2.1)<br />Thank you in advance<br />Michele<br /><br />]]></description>
   </item>
      <item>
      <title>NSIGHT doesn&#039;t let me choose threads with id greater than 15.</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3696/nsight-doesnt-let-me-choose-threads-with-id-greater-than-15-</link>
      <pubDate>Thu, 19 Jan 2012 10:41:02 -0500</pubDate>
      <dc:creator>lucana</dc:creator>
      <guid isPermaLink="false">3696@/devforum/discussions</guid>
      <description><![CDATA[I have managed to stop CUDA debugging at breakpoints. I'm working with VS2010. I can use the Debug Focus to select threads and blocks to follow. But I can't select any of the threads/blocks defined. The dimensions of grid and block written there are wrong. For example, I launched 1024 (kernel&lt;&lt;&lt;1, 1024&gt;&gt;&gt;)threads, but it only lets me choose up to thread number 15. Is it normal? I'm I doing something wrong? ]]></description>
   </item>
      <item>
      <title>linker errors while executing opencl sample codes</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3421/linker-errors-while-executing-opencl-sample-codes</link>
      <pubDate>Fri, 13 Jan 2012 06:10:44 -0500</pubDate>
      <dc:creator>Prasanna</dc:creator>
      <guid isPermaLink="false">3421@/devforum/discussions</guid>
      <description><![CDATA[Hi<br />I am new in executing opencl codes...I have downloaded the GPU Computing SDK and drivers and executing opencl samples from that...I have included all the .lib files which are there in opencl in SDK...While executing i got the following errors in visual studio 2010<br /><br />1&gt;------ Build started: Project: testopencl, Configuration: Debug Win32 ------<br />1&gt; Skipping... (no relevant changes detected)<br />1&gt; testopencl.cpp<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _shrComparefet referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clEnqueueReadBuffer@36 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clEnqueueNDRangeKernel@36 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clEnqueueWriteBuffer@36 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clSetKernelArg@16 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clCreateKernel@12 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clBuildProgram@24 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clCreateProgramWithSource@20 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _oclLoadProgSource referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _shrFindFilePath referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clCreateBuffer@24 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clCreateCommandQueue@20 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clCreateContext@24 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clGetDeviceIDs@24 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clGetPlatformIDs@12 referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _shrFillArray referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _shrRoundUp referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _shrLog referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _shrSetLogFileName referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _shrCheckCmdLineFlag referenced in function _main<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clReleaseMemObject@4 referenced in function "void __cdecl Cleanup(int,char * *,int)" (?Cleanup@@YAXHPAPADH@Z)<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clReleaseContext@4 referenced in function "void __cdecl Cleanup(int,char * *,int)" (?Cleanup@@YAXHPAPADH@Z)<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clReleaseCommandQueue@4 referenced in function "void __cdecl Cleanup(int,char * *,int)" (?Cleanup@@YAXHPAPADH@Z)<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clReleaseProgram@4 referenced in function "void __cdecl Cleanup(int,char * *,int)" (?Cleanup@@YAXHPAPADH@Z)<br />1&gt;testopencl.obj : error LNK2019: unresolved external symbol _clReleaseKernel@4 referenced in function "void __cdecl Cleanup(int,char * *,int)" (?Cleanup@@YAXHPAPADH@Z)<br />1&gt;C:\Users\Acer\Documents\Visual Studio 2010\Projects\testopencl\Debug\testopencl. exe : fatal error LNK1120: 25 unresolved externals<br />========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========<br /><br /><br />I am using Windows7 os 64-bit with nvidia graphic card...It will be great helpful if anyone reply the solution for this problem.<br />Thank You... ]]></description>
   </item>
      <item>
      <title>NPP: Unable to process multiple MCU rows with nppiDCTQuantFwd8x8LS_JPEG_8u16s_C1R</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3536/npp-unable-to-process-multiple-mcu-rows-with-nppidctquantfwd8x8ls_jpeg_8u16s_c1r</link>
      <pubDate>Tue, 17 Jan 2012 03:04:37 -0500</pubDate>
      <dc:creator>StuJey</dc:creator>
      <guid isPermaLink="false">3536@/devforum/discussions</guid>
      <description><![CDATA[The ROI passed to the function does not seem to allow the height to be set to greater than 8.<br /><br />As an example if I pass an ROI with a width of 1280 &amp; a height of 8 the output DCTs are generated correctly. However if I pass an ROI with a width of 1280 &amp; a height of 32 only the first row of DCTs are generated, the remainder are not calculated, but there is no error. The source stride is 1280 in this case.<br /><br />Could this be a bug in the function or perhaps a problem with the parameters passed to the function?<br /><br />I need to be able to process multiple rows in order to got enough performance from the jpeg compression process when using the GPU. The orignal code is based on the Intel IPP sample UIC code. Currently processing 1 MCU row at a time the overvall performance with the GPU is about the same as with the standard IPP version.<br /><br />The CUDA SDK version is 4.1 RC, running on Ubuntu 10.04.2 server 64 bit with a Q9650 cpu and GTX550 Ti grahpics card.]]></description>
   </item>
      <item>
      <title>Asynchronous kernels, CUDA, and hardware question.</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3196/asynchronous-kernels-cuda-and-hardware-question-</link>
      <pubDate>Mon, 09 Jan 2012 11:02:00 -0500</pubDate>
      <dc:creator>dlowell</dc:creator>
      <guid isPermaLink="false">3196@/devforum/discussions</guid>
      <description><![CDATA[Hi all,<br /><br />I'm using streams to pipeline the segmented portions of a reduction asynchronously; e.g., norm2, dot, or max. In this way I hope to lessen the amount of resource idleness during the latter part of the reduction phase.<br /><br />(Nothing is wrong the CUBLAS, however for our own reasons we are writing a more transparent and tunable version of many kernels in PETSc.)<br /><br />What I have read is that the execution scheduler will occur once resources are available. <br /><br />By streaming and pipelining what I am doing is at run time breaking up the work into chunks. Each of these chunks are given a stream ID and run asynchronously, dispatched one after the other to pipeline the partial results, then a further kernel is called for the final reduction. I am trying to take the gpu to limit of its memory and to keep the gpu from being idle during reduction. What I have read is that the gpu will execute the next stream, once the previous kernel has freed enough resources such that multiple kernels can be run concurrently. However what I don't know is how much resources need to be freed before the next kernel can be run concurrently. What I am unclear on is does this mean a full warp needs to be freed by the previous kernel before the next kernel can begin its partial execution? What is the smallest unit of hardware (SM, warp, or SP), that needs to be freed by a kernel before the new kernel can utilize the GPU?]]></description>
   </item>
      <item>
      <title>Nvda.Build.CudaTasks.SanitizePaths compile error</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/3336/nvda-build-cudatasks-sanitizepaths-compile-error</link>
      <pubDate>Wed, 11 Jan 2012 21:39:02 -0500</pubDate>
      <dc:creator>jm99</dc:creator>
      <guid isPermaLink="false">3336@/devforum/discussions</guid>
      <description><![CDATA[I receive the error Nvda.Build.CudaTasks.SanitizePaths trying to compile a program in VS2010 with SDK 4.0.<br /><br />The complete error is:<br /><br /><br />Error	1	error MSB4062: The "Nvda.Build.CudaTasks.SanitizePaths" task could not be loaded from the assembly C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\Nvda.Build.CudaTasks.v4.0.dll. Could not load file or assembly 'file:///C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\Nvda.Build.CudaTasks.v4.0.dll' or one of its dependencies. The system cannot find the file specified. Confirm that the  declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask<br /><br />The dll referred to does exist in the specified path.  What could be the issue?  Thanks.]]></description>
   </item>
      <item>
      <title>GPU Computing SDK: problem with the OpenCL n-body simulation example.</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/2116/gpu-computing-sdk-problem-with-the-opencl-n-body-simulation-example-</link>
      <pubDate>Sun, 04 Dec 2011 08:41:50 -0500</pubDate>
      <dc:creator>Bekos</dc:creator>
      <guid isPermaLink="false">2116@/devforum/discussions</guid>
      <description><![CDATA[Hello everyone!<br /><br />I have a question regarding the n-body OpenCL simulation in the nVidia GPU computing SDK. First of all I want to apologize if I am asking something very silly. My physics and n-body knowledge is not very good yet. I was checking the CPU version of the n-body algorithm in file "oclBodySystemCPU.cpp". My question is related to the void BodySystemCPU::_integrateNBodySystem(float) function. This function, at line 165 calculates the velocity of the particle at the end of the interval. And then uses this velocity to calculate the position of the particle at the end of the interval. Isn't this wrong? I thought the correct solution is to calculate the position using the velocity of the particle at the end of the previous interval. And the velocity calculated at line 165 should be used for the next interval. I am missing something here? Thanks a lot for your time.<br /><br />Cheers,<br />Bekos<br />]]></description>
   </item>
      </channel>
</rss>
