<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
      <title>Tagged with linux - NVIDIA Developer Forums</title>
      <link>http://forums.developer.nvidia.com/devforum/discussions/tagged/linux/feed.rss</link>
      <pubDate>Wed, 16 May 12 17:32:54 -0400</pubDate>
         <description>Tagged with linux - NVIDIA Developer Forums</description>
   <language>en-CA</language>
   <atom:link href="/devforum/discussions/taggedlinux/feed.rss" rel="self" type="application/rss+xml" />
   <item>
      <title>nvv start error code=13 (linux)</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8211/nvv-start-error-code13-linux</link>
      <pubDate>Tue, 15 May 2012 08:04:16 -0400</pubDate>
      <dc:creator>Da Ma</dc:creator>
      <guid isPermaLink="false">8211@/devforum/discussions</guid>
      <description><![CDATA[Dear all,<br /><br />i want to profile my cuda program. In the good old time i used computeprof without any problem. Since update to cuda 4.1 i must use the new profiler nvv. <br /><br />Some system information's:<br />Linux 3.2.1-gentoo-r2 x86_64<br />nvidia-drivers-295.41<br />dev-util/nvidia-cuda-sdk-4.1<br />dev-util/nvidia-cuda-toolkit-4.1<br />dev-java/sun-jdk-1.6.0.31<br />dev-java/sun-jre-bin-1.6.0.31<br /><br />When i start nvv from comandline i get <br /><br /><code>JVM terminated. Exit code=13<br />/opt/cuda/libnvvp/jre/bin/java<br />-jar /opt/cuda/libnvvp/plugins/org.eclipse.equinox.launcher_1.1.0.v20100507.jar<br />-os linux<br />-ws gtk<br />-arch x86_64<br />-showsplash<br />-launcher /opt/cuda/libnvvp/nvvp<br />-name Nvvp<br />--launcher.library /opt/cuda/libnvvp/plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.1.R36x_v20100810/eclipse_1309.so<br />-startup /opt/cuda/libnvvp/plugins/org.eclipse.equinox.launcher_1.1.0.v20100507.jar<br />-exitdata a848011<br />-data <a href="/devforum/profile/user">@user</a>.home/nvvp_workspace<br />-vm /opt/cuda/libnvvp/jre/bin/java<br />-vmargs<br />-jar /opt/cuda/libnvvp/plugins/org.eclipse.equinox.launcher_1.1.0.v20100507.jar <br /></code><br /><br />It told me nothing because i'm not that java-expert ... maybe someone can tell me whats went wrong.  Should i use a newer version of sun java?<br /><br />Thanks a lot in advance.<br /><br />Best,<br />David]]></description>
   </item>
      <item>
      <title>Flashing L4T on cardhu dev tablet</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6506/flashing-l4t-on-cardhu-dev-tablet</link>
      <pubDate>Wed, 28 Mar 2012 12:57:40 -0400</pubDate>
      <dc:creator>madmaze</dc:creator>
      <guid isPermaLink="false">6506@/devforum/discussions</guid>
      <description><![CDATA[I have the latest version of the Linux for Tegra development kit,<br />I am stuck on flashing the dev tablet with the new kernel/bootloader.<br /><br />The documentation says the following:<br /><code>You must first put the target board into reset/recovery mode. Do so by first powering <br />on the board and then holding the recovery button and pressing the reset button.</code><br /><br />When i power on the tablet it announces:<br /><code><br />...<br />Checking for RCK.. press key &lt;Volume Down&gt; in 5 sec to enter RCK<br />OS will cold boot in 10 seconds if no input is detected<br />Press &lt;Volume Down&gt; to select, &lt;Volume Up&gt; for selection move</code><br />(options are USB and Android)<br /><br />if i press Vol- when the RCK message pops up it gives me a distorted screen with an android exclamation mark in it. Then no other input it possible, not even turning it off.<br /><br />if i select to boot USB then it claims "Starting Fastboot USB download protocol"<br />If at this point I execute flash.sh it will hang after "Nvflash  started"<br /><br />any suggestions on what I could/should try?<br />Is there a way to get debug output from NVFLASH about where its stuck?<br /><br />Thanks,<br /><br />Matthias<br />]]></description>
   </item>
      <item>
      <title>Booting device from internal eMMC drive aka mmcblk0p5</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8251/booting-device-from-internal-emmc-drive-aka-mmcblk0p5</link>
      <pubDate>Tue, 15 May 2012 18:50:07 -0400</pubDate>
      <dc:creator>PouryaShirazian1</dc:creator>
      <guid isPermaLink="false">8251@/devforum/discussions</guid>
      <description><![CDATA[Sorry if my question is already posted somewhere in ther forum. I have successfully booted the device from a prepared USB drive and now want to flash the device to be able to boot from its internal eMMC memory drive. How can I do this properly?<br /><br />The error that I receive is:<br /><br />file not found: bootloader/system.img<br />failed executing command 2147483647 NvError 0x4<br />command failure: create failed <br />Failed to flash cardhu.<br /><br />Thanks]]></description>
   </item>
      <item>
      <title>Errors with cuda RC2 visual profiler</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/2596/errors-with-cuda-rc2-visual-profiler</link>
      <pubDate>Fri, 16 Dec 2011 19:31:09 -0500</pubDate>
      <dc:creator>tcelvis</dc:creator>
      <guid isPermaLink="false">2596@/devforum/discussions</guid>
      <description><![CDATA[1. Execution starts normally (as indicated by output lines. After approx 20 seconds progress status switches to collecting results and then aborts with the following error dialog:<br />"Unable to locate CUDA libraries and establish connection with CUDA dirver<br />Error com.nvidia.viper.jni.CuException: CUDA_ERROR_INVALID_VALUE"<br /><br />Some reruns give error ... CUDA_OUT_OF_MEMORY<br /><br />2. Notices that when nvvp is launched 42 processes show up all looking identical. "top" output for each line is as follows:<br />/usr/local/cuda/libnvvp/jre/bin/java -jar /usr/local/cuda/libnvvp/plugins/org.eclipse.equinox.launcher_1.1.0.v20100507.jar -os linux -ws gtk -arch x86_64 -showsplash -launcher /usr/local/cuda/libnvvp/nvvp -name Nvvp --launcher.library /usr/local/cuda/libnvvp/plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.1.R36x_v20100810/eclipse_1309.so -startup /usr/local/cuda/libnvvp/plugins/org.eclipse.equinox.launcher_1.1.0.v20100507.jar -exitdata 6e000f -data <a href="/devforum/profile/user">@user</a>.home/nvvp_workspace -vm /usr/local/cuda/libnvv<br /><br />3. Visual profiler users guide included with RC2 still references computeprof and not nvvp. computeprof was not part of this distribution.<br /><br />4. This application runs file as the execuatable when not using visual profiler. The version 4.0 computeprof also worked fine on this application.<br /><br />5. Using Centos 5.5 linux on a quad-hex core chassis containing 8 Fermi 2090 GPUs.<br />Utilizing stream ids running 24 threads with 3 threads sharing each GPU]]></description>
   </item>
      <item>
      <title>How to write QT project file when the QT project contain &#039;.cu&#039; file</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8156/how-to-write-qt-project-file-when-the-qt-project-contain-cu-file</link>
      <pubDate>Mon, 14 May 2012 05:39:15 -0400</pubDate>
      <dc:creator>licongsheng1206163com</dc:creator>
      <guid isPermaLink="false">8156@/devforum/discussions</guid>
      <description><![CDATA[Hello All! I have write a code file named deviceQuery.cu and the compilation is successful, now i want to put it into Qt project, i want to know how to write the QT project(.pro) file. Thanks!]]></description>
   </item>
      <item>
      <title>input devices for Tegra 3 with Linux</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8076/input-devices-for-tegra-3-with-linux</link>
      <pubDate>Fri, 11 May 2012 11:15:54 -0400</pubDate>
      <dc:creator>compose</dc:creator>
      <guid isPermaLink="false">8076@/devforum/discussions</guid>
      <description><![CDATA[I flashed the Linux kernel and the sample file system provided to the Tegra 3 board. The operating system booted and the login prompt showed. However, I have no idea how to type in it. The connection with an USB keyboard does not work. With only the HDMI port provided I don't know whether I can use minicom through the serial cable. I guess probably I need to re-compile the kernel to add the drivers of USB and HDMI. Does anyone have experience on this? Thank you.]]></description>
   </item>
      <item>
      <title>GTX680 OpenCL in Ubuntu 11.10, clBuildProgram returns -30</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8071/gtx680-opencl-in-ubuntu-11-10-clbuildprogram-returns-30</link>
      <pubDate>Fri, 11 May 2012 09:05:02 -0400</pubDate>
      <dc:creator>yashiz</dc:creator>
      <guid isPermaLink="false">8071@/devforum/discussions</guid>
      <description><![CDATA[I've tested driver 295.49 and 302.07 both.<br /><br />clBuildProgram returns -30 when build some kernel files, not all of them, while the others have the same config, as:<br /><br />m_ciErrNum = clBuildProgram(m_program, 0, NULL, "-cl-fast-relaxed-math", NULL, NULL);<br /><br />one of the problem kernels is in the attachment.<br /><br />when I remove TabulateCDF1Dv, no -30 and pass<br />when I keep TabulateCDF1Dv, and remove read_imagef and write_imagef inside, no -30 and pass<br />when I keep TabulateCDF1Dv, and remove write_imagef inside only, return -30.<br /><br />I think, maybe it is the problem of reading or writing 2D texture with height 1 . but kernel TabulateCDF2D runs well.<br /><br />these code can run on GTX460 and GTX580 with latest driver, so maybe a GTX680 driver bug ?<br /><br />it is quite weird...<br /><br />thanks for you help<br /><br />here is the code (if you can not get the attachment)<br /><br />__kernel __attribute__((reqd_work_group_size(WORKGROUP_SIZE, 1, 1)))<br />void TabulateCDF1Dv(__read_only image2d_t CDF1D, sampler_t normSampler, __write_only image2d_t CDF1DTable, sampler_t pixSampler, int lenth)<br />{<br />       uint tid = get_global_id(0);<br />       float cdfValue = (tid+0.0f)/(lenth+0.0f);<br /><br />       float index = 0.5f;<br />       float step = 0.5f;<br /><br />       for(int i=0; i&lt;8; i++)<br />       {<br /><br />          float4 tex1DRefValue = read_imagef(CDF1D, normSampler, (float2)(index,0.0f));<br />          float refValue = tex1DRefValue.x;<br /><br />          step *= 0.5f;<br /><br />          float diff = (cdfValue - refValue);<br /><br />          if(diff &lt; pMAXERROR &amp;&amp; diff &gt; nMAXERROR)<br />          {<br />                 break;<br />          }<br />	  if(diff &lt; nMAXERROR)<br />          {<br />		index = index - step;<br />          }<br />          if(diff &gt; pMAXERROR)<br />          {<br />                index = index + step;<br />          }<br />       }<br /><br />       write_imagef(CDF1DTable, (int2)(tid,0), (float4)(index));<br />}<br /><br />__kernel __attribute__((reqd_work_group_size(WORKGROUP_SIZE, 1, 1)))<br />void TabulateCDF2D(__read_only image2d_t CDF2D, sampler_t normSampler,__read_only image2d_t CDF1DTable, __write_only image2d_t CDF2DTable, sampler_t pixSampler, int lenth)<br />{<br />       uint tid = get_global_id(0);<br />       uint indexU = get_group_id(0);<br />       uint indexV = get_local_id(0);<br /><br />       float4 tex1DRefValue = read_imagef(CDF1DTable, pixSampler, (int2)(indexU,0));<br /><br />       float cdfValueV = (indexV+1.0f)/(lenth+1.0f);<br /><br />       float indexU0 = tex1DRefValue.x;<br />       float indexV0 = 0.5f;<br />       float step = 0.5;<br /><br /><br />       for(int i = 0; i&lt;8; i++)<br />       {<br />         float4 tex2DRefValue = read_imagef(CDF2D, normSampler, (float2)(indexV0, indexU0));<br /><br />         float refValue = tex2DRefValue.x;<br /><br />         step *= 0.5f;<br /><br />         float diff = (cdfValueV - refValue);<br /><br />          if(diff &lt; pMAXERROR &amp;&amp; diff &gt; nMAXERROR)<br />          {<br />                 break;<br />          }<br />	  if(diff &lt; nMAXERROR)<br />          {<br />		indexV0 = indexV0 - step;<br />          }<br />          if(diff &gt; pMAXERROR)<br />          {<br />                indexV0 = indexV0 + step;<br />          }<br />       }<br /><br />       write_imagef(CDF2DTable, (int2)(indexV,indexU), (float4)(indexV0));<br />}]]></description>
   </item>
      <item>
      <title>nvcc 4.2 pragma unroll issue</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7741/nvcc-4-2-pragma-unroll-issue</link>
      <pubDate>Tue, 01 May 2012 15:42:47 -0400</pubDate>
      <dc:creator>dlowell</dc:creator>
      <guid isPermaLink="false">7741@/devforum/discussions</guid>
      <description><![CDATA[If exit condition is: i&lt;=nv-1 where nv is define as a macro setting nv = NV, <a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> NV 16, then the unroll will be incorrectly implemented.<br /><br />Example, <br /><br /><code><a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> NV 16 <br />nv=NV;<br />int tid = threadIdx.x+blockDim.x*blockIdx.x;<br /><a href="/devforum/search?Search=%23pragma&amp;Mode=like">#pragma</a> unroll 2<br />for(int i=0;i&lt;=nv-1;i++){<br />  y[tid]+=a[i]*x[i*n+tid];<br />}</code><br /><br />The code above with nvcc 4.2 will produce incorrect code, where as nvcc 4.0 will produce correct code. The code below will produce correct output for nvcc 4.2.<br /><br /><code><a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> NV 16 <br />nv=NV;<br />int tid = threadIdx.x+blockDim.x*blockIdx.x;<br /><a href="/devforum/search?Search=%23pragma&amp;Mode=like">#pragma</a> unroll 2<br />for(int i=0;i&lt;nv;i++){<br />  y[tid]+=a[i]*x[i*n+tid];<br />}</code><br /><br />Anyone else have this issue?]]></description>
   </item>
      <item>
      <title>OpenCL callbacks scheduling</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/8031/opencl-callbacks-scheduling</link>
      <pubDate>Thu, 10 May 2012 09:59:48 -0400</pubDate>
      <dc:creator>rjmarques</dc:creator>
      <guid isPermaLink="false">8031@/devforum/discussions</guid>
      <description><![CDATA[Greatings,<br /><br />I am having a huge overhead when using callbacks on linux. After I enqueue the necessary read operation, I set a callback for apropriate threatment. The read takes less then a milisecond to complete, however the callback is only issued after, about, 19 miliseconds. Is this a driver issue?<br /><br />The graphics card is a Tesla C2050.<br />The driver version is 295.41.<br />And the GCC version is 4.4.3.<br /><br />Thanks,<br />Ricardo Marques<br /><br /> ]]></description>
   </item>
      <item>
      <title>Tegra VI/CSI interface</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/2951/tegra-vicsi-interface</link>
      <pubDate>Sat, 31 Dec 2011 07:49:13 -0500</pubDate>
      <dc:creator>Jens Andersen</dc:creator>
      <guid isPermaLink="false">2951@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br />I am attempting to get a camera working on a tegra board.<br />The camera is connected through the Tegra-isp port, using Parallel 8-bit VI interface.<br />I am looking to use the CHROMIUM provided V4L2 interface, but this is not really relevant to my query.<br /><br />I have the public TRM, but this only seems to document the serial CSI interface and a LOT of registers are missing.<br />I've attached a list of all the registers mentioned by the V4L2 driver, and comparing that to the documentation available, not even all the CSI registers are documented. <br /><br />I have also verified that this information is not available in the private TRM through third-party contacts.<br /><br />Is it possible to get any kind of documentation on these VI registers? It doesn't appear to me like this is something that is required to be kept secret, but I understand that it doesn't appear to be properly documented at the moment, so even sparse documentation with just possible values would be a huge help!<br /><br />]]></description>
   </item>
      <item>
      <title>Does current cuda-gdb allow single GPU debugging like Nsight 2.2? in CUDA 5 will support it?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7611/does-current-cuda-gdb-allow-single-gpu-debugging-like-nsight-2-2-in-cuda-5-will-support-it</link>
      <pubDate>Thu, 26 Apr 2012 19:31:13 -0400</pubDate>
      <dc:creator>oscarbg</dc:creator>
      <guid isPermaLink="false">7611@/devforum/discussions</guid>
      <description><![CDATA[As Nsight 2.2 now supports single GPU debugging via called software preemption cuda-gdb supports same technology on Linux or Mac? will it support it soon? as seems GTC will unveil nsight for mac and linux hope it's added there too as I think it will use cuda-gdb underneath..]]></description>
   </item>
      <item>
      <title>Tegra4Linux?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7831/tegra4linux</link>
      <pubDate>Thu, 03 May 2012 12:03:05 -0400</pubDate>
      <dc:creator>savalik</dc:creator>
      <guid isPermaLink="false">7831@/devforum/discussions</guid>
      <description><![CDATA[Tegra2 was submitted in January 2010. Now in 2012.<br />In the summer of 2011 I bought a toshiba ac100. The first thing I did - removed the android.<br />Since then, every day I'm waiting for stable drivers, and normal multimedia software(with openmax and openGL ES support). Why Mali's users enjoy the rainbow until we cry?]]></description>
   </item>
      <item>
      <title>Tegra for Linux bugs</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7816/tegra-for-linux-bugs</link>
      <pubDate>Thu, 03 May 2012 04:56:52 -0400</pubDate>
      <dc:creator>mase</dc:creator>
      <guid isPermaLink="false">7816@/devforum/discussions</guid>
      <description><![CDATA[I tested the hardfp driver on my AC100. There are some issues, which also appeared on<br />softfp.<br />I am using Debian Wheezy with xfce4 and the Ventana driver package.<br />The kernel is linux-tegra-nv-ac100-3.1-exp. The chromeos kernels make the desktop<br />freeze after some minutes. Only the mouse keeps movable. But no klicks possible and<br />the keyboard does not react.<br /><br />The window manager of xfce has to be restarted after boot. Otherwise the window title<br />bar disappears. That has also to be done when resuming after suspend.<br />There are still some graphical glitches in the upper panel of xfce. Some users report,<br />that other desktop environments have also such glitches. Switching to console gives a<br />black screen. I cannot return to x after that.<br /><br />BTW: Will there be omx support and a codec package for hf?]]></description>
   </item>
      <item>
      <title>cudaEvent timers vs. Host timers</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7541/cudaevent-timers-vs-host-timers</link>
      <pubDate>Wed, 25 Apr 2012 16:35:14 -0400</pubDate>
      <dc:creator>dlowell</dc:creator>
      <guid isPermaLink="false">7541@/devforum/discussions</guid>
      <description><![CDATA[In doing performance testing we are trying two methods of timing.<br />Cuda event based timing and system timer.<br />We are running on a Fermi 2070 sm_20, with CUDA SDK 4.2<br />I haven't seen anything the internet that makes it clear whether one is superior over the other in terms of timing. I've seen the cudaDeviceSynchronize() used for this purpose,...any insight would be valuable. <br /><br />Thanks ahead of time!<br /><br /><br />The first is the built in event based:<br /><br /><code>cudaEventRecord(start, 0);<br />kernel&lt;&lt;&lt;grid,block&gt;&gt;&gt;(devy,devx,alpha,length);<br />cudaEventRecord(stop, 0);<br />cudaEventSynchronize(stop);</code><br /><br /><br />The second is a system time based timer using a barrier.<br /><br /><code>  start = getclock();<br />  kernel&lt;&lt;&lt;dimGrid,dimBlock&gt;&gt;&gt;(devy,devx,alpha,length);<br />  cudaDeviceSynchronize();<br />  finish = getclock();</code><br /><br />where getclock() is defined as:<br /><br /><code><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> &lt;sys/time.h&gt;<br />double getclock(){<br />  struct timezone tzp;<br />  struct timeval tp;<br />  gettimeofday (&amp;tp, &amp;tzp);<br />  return (tp.tv_sec + tp.tv_usec*1.0e-6);<br />}</code><br /><br /><br />The kernel we are running is:<br /><br /><code>__global__ void  kernel(double* devY,double* devX, double alpha, int length){<br /> /* w &lt;- y + alpha*x */<br />  int tid = blockIdx.x*blockDim.x+threadIdx.x;<br />  if(tid&lt;length){<br />    devY[tid]=alpha*devX[tid];<br />  }<br />}</code><br />]]></description>
   </item>
      <item>
      <title>uncertain results in CUDA program</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7711/uncertain-results-in-cuda-program</link>
      <pubDate>Tue, 01 May 2012 05:07:32 -0400</pubDate>
      <dc:creator>tanjun2525</dc:creator>
      <guid isPermaLink="false">7711@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br />I'm trying to write a matrix iteration program by CUDA. But I get some problems.<br />The whole program procedures are: CPU assign the matrix to GPU; GPU iterate the matrix following a fomular; CPU get back the new matrix; CPU judge whether the matrix is convergent, if it is, stop iteration, otherwise, continue.<br />Matrices are transformed to 1-D arrays while delivering between CPU and GPU.<br />Now I cannot get right results. Because the iteration process is uncertain. Different results would be obtained while excuting the program several times.<br />Each element of the matrix is assigned one thread to iterate. So, I defined n1*n2 threads to iterate an n1*n2 matrix.<br />I cannot figure out why the result is uncertain.<br /><strong>Maybe some "volatile" keywords are needed. But I dont know how to define a volatile variable in global memory. I tried, but "error: argument type 'volatile double *' is imconpatible with parameter of type 'void *'" was reported while assigning values to them using cudaMemcpy.</strong><br />The codes are shown following. I'm sorry for my poor English.<br />Any suggestions would be highly appreciated.<br /><br /><br />CPU_function{<br />  const int num_threads = n1 * n2;<br />  const int threadsPerBlock = 16 * 16;<br />  const int blocksPerGrid = (num_threads + threadsPerBlock - 1) / threadsPerBlock;<br /><br />  ...//variables defining and values assigning<br /><br />  do<br />  {<br />	 if(times &gt; 1000)<br />              break;<br /><br />     // assign the new result to the next iteration step<br />     CUDA_SAFE_CALL( cudaMemcpy( d_Sk2, h_Sk2, n1*n2*sizeof(double), cudaMemcpyHostToDevice) );<br /><br />     // core function<br />     ComputeSim_Kernel&lt;&lt;&gt;&gt;(d_Sim, <br />	 d_Sk2,<br />	 d_Sk1,<br />	 d_adjMatrix_Yeast,<br />	 d_adjMatrix_Fly,<br />	 d_yeastIndex,<br />	 d_flyIndex,<br />	 n1, n2<br />	 );<br />     cudaThreadSynchronize();<br /><br />    // get the new matrix from GPU<br />    CUDA_SAFE_CALL( cudaMemcpy( h_Sk1, d_Sk1, n1*n2*sizeof(double), cudaMemcpyDeviceToHost) );<br /><br />    // get the maxum element of the matrix<br />    maxw = h_Sk1[0];<br />    for(int i=1; i    {<br />	 if(h_Sk1[i] &gt; maxw)<br />	 	 maxw = h_Sk1[i];<br />    }<br /><br />    minSk = 1;<br />    maxDeltaSk01 = 0;<br />    maxDeltaSk02 = 0;<br />    deltaSk01 = 0;<br />    deltaSk02 = 0;<br /><br />    ...// some compute for judging convergency<br /><br />    tmpsk = h_Sk2;<br />    h_Sk2 = h_Sk1;<br />    h_Sk1 = h_Sk0;<br />    h_Sk0 = tmpsk;<br /><br />    ++times;<br /><br />   }while((maxDeltaSk01 &gt; 0.01) &amp;&amp; (maxDeltaSk02 &gt; 0.01));<br />}<br /><br /><br />Codes on device:<br /><br /><a href="/devforum/search?Search=%23ifndef&amp;Mode=like">#ifndef</a> __COMPUTESIM_KERNEL_H__<br /><a href="/devforum/search?Search=%23define&amp;Mode=like">#define</a> __COMPUTESIM_KERNEL_H__<br /><br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> "cuda.h"<br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> "cutil.h"<br /><a href="/devforum/search?Search=%23include&amp;Mode=like">#include</a> "computeSim_kernel.h"<br /><br />const int threadsPerBlock = 256;<br /><br />inline __global__ void<br />ComputeSim_Kernel(double *Sim, double *Sk2, double *Sk1, int *adjMatrix_Yeast, <br />	 int *adjMatrix_Fly, int *yeastIndex, int *flyIndex, int n1, int n2)<br />{<br />	 for(int i=0; i	 	 yeastIndex[i] = 0;<br />	 for(int j=0; j	 	 flyIndex[j] = 0;<br /><br />	 unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;<br /><br />	 const unsigned int index = threadIdx.x;<br />	 const unsigned int stride = blockDim.x * gridDim.x;<br />	 int iIndex, jIndex;<br />	 int iDegree, jDegree;<br />	 double N1, N2;<br /><br />	 N1 = 0.0;<br />	 N2 = 0.0;<br />	 iDegree = 0;<br />	 jDegree = 0;<br /><br />	 // iterate the n1*n2 matrix<br />	 while(tid &lt; (n1*n2))<br />	 {<br />	 	 Sk1[tid] = 0;// for storing the new value<br />	 	 if(Sim[tid] == 0)// if the element is 0, no need to iterate<br />	 	 {<br />	 	 tid += stride;<br />	 	 continue;<br />	 	 }<br /><br />		 iIndex = tid / n2; // get the row index in matrix<br />	 	 jIndex = tid % n2; // get the column index in matrix<br /><br />	 	 // some data structure for iteration<br />	 	 for(int i=0; i	 	 	 yeastIndex[i] = adjMatrix_Yeast[iIndex * (n1+1) + i];<br />	 	 iDegree = adjMatrix_Yeast[iIndex * (n1+1) + n1];<br /><br />	 	 // some data structure for iteration<br />	 	 for(int j=0; j	 	 	 flyIndex[j] = adjMatrix_Fly[jIndex * (n2+1) + j];<br />	 	 jDegree = adjMatrix_Fly[jIndex * (n2+1) + n2];<br /><br /><br />	 	 // compute N1 for iteration<br />	 	 if((iDegree != 0) &amp;&amp; (jDegree != 0))<br />	 	 {<br />	 	 	 for(int i=0; i	 	 	 {<br />	 	 	 	 if(yeastIndex[i] == 1)<br />	 	 	 	 	 for(int j=0; j	 	 	 	 	 {<br />	 	 	 	 	 	 // a2&lt;-&gt;a, b2&lt;-&gt;b<br />	 	 	 	 	 	 if(flyIndex[j] == 1)<br />	 	 	 	 	 	 	 N1 += Sk2[i * n2 + j];<br />	 	 	 	 	 }<br />	 	 	 }<br />	 	 	 N1 /= (iDegree * jDegree);<br />	 	 }<br />	 	 else if((iDegree == 0) &amp;&amp; (jDegree == 0))<br />	 	 {<br />	 	 	 for(int i=0; i	 	 	 {<br />	 	 	 	 for(int j=0; j	 	 	 	 	 N1 += Sk2[i * n2 + j];<br />	 	 	 }<br />	 	 	 N1 /= (n1 * n2);<br />	 	 }<br />	 	 else<br />	 	 	 N1 = 0;<br /><br />	 	 // compute N2 for iteration<br />	 	 if((iDegree != n1) &amp;&amp; (jDegree != n2))<br />	 	 {<br />	 	 	 for(int i=0; i	 	 	 {<br />	 	 	 	 if(yeastIndex[i] == 0)<br />	 	 	 	 	 for(int j=0; j	 	 	 	 	 {<br />	 	 	 	 		  // a2 !&lt;-&gt;! a, b2 !&lt;-&gt;! b<br />	 	 	 	 		  if(flyIndex[j] == 0)<br />	 	 	 	 	 	 	 N2 += Sk2[i * n2 + j];<br />	 	 	 	 	 }<br />	 	 	 }<br />	 	 	 N2 /= ((n1 - iDegree) * (n2 - jDegree));<br />	 	 }<br />	 	 else if((iDegree == n1) &amp;&amp; (jDegree == n2))<br />	 	 {<br />	 	 	 for(int i=0; i	 	 	 {<br />	 	 	 	 for(int j=0; j	 	 	 	 	 N2 += Sk2[i * n2 + j];<br />	 	 	 }<br />	 	 	 N2 /= (n1 * n2);<br />	 	 }<br />	 	 else<br />	 	 	 N2 = 0;<br /><br />	 	 // update the matrix using N1 and N2<br />	 	 Sk1[tid] = (N1 + N2)/2 * Sim[tid];<br /><br />	 	 tid += stride;<br /><br />	 } // while(tid}<br /><br /><a href="/devforum/search?Search=%23endif&amp;Mode=like">#endif</a> // __COMPUTESIM_KERNEL_H__]]></description>
   </item>
      <item>
      <title>How to avoid Xorg lockups and display corruption on Fedora 15 / kernel 3.x.y?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7681/how-to-avoid-xorg-lockups-and-display-corruption-on-fedora-15-kernel-3-x-y</link>
      <pubDate>Sun, 29 Apr 2012 05:49:56 -0400</pubDate>
      <dc:creator>xilman</dc:creator>
      <guid isPermaLink="false">7681@/devforum/discussions</guid>
      <description><![CDATA[Since upgrading to Fedora 15 I've had to run the 295 series of drivers because earlier ones won't build into the V3.x.y kernels.  So far so good. The installation works, the machine reboots, the X11 display starts up (mostly anyway) and CUDA applications build and run.  However ...<br /><br />Very frequently there is display corruption, usually taking the form of rectangular blocks not updating as windows are moved around but various other forms have been seen.  For instance, on two occasions <strong>every</strong> line of text had every 7th character (counting backwards from the end) replaced by a blank!<br /><br />Also very frequently, the Xorg process runs away and takes 100% of the cpu.  It sometimes crashes and restarts.  It sometimes, as now (I'm typing on another system), locks solid and the machine is completely unresponsive.  Most of the time switching to a text console with Alt-F2 and back to the X display is enough to restore usability for a little while.  Logging in remotely and "telinit 3" followed by "telinit 5" is also a temporary workaround.<br /><br />There's a rash of reports of this sort of behaviour to be found on the web but I've yet to find any  comment by Nvidia people.<br /><br />So, does anyone have any suggestions on how to run Nvidia device drivers reliably on recent Linux kernels?<br /><br />Thanks,<br />	 Paul]]></description>
   </item>
      <item>
      <title>nvcc 4.2; a cicc and gcc preprocessing issue</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7441/nvcc-4-2-a-cicc-and-gcc-preprocessing-issue</link>
      <pubDate>Mon, 23 Apr 2012 16:23:17 -0400</pubDate>
      <dc:creator>dlowell</dc:creator>
      <guid isPermaLink="false">7441@/devforum/discussions</guid>
      <description><![CDATA[After upgrading to SDK 4.2 for some reason when I am building my library I now get this error below:<br /><br /><br /><br /><code>#$ cicc  -arch compute_20 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 -g -O0 "/tmp/tmpxft_00002684_00000000-10_vecgpu" "/tmp/tmpxft_00002684_00000000-7_vecgpu.cpp3.i"  -o "/tmp/tmpxft_00002684_00000000-2_vecgpu.ptx"<br />&lt;built-in&gt;(2): error: "__STDC_HOSTED__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(8): error: "__WCHAR_TYPE__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(115): error: "__x86_64" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(116): error: "__x86_64__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(126): error: "__linux__" is predefined; attempted redefinition ignored<br /><br />&lt;built-in&gt;(128): error: "__unix__" is predefined; attempted redefinition ignored<br /><br />6 errors detected in the compilation of "/tmp/tmpxft_00002684_00000000-7_vecgpu.cpp3.i".<br /># --error 0x1 --</code><br /><br /><br /><br /><br />My gcc version is 4.4 though I've attempted this on 4.3<br />I am not sure why it is getting caught on this. If the redefinition is being ignored, why is it throwing an error and stopping compilation at all? Additionally the NVCC doc still mentions cicc as nvopencc, and in fact nowhere mentions cicc.<br /><br />Has anyone else had this issue? Any tips would be greatly appreciated.]]></description>
   </item>
      <item>
      <title>Out of order command execution non-working on Linux?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7526/out-of-order-command-execution-non-working-on-linux</link>
      <pubDate>Wed, 25 Apr 2012 11:23:13 -0400</pubDate>
      <dc:creator>thorfdbg</dc:creator>
      <guid isPermaLink="false">7526@/devforum/discussions</guid>
      <description><![CDATA[Could it be that out-of-order execution in command queues is currently simply not working with the current OpenCL SDK? I'm creating here a command queue with the flag CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, but when checking in the profiler, I see that the memory operations (buffer copy) and the GPU computation are still not overlapping, but executed sequentially.<br />]]></description>
   </item>
      <item>
      <title>Low memcpy performance in OpenCL, what to do about it?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6036/low-memcpy-performance-in-opencl-what-to-do-about-it</link>
      <pubDate>Fri, 16 Mar 2012 19:41:24 -0400</pubDate>
      <dc:creator>thorfdbg</dc:creator>
      <guid isPermaLink="false">6036@/devforum/discussions</guid>
      <description><![CDATA[Folks,<br /><br />using the nvvp profiler shows that my current OpenCL application has a low memcpy performance on Linux. Actually, it is only ~400MB/sec host to device and about 800MB/sec device to host. The nvvp compiler makes suggestions on CUDA, which I'm not using (this is OpenCL). The manual states that I should allocate pinned memory. <br /><br />I tried the following approaches:<br /><br />a) Allocate the buffers with CL_MEM_ALLOC_HOST_PTR and mapping buffers to host memory. Result: Negative.<br /><br />b) Pinning memory myself with the mlock() Linux system call. Result: Negative.<br /><br />Memcpy performance remains at a crawl in both setups, at exactly the same speed. This brings me to the rather paradoxical situation that even though my kernel is fast (~10ms for an operation) the memcpy to the GPU and back takes an enmourmous amount of time (~70ms) and makes the GPU usage rather unattractive - I can get about the same speed with using SSE2 vector instructions of the CPU.<br /><br />Any help or hints?]]></description>
   </item>
      <item>
      <title>Tegra Development Pack not working with SDK and ADT 18?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7291/tegra-development-pack-not-working-with-sdk-and-adt-18</link>
      <pubDate>Wed, 18 Apr 2012 20:18:26 -0400</pubDate>
      <dc:creator>skip</dc:creator>
      <guid isPermaLink="false">7291@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />I have downloaded the <a href="http://developer.nvidia.com/tegra-android-development-pack" title="Tegra Android Developer Pack 1.0r5">Tegra Android Developer Pack 1.0r5</a>, have tried out the NVIDIA samples and written my own applications created with the help of <em>app_create.sh</em> and its <em>nv_event</em> option and everything worked fine.<br /><br />As an improved Android emulator has been <a href="http://android-developers.blogspot.de/2012/04/faster-emulator-with-better-hardware.html">announced</a> recently, I updated the SDK tools and the ADT plugin to version 18. There was a crash at the end of the update process, so I uninstalled ADT with the Eclipse plugin manager and installed the new version.<br /><br />Now all these things seem up-to-date and I can compile NVIDIA's and my NDK applications. But when I try to run them on my Tegra 3 tablet, I receive the "Unfortunately [app] has stopped" error message. Logcat gives me the error "Unable to resolve superclass of [app]", I suppose the mentioned superclass is <i>NvEventQueueActivity</i> then.<br /><br />Google led me to solutions for <a href="http://stackoverflow.com/questions/6168841/unable-to-resolve-super-class">similar problems</a> due to <a href="http://tools.android.com/recent/dealingwithdependenciesinandroidprojects">changes</a> in SDK and ADT version 17. This didn't help me though, because I'm not using JAR files and the projects only have a <em>libs</em> folder anyway.<br /><br />My question is now: Is it a known problem that the Tegra Development Pack and all the applications using NVIDIA's event framework don't work with the new SDK and ADT versions? If so, what could I do to my make it all work again (except for downgrading the tools again)?]]></description>
   </item>
      <item>
      <title>Compile SDK samples on Ubuntu 10.04 plain vanilla ok, /usr/bin/ld: cannot find -lcuda on Optimus</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4331/compile-sdk-samples-on-ubuntu-10-04-plain-vanilla-ok-usrbinld-cannot-find-lcuda-on-optimus</link>
      <pubDate>Sun, 05 Feb 2012 17:04:50 -0500</pubDate>
      <dc:creator>gue22</dc:creator>
      <guid isPermaLink="false">4331@/devforum/discussions</guid>
      <description><![CDATA[Compile SDK samples on plain vanilla Ubuntu 10.04 + GTX 560 ok, on the Optimus machine and on a VMware [Fedora 14] VM w/o nVidia drv I get<br />make[1]: Entering directory `/home/gy/NVIDIA_GPU_Computing_SDK/C/src/deviceQueryDrv'<br />/usr/bin/ld: cannot find -lcuda<br />collect2: ld returned 1 exit status<br />make[1]: *** [../../bin/linux/release/deviceQueryDrv] Error 1<br />make[1]: Leaving directory `/home/gy/NVIDIA_GPU_Computing_SDK/C/src/deviceQueryDrv'<br />make: *** [src/deviceQueryDrv/Makefile.ph_build] Error 2<br /><br />[Edit: Just loaded a VMware Ubuntu 10.04 with CUDA toolkit and GPUcomp SDK just to double-check. Same error.]<br /><br />Don´t see any difference in the setup of the machines [except the driver - and a compile / make should not be dependent on the drv install! The Optimus notebook seems somewhere in between - with an Intel on-board GPU and the GTX 525 via PCIe. Could that be the cause there? Dev driver installed correctly though.]<br /><br />[EDIT 2: Why there compile (global make in the C subdir) 84 examples on the quad and only a handful on the Tosh and the HP with exactly the same Ubuntu 10.04.3 setup is beyond me.<br /><br />Why deviceQuery doesn´t compile in the global make on the latter two machines, but compiles without a hitch in the local make is BEYOND BEYOND. - Well, there must be some issues in the tool chain.]<br />Thx<br />G.]]></description>
   </item>
      <item>
      <title>linker error using cuda toolkit 4.1</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7091/linker-error-using-cuda-toolkit-4-1</link>
      <pubDate>Mon, 16 Apr 2012 05:39:43 -0400</pubDate>
      <dc:creator>sicherer</dc:creator>
      <guid isPermaLink="false">7091@/devforum/discussions</guid>
      <description><![CDATA[I just upgraded to version 4.1 of the cuda toolkit and now I get a linker error (Ubuntu 10.04):<br />CUDAPACKAGE/ipdiagsolver/CG.o: In function `cublasSdot':<br />tmpxft_00004800_00000000-1_CG.cudafe1.cpp:(.text+0x1c): undefined reference to `cublasGetCurrentCtx'<br />Using readelf -Ws I found that this symbol is no longer present in libcublas.so as it was in the 4.0 version. This is strange. I get no compiler errors, only the linker complains. <br />How can I get my code to link again? Please help!]]></description>
   </item>
      <item>
      <title>PCL - Flann Compilation Troubles due to NVCC</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7501/pcl-flann-compilation-troubles-due-to-nvcc</link>
      <pubDate>Tue, 24 Apr 2012 23:00:44 -0400</pubDate>
      <dc:creator>erwinkendo</dc:creator>
      <guid isPermaLink="false">7501@/devforum/discussions</guid>
      <description><![CDATA[Greetings<br /><br />I have been trying to compile PCL in an x86_65 Archlinux machine, and FLANN is one of its dependencies, but are getting some errors CUDA related.<br /><br />First, compiling it delivers the following error:<br /><br /><code>Linking CXX static library ../../lib/libflann_cpp_s-gd.a<br />[ 44%] Built target flann_cpp_s-gd<br />[ 55%] Building NVCC (Device) object src/cpp/CMakeFiles/flann_cuda_s.dir/flann/algorithms/./flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o<br />No such file or directory<br />CMake Error at flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o.cmake:198 (message):<br />  Error generating<br />  /tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/build/src/cpp/CMakeFiles/flann_cuda_s.dir/flann/algorithms/./flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o<br /><br /><br />make[2]: *** [src/cpp/CMakeFiles/flann_cuda_s.dir/flann/algorithms/./flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o] Error 1<br />make[1]: *** [src/cpp/CMakeFiles/flann_cuda_s.dir/all] Error 2<br />make: *** [all] Error 2<br /></code><br /><br />Following the error source, a line in src/cpp/CMakeList.txt, the <code>${NVCC_COMPILER_BINDIR}</code> variable seems to not be found when using the <code>--compiler-bindir</code> instruction.<br /><br /><code>set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};-Xcompiler;-fPIC;-arch=sm_13;--compiler-bindir=${NVCC_COMPILER_BINDIR}" )</code><br /><br />Changing the code, I make it point to my gcc directory (Archlinux uses gcc-4.7) or without using the <code>--compiler-bindir</code> instruction (which gives the same result) trying to solve this issue, but in generates this errors:<br /><br /><code>[ 55%] Building NVCC (Device) object src/cpp/CMakeFiles/flann_cuda_s.dir/flann/algorithms/./flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o<br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(54): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(57): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(60): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(63): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(66): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(69): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(72): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(75): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(78): warning: statement is unreachable<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/general.h(81): warning: statement is unreachable<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1405): error: identifier "__int128" is undefined<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1409): error: identifier "__int128" is undefined<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1412): error: identifier "__int128" is undefined<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1421): error: identifier "__int128" is undefined<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1421): error: function call is not allowed in a constant expression<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1423): error: function call is not allowed in a constant expression<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1432): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1435): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1450): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1454): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1458): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1462): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1410): error: expected a ")"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1410): error: expected a ")"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1410): error: expected a ")"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1413): error: expected a ")"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1413): error: expected a ")"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1452): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1456): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1460): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1464): error: "__int128" is not a type name<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1479): error: expected a "&gt;"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1484): error: expected a ";"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1497): error: expected a ")"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1497): error: expected a ")"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1506): error: "__int128" has already been declared in the current scope<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1507): error: expected a ";"<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1524): error: "__int128" has already been declared in the current scope<br /><br />/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/limits(1525): error: expected a ";"<br /><br />/tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/src/cpp/flann/algorithms/dist.h(540): warning: integer conversion resulted in a change of sign<br /><br />31 errors detected in the compilation of "/tmp/tmpxft_00007a65_00000000-4_kdtree_cuda_3d_index.cpp1.ii".<br />CMake Error at flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o.cmake:256 (message):<br />  Error generating file<br />  /tmp/yaourt-tmp-erwin/aur-flann/src/flann-1.7.1-src/build/src/cpp/CMakeFiles/flann_cuda_s.dir/flann/algorithms/./flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o<br /><br /><br />make[2]: *** [src/cpp/CMakeFiles/flann_cuda_s.dir/flann/algorithms/./flann_cuda_s_generated_kdtree_cuda_3d_index.cu.o] Error 1<br />make[1]: *** [src/cpp/CMakeFiles/flann_cuda_s.dir/all] Error 2<br />make: *** [all] Error 2<br /></code> <br /><br />When I try and use other compiler (gcc-4.5) it doesn't even begin the compilation.<br /><br />If there is any temporary workaround, I will be happy to try it.]]></description>
   </item>
      <item>
      <title>calculating on shader</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5906/calculating-on-shader</link>
      <pubDate>Wed, 14 Mar 2012 10:47:06 -0400</pubDate>
      <dc:creator>zimmerlinde</dc:creator>
      <guid isPermaLink="false">5906@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />everybody speak about the shaders of the Tegra SoC. I<br />read there are x vertex and y fragment shaders. But nowhere i can find<br />an explanation of using parallelism for GLSL<br /><br />How can i use more<br />than one vertex or fragment shader? I’v never seen arguments or code for<br />use more than one vertex or fragment shader. I have no problem with<br />using one fragment and one vertex shader but i dont unserstand how i can<br />use more than one. Does the Compiler choose how many shaders to use?<br /><br />Thank you for your answers]]></description>
   </item>
      <item>
      <title>GTX580 PowerMizer cannot be changed by program on linux</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/4281/gtx580-powermizer-cannot-be-changed-by-program-on-linux</link>
      <pubDate>Fri, 03 Feb 2012 07:17:06 -0500</pubDate>
      <dc:creator>simtec</dc:creator>
      <guid isPermaLink="false">4281@/devforum/discussions</guid>
      <description><![CDATA[Because after every reboot the PowerMizer of the GTX580 is reset to Adaptive, I tried to set my GTX580 to maximum Power following command:<br /><br />nvidia-settings --assign GPUPowerMizerMode=1<br /><br />but this seems not to work. I use driver 290.10. What do I wrong or is there another possibility to do so (nvidia-settings handled with mouse works)<br /><br />Thanks for help]]></description>
   </item>
      <item>
      <title>Issues with using CUDA specifically on a version of Linux, Mac or a particular Windows release?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/191/issues-with-using-cuda-specifically-on-a-version-of-linux-mac-or-a-particular-windows-release</link>
      <pubDate>Mon, 29 Aug 2011 18:06:16 -0400</pubDate>
      <dc:creator>Nadeem Mohammad</dc:creator>
      <guid isPermaLink="false">191@/devforum/discussions</guid>
      <description><![CDATA[We have a comprehensive QA procedure to test all our supported configurations - but sometimes you may need to tweak your installation or there even some bugs or issues. Use these forums with the correct TAGs to ask a question or share some ideas.]]></description>
   </item>
      <item>
      <title>CUDA 4.X UVA and P2P is broken for 2 x GTX 680 running on AMD 990FX chipset</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7336/cuda-4-x-uva-and-p2p-is-broken-for-2-x-gtx-680-running-on-amd-990fx-chipset</link>
      <pubDate>Thu, 19 Apr 2012 16:13:18 -0400</pubDate>
      <dc:creator>mdvornik</dc:creator>
      <guid isPermaLink="false">7336@/devforum/discussions</guid>
      <description><![CDATA[CUDA kernels with UVA fetching are not running properly on 2 x GTX 680, AMD 990FX mobo. So far, it has been confirmed only for Scientific  Linux 6.2 (2.6.32-220.13.1 x86_64) with 295.40 Nvidia driver and CUDA 4.1(2 RC1).<br /><br />Symptoms: CUDA kernels running extremely slow and eventually the execution hangs. When running simpleP2P the reported bandwidth is 1GB/s. By repeatedly running simpleP2P, it hangs at some point just like the CUDA kernels from our software.<br /><br />The kernels running just fine with 2 x GTX 480 on Intel X58 mobo.<br /><br />Finally, 2 x GTX 480 are also happy with 990FX mobo!<br /><br />So the question is: Does consumer-grade Kepler has full-featured GLDirect enabled?]]></description>
   </item>
      <item>
      <title>CUDA SDK 4.1  and GCC 4.7.0</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6336/cuda-sdk-4-1-and-gcc-4-7-0</link>
      <pubDate>Sun, 25 Mar 2012 05:06:24 -0400</pubDate>
      <dc:creator>perestoronin</dc:creator>
      <guid isPermaLink="false">6336@/devforum/discussions</guid>
      <description><![CDATA[CUDA SDK 4.1 examples successful compile with gcc-4.6.3 boost-1.48.0 glibc-2.15<br />but<br />not successful compile with gcc-4.7.0 boost-1.49.0 glibc-2.15<br /><br />/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.0/include/g++-v4/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined<br /><br />/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.0/include/g++-v4/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined<br /><br />Please help to compile CUDA SDK 4.1 examples with gcc-4.7.0 release.]]></description>
   </item>
      <item>
      <title>I need a shader</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6641/i-need-a-shader</link>
      <pubDate>Tue, 03 Apr 2012 07:52:38 -0400</pubDate>
      <dc:creator>Kamlah</dc:creator>
      <guid isPermaLink="false">6641@/devforum/discussions</guid>
      <description><![CDATA[Hi All,<br />I would like to get some help. I work in a TV as a graphic artist. Here we use a virtual studio system, where I need to create or somehow solve the problem showing frozen glass like material. The point is that the objects behind need to be blured. I found a good example here: <a href="http://www.polycount.com/forum/showthread.php?t=87743" target="_blank" rel="nofollow">http://www.polycount.com/forum/showthread.php?t=87743</a><br />The first picture shows  some teapots behind a glasslike surface. Thats what I need.<br />My biggest problem is that iam a total lama for the Cg topic. I started to learn and go into deep into the topic, but now i see that it is much more complex as i thought.<br />So i would be really thankful ...<br />The studio render engine use Nvidia cards but I dont know the exact parameters now. If necessary i look for it.<br />thx <br />KG  ]]></description>
   </item>
      <item>
      <title>does hpl-2.0_FERMI_v13 use pinned pages?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6971/does-hpl-2-0_fermi_v13-use-pinned-pages</link>
      <pubDate>Thu, 12 Apr 2012 12:30:10 -0400</pubDate>
      <dc:creator>peteroliver</dc:creator>
      <guid isPermaLink="false">6971@/devforum/discussions</guid>
      <description><![CDATA[Hi<br /><br />I've just downeded hpl-2.0_FERMI_v13 and  compiled against cuda 4.1.28 using goto blas libraries. However, I was wondering is this the pinned memory version as performace (using M2090) is lower than suggested by HOWTOs.<br /><br />My results<br />eg 2 GPUs, with 100,000 size ~650Gflops<br />with 4 GPUS and 100,000 size ~7680Gflops<br /><br /><br />Thanks<br /><br />Pete]]></description>
   </item>
      <item>
      <title>Compute-modify with __threadfence()</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7276/compute-modify-with-__threadfence</link>
      <pubDate>Wed, 18 Apr 2012 11:44:28 -0400</pubDate>
      <dc:creator>chrism0dwk</dc:creator>
      <guid isPermaLink="false">7276@/devforum/discussions</guid>
      <description><![CDATA[Hi All,<br /><br />I have an algorithm in which requires a summary measure of a dataset to be computed before modifying that dataset (necessarily in that order).  The modification is only small -- in fact, only one element of a large array is changed.  My current code does the following:<br /><br />1. myKernel&lt;&lt;&gt;&gt;(...) ;<br />2. (thrust::device_vector) theDataset[modIdx] = modVal ; // Memcpy implemented as a thrust::device_vector<br /><br />According to the profiler, I get a long latency associated with the cudaMemcpy() call. I wondered (since I'm actually passing the modified value to the kernel anyway) if there was a sensible way of getting the kernel to do the update, but only after all threads have done their bit of the calculation?<br /><br />I wondered about the following kernel definition:<br /><br /><code>__global__<br />void<br />myKernel(float modifiedVal, int modifiedIdx, float* dataset,...)<br />{<br />  int tid = threadIdx.x + blockIdx.x*blockDim.x;<br /><br />  // Calculations here...<br /><br />  __threadfence(); // threads in all blocks must have <br />                   // read from dataset before modification<br /><br />  if(tid == 0) dataset[modifiedIdx] = modifiedVal;<br />}</code><br /><br />might be a good idea?  The bit that concerns me is the "if(tid==0)" line -- does this mean that the first block will be sat idle, consuming resources, whilst other blocks are executed around it?  Is there a better way to achieve my aim?<br /><br />Thanks,<br /><br />Chris<br />]]></description>
   </item>
      <item>
      <title>Bad video from jack1 and jack2 on Quadro FX SDI Capture</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6431/bad-video-from-jack1-and-jack2-on-quadro-fx-sdi-capture</link>
      <pubDate>Tue, 27 Mar 2012 10:23:33 -0400</pubDate>
      <dc:creator>Sergey Gaychuk</dc:creator>
      <guid isPermaLink="false">6431@/devforum/discussions</guid>
      <description><![CDATA[Dear All!<br /><br />First of all: Sorry for my bad English :)<br />I want to develop application for pack input video to h264 container. I downloaded SDI SDK. I run vid2tex for check inputs. I had some troubles with video. I have good picture from first jack, and bad video from second and third jacks. I tried to swap (physically) first and third sdi inputs. After that, i still have good picture from first jack and bad from 2 and 3. I looked at console and saw that VideoCapture return PARTIAL_SUCCESS. I checked LAST_VIDEO_CAPTURE_STATUS on all jacks, and i had SUCCESS for the first and FAILURE for others. I tried to get video only from the second jack, and after call VideoCapture, i had failure result. I tried to get glError but i had no errors.<br />I have:<br /> Debian with installed nvidia drivers 295 version.<br /> Intel XEON server with KVM<br /> Quadro SDI Capture Card<br /> Quadro FX 4800<br /> Input on all jack the same:<br />   Video Format: 1920*1080i 50.00 Hz (SMPTE274)<br />   Component Sampling: 4:2:2<br />   Color space: YCbCr<br />   Bits per component: 10bpc<br /><br />I didn't develop own program, i used vid2tex and changed NvSDIin for playing with different streams.<br /><br />If you need some other info, please ask me and I'll provide it in a moment.<br /><br />Thanks to all in advance,<br />Sergey Gaychuk<br /> ]]></description>
   </item>
      <item>
      <title>Enabeling WIFI on Tegra 3 Development board with Linux 4 Tegra</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6876/enabeling-wifi-on-tegra-3-development-board-with-linux-4-tegra</link>
      <pubDate>Tue, 10 Apr 2012 20:45:52 -0400</pubDate>
      <dc:creator>madmaze</dc:creator>
      <guid isPermaLink="false">6876@/devforum/discussions</guid>
      <description><![CDATA[Hello everyone,<br /><br />Ive been playing with L4T on a Cardhu Development tablet. I used the standard kernel and sample file system, but I seem not to be able to get the wifi to work.<br />If im not mistaken the wifi module is a bcm4329 wifi/bluetooth combo chip. In the L4T rootfs a kernel module is provided, bcm4329.ko. But it seems this only speaks to the bluetooth?<br /><br />I have used modprobe to load the module, but it cannot find/use the hardware.<br /><br />Does anyone have pointers to what I could try?<br /><br />Thanks,<br /><br />Matthias]]></description>
   </item>
      <item>
      <title>Weird PhysX problem</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6191/weird-physx-problem</link>
      <pubDate>Thu, 22 Mar 2012 08:19:31 -0400</pubDate>
      <dc:creator>Chico</dc:creator>
      <guid isPermaLink="false">6191@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />First excuse me for my english, I'm French...<br />I have settled a vehicle entity with a main body attached to a wheel by 2 revolute joint and an intermediate actor ( I don't use PhysX Vehicle ).<br />which mean : <br />	PxRigidDynamic (body) =&gt; revolute joint (steer) =&gt; PxRigidDynamic (intermediate actor) =&gt; revolute joint (motor) =&gt; PxRigidDynamic (wheel)<br />Positions and orientations of all Actors are used to actualize their visuals states.<br /><br />Now the problem : <br />Sometimes the visual part of wheels are detached from the vehicle while moving. <br />Positions returned by PhysX for the wheel at those moments stay the same even if the vehicle is moving.<br />But if the vehicle climbs a sidewalk (at least the missing wheel) the vehicle acts as if the wheel were still here.<br />It seems to mean that the physical part of the wheel is still there and 'alive' but the position is not actualized for use.<br /><br />I use Linux PhysX and Ihave experienced that problem in PhysX 3.1.x and 3.2 betas.<br /><br />PS : I'm sorry i cannot post any codes, pictures or videos ...<br /><br />edit : I'm allowed to post a picture.<br />You can see the vehicle on a sidewalk (with therefore only 3 wheel that touch the ground).<br />The white lines are debug contacts (positions and normals) showed by visual renderer.<br />You can see the rear-left wheel's visual is not at its good position because it is not being updated by PhysX accessor even if the wheel is still here physically and reports good contacts informations.]]></description>
   </item>
      <item>
      <title>Ubuntu 10.04: trying to &quot;make&quot; examples. i get this error alot.</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6536/ubuntu-10-04-trying-to-make-examples-i-get-this-error-alot-</link>
      <pubDate>Thu, 29 Mar 2012 02:19:33 -0400</pubDate>
      <dc:creator>aspaulding18</dc:creator>
      <guid isPermaLink="false">6536@/devforum/discussions</guid>
      <description><![CDATA[<img src="http://ubuntuone.com/3rzvaoB4kLGchsCGGVzZpV" alt="" /><br />This image is what is going on. i have freeglut3 installed so i have no idea whats wrong.<br />any help would be great! <br />thanks<br />Alex]]></description>
   </item>
      <item>
      <title>Driver with realtime linux kernel?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/7071/driver-with-realtime-linux-kernel</link>
      <pubDate>Sun, 15 Apr 2012 08:06:23 -0400</pubDate>
      <dc:creator>int512</dc:creator>
      <guid isPermaLink="false">7071@/devforum/discussions</guid>
      <description><![CDATA[I tried to compile and use a patched nvidia kernel module(patches provided <a href="http://www.clemensrabe.com/linux/nvidia-driver-295-20-and-the-rt-preempt-patch">here</a>), the module loaded(the gpu fan slowed down, without a driver it runs on max) but the screen remained black and CTRL+ALT+F1 -F8 didn't do anything, the splash logo never appeared(it does with generic debian kernel). Kernel version is 2.6.33.7.2-rt30(same as in the guide), driver version is 295.40(those patches are for 295.2, that might be a problem). The way I tried was to pass -x to the .run file, cd into 295.4/kernel directory created by the installer, apply the patches(didn't return any errors) boot in to a realtime kernel and pass the -K option to the nvidia-installer, it installed without errors, after a reboot the fan slows down and the screen remains black. Will nVidia provide support for RT kernels with a command-line option?]]></description>
   </item>
      <item>
      <title>Performance issue on first drawing of a VBO with a Quadro GPU under Linux</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6921/performance-issue-on-first-drawing-of-a-vbo-with-a-quadro-gpu-under-linux</link>
      <pubDate>Wed, 11 Apr 2012 12:14:47 -0400</pubDate>
      <dc:creator>aguinet</dc:creator>
      <guid isPermaLink="false">6921@/devforum/discussions</guid>
      <description><![CDATA[Hello everyone,<br /><br />I am experiencing a performance issue when drawing one million random lines thanks to a VBO w/ OpenGL (as an experiment) using a Quadro GPU under Linux. Indeed, the first frame (that draws 1 million lines) takes about 2.7s, and the others about 300ms. The same thing is done for each frame :<br /><br /><code><br />$ ./lines 1000000<br />Initialising 1000000 random lines...<br />Done !<br />Drawing took 2702.0719 ms... (370086.3773 lines/s)<br />Drawing took 305.3015 ms... (3275450.3285 lines/s)<br />Drawing took 301.8497 ms... (3312906.7015 lines/s)<br />Drawing took 305.5023 ms... (3273298.2181 lines/s)<br />Drawing took 300.9025 ms... (3323335.5547 lines/s)<br /></code><br /><br />You can download the source code here: <a href="http://files.geekou.info/gl_vbo.tar.bz2">http://files.geekou.info/gl_vbo.tar.bz2</a> . It is using GLUT.<br /><br />The thing is that this performance issue does not happen with the same machine but changing its GPU with a GTX 570, so that might be a driver issue (?):<br /><br /><code><br />$ ./lines 1000000<br />Initialising 1000000 random lines...<br />Done !<br />Drawing took 0.02 ms...<br />Drawing took 303.9075 ms... (3290475.0494 lines/s)<br />Drawing took 301.9110 ms... (3312233.9943 lines/s)<br />Drawing took 305.4198 ms... (3274182.1764 lines/s)<br />Drawing took 300.9701 ms... (3322589.3309 lines/s)<br />Drawing took 304.0043 ms... (3289427.4158 lines/s)<br />Drawing took 301.0136 ms... (3322108.8025 lines/s)<br />Drawing took 305.5717 ms... (3272554.1487 lines/s)<br />Drawing took 301.8462 ms... (3312944.9401 lines/s)<br /></code><br /><br />Is it due to the way the lines are drawn (using a VBO), or has someone an idea of what's the source of this behaviour ?<br /><br />Some informations about the system used:<br /><br /><code><br />$ uname -a<br />Linux proto-01 3.2.0-2-amd64 <a href="/devforum/search?Search=%231&amp;Mode=like">#1</a> SMP Tue Mar 20 18:36:37 UTC 2012 x86_64 GNU/Linux<br /></code><br /><br />You can find the output of glxinfo here: <a href="http://pastebin.com/jt6xWyA7">http://pastebin.com/jt6xWyA7</a><br />This is a debian testing system up-to-date (as of today).<br />The NVIDIA driver used is the 285.05.33 (hte latest CUDA driver that can be downloaded for Linux as of today).<br />Please let me know if any other informations can be relevant :<br /><br />Thanks for any help!<br /><br />Regards]]></description>
   </item>
      <item>
      <title>GLSL floatBitsToInt() error</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6746/glsl-floatbitstoint-error</link>
      <pubDate>Fri, 06 Apr 2012 17:22:05 -0400</pubDate>
      <dc:creator>hangdou</dc:creator>
      <guid isPermaLink="false">6746@/devforum/discussions</guid>
      <description><![CDATA[Hi, in my fragment shader, I tried to use floatBitsToInt() to get the bitwise representation of a float number into int. However, floatBitsToInt() only takes const value:<br /><br />When I use floatBitsToInt(2.334), it gives me a right result. <br />When I use floatBitsToInt( temp ), it simply returns 0.<br /><br />I have looked up into this for long. Anybody can help? Thanks a lot.<br />My system is:Linux, opengl 4.2, GTX 550 Ti.]]></description>
   </item>
      <item>
      <title>PhysX with CUDA on Linux</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6671/physx-with-cuda-on-linux</link>
      <pubDate>Tue, 03 Apr 2012 23:12:10 -0400</pubDate>
      <dc:creator>Bonaducci</dc:creator>
      <guid isPermaLink="false">6671@/devforum/discussions</guid>
      <description><![CDATA[Actually, I'm working now on physics for MMO RPG game with big ability of modyfying scenery. Basicly I'd like to use PhysX to perform most of expected actions, but I have linux server running on tesla. I've been using this mainly to calculate sparse matrix-vector multiplications and other things like this. Of cource CUDA with linux works fine, but I'm new with PhysX and I'm wondering, if it's possible to use CUDA for PhysX on linux OS?]]></description>
   </item>
      <item>
      <title>Is there any GTX-680 specified Optimise tips?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6666/is-there-any-gtx-680-specified-optimise-tips</link>
      <pubDate>Tue, 03 Apr 2012 21:44:10 -0400</pubDate>
      <dc:creator>ryu-o</dc:creator>
      <guid isPermaLink="false">6666@/devforum/discussions</guid>
      <description><![CDATA[I have a CUDA application which developed on the Fermi arch. <br />I have make it run on the new GTX-680(Kepler), but it is even slower than GTX-580.<br />So, Is there any GTX-680 specified Optimise tips i can try?<br /><br />I am using the newest toolkit(4.2.6) on Fedora 14.<br /><br />thx.<br />]]></description>
   </item>
      <item>
      <title>No cuda-capable device found on both Windows and Linux</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6561/no-cuda-capable-device-found-on-both-windows-and-linux</link>
      <pubDate>Thu, 29 Mar 2012 19:37:54 -0400</pubDate>
      <dc:creator>nicolee</dc:creator>
      <guid isPermaLink="false">6561@/devforum/discussions</guid>
      <description><![CDATA[Hello!<br /><br />I'm running into an issue getting any code that uses the GPU in my Lenovo Y570 laptop in both Windows 7 and Ubuntu 10.04.  On both operating systems, I get a message telling me no cuda-capable device was found.  However, Windows hardware manager shows my correct graphics card, and Ubuntu shows the correct graphics driver as well.  <br /><br />I've checked to make sure the versions of tools that work with cuda are the same as what is listed in the supported distro section of the Release Notes.  I'm working with a GeForce 555M.  I haven't had any trouble getting these tools up and running on my desktop, just this laptop.<br /><br />Any help or advice would be greatly appreciated!<br /><br />Thanks,<br />Nicole]]></description>
   </item>
      <item>
      <title>Vertex buffer objects and host memory allocation</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6486/vertex-buffer-objects-and-host-memory-allocation</link>
      <pubDate>Wed, 28 Mar 2012 03:30:21 -0400</pubDate>
      <dc:creator>c_gerlach</dc:creator>
      <guid isPermaLink="false">6486@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />I'm currently working on a render engine and therefore I use (static) vertex buffer objects to store the geometry in the gpu. It happens that the geometry can be very big and therefore I am concerned about the memory footprint of the application. That's why I hoped that using static VBOs gives me back some host memory.<br /><br />As the application runs on windows and linux I used the taskmanager and top to take a look on the allocated memory before and after VBO creation. After creating the VBO the scene is rendered multiple times to convince the driver that it is a really good idea the keep the data in the gpu.<br /><br />On Windows the reported memory in the task manager dropped to an amount, which seems reasonable for the application - without the geometry data.<br /><br />On Linux top did not show any real change (just a drop of some MBs).<br /><br />So here's the question(s):<br />* Does the driver keep a copy of the VBO?<br />* Is there anything I can do to take better control of the VBO placement?<br />* Why do windows and linux show different behaviours in this case?<br /><br />Thanks in advance.]]></description>
   </item>
      <item>
      <title>How to use choose specific GPU device on linux for opengl?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6446/how-to-use-choose-specific-gpu-device-on-linux-for-opengl</link>
      <pubDate>Tue, 27 Mar 2012 11:25:49 -0400</pubDate>
      <dc:creator>hangdou</dc:creator>
      <guid isPermaLink="false">6446@/devforum/discussions</guid>
      <description><![CDATA[Hi, I am working on a machine with two GTX 580 with one monitor. <br />1. It is said that OpenGL will only choose the graphic card connecting to the monitor even with two graphic cards. Will OpenGL makes use of two cards if I use two monitors and improves the performance? <br />2. How could I choose a specific device on linux for OpenGL with NVIDIA card? Someone told me we can only do this on windows. Is that true?<br /><br />Any hint will help. Thanks.]]></description>
   </item>
      <item>
      <title>No global counters in CUPTI 4.1? Separate counters in CUPTI 4.1?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6426/no-global-counters-in-cupti-4-1-separate-counters-in-cupti-4-1</link>
      <pubDate>Tue, 27 Mar 2012 09:55:29 -0400</pubDate>
      <dc:creator>drcuda</dc:creator>
      <guid isPermaLink="false">6426@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I have developed monitoring software based on CUPTI 4.0. The idea was based on the event_sampling from CUDAToolsSDK. It seems that in CUPTI 4.0, CUPTI 4.0 exposed to each process the same set of counters. Specifically, if the process A was using GPU, the process B could detect that GPU was used based on reading the counters. In that context, the set of counters was global and visible to all CUPTI clients. <br /><br />Now, i.e., in CUPTI 4.1 it seems that each process has its own set of counters, so process A cannot detect any activity on GPU, even if process B executes kernels on GPU. Is my understanding correct, or do I miss something? <br /><br />I suspect that this might be because of the new driver that <br />restricts visibility of counters to a single LINUX process and does not allow to share them across different processes in the system. <br /><br />I have not checked this, but maybe for "global" monitoring of the GPU state, I could use CUPTI Activity API, which is a new feature in CUPTI 4.1.<br /><br />Thanks,]]></description>
   </item>
      <item>
      <title>Simple GPU program not working with -arch=sm_12 compiler option</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6411/simple-gpu-program-not-working-with-archsm_12-compiler-option</link>
      <pubDate>Tue, 27 Mar 2012 03:04:27 -0400</pubDate>
      <dc:creator>vinodhrajagopal</dc:creator>
      <guid isPermaLink="false">6411@/devforum/discussions</guid>
      <description><![CDATA[I have a very simple CUDA program. The program when compiled with -arch=sm_11 option, works correctly as expected. However, when compiled with -arch=sm_12, the results are unexpected. <br />The reason i want to use sm_12 is that i want to use atomicCAS() on a __shared__ variable.<br /><br />Here is the kernel code :<br /><br /><code><br />__global__ void dev_test(int *test) {<br />*test = 100;<br />}<br /></code><br /><br />I invoke the kernel code as below :<br /><br /><code><br />int *dev_int, val;<br />val = 0;<br />cudaMalloc((void **)&amp;dev_int, sizeof(int));<br />cudaMemset((void *)dev_int, 0, sizeof(int));<br />cudaMemcpy(dev_int, &amp;val, sizeof(int), cudaMemcpyHostToDevice);<br />dev_test &lt;&lt;&lt; 1, 1&gt;&gt;&gt; (dev_int);<br />int *host_int = (int*)malloc(sizeof(int));<br />cudaMemcpy(host_int, dev_int, sizeof(int), cudaMemcpyDeviceToHost);<br />printf("copied back from device %d\n",*host_int);<br /></code><br /><br />When compiled with -arch=sm_11, the print statement correctly prints 100. However when compiled with -arch=sm_12, it prints 0 i.e the changes inside the kernel function is not taking effect. I am guessing this is due to some incompatibility between my CUDA version and the nvidia drivers.<br /><br />CUDA version - 3.0 NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010 GCC version: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)<br /><br /><br />Any help is highly appreciated.]]></description>
   </item>
      <item>
      <title>Driver versions after 275.43 return inconsistent results</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/5826/driver-versions-after-275-43-return-inconsistent-results</link>
      <pubDate>Mon, 12 Mar 2012 08:47:52 -0400</pubDate>
      <dc:creator>radix</dc:creator>
      <guid isPermaLink="false">5826@/devforum/discussions</guid>
      <description><![CDATA[Code compiled with CUDA 4.1 does not return all expected results.  We have tested 4 different GTX (580, 560ti, 480, and 295) models with each driver version starting from 275.43 and ending with 295.20.  When given the result to each problem 275.43 will output what is expected, but all other versions will return a different result each time the code is run.  Was there a change in the driver code that could be affecting our program?]]></description>
   </item>
      <item>
      <title>Problems with variable arguments in the NDK</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6366/problems-with-variable-arguments-in-the-ndk</link>
      <pubDate>Mon, 26 Mar 2012 08:42:06 -0400</pubDate>
      <dc:creator>Sir Graham</dc:creator>
      <guid isPermaLink="false">6366@/devforum/discussions</guid>
      <description><![CDATA[Hi,<br /><br />I have a problem with the NDK and the use of variable arguments in<br />functions. This only happens to me in the NDK, the same code compiled<br />with Visual 2010 or with the Debian GNU functioning properly.<br /><br />int Test(int data,...)<br />{<br />   va_list arguments;<br /><br />   va_start ( arguments, data );<br />   int a =  va_arg ( arguments, int);<br />   va_end ( arguments );<br /><br />   return a;<br />}<br /><br />If I call this function somewhere in the program  for example:<br /><br />int b = Test(5,10);<br /><br />.. and made a debugger data. Within the function the variable "data"<br />grabs a random value and thus almost the same when I get the variable<br />"a" is not set to 10 (taking values ​​as garbage)....<br /><br />The corruption of the variables is as much for the "fixed" as the<br />variable "data" to the variables (obtained with the function va_arg<br />()). In the case of "fixed" as data corruption is even before<br />declaring the variable of type va_list va_xxx or use the functions ().<br /><br />If I remove the variable argument of the function the value of data is<br />quite correct:<br /><br />int Test(int data)<br />{<br />   return data;<br />}<br /><br />int b = Test(5);<br /><br /><br />It seems that when I use variable arguments in a function, stack it to<br />make the call or is not correct a problem occurs. This same code with<br />another compiler or on another platform, it works correctly.<br /><br />To compile with the NDK'm using the latest version available from<br />NVidia Tegra DevPack:<br />tegra-devpack-1.0-windows-2012-02-21-11617556.exe<br /><br />Anyone have any idea what's going on?<br /><br />Thank you very much.<br />]]></description>
   </item>
      <item>
      <title>glTexImage2D causes xorg to use 95% CPU</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6331/glteximage2d-causes-xorg-to-use-95-cpu</link>
      <pubDate>Sat, 24 Mar 2012 13:49:07 -0400</pubDate>
      <dc:creator>cameronking</dc:creator>
      <guid isPermaLink="false">6331@/devforum/discussions</guid>
      <description><![CDATA[Hello,<br /><br />I'm using a 9600 GT Mobile GPU with 256MB with Ubuntu Linux 10.04 LTS and Nvidia driver 195.36.24<br /><br />I have an application that generates a texture atlas 4096x4096 sized with RGBA, compressed and then about 7 smaller textures 56x56 sized, uncompressed for FBO reasons, but the next small texture I create which happens to be 48x48 uncompressed instead, seems to drive Xorg up to 95% of CPU.<br /><br />I isolated this effect down to the glTexImage2D call. If I make that texture 56x56 like the other small uncompressed textures then everything is fine, Xorg is around 0-1% of CPU.<br /><br />But if that last small texture is 48x48.. or in fact any other size than the other 56x56 textures, Xorg uses up a CPU core (around 95%). -The application itself remains around 6% of CPU as per normal..<br /><br />Is this a known bug with this driver series? What am I doing wrong?<br /><br />Thanks very much for any help or advice.<br /><br />Cameron<br />]]></description>
   </item>
      <item>
      <title>Where do I get CUDA toolkits for GeForce GTX 680?</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6306/where-do-i-get-cuda-toolkits-for-geforce-gtx-680</link>
      <pubDate>Fri, 23 Mar 2012 20:27:06 -0400</pubDate>
      <dc:creator>Bill Phipps</dc:creator>
      <guid isPermaLink="false">6306@/devforum/discussions</guid>
      <description><![CDATA[Linux Users:   <br /><br />The recommended NVIDIA Driver is 295.33, which can be downloaded from <a href="http://www.geforce.com/Drivers" target="_blank" rel="nofollow">http://www.geforce.com/Drivers</a>.<br /><br />The recommended CUDA Toolkit can be downloaded from:<br /><br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_rhel5.5.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_rhel5.5.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_sles11.0.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_sles11.0.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_suse11.2.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_suse11.2.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_ubuntu10.04.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_ubuntu10.04.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_ubuntu11.04.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_32_ubuntu11.04.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_fedora14.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_fedora14.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_rhel5.5.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_rhel5.5.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_rhel6.0.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_rhel6.0.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_sles11.0.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_sles11.0.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_suse11.2.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_suse11.2.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_ubuntu10.04.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_ubuntu10.04.run</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_ubuntu11.04.run" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_linux_64_ubuntu11.04.run</a> <br /> <br />Windows Users (GTX 680 only at this time):<br /><br />The recommended NVIDIA Driver for GTX 680 is 301.10, which can be downloaded from <a href="http://www.geforce.com/Drivers" target="_blank" rel="nofollow">http://www.geforce.com/Drivers</a>.<br /><br />The recommended CUDA Toolkit can be downloaded from:<br /><br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_win_32.msi" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_win_32.msi</a> <br /><a href="http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_win_64.msi" target="_blank" rel="nofollow">http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_win_64.msi</a> ]]></description>
   </item>
      <item>
      <title>console broken on linux-tegra-nv-3.1</title>
      <link>http://forums.developer.nvidia.com/devforum/discussion/6296/console-broken-on-linux-tegra-nv-3-1</link>
      <pubDate>Fri, 23 Mar 2012 15:08:32 -0400</pubDate>
      <dc:creator>Marc</dc:creator>
      <guid isPermaLink="false">6296@/devforum/discussions</guid>
      <description><![CDATA[hi,<br /><br />I tried the linux-tegra-nv-3.1 on my AC100 (paz00) and found that the framebuffer console has some problems. It shows the kernel booting, but it hangs somewhere in the initscripts. Adding console=ttyS0 makes it all going again. Well, more or less, because there is no text console when trying to switch from X to text. Strange enough, but booting with my sd card (where the ubuntu root fs is on) unplugged and manual mount it from the initrd makes it also boot again, so I initially thought it was a mmc problem. Also other oss devs mentioned that there seems to be a bug in the framebuffer console. Any hint?<br /><br />Thanks<br /><br />Marc]]></description>
   </item>
      </channel>
</rss>
