C1060/c1070 GPU Bench results for IO Ops

I was wondering if anyone posted GPUbench results for c1060/c1070 since it was not on GPUbench.org or quick search on the forums.

I was wondering what the practical and theoretical amount of I/O ops they can handle


“I/O ops?” You mean global memory bandwidth?

The theoretical peaks are listed here:


Although, they multiplied the per-card bandwidth by 4 to get 408, so it is really only 102 GB/s if you run on one card.

All the Teslas I’ve got access to are HPC systems without any windowing systems installed so I cannot run a graphical GPUBench on them. If it is bandwidth you are interested in, though, here are the results of my bandwidth test script: (http://forums.nvidia.com/index.php?showtopic=52806&hl=bw_test)

On a single GPU inside a Telsa S1070:

copy_gmem<float> - Bandwidth:	71.592189 GiB/s

copy_gmem<float2> - Bandwidth:	76.486863 GiB/s

copy_gmem<float4> - Bandwidth:	74.633183 GiB/s

copy_tex<float> - Bandwidth:	70.903607 GiB/s

copy_tex<float2> - Bandwidth:	76.418907 GiB/s

copy_tex<float4> - Bandwidth:	77.371516 GiB/s

write_only<float> - Bandwidth:	69.588005 GiB/s

write_only<float2> - Bandwidth:	71.199130 GiB/s

write_only<float4> - Bandwidth:	70.569901 GiB/s

read_only_gmem<float> - Bandwidth:	65.726154 GiB/s

read_only_gmem<float2> - Bandwidth:	83.037063 GiB/s

read_only_gmem<float4> - Bandwidth:	46.565674 GiB/s

read_only_tex<float> - Bandwidth:	65.170775 GiB/s

read_only_tex<float2> - Bandwidth:	71.472940 GiB/s

read_only_tex<float4> - Bandwidth:	70.653347 GiB/s

Compare this to GTX 285 (theoretical peak ~160 GiB/s):

copy_gmem<float> - Bandwidth:	121.494258 GiB/s

copy_gmem<float2> - Bandwidth:	126.038586 GiB/s

copy_gmem<float4> - Bandwidth:	104.040466 GiB/s

copy_tex<float> - Bandwidth:	124.938593 GiB/s

copy_tex<float2> - Bandwidth:	129.273315 GiB/s

copy_tex<float4> - Bandwidth:	130.567617 GiB/s

write_only<char> - Bandwidth:	18.899835 GiB/s

write_only<float> - Bandwidth:	73.363346 GiB/s

write_only<float2> - Bandwidth:	75.512689 GiB/s

write_only<float4> - Bandwidth:	73.769699 GiB/s

read_only_gmem<float> - Bandwidth:	69.645715 GiB/s

read_only_gmem<float2> - Bandwidth:	97.900049 GiB/s

read_only_gmem<float4> - Bandwidth:	52.964178 GiB/s

read_only_tex<float> - Bandwidth:	69.641488 GiB/s

read_only_tex<float2> - Bandwidth:	109.820544 GiB/s

read_only_tex<float4> - Bandwidth:	105.827061 GiB/s


Why is read_only_tex much faster then read_only_tex ? is it just an artifact or its better reading float2 from textures then simple floats?

read_only_tex<float> - Bandwidth:	69.641488 GiB/s

read_only_tex<float2> - Bandwidth:	109.820544 GiB/s

read_only_tex<float4> - Bandwidth:	105.827061 GiB/s



In terms of bandwidth, the more bytes you can read in a single texture read the, the better. There are two limitations that come into play, 1) bandwidth, and 2) the hardware can only serve so many texture reads/second (regardless of their size). For texture reads of only 4-byte quantities (floats), the 2nd can become the bottleneck.

In non-toy kernels, the extra instructions needed to calculate several texture addresses can also become an overhead in a situation of multiple float texture reads vs a single float2/4 texture read.

edit: answered before I posted

A bit out of topic: MisterAnderson42, are there no windowing systems installed on those HPC servers or do you don’t have access to them via ssh (or whatever you’re using to communicate with them)? I’m asking because I thought Tesla drivers can only be installed with graphics drivers, and those require a windowing system?

There is a script in the Linux release note that shows how to load the driver without starting X.

Some systems do have X-windows installed, but not running (no reason to waste cycles on compute nodes). Regardless, these systems are only accessible by logging into a head node and then submitting batch jobs via a queue, so I can’t exactly plug in a monitor and start X running, especially given that the Tesla system I posted a benchmark from is 3 states away :)

Linux isn’t silly about this like windows is. To Linux, a driver is a driver. It is just a piece of code that loads into the kernel, and you can access it whether you are sitting at the physical machine or logged in remotely. I’ve setup several systems without even installing X-windows and the NVIDIA drivers install and load happily. You do need to run the dev node creation script that mfatica mentioned, though.