Tesla M60 Tensorflow/Cuda Compatibility

It is my understanding that the Tesla M10 is mainly developed for multi-device application support etc. We are thinking about purchasing this GPU for deep learning purposes. We have very high memory data so it would be very useful.

I have reviewed a lot of documentation online but it’s not clear to me if this GPU can be used with the newest versions of cuda (v10+) and therefore keras and tensorflow. Tesla M10 is also 4 GPUs linked together, so it is possible to utilize the full 32GB of RAM when linked like this? Does nvidia forbid this in any way? Does the card work like 4 separate GPUs or one united one? And I have seen licensing for using it as a vGPU but if we are just connecting this GPU to one server and plan to use it only there, do we need a separate license for this (does that require GRID licensing for instance?)?

Thanks for any help here!

Hi

The M10 is for entry level workloads, it’s not designed for DL. The CUDA Core count is pretty low, so you’d be better looking at other GPUs. Also, the 4 GPUs are separate, meaning 4 x 8GB, not 1 x 32GB. Even if you add all GPUs to a single VM, your application may use 4 GPUs but it will only make use of 8GB Memory total.

For DL, at a minimum, you’d be better looking at either a P100 (16GB) or P40 (24GB) which are both high-performance single GPUs. As you’ve mentioned you use a lot of Memory the P100 might be a good choice due to using HBM2, whereas although the P40 has 24GB, it uses GDDR5 which has a lot less bandwidth, however it does have a few hundred more CUDA Cores (3584 vs 3840), so it depends which your workload makes more use of.

If you use vGPU, then you need a license (probably vCompute (vCS)). However, if you install it in Passthrough then you can use the standard Tesla driver without the need for a license.

Either option will be far superior to an M10.

Regards

MG

Wow thanks you confirmed my suspicions for this GPU - seems like it is just not in the market for what we are thinking about. P100 and P40 are really out of our price range right now…we have been training on AWS and looking to get something in house for training and using AI in our current architecture. I am now leaning towards something like the Titan RTX…

Can you explain why if P100 has less memory HBM2 makes it more desirable than GDDR5? Most past training was on a 4GB GeForce 745 or 16GB Tesla V100 on Amazon. We have a mix of 2&3D images so the local GeForce can do small batches of 2D but any 3D images needed to go on Amazon and still only process 1 image/batch.

Hi

Because HBM2 is much faster than GDDR5, so it depends how your application / workload uses the hardware. Typically you want to get data into the GPU and process it as fast as possible, so HBM2 would be better in this instance. If you needed more capacity than 16GB, then the P40 would obviously be better due to having a higher capacity. The V100 has HBM2 as well.

The Titan RTX is a good GPU, just be careful of its cooling requirements when pushed hard for long periods. There’s a reason why the Quadro GPUs all have Blower fans …

Regards

MG