Ocelot Pre-Release

Gregory_Diamos · February 4, 2011, 2:03am

I’m pleased to announce that we are gearing up for a few new Ocelot releases. The first one is 1.3.967.

Features in the stable 1.3.967 release include:

Support for PTX 1.4
PTX Emulator
PTX to LLVM to x86/ARM JIT
PTX to PTX to NVIDIA JIT
Memory Checker
Memory Race Detector
Interactive Debugger
Prototype AMD Device Support
A compiler optimization pass framework for PTX.
Various instrumentation passes.
A complete reimplementation of the Cuda Runtime.
Numerous bug fixes and performance improvements.

A packaged pre-release can be downloaded from ocelot-1.3.967 . I am going to leave this up for a week before migrating it to the main project website. Please post back if there are any problems with it.

This is the last release that will support devices that require PTX 1.4, it is intended to provide a stable final version including all bug fixes up to this point. Rather than maintaining multiple versions, we plan to evolve Ocelot with the development of PTX, and drop support for older versions as newer versions come online. There will be a final release supporting the older versions of PTX, but support for older versions will not be rolled into the newer versions to limit the amount of testing that we need to do.

We are also gearing up for a PTX 2.x release that will go out by the end of the week that will have the more interesting new developments. I’ll post back with more details about it.

Gregory_Diamos · February 8, 2011, 2:01am

The full releases are now available on the main ocelot website, or directly at 1.3.967 and 2.0.969 .

Here is a feature list for 2.0.969

 - PTX 2.2 and Fermi device support.

   a) Floating point results should be within the ULP limits in the PTX ISA manual.

   b) Over 500 unit tests verify that the behavior matches NVIDIA devices.

 - Four target device types:

   a) A functional PTX emulator.

   b) A PTX to LLVM to x86/ARM JIT.

   c) A PTX to CAL JIT for AMD devices (beta).

   d) A PTX to PTX JIT for NVIDIA devices.

 - A full-featured PTX 2.2 IR:

   a) An analysis/optimization pass interface over PTX.

     i)   Control flow graph.

     ii)  Dataflow graph.

     iii) Dominator/Postdominator trees.

     iv)  Structured control tree.

   b) Optimizations can be plugged in as modules.

 - Correctness checking tools:

   a) A memory checker (detects unaligned and out of bounds accesses).

   b) A race detector.

   c) An interactive debugger (allows stepping through PTX instructions).

 - An instruction trace analyzer interface:

   a) Allows user-defined modules to receive callbacks when PTX instructions are executed.

   b) Can be used to compute metrics over applications or perform correctness checks.

 - A CUDA API frontend:

   a) Existing CUDA programs can be directly linked against Ocelot.

   b) Device pointers can be shared across host threads.  

   c) Multiple devices can be controlled from the same host thread (cudaSetDevice can be called multiple times).

There are also some interesting features that are coming online in the development branches:

Remote devices (start up an Ocelot server and attached devices will be visible to CUDA applications on client nodes)
Software warp formation (use CUDA to program your SSE/AVX units)
PTX instrumentation. Allows arbitrary code to be inserted into CUDA kernels as they are launched. So far we have recorded hot paths, CTA schedules, and load balance.
The AMD device is becoming more stable every day (thanks completely to Rodrigo Dominguez). He has gotten about half of the CUDA SDK examples to execute on an AMD GPU.

Topic		Replies	Views
Ocelot 1.1.560 Released An open-source reimplementation of CUDA for GPUs and CPUs CUDA Programming and Performance	7	2267	May 3, 2010
Ocelot 1.0 Alpha Release High Performance GPU and Multi-core CPU targets CUDA Programming and Performance	27	60284	January 1, 2010
Ocelot - Finding the PTX (Cat) inside the executable (Bag) Is Ocelot Dependent on the CUDA version? CUDA Programming and Performance	29	11777	October 8, 2010
Ability to run PTX directly CUDA Programming and Performance	2	4452	November 11, 2009
PTX Emulator Released CUDA Programming and Performance	32	8777	July 15, 2009
Ocelot PTX Debugger CUDA Programming and Performance	5	8101	July 23, 2010
NVIDIA has hade a huge mistake with HW debugger Single-GPU debugging not supported and no emulation& CUDA Programming and Performance	34	6497	August 7, 2010
Using PTX actively CUDA Programming and Performance	6	9548	October 21, 2009
Simple Question! Can CUDA code be run on CPU CUDA Programming and Performance	9	30267	October 19, 2023
Is emulation mode removed from CUDA 3.0? CUDA Programming and Performance	23	22846	July 3, 2010

Ocelot Pre-Release

Related topics