We are pleased to announce the release of Ocelot 1.1.560, a dynamic compilation framework for PTX and open-source reimplementation of the CUDA runtime. Ocelot supports emulation of PTX kernels, native execution on CPUs, and native execution execution on NVIDIA GPUs. It also includes a comprehensive and extendible back end optimizing compiler for PTX.
Ocelot can be downloaded directly from http://gpuocelot.googlecode.com/files/ocelot-1.1.560.tar.bz2 , or you can visit http://code.google.com/p/gpuocelot/ for documentation and source code access.
Version 1.1 includes various bug fixes as well as several new features:
[list=1]
Three target devices
[list=1]
PTX 1.4 Emulator
[list=1]
Memory Checker
-out of bounds accesses
-misalgined accesses
Shared Memory Race Detector
PTX 1.4 JIT Compiler and CPU Runtime
[list=1]
Execute CUDA programs natively on CPU targets without emulation
Support for any LLVM target
*Requires LLVM 2.8svn
Can achieve over 80% of theoretical peak FLOPs/OPs on CPU targets
NVIDIA GPU JIT
[list=1]
Recompiles PTX kernels using the NVIDIA Driver
[b]
Reimplementation of the CUDA Runtime
[list=1]
Device Switching
- The same host thread can simultaneously control multiple devices.
New Memory Model
- Device allocations are shared among all host threads[/b]
[b]
PTXOptimizer
[list=1]
Extendible optimization pass interface for PTX
- Per-Block, Per-Kernel, Per-Module passes [/b]
Trace Generator
[list=1]
Extendible interface for instrumenting PTX kernels
Can examine the complete system state after each instruction is executed
i) Registers
ii) Memory Accesses
iii) Last instruction executed
iv) Thread activity mask
Open Projects for Ocelot 1.2
-
Full PTX 2.0 support
-
AMD GPU Devices
-
SIMT on CPU vector units
-
Asynchronous kernel execution
-
Multi-threaded emulator device