We are pleased to announce the release of Ocelot 1.1.560, a dynamic compilation framework for PTX and open-source reimplementation of the CUDA runtime. Ocelot supports emulation of PTX kernels, native execution on CPUs, and native execution execution on NVIDIA GPUs. It also includes a comprehensive and extendible back end optimizing compiler for PTX.
Ocelot can be downloaded directly from http://gpuocelot.googlecode.com/files/ocelot-1.1.560.tar.bz2 , or you can visit http://code.google.com/p/gpuocelot/ for documentation and source code access.
Version 1.1 includes various bug fixes as well as several new features:
[list=1]
[*] Three target devices
[list=1]
[*] PTX 1.4 Emulator
[list=1]
[*] Memory Checker
-out of bounds accesses
-misalgined accesses
[*] Shared Memory Race Detector
[*] PTX 1.4 JIT Compiler and CPU Runtime
[list=1]
[*] Execute CUDA programs natively on CPU targets without emulation
[*] Support for any LLVM target
[*] *Requires LLVM 2.8svn
[*] Can achieve over 80% of theoretical peak FLOPs/OPs on CPU targets
[*] NVIDIA GPU JIT
[list=1]
[*] Recompiles PTX kernels using the NVIDIA Driver
[b]
[*] Reimplementation of the CUDA Runtime
[list=1]
[*] Device Switching
- The same host thread can simultaneously control multiple devices.
[*] New Memory Model
- Device allocations are shared among all host threads[/b]
[b]
[*] PTXOptimizer
[list=1]
[*] Extendible optimization pass interface for PTX
- Per-Block, Per-Kernel, Per-Module passes [/b]
[*] Trace Generator
[list=1]
[*] Extendible interface for instrumenting PTX kernels
[*] Can examine the complete system state after each instruction is executed
i) Registers
ii) Memory Accesses
iii) Last instruction executed
iv) Thread activity mask
Open Projects for Ocelot 1.2
-
Full PTX 2.0 support
-
AMD GPU Devices
-
SIMT on CPU vector units
-
Asynchronous kernel execution
-
Multi-threaded emulator device