Hi thstart,
Can you tell me more about your visual SIMD simulator
application and code generator?
Hi thstart,
Can you tell me more about your visual SIMD simulator
application and code generator?
It is already a patent pending so I can tell the most interesting features and benefits.
The list long, it is still in development and here is a very short list of what this tool is doing:
1.1) Graphical Mapping Tool - Simplifies mapping application requirements to processor resources
1.2) Static Performance Analysis - Speeds performance estimation
Assesses processor resources utilization prior to code generation
2.1) Run time tools
During design phase the parallel SIMD architecture tool generate and test different
scenarios and with optimized binary code.
At run time the analyze tool is analyzing the data and creation tool is
generating the optimal data dependent binary code.
Instruction selection, Address calculations, Execution domains selection, Execution ports selection, Ordering instruction sequences
Full register length utilization, Address alignment, Cache usage optimization, Software pre-fetch scheduling distance optimization
Load and store execution bandwidth optimization.
Using data dependent weighted cost optimizations metrics generates and ranks most of possible permutations.
Generates only the code custom tailored to the run time CPU. Can solve also the copy protection problem - the code can run only on
one machine and delete the unused binary.
Certain data-dependent optimizations are postponed to runtime, where they can be done more effectively because there is more
information about the data.
There are three costs associated with runtime code generation:
creation cost, execution cost and management costs.
In order to win, the savings of using the runtime-created code
must exceed the cost of creating and managing that code.
This means that for many applications, a fast code generator
that creates good code will be superior to a slow code generator
that creates excellent code.
Bytes, word, double word shuffling using control mask (constant) are
another group of SIMD operations producing a very valuable transformations
on input data. Each byte in the shuffle control mask (constant) forms an
index to permute the corresponding byte in the destination operand.
Generating appropriate constants for bitwise, bytewise, wordwise,
double wordwise, quad wordwise, shuffle and other SIMD is the most
important step implementing parallel SIMD software.
I would say that we are extending our visual tool for the NVIDIA GPU
architecture. Simulation and visualization can help in understanding how
the data moves in the processor. It is much more important for
NVIDIA multiprocessors. Certain memory access patterns can be
discovered which can speed up the GPU processing a lot. Some of them
can be counterintuitive and discovered only after automated benchmarks.
There are some possibilities to emulate some of SIMD operations on NVIDIA
platform. Some SIMD instructions are very useful.
Simulation and visualization is the first step to understand what happens in
the CPU and GPU so we extended the tool to CPU/GPU.
Also I believe CPU and GPU have to be used to the full potential and
to work together. Not everything is good to do in CPU and not everything
is good to do in GPU. The optimal solution is to mix the best features available.
I am attaching several screen shots showing an examples with PSHUFD
instruction with different Immediate constants which effectively creates new
instructions:
Imm=00 00 00 00 is in practice a Broadcast 1st DWord to 4 DWords
Imm=03 02 01 00) is in practice a Copy 4 DWords to 4 DWords
Imm=02 03 00 01) is in practice a Swap HL DWords
Imm=02 00 03 01)) is in practice a Grouping of DWords by HL
Imm=02 00 03 01)) is in practice a Grouping of DWords by LH
Imm=03 03 02 01)) is in practice a Shift Right DWords
The possible permutations are a lot, so you can enter the desired input,
desired output and run the visual simulator to find the appropriate immediate
constant. This input/output mapping and automatic permutation generation
turns out to be particularly interesting for NVIDIA GPUs because there are
much more possibilities in it.
thstart: Sounds very interesting!! well your Patent application is published? If so can you send me the application number? I am sure it must be having more details
Thanks
Better contact me directly this a little off topic indeed.
Sorry guys, but this is completely off-topic…