After finally getting my hands on a GB10 I ran some simple torch and diffusers Python code to see how quickly Z-Image-Turbo (BF16 mostly) can generate images on this little magic box. So I used 9 steps, 1024x1024, same prompt: “A dog chasing a stick thrown by an astronaut on the moon, with a lunar lander in the background, and the Earth on the horizon”. Results:
| What | time (secs) |
|---|---|
| Default attention, BF16, no compile | 12.1 |
| Sage attention, BF16, no compile | 13.2 |
| Flash attention, BF16, no compile | 12.9 |
| Default attention, BF16, compile | 8.1 |
| Default attention, GGUF Q8_0, compile | 9.1 |
The time is average of the last 5 runs, ignoring the first 2 runs. Interestingly it took 2 runs for torch’s lazy compile to finish its job, those runs taking about 30s each. For me, 8s is not bad at all.
I’m wondering if I can run in BF8. I can see Nvidia’s TransformerEngine could be of use, but I haven’t tried installing it yet. And the code would be a little less simple. Anyway, I’ll post a few more results in this thread.