Float to TF32 conversion

lee.deepm · July 28, 2021, 6:49pm

hello i would like to know if there is any instruction to convert tf32 to float, thanks

i tried this

#include <mma.h>
#include <iostream>
#include <stdio.h>

__global__ void Cvt_FT_F() {
    float a = 1.0;
    uint32_t b;

    // Cvt Float - TF32
    asm("cvt.rna.tf32.f32         %0, %1;\n" : "=r"(b) : "f"(a));
    // Cvt TF32 - Float
    asm("cvt.rna.f32.tf32         %0, %1;\n" : "=f"(a) : "r"(b));

    printf("%f \n", a);
}

int main() {
    Cvt_FT_F<<<1, 32>>>();
    cudaDeviceSynchronize();
}

but i get this error

ptxas /tmp/tmpxft_000016eb_00000000-6_Cvt_FT_F.ptx, line 43; error   : Illegal rounding modifier for instruction 'cvt'
ptxas /tmp/tmpxft_000016eb_00000000-6_Cvt_FT_F.ptx, line 43; error   : Unexpected instruction types specified for 'cvt'
ptxas fatal   : Ptx assembly aborted due to errors

thank for help

njuffa · July 28, 2021, 8:19pm

Are you building for compute capability >= 8.0 (sm_80) with a fairly recent CUDA version?

The PTX documentation only lists cvt.rna.tf32.f32. I don’t see any mention of conversion in the opposite direction, so an operation cvt.rna.f32.tf32 probably doesn’t exist.

Robert_Crovella · July 28, 2021, 8:28pm

float to tf32 is already provided for, using the first instruction. The second instruction cvt.rna.f32.tf32 is not valid.

tf32 is already compliant with float from a format perspective. You can reinterpret it as a float. Here is an example:

__device__ float cvt_demo(float a){
    float ret;
    asm ("{.reg .b32 %mr;\n"
        "cvt.rna.tf32.f32 %mr, %1;\n"
        "mov.b32 %0, %mr;}\n" : "=f"(ret) : "f"(a));
    return ret;
}
__global__ void Cvt_FT_F(float *b) {
    *b = cvt_demo(*b);
}

int main() {
    float *a;
    cudaMallocManaged(&a, sizeof(float));
    Cvt_FT_F<<<1, 1>>>(a);
    cudaDeviceSynchronize();
}

and, yes, if you compile for a GPU architecture less than cc8.0, you will get compile errors

lee.deepm · July 30, 2021, 10:20am

Thank you for your help

system · September 28, 2021, 10:20am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can I cvt uint32_t to tf32? CUDA Programming and Performance	2	366	October 30, 2023
Arguments mismatch for instruction 'mma', why? CUDA Programming and Performance	7	585	November 13, 2023
How to create tf32 array for tensor core? CUDA Programming and Performance	1	584	October 30, 2023
Cudnn TF32 performs no better than FP32 on RTX3090 TensorRT	1	704	January 15, 2021
Disabling TF32 in cuDNN at runtime on Ampere cuDNN	5	1732	August 11, 2022
Cudnn TF32 performs no better than FP32 on RTX3090 cuDNN cudnn	5	2542	January 28, 2021
How to use uint32_t in tensor core? CUDA Programming and Performance	6	1153	October 18, 2023
Performace on A100SXM40GB TF32 vs FP32 CUDA Programming and Performance cuda , ampere	1	1013	January 26, 2023
Float to bf16*2 conversion CUDA Programming and Performance	1	762	August 12, 2021
Type conversion throughput/latency CUDA Programming and Performance	5	644	February 3, 2024

Float to TF32 conversion

Related topics