Help with Inline Assembly Syntax

grynet · January 26, 2023, 9:09am

Hi All,

I am trying to use inline assembly to generate different flavor of loads and store. I cannot get the syntax right. Can someone help me please?

Here is the cuda code I try to generate in inline asm.

__global__ void exampleCuda(float * array, float * arrayout) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
   array[tid] = (1+ arrayout[tid]);
}

This is what I wrote, but it is not correct. Where I am doing wrong?

__global__ void ptxCode(float * array, float * arrayout) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    float* ptr = &array[tid];
    float *outPtr = &arrayout[tid];    
    float data;
    asm volatile("ld.global.f32 %0, [%1];" : "=f"(data) : "l"(ptr));
    data++;
    asm volatile("st.global.f32 [%0], %1;" : "=l"(outPtr) : "f"(data));
}

Godbot link

Thanks in advance

grynet · January 26, 2023, 5:06pm

I think I found a solution

__global__ void ptxCode(float * array, float * arrayout) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    float* ptr = &array[tid];
    float *outPtr = &arrayout[tid];    
    float data;
    asm volatile ("ld.global.f32 %0, [%1];" : "=f"(data) : "l"(ptr));
    data++;
    asm volatile ("st.global.f32 [%0], %1;" : : "l"(outPtr), "f"(data));
}

But now I cannot get it right with the vector loads. How can I write this code with PTX inline asm?

__global__ void cudaCodeVector(float* array, float * arrayout) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    float4 data = reinterpret_cast<float4*>(array)[tid];
    data.x++;
    data.y++;
    data.z++;
    data.w++;
    reinterpret_cast<float4*>(arrayout)[tid] = data;
}

rs277 · January 26, 2023, 6:23pm

This might be helpful:
https://stackoverflow.com/questions/56719743/simple-add-of-vectors-in-inline-ptx-cuda

Topic		Replies	Views
Problem about inline PTX code in CUDA program CUDA Programming and Performance	3	2262	January 10, 2013
How to correctly use inline asm to make a vector load CUDA Programming and Performance	4	435	November 21, 2022
[Solved]CUDA inline PTX Internal Compiler Error CUDA Programming and Performance	2	1184	June 7, 2016
atomicLoad in CUDA through PTX ISA CUDA Programming and Performance	5	1644	August 7, 2017
Inline PTX problem Probably an easy fix? CUDA Programming and Performance	5	7018	May 16, 2011
How to write inline asm function sts128 (store 128 bits to shared memory)? CUDA Programming and Performance cuda	14	1074	November 10, 2023
asm inlining in CUDA code? CUDA Programming and Performance	5	6536	July 19, 2010
inline assembly CUDA Programming and Performance	0	5869	January 15, 2008
Some problems with inline PTX CUDA Programming and Performance	6	1888	March 6, 2013
Issue using inline PTX functions, with address operands, in CUDA application - Any help much appreciated! CUDA Programming and Performance	8	1185	April 7, 2018

Help with Inline Assembly Syntax

Related topics