[Solved]CUDA inline PTX Internal Compiler Error

iamkaka · June 6, 2016, 10:21pm

I want to measure the cache behavior of gpu global memory and below is the micro-benchmark that i design. What i want to do is to load from global memory address r_add0 and store it into shared memory s_tvalue[0]. For some reason, i need to replace the loading instruction from global memory with inline PTX code.

i = *r_addr0;
//asm("ldu.global.f64.cs %1, [%2];":"=l"(i):"l"(r_addr0));
s_tvalue[0] = i;

However, when i compile it with nvcc, it complaint with compile error:

error: Internal Compiler Error (codegen): "asm operand index requested is larger than the number of asm operands provided!"

Does anybody knows the bug of my codes.

Complete codes see below:

__global__ void global_latency (long long * my_array, long long array_length, int position, long long *d_time) {

unsigned int start_time, end_time;

__shared__ long long s_tvalue[2];//2: number of threads per block

int k;
long long i, j;
for(k=0; k<2; k++)
    s_tvalue[k] = 0L;
long long addr0,addr1;

addr0=(long long)my_array;

addr1 = ( addr0 ^ (1 << position));

long long *r_addr0, *r_addr1;
r_addr0 = (long long *)addr0;
r_addr1 = (long long *)addr1;

start_time = clock();
//i = *r_addr0;
asm("ldu.global.f64.cs %1, [%2];":"=l"(i):"l"(r_addr0));

s_tvalue[0] = i;
//j = *r_addr1;
asm("ld.global.f64.cs %3, [%4];" : "=l"(j):"l"(r_addr1));
s_tvalue[1] = j;


end_time = clock();

d_time[0] = end_time-start_time;
d_time[1] = s_tvalue[0];
printf("[%p]=%lld\n",addr0,d_time[1]);
d_time[2] = s_tvalue[1];
printf("[%p]=%lld\n",addr1,d_time[2]); 
}

njuffa · June 7, 2016, 12:18am

Counting starts at 0, not 1. Try using %0,%1 instead of %1,%2 (the %2 presumably triggers the index-out-of-bounds error).

iamkaka · June 7, 2016, 3:29am

Thanks.
Although there are some other minor errors, you point out the most important one.

Topic		Replies	Views
Problem about inline PTX code in CUDA program CUDA Programming and Performance	3	2262	January 10, 2013
Help with Inline Assembly Syntax CUDA Programming and Performance	2	375	January 26, 2023
Issue using inline PTX functions, with address operands, in CUDA application - Any help much appreciated! CUDA Programming and Performance	8	1186	April 7, 2018
Inline PTX problem Probably an easy fix? CUDA Programming and Performance	5	7018	May 16, 2011
ptxas compiles my program wrong CUDA 4.0RC2 CUDA Programming and Performance	2	4525	May 8, 2011
atomicLoad in CUDA through PTX ISA CUDA Programming and Performance	5	1644	August 7, 2017
Some problems with inline PTX CUDA Programming and Performance	6	1888	March 6, 2013
How to correctly use inline asm to make a vector load CUDA Programming and Performance	4	435	November 21, 2022
How to write inline asm function sts128 (store 128 bits to shared memory)? CUDA Programming and Performance cuda	14	1074	November 10, 2023
[Solved] Texture access and inline CUDA ptx assembly in VS 2010 CUDA Programming and Performance	3	1123	September 8, 2013

[Solved]CUDA inline PTX Internal Compiler Error

Related topics