For "unsigned int max(-5, 4) " , what should it return? -5 or 4?

halomoooonk · November 29, 2024, 9:42am

Hi everyone,

When I use the integer function unsigned long int max(const unsigned long int a, const long int b), and I provide the inputs -5 and 4, it returns the two’s complement representation of -5, which is a very large number. This behavior seems a little different from how max works in C or C++.

Is this because the return type is unsigned? Is this behavior by design? Sorry if this is a basic question, but I’m a beginner.

I am using the sm_86 architecture, and here is my code:

#include <stdio.h>

__global__ void kernel_max(unsigned long int a, long int b, unsigned long int *result) {
    *result = max(a, b);
}

int main() {
    unsigned long int a = 4;
    long int b = -5;
    unsigned long int *dev_result, result;

    cudaMalloc(&dev_result, sizeof(unsigned long int));

    kernel_max<<<1, 1>>>(a, b, dev_result);

    cudaMemcpy(&result, dev_result, sizeof(unsigned long int), cudaMemcpyDeviceToHost);
    cudaFree(dev_result);
    printf("The result of max(4, -5) is: %lu\n", result);
    return 0;
}

This is the max function’s definition in the documentation:
device unsigned long int max(const unsigned long int a, const long int b)

halomoooonk · November 29, 2024, 10:10am

This is the PTX code:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-34714021
// Cuda compilation tools, release 12.6, V12.6.68
// Based on NVVM 7.0.1
//

.version 8.5
.target sm_52
.address_size 64

        // .globl       _Z10kernel_maxmlPm

.visible .entry _Z10kernel_maxmlPm(
        .param .u64 _Z10kernel_maxmlPm_param_0,
        .param .u64 _Z10kernel_maxmlPm_param_1,
        .param .u64 _Z10kernel_maxmlPm_param_2
)
{
        .reg .b64       %rd<6>;


        ld.param.u64    %rd1, [_Z10kernel_maxmlPm_param_0];
        ld.param.u64    %rd2, [_Z10kernel_maxmlPm_param_1];
        ld.param.u64    %rd3, [_Z10kernel_maxmlPm_param_2];
        cvta.to.global.u64      %rd4, %rd3;
        max.u64         %rd5, %rd1, %rd2;
        st.global.u64   [%rd4], %rd5;
        ret;

}

It looks like max treats the long int value -5 as an unsigned integer initially.

njuffa · November 29, 2024, 4:43pm

std::max() from algorithms does not support mixed-type arguments, according to the C++11 standard I consulted (section 24.5.7). I also tried compiling with clang and -std=c++23 and it throws an error error: no matching function for call to 'max'.

The closest equivalent to CUDA’s max() that will compile is therefore

*result = (a > b) ? a : b;

With this, the sample program below returns

The result of max(4, -5) is: 18446744073709551611

The reason for this is that when comparing a signed and an unsigned integer of the same storage size in C++, the signed integer is implicitly converted to an unsigned integer prior to the comparison. This is because unsigned T is considered a wider type than T, and by the rules governing implicit type conversions, for a two-input operator where one argument has a narrower type and the other has a wider type, the argument of the narrower type is implicitly converted to the wider type prior to performing the operation. In other words, this applies to an arithmetic operator like + or * just as it applies to a comparison operator like >.

Because this can lead to “surprising” behavior of C++ programs, as you noted, it is a best practice to avoid mixing signed and unsigned integer types in the same expression.

#include <cstdio>
#include <cstdlib>
#include <algorithm>

void kernel_max (unsigned long int a, long int b, unsigned long int *result) {
#if 0 // does not compile with C++23
    *result = std::max (a, b);
#else
    *result = (a > b) ? a : b;
#endif    
}

int main (void)
{
    unsigned long int a = 4;
    long int b = -5;
    unsigned long int result;

    kernel_max (a, b, &result);
    printf("The result of max(4, -5) is: %lu\n", result);
    return EXIT_SUCCESS;
}

Robert_Crovella · November 29, 2024, 5:05pm

I wonder what the function documentation indicates:

__device__ unsigned long int max(const unsigned long int a, const long int b)

Calculate the maximum value of the input unsigned long int and long int arguments.

Calculate the maximum value of the arguments a and b, perform integer promotion first.

I wonder what “perform integer promotion first” means in this context. Does that refer to promotion to signed integer? Or unsigned integer? The function prototypes that don’t include mixed parameter types don’t have that notation.

FWIW I tried using max in host code and did not have any difficulty, and FWIW it produced the same answer as device code:

# cat t316.cu
#include <stdio.h>
#include <math.h>
__global__ void kernel_max(unsigned long int a, long int b, unsigned long int *result) {
    *result = max(a, b);
}

int main() {
    unsigned long int a = 4;
    long int b = -5;
    unsigned long int *dev_result, result;

    cudaMalloc(&dev_result, sizeof(unsigned long int));

    kernel_max<<<1, 1>>>(a, b, dev_result);

    cudaMemcpy(&result, dev_result, sizeof(unsigned long int), cudaMemcpyDeviceToHost);
    cudaFree(dev_result);
    printf("The result of max(4, -5) is: %lu\n", result);
    printf("host: %lu\n", max(a,b));
    return 0;
}
# nvcc -o t316 t316.cu
# ./t316
The result of max(4, -5) is: 18446744073709551611
host: 18446744073709551611
#

njuffa · November 29, 2024, 5:24pm

CUDA’s max() and C++ std::max() from algorithms are not the same thing. To my recollection, min() and max() in CUDA predate CUDA’s transition from C to C++, which presumably explains the different semantics in the form of support for mixed-type arguments in CUDA. You can double check with the NVIDIA compiler folks, but I am fairly certain that

(1) max(a,b) is essentially implemented as (((a)>(b))?(a):(b)), presumably in a way that avoids multiple evaluations of a and b.

(2) This implementation is supplied for both host and device code to achieve matching behavior, which is important when this is used in __host__ __device__ functions.

“perform integer promotion” is standard terminology for the various forms of silent integer type conversions applied in C++. See section 4.5 integral promotions in the C++11 standard document (I don’t own a copy of later versions, the sections might be numbered differently). What I loosely called “width” of an integer type is related but not identical to the “rank” in the terminology of the standard. The section on relational operators in the standard dryly specifies:

The usual arithmetic conversions are performed on operands of arithmetic or enumeration type

Section 5 Expressions of the C++11 standard specifies:

Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.

The implicit conversion to the wider type on two-operand operator works just the same as if the programmer had supplied an explicit cast:

unsigned T a, result;
T b;
result = (a > (unsigned T)b) ? a : b;

Given the values in the example (a=, b=-5), this leads to b being selected, and subsequently being converted to unsigned T on assignment to result.

Curefab · November 30, 2024, 9:56am

The ‘trunk’ for the C++ standard is available here, it may contain some C++26 features:

The integral promotions are now in chapter 7.3.7 or conv.prom (chapter id has more stability than the chapter number, if other chapters are inserted or deleted, and also the link uses the chapter id)
https://eel.is/c++draft/conv.prom#:promotion,integral

For an easier read, also see the quite faithful CppReference, there is a section integral promotion further down the page

https://en.cppreference.com/w/cpp/language/implicit_conversion

The mentioned specific wordings are still present:

https://eel.is/c++draft/expr#mul-2

The usual arithmetic conversions are performed on the operands and determine the type of the result.

with

https://eel.is/c++draft/expr#arith.conv-1.5

Otherwise, each operand is converted to a common type C.

The integral promotion rules ([conv.prom]) are used to determine a type T1 and type T2 for each operand.48

Then the following rules are applied to determine C:

If T1 and T2 are the same type, C is that type.

Otherwise, if T1 and T2 are both signed integer types or are both unsigned integer types, C is the type with greater rank.

Otherwise, let U be the unsigned integer type and S be the signed integer type.

If U has rank greater than or equal to the rank of S, C is U.

Otherwise, if S can represent all of the values of U, C is S.

Otherwise, C is the unsigned integer type corresponding to S.

halomoooonk · December 2, 2024, 1:04am

Thank you all for your kindness and help. I learned a lot from this.

system · December 16, 2024, 1:04am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
(short)1 + (int)1 type? CUDA Programming and Performance	2	4260	May 24, 2011
Bizarre type promotion Pretending int is a uint CUDA Programming and Performance	8	3528	September 26, 2008
Possible CUDA runtime library error? CUDA Programming and Performance	5	1811	March 30, 2009
"unsigned int" vs "long int" CUDA Programming and Performance	1	3482	April 17, 2011
atomicMax() for unsigned long long error compiling CUDA Programming and Performance	2	2432	August 25, 2013
float -> unsigned int conversion problem CUDA Programming and Performance	9	46376	October 7, 2008
reintepret_cast double into `unsigned long long' CUDA Programming and Performance	1	2320	February 5, 2010
Error in calculation with unsigned chars CUDA Programming and Performance	1	4162	September 7, 2007
min max and sign functions in CUDA do they exist? if so where? CUDA Programming and Performance	6	48742	February 27, 2012
Can it be a problem with 'unsigned long long' (64 bit) ? CUDA Programming and Performance	0	1947	June 11, 2009

For "unsigned int max(-5, 4) " , what should it return? -5 or 4?

Related topics