I’m running into an odd issue with a simulation I’ve coded. I’ve setup an interface function for other code to call: BFieldAtS. This function returns the result of a function pointed to by a global function pointer which I define in another place before calling this interface function. Imagine, if you would, a simulation of the earth’s magnetic field. There are a number of models that are more or less accurate in different regions. I would like the user to be able to define the model used at run time and for the BFieldAtS interface to call whatever model the user defines - hence the callback function. Perhaps there is a better way to achieve this - I’m open to that. However, that is not specifically where my question is.

The issue:

In order to accomplish the above, I’ve placed this BFieldAtS function in a .cu file and the assignment and implementation of the callback in another file. This is where I run into issues. An equation is producing blatantly wrong results. The equation is:

```
double x{ asinh(sqrt(3.0) * sin(lambdaDegrees * 3.1415927 / 180.0)) };
```

I’ve found hard-coding values for the angle (lambdaDegrees) in the equation in question produces the (correct) expected results. Additionally, if I place everything in one file and consolidate headers/remove externs, the function produces the expected results again. The rest of the functions are working correctly - i.e. with the wrong x value, the functions still calculate what would be expected with that x value - compared with hand/excel calculations.

The issue is when the implementation of the model is separated. With the code separated into two files, printing the angle before and after the equation for “x” is executed, the angle is unchanged, yet without hard-coding the angle, the equation produces the wrong results. Furthermore, it produces the same wrong value over and over. The weirdest part is simply adding another diagnostic printf function caused the function to produce yet another (different, also wrong) result. I’m wondering if it has anything to do with the way I’ve set up a callback function on the GPU…maybe multiple threads using the function at the same time leading to some (consistent) undefined behavior? Taking this one step further, sin and sqrt seems to calculate correctly, and the wrong value produced is greater than the equation allows (i.e. there is no angle that produces the output result). It seems as if the issue is with asinh specifically.

Expected value of this equation, producing “x” within getSAtLambda (the printf statement) is 1.268… Result is 1.768… Let me know what you think. By the way, I’m aware that PI as defined here has less accuracy than double (thanks njuffa). However, I don’t feel like this should impact the issue. Simply placing all code in one file (with the same definition for PI) corrects the issue.

CUDA Version: release 8.0, V8.0.60

Flags: -rdc=true, -gencode=arch=compute_50,code=“sm_50,compute_50”, otherwise normal visual studio flags (windows)

otherfuncs.cu

```
#include <cmath>
#include <iostream>
//CUDA includes
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "cuda_profiler_api.h"
typedef double(*callbackFcn)(double*, int, double, double, int);
__device__ double* fieldConstArray_GPU{ nullptr };
__device__ int arraySize_GPU{ 7 };
__device__ callbackFcn callback_GPU{ nullptr };
__host__ __device__ double getSAtLambda(double* consts, int arrayLength, double lambdaDegrees, double simtime, int thdInd)
{//returns s in units of L
double x{ asinh(sqrt(3.0) * sin(lambdaDegrees * 3.1415927 / 180.0)) };
if (simtime == 0.0 && thdInd == 31487) { printf("\n\ngetSAtLambda: %f, %f\n\n", lambdaDegrees, x); }
return (0.5 * consts[2] / sqrt(3.0)) * (x + sinh(x) * cosh(x));
}
__host__ __device__ double getLambdaAtS(double* consts, int arrayLength, double s, double simtime, int thdInd)
{// consts: [ B0, ILATDeg, L, L_norm, s_max, ds, errorTolerance ]
double lambda_tmp{ (-consts[1] / consts[4]) * s + consts[1] }; //-ILAT / s_max * s + ILAT
double s_tmp{ consts[4] - getSAtLambda(consts, arrayLength, lambda_tmp, simtime, thdInd) };
double dlambda{ 1.0 };
bool over{ 0 };
while (abs((s_tmp - s) / s) > 1e-4) //errorTolerance
{
while (1)
{
over = (s_tmp >= s);
if (over)
{
lambda_tmp += dlambda;
s_tmp = consts[4] - getSAtLambda(consts, arrayLength, lambda_tmp, simtime, 0);
if (s_tmp < s)
break;
}
else
{
lambda_tmp -= dlambda;
s_tmp = consts[4] - getSAtLambda(consts, arrayLength, lambda_tmp, simtime, 0);
if (s_tmp >= s)
break;
}
}
if (dlambda < 1e-4 / 100.0) //errorTolerance
break;
dlambda /= 5.0; //through trial and error, this reduces the number of calculations usually (compared with 2, 2.5, 3, 4, 10)
}
return lambda_tmp;
}
__host__ __device__ double BFieldAtS(double* consts, int arrayLength, double s, double simtime, int thdInd)
{// consts: [ B0, ILATDeg, L, L_norm, s_max, ds, errorTolerance ]
double lambda_deg{ getLambdaAtS(consts, arrayLength, s, simtime, thdInd) };
double lambda_rad{ lambda_deg * 3.1415927 / 180.0 };
double rnorm{ consts[3] * pow(cos(lambda_rad), 2) };
return -consts[0] / pow(rnorm, 3) * sqrt(1.0 + 3 * pow(sin(lambda_rad), 2));
}
__host__ __device__ double gradBAtS(double* consts, int arrayLength, double s, double simtime, int thdInd)
{
return (BFieldAtS(consts, arrayLength, s + 6371.0, simtime, thdInd) - BFieldAtS(consts, arrayLength, s - 6371.0, simtime, thdInd)) / (2 * 6371.0);
}
__global__ void setupEnvironmentGPU(double* constArrayPtr)
{
callback_GPU = gradBAtS; //sets pointer to callback function
arraySize_GPU = 7;
fieldConstArray_GPU = constArrayPtr;
}
```

main.cu

```
//CUDA includes
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "cuda_profiler_api.h"
typedef double(*callbackFcn)(double*, int, double, double, int);
//on GPU global variables
extern __device__ double* fieldConstArray_GPU;
extern __device__ int arraySize_GPU;
extern __device__ callbackFcn callback_GPU;
__host__ __device__ double BFieldAtS(double* consts, int arrayLength, double s, double simtime, int thdInd);
__host__ __device__ double gradBAtS(double* consts, int arrayLength, double s, double simtime, int thdInd);
__global__ void setupEnvironmentGPU(double* constArrayPtr);
__global__ void execute()
{
int thdInd{ blockIdx.x * blockDim.x + threadIdx.x };
callback_GPU(fieldConstArray_GPU, arraySize_GPU, (thdInd == 31487) ? 1233005.097 : ((115200 - thdInd) / 50000.0 * 6.371e6), 0.0, thdInd ); //3rd argument are example values
}
void setupEnvironment()
{// consts: [ B0, ILATDeg, L, L_norm, s_max ]
double fieldConstArray_h[]{ 3.12e-5, 72.0, 66717978.17, 10.47213595, 85670894.1 };
double* fieldConstants_d{ nullptr };
cudaMalloc((void **)&fieldConstants_d, 5 * sizeof(double));
cudaMemcpy(fieldConstants_d, fieldConstArray_h, 5 * sizeof(double), cudaMemcpyHostToDevice);
setupEnvironmentGPU <<< 1, 1 >>> (fieldConstants_d);
}
int main()
{
setupEnvironment();
int loops{ 0 };
while (loops < 3)
{
execute <<< 115200 / 256, 256 >>> ();
cudaDeviceSynchronize();
loops++;
}
return 0;
}
```