nvcc compilation problem with struct

Hi to everybody. First of all I’m not a really expert programmer, so I’m sorry if my problem is pretty stupid; I’ve already tried searching something on the forum but without success.

Ok, here the description of my problem.It’s not stricty related to a kernel execution, but, at least I think, to the way nvcc compiles my c code.

I’ve 2 src files: the first .c (“cross_gpu.c”) one contains the main function, where a particular structure is created and filled; Its pointer is passed as argument to another function:

//Structure definition 

LWPR_Model* model = (LWPR_Model*)malloc(sizeof(LWPR_Model));



//Structure filling by a function of a third party library

lwpr_update(model, xx, yy, yp, NULL);

//My function calling, with the structure pointer as an argument

gpu_lwpr_predict(model, X, nstep*nstep, 0.001, Yp, NULL, NULL);

Here the declaration of the structure in an header file:

typedef struct LWPR_Model {

   int nIn;             

   int nInStore;       

   int nOut;           

   int n_data;          /**< \brief Number of training data the model has seen */


   double *mean_x;      /**< \brief Mean of all training data the model has seen (Nx1) */

   double *var_x;       /**< \brief Mean of all training data the model has seen (Nx1) */

   char *name;          /**< \brief An optional description of the model (Mx1) */

   int diag_only;       /**< \brief Flag that determines whether distance matrices are handled as diagonal-only */

   int meta;            /**< \brief Flag that determines wheter 2nd order updates to LWPR_ReceptiveField.M are computed */

   double meta_rate;    /**< \brief Learning rate for 2nd order updates */

   double penalty;      /**< \brief Penalty factor used within distance metric updates */

   double *init_alpha;  /**< \brief Initial learning rate for 2nd order distance metric updates (NxN) */

   double *norm_in;     /**< \brief Input normalisation (Nx1). Adjust this to the expected variation of your data. */

   double *norm_out;    /**< \brief Output normalisation. Adjust this to the expected variation of your output data. */

   double *init_D;      /**< \brief Initial distance metric (NxN). This often requires some tuning (NxN) */

   double *init_M;      /**< \brief Cholesky factorisation of LWPR_Model.init_D (NxN) */

   double w_gen;        /**< \brief Threshold that determines the minimum activation before a new RF is created. */

   double w_prune;      /**< \brief Threshold that determines above which (second highest) activation a RF is pruned. */

   double init_lambda;  /**< \brief Initial forgetting factor */

   double final_lambda; /**< \brief Final forgetting factor */

   double tau_lambda;   /**< \brief This parameter describes the annealing schedule of the forgetting factor */

   double init_S2;      /**< \brief Initial value for sufficient statistics LWPR_ReceptiveField.SSs2 */

   double add_threshold;/**< \brief Threshold that determines when a new PLS regression axis is added */

   LWPR_Kernel kernel;  /**< \brief Describes which kernel function is used (Gaussian or BiSquare) */

   int update_D;        /**< \brief Flag that determines whether distance metric updates are performed (default: 1) */

   LWPR_SubModel *sub;  /**< \brief Array of SubModels, one for each output dimension. */


   struct LWPR_Workspace *ws;  /**< \brief Array of Workspaces, one for each thread (cf. LWPR_NUM_THREADS) */


   double *storage;     /**< \brief Pointer to allocated memory. Do not touch. */


   double *xn;          /**< \brief Used to hold a normalised input vector (Nx1) */

   double *yn;          /**< \brief Used to hold a normalised output vector (Nx1) */

} LWPR_Model;

My function “gpu_lwpr_predict”, declared as:

void gpu_lwpr_predict(LWPR_Model *model, float **x, unsigned int nx, float cutoff, float **y, float *conf, float *max_w);

is defined in the second .cu file (“gpu_lwpr_aux.cu”).

To compile, I use gcc for the .c file and nvcc for the .cu file, here the shell commands i use:

gcc -I/home/cavadini/LWPR/lwpr-GPU/src/ -xc++ -c cross_gpu.c

nvcc -I/home/cavadini/LWPR/lwpr-GPU/src/ -c gpu_lwpr_aux2.cu

gcc  -L/home/cavadini/CUDA/cuda/lib/ -I/home/cavadini/LWPR/lwpr-GPU/src/ gpu_lwpr_aux2.o cross_gpu.o -o cross_gpu -lm -llwpr -lcudart

where /home/cavadini/LWPR/lwpr-GPU/src/ is the directory where my header files are, lwpr is the third party library I’m using.

Ok, now the problem. The compilation ends without errors but when i try to access to some of the fields of the structure withing the “gpu_lwpr_predict” function I get uncorrect numbers (fileds which obviously return correct results when accessed in the main, even after the calling of my function).

After some investigations, I’ve discovered that within my “gpu_lwpr_predict” function the addresses assigned to the fields of the structure are different (by a certain offset) from those assigned in the main. In particular, I obtain that:

In main():

&model->meta = 0x8055140

&model->meta_rate = 0x8055144

while in gpu_lwpr_predict():

&model->meta = 0x8055140

&model->meta_rate = 0x8055148

meaning, that a 4 bytes offset has been somehow introduced.

Now my question is naturally how is this possible and how I can solve it.

In order to see if I had done some mistake, I’ve tried compiing the .cu with gcc (leaving only the C code parts of my function) and the results obtained are correct, making me think that I’m somehow using nvcc in an erroneous way.

I hope to have been clear enough and I’m here for any question. Thanks a lot for any help.

P.S.: I was forgetting my configuration:

Distributor ID: Ubuntu

Description: Ubuntu 8.04.1

Release: 8.04

Codename: hardy

and I’m using the 2.0beta2 version of CUDA with 177.13 driver.

Ok, I think the problem is related to the -malign-double used by nvcc when launching gcc. The third party library I’m using probably has not been compiled with that option.
I’ll try solve the problem. Obviously any suggestion is very appreciated :)


I’ve solved the problem recompiling the library with the -malign-double flag. I’m sorry for having bothered you and thansk for your time.


Be careful with using doubles. On older hardware they’ll give issues (because they’re treated as floats by the GPU) and on the latest hardware they work fine but have much lower performance.

Can you explain a little bit more about -malign-double?