tests on r/w device and host-device cudaMemcpy memory bandwidth issues

yk_cadcg · March 12, 2007, 9:56am

Hi,
1, my kernel reads from one device array d_Src and writes them into d_Dst. When the array elements are of int4, it’s blazing fast; when I changed int4 into Pos, where Pos is my own-defined struct:
struct Pos
{
int x, y, z,w;
}
it’s 4~5 times slower. why’s that?
btw, the int4 bandwidth is about 70G/s in a rough approximate. What’s the peak bandwidth of G80, please?

2, I use cudaMemcpy() to test device-memory bdwd, roughly, copy-in is 0.6G/s, copy-out is 0.8G/s. What’s the peak bandwidth of copy-in/out, please?
thanks!

prkipfer · March 12, 2007, 12:16pm

see section 6.1.2.1 of the programming guide.

Peter

yk_cadcg · March 12, 2007, 2:48pm

Thank you very much.

1, if I write

struct align(16) {int x, y, z, w};

then where do I write the struct name “pos”?

“struct Pos align(16) {int x, y, z, w};” or

“struct align(16) Pos {int x, y, z, w};” or

or other orders all get error.

2, bandwidth of r/w global memory, and that of host-device cudaMemcpy, are still unmentioned in 6.1.2.1?

Thanks!

prkipfer · March 12, 2007, 2:54pm

struct align Name {};
section 5.1

It doesn’t hurt if you read the entire manual btw.

Peter

yk_cadcg · March 12, 2007, 3:17pm

Thanks!

Anybody has passed below code in .cu? thanks!

struct align Pos

{

int x;

int y;

int z;

int w;

};

Still don’t know the bandwidths, in forms of Gbytes/s.

Mark_Harris · March 12, 2007, 3:19pm

If you change the align to align(16) I think your Pos struct will be identical to the int4 struct defined in vector_types.h

Mark

yk_cadcg · March 12, 2007, 3:24pm

Thanks! I’m using sdk0.8,

this passed compiling:

struct Pos

{

int4 x;

};

this also passed compiling:

struct /align(16)/ Pos

{

int x;

int y;

int z;

int w;

};

but this failed compiling with error outputs:

struct align(16) Pos

{

int x;

int y;

int z;

int w;

};

error:

      expected an identifier

struct align(16) Pos

error:

      expected a ";"

struct align(16) Pos

prkipfer · March 12, 2007, 3:44pm

OK, maybe I wasn’t verbose enough. The following compiles and works nicely for me:

struct __align__(16) Pos

{

  int x;

  int y;

  int z;

  int w;

};

As I said, section 5.1. I cite

Peter

yk_cadcg · March 13, 2007, 8:44am

Thank you very much! I was using the old Jan15 progGuide and gave you troubles :) My account has no power to install Feb6 currently. Please forgive me for following progGuide Jan15’s align sample code align(16) , which might be a typo.

Thanks again:)

OK, maybe I wasn’t verbose enough. The following compiles and works nicely for me:
struct __align__(16) Pos

{

  int x;

  int y;

  int z;

  int w;

};
As I said, section 5.1. I cite

Peter

[snapback]170666[/snapback]