streams: feature is not yet implemented in 64 bit Ubuntu 9.04

I recently switched from 32 bit Ubuntu 9.04 to 64 bit Ubuntu 9.04.

I am using 180.44 NVIDIA driver that comes with the unrestricted drivers released by Ubuntu

I have installed 2.2 Cuda, since I could not get 2.3 to run with the 180.44 driver, and newer drivers would not work correctly (always would go into low graphics mode or hang X-windows.

Everything compiles successfully in the SDK,

the output of deviceQuery is:

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "GeForce 8700M GT"

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 267714560 bytes

  Number of multiprocessors:					 4

  Number of cores:							   32

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.25 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									Yes

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

Device 1: "GeForce 8700M GT"

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 268173312 bytes

  Number of multiprocessors:					 4

  Number of cores:							   32

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.25 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 No

  Integrated:									Yes

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit

if I run simpleStreams I get:

running on: GeForce 8700M GT

cudaSafeCall() Runtime API error in file <simpleStreams.cu>, line 132 : feature is not yet implement

I never got this error running 2.2 Cuda on the 32 bit version. Any suggestions??