openCL --- weird behavior

I am using the following parameters for my simulation on Geforce GT 220 card -

number of compute units = 6
local size = 32
global size = 326256 = 49152
(everything is one dimensional)

But in the Visual Profiler, I see that Number of work groups per Compute Unit = 768. Which means it is utilizing only 2 compute units. Why is that? How can I make sure all the compute units are busy? I mean, ideally, I would expect 49152/(32*6) = 256 work groups per compute unit. I am confused at this behavior.

Thanks in advance.