Only a fraction of the threads called are running, what's the issue?

Any ideas?

cudaMemcpy() takes a size in bytes (not in items, as it does not know the type of the items).

BTW your code as you are showing it here would never print “N = 20”, as the value of test_items is 0.