If you omit CUT_DEVICE_INIT, the initialization will occur on the first call to a cuda runtime function.
“warm up” usually refers to initializing the device, allocating memory, etc… before benchmarking timing begins. These are actions that are only performed at the beginning of the programs execution and thus won’t contribute significantly to long running applications.
It’s probably also worth noting to a fellow newcomer this extract from NVIDIA CUDA SDK\common\cutil_readme.txt, if you weren’t already aware of it:
From looking at the code, all CUT_DEVICE_INIT() actually does is check to see that you have a valid CUDA device and let you specify an alternate device by the command argument --device.