No, just the opposite; most of the intermediate quantities used in the quadratic sieve are about the size of the number you want to factor, but most of the runtime in a good QS code should be spent sieving and not performing multiple precision arithmetic. However, if the number to be factored is hundreds of digits long, all the GPUs in the world won’t help you get QS to complete the factorization. For practical purposes QS tops out factoring numbers around 90 digits in size before it’s better to switch to the number field sieve.

Bear in mind that sieve methods rely on low latency random access to large arrays of memory, but GPUs provide huge bandwidth and not low latency.

ECM on numbers hundreds of digits long is both easy and trivial to parallelize; see Dan Bernstein’s work on ECM using CUDA. If you’re new to both factoring and CUDA, I’d suggest starting there. You still need some multiple-precision arithmetic, and all the runtime will go into it, but running different elliptic curves in parallel is easy. The downside is that you probably will never find factors over about 60 digits in size, which means you’ll never get to break a decent size RSA key.