CUDA 4.0 and CUBLAS Update? Will CUBLAS functions be updated for multiple GPUs?

Hello

Since CUDA 4.0 will improve communication between GPUs, will CUBLAS be rewritten to take advantage of multi-gpu computing? For example, matrix multiplication for massively large matrices could greatly benefit by dividing the input across 2 GPUs, doing the matrix multiplication, and then merging them back.

Also, since CUDA 4 will work with MPI, will NVIDIA release any libraries on matrix operations on a CUDA cluster?