I noticed that if I compile using the -arch compute_13 flag, the resulting .cu_o file is more than double in size compared to that generated using the -arch sm_13 flag. However, there is no noticeable difference in performance between the two cases (running on a GTX 280). What is the main difference between the two flags?
According to the documentation, compute_13 stores more abstract GPU code which will then be compiled to the real device cubin at runtime. sm_13 buids the cubin at compile time.
Thanks for the clarification.