NDA expiration - new GF100 information

NDAs expired at 9PM.
Lots of sites have similar summaries:

[url=“NVIDIA's DirectX 11 Architecture: GF100 (Fermi) In Detail - AlienBabelTech”]NVIDIA's DirectX 11 Architecture: GF100 (Fermi) In Detail - AlienBabelTech
[url=“http://www.bjorn3d.com/read.php?cID=1778”]http://www.bjorn3d.com/read.php?cID=1778[/url]
http://www.hardocp.com/article/2010/01/17/…_facts_opinions
[url=“http://www.pcper.com/article.php?aid=858”]http://www.pcper.com/article.php?aid=858[/url]
[url=“Guru3D.com”]Guru3D.com
[url=“NVIDIA’s GF100: Architected for Gaming”]http://www.anandtech.com/video/showdoc.aspx?i=3721[/url]

Most of the information is about the previously undisclosed graphics architecture of Fermi, like tesselation, pixel sampling, etc.
There are some carefully chosen but impressive benchmarks against a Radeon 5870.

My favorite Fermi preview:

http://www.hardwarecanucks.com/forum/hardw…microscope.html

Ya just gotta look at the funny ATI 5000 video posted at XS. External Image

http://www.xtremesystems.org/forums/showth…120&page=50

(Post 1239)

Man, it’s not NF100. GF100.

Er, well, I guess we’ll have to call it NF100 now since the BBS software doesn’t let even the thread creator rename the title.

Whups!

Anand probably has the best summary:

With rumours of a third respin underway, it could be a while yet before any of those things are known with certainty.

Something has gone wrong with the server @ hardocp.com. It looks like it (at random points) is trying to exploit a buffer overflow in the browser.

The whitepaper they reproduce looks nice though. Can we have the original PDF from Nvidia instead, please? … Pretty please?

Example of odd link:

[codebox]http://www.hardocp.com/image.html?image=MTI2MzYwODIxNHh4VHN

0ekRuc2RfMl8yM19sLmdpZg==&ref=aHR0cDovL3d3dy5oYXJkb2NwLmNvbS9pbWFnZS5odG1sP2ltYWdl

PU1USTJNell3T0RJeE5IaDRWSE4wZWtSdWMyUmZNbDh5TVY5c0xtZHBaZz09

JnJlZj1hSFIwY0RvdkwzZDNkeTVvWVhKa2IyTndMbU52YlM5cGJXRm5aUzVv

Z

Ecxc1AybHRZV2RsUFUxVVNUSk5lbGwzVDBSSmVFNUlhRFJXU0U0d1pXdFNkV

0

15VW1aTmJEaDRUMVk1YzB4dFpIQmFaejA5Sm5KbFpqMWhTRkl3WTBSdmRrd3

p

aRE5rZVRWdldWaEthMkl5VG5kTWJVNTJZbE01Y0dKWFJtNWFVelZ2WkVjeGM

x

QXliSFJaVjJSc1VGVXhWVk5VU2s1bGJHd3pWREJTU21WRk5VbGhSRkpYVTBV

MGQxcFhkRk5rVjAxNVZXMWFUbUpFYURSVWFrVTFZekI0ZEZwSVFtRmFlakE1

U2

01S2JGcHFNV2hUUmtsM1dUQlNkbVJyZDNwYVJFNXJaVlJXZGxkV2FFdGhNa2

w

1Vkc1a1RXSlZOVEpaYkUwMVkwZEtXRkp0TldGVmVsWjJXa1ZqZUdNeFFYbGl

TRkphVmpKU2MxVkdWWGhXVms1VlUyczFiR0pIZDNwV1JFSlRVMjFXUms1VmJ

H

aFNSa3BZVlRCVk1HUXhjRmhrUms1clZqQXhOVlpYTVdGVWJVcEZZVVJTVldK

R

2F6Rlpla0kwWkVad1NWRnRSbUZsYWtFMVUyMDFTMkpHY0hGTlYyaFVVbXRzT

TF

kVVFsTmtiVkp5WkROd1lWSkZOWEphVmxKWFpHeGtWMkZGZEdoTmEydzFWa2M

x

YTFSWFNsWk9WRXBhWWtVd01Wa3daRXRYUmtwMFRsZEdWbVZzV2pKWGExWnFa

V

WROZUZGWWJHbFRSa3BoVm1wS1UyTXhWa2RXV0doWFZtczFWbFV5Y3pGaVIwc

E

laRE53VjFKRlNsUlZNakZYVW1zMVZtSkhhRk5TYTNCWlZsUkNWazFIVVhoal

J

taHJVbXMxY2xacVFYaE9WbHBZVFZkR1ZXSlZjRVpaVlZKVFZsWmFSbFpVUmx

w

bGEwa3dXa1ZhZDFOV1JuUlNiVVpzWVd0Rk1WVXlNREZUTWtwSFkwaEdUbFl5

Y

UZWVmJYUnpUVEZrVlZGc1RtdGlWa3A1V2tST2QxbFdTa1pPV0VwaFZteEtXR

n

BIZUd0V01rWkdaRWRvVG1FeWR6RldhMk14WVRGU1dGTnNXazlXUlhCaFdXdF

Z

kMDFXYTNkYVJYUllVbXR3TUZSc1pFZFdiVlp6VjJwS1dHRXhXbkZhVldST1p

V

WkdXV0pIYkZSU2EzQm9WbTF3UzFVeVRYaFdhMlJYVjBkb1dGWnRjekZXYkZW

N

VkzcEdhVkl3Y0VsYVJFNTNWakZLUmxOc1VsWk5ha1pZVlcxek1WWnRTa2hoU

m

s1VFlUTkNXbFpzVWtOV2F6RklWVmhvYWxKdGFISlZiWE14WTJ4YWNWRllhRT

l

XYkhCWlZGWmtSMVpYU2xaalJWcGFWbFpLVkZac1dtRmpNa1kyVW14d2JHRXd

h

M2RYYTFaaFpERk9WMUp1VWxOaVZWcHpXVmQwUmsxV1ZYbE5SRVpVVFd0d1NG

a

3dhRWRVYkZsNVlVWldWbUpZVW5wVVZFWnJWbFpHYzFSdGRHbFdhM0ExVjJ0U

1

QyUXhiRmRUYTFwUFYwVndhRlp0ZUV0WFJuQklaVWQwVjAxcldrZGFSV1J2Vk

c

xRmVXUjZSbGRoTWsxNFdWUkdVMWRHVG5OWGF6bFhVbGhDYUZkWGRGWmtNREZ

Y

WVROa1lWSllVbGxWYlhSM1RVWlNjMXBGWkZkaVZscDZWakp3UzFkSFJYaFhi

a

1poVmxkU1QxcFZXa2RYVjBwSVlrWlNVMkV6UW05V2JURjNVekZWZVZSWWFGZ

G

hNbEpZVmpCa2IxZEdXblJqZWtaWFlrWldOVmt6Y0VkaFZrbDNZMFZzWVZKRk

5

UTldha1pMVW14T2MxVnNXazVoYTFwWlZsY3hlazFXV25SVGEyaG9VbXMxVkZ

s

VVRrTlhiRnB6Vld0T1YyRjZSa2xXVm1odllXeEtkR0ZJU2xaaVdFMTRXVEo0

W

VdOV1JsbGhSVGxYWWtoQ1dsWkdXbXRTTVZwWVUyeGFhbEpXY0ZsWmJUVkRVM

F

paZDFkcmRGTmlWV3cyV1RCa1IxVXlTa2xSV0d4WVlrWndjbFY2Ums5U2F6Rl

d

ZVVUxVjAwd1NuZFdiVEUwWkRBeFIxZHNXbGhpV0ZKVVZGWlZNVmRHV25ST1Z

X

UldZbFZ3V2xWWGVGTldiVXAxVVc1c1lWWnNjR2hWYlhoWFkyczVXR0ZGTlZO

U

1ZXdDRWbTE0YTJReFdYaFZXR2hZWVd4d2NGVnFRbUZYUmxKV1YyMUdUazFYZ

E

RWYVZXaFBZV3hhY2s1WWJGVk5WMUYzVmtkNFQxTkdiM3BhUm1Sb1RXc3hORl

l

4V210U01WbDRWMjVHV0dKWGVGVlZha1pMWVVaa1dHUkhkRlpOVlRWSVdXdG9

V

MkpHVGtoVmJHaFdZa1pLV0ZZeFdsSmxSbkJGVVd4YVRtSkdjRFZXYWtwM1dW

W

lplRmRzWkZSaE0yaGhXbGQwZDFaR1ZuRlRhM1JUVFd0YVNWVnRlRTlVYXpGM

F

lVWlNWMkpVUlRCV1ZFWktaVVprZFZSc1pGaFNNbWhZVjFaU1QxVXhXbk5pU0

U

1WVlsVmFjMWxyWkRSbFZscFlZM3BXVjAxV2NGWlZiWFJ2Vm1zeGRWVllaRmh

X

YkhCb1ZtcEdZVmRYUmtkWGF6VlhUVlZ3U2xac1pEUlpWbGw1VkZob2FsSlhV

b

GRaYTFVeFYwWldkR1JJVGs5aVJuQjRWVEp3VTJGRk1VVlNhMmhYWWtkb2VsW

n

RNVXRUUjBaSFYyeGtUbEl4UlhkV2JYQkhZekpOZUZwSVZtRlNiRnBQVm1wT2

I

wNVdXbk5aTTJST1ZqQndTVlV5ZEc5V2JVVjRZMFprVlZaNlZrUldWVnB6Vmx

a

S2RGSnRkRk5XUjNoWFZtMHhOR1F5UmtaTldFcFBVMGhDV1ZacVRtOWpiRnBG

V

TJ4T1UwMVhVakZXVjNNeFZqSkZlR05HYkZkU2JGcG9Xa1JHVG1WR1pIVlViV

V

pVVWpGS1ZGWkdaREJaVjFaelZsaHNhMUo2YkZkVmJYaDNVakZyZDFkdFJsZG

l

WVlkwVm14b2IxZHJNWFZoUmxKWFlrWndVRmw2UmxOamF6bFhXa1prVTJKSVF

t

aFdiWFJUVWpGc1dGVllhR0ZTVjFKVldXdGtORlV4YkhSTlZGSllWbXh3ZUZa

S

GRFOVdWa3B6WTBWYVZrMXVhRXhaVkVGNFl6RmtjbU5HWkZkTk1taFpWMVphY

T

FNeFRsZFNibFpoVW0xU2NGVnFTbTVsYkdSWlkwVjBWbUpXV2tsVk1uUnpWbG

R

LVmxkc1VsZGhhMXBNV2tSR2ExZEhVa2hQVmtwT1ZtNUNXVlpzWXpGVU1XUjB

W

bTVPV0dKVWJGWldiWGhoVFd4U1dHVkdaR3RTVkZaYVYydGFhMkZXU2xsUmEy

a

FlWa1ZLZGxwRVJuTldNVTV6V2tkd1ZGTkZTbGxXUmxaaFpEQXdlRlpZWkdGU

2

JWSlVWRlpXZDJWc1pISlhiWFJXVFZad01WVlhlSE5XTWtwWllVWlNWbUZyV2

x

kYVZWcFRWMWRHUjFkdGJHbFNXRUphVmpGamVHVnJNVmhVYTJSWFltdHdXRlp

yV

m1GWFJteHlWMnQwVjAxV2JEVmFSVkpEVmpGS1ZWSnNjRlpXTTJoeVZtcEtTM

Ul4VG5KaFJtUm9UVmhDYUZkclVrZGhNazV6V2toT1lWSnRVazlVVnpGdlRsW

m

FkRTFVUWxwV2F6RTBWako0YTFZeVNuSlhiRnBhWVRKU2Rsa3dXbGRqVmtweV

d

rVTFUbFp1UWpaV2JHUXdUVVphY2sxV2FGWmhNbWhZVm1wT2IyVnNiRmRYYTN

S

cVRXczFTbFZYZUd0aFZscEhWMnRXVjJKR1dtaFpWRVpyVTBaV2NscEdWbWxX

T

TJoM1ZtMHhOR1F4VGxkWFdHeHJVbTFTV0ZSV1duZFRWbFowVGxWMFZVMVhVa

2

RaTUZwdlYyMUtSMU5yVGxWV2JIQlVXWHBHYTJSV1VuUmpSMnhUVFRGRmVWWn

F

TakJoTWsxNFYxaGtUbFpXV2xoV01HUlRWMFpzYzFWclRrNU5XRUpYVmpKNGE

x

WXdNVmhWYkhCWVlURndkbGxWVlhka01VNXpZa1prYVZkRlNsRldWbEpIVXpG

S

mVWTnJaR2hTTTJoVVZqQmFTMVpzV2xobFJrNXFUVlp3ZWxac2FITmlSa3AwV

l

d4a1ZWWnNXak5XYlhoaFpFVTFXVlJzWkU1V1dFSTJWbFJLTUU1R1dYaFhhMX

B

UVjBkb1dGbHNhRTVsUmxweFUydGtVMDFWV25wWGEyUnpZVVV4U1ZGdGFGZGl

X

R2h5Vkd0a1NtVkdTbGxpUm1ScFVqRktiMVpYTUhoaU1rNVhWbTVPYUZKck5W

a

FphMXBMVjFad1ZsWnRkRlZoZWtaYVZWZDRhMWRzV2xkalJrSlhZV3RhWVZwR

V

NrOVRWMDVIWTBVMVUwMXRhRVpXYlRGM1VqSkZlRlJzWkZOaWJGcHhWV3BDWV

Z

aR1ZuTlhibVJQWWtkU1dsa3dWbXRoTURGV1kwaHdXazFHV25aWlZscEtaVmR

X

UjFac1dtbFNiRzk2Vm0xd1IxbFhUbk5qUlZaV1lsZG9iMXBYZUdGWGJGcFla

V

WM1YVUxV2NFbFZiVFZQWVd4S2MxTnRSbHBpUmtwWVZXdGFZV05XU25OYVIzQ

n

BVbTVCZUZZeWRHdE9SMFpYVkd0YVZHRXlhRmhaVjNSeVpVWmFSVkp0ZEZOTl

Z

UVXhWbTE0UzJGRk1YVmhSemxYWWxob2FGcEVSazlTTVdSMVVteE9hVlpXY0Z

W

WFYzUnJWVEpXYzFwSVNsaGlXRkpZVkZaV2QxZEdhM2RYYkdSWFRWZFNTbFZY

Z

Ec5V01WcEdZMFpTVm1GclduSmFSbHBoWXpGYWRHSkdaRTVOYldoWFZtMTRhM

D

VHVlhoWFdHaFlWMGRvV1ZsVVNsTlhSbXh6Vm0xR1YxWnNTbGhXYkdodlZERk

t

kR1JFVGxkTmFrWklWbXBCZUZZeVRrWmFSbVJPWW0xb1dWWnRjRXRTTWs1elZ

H

NVNhRkpzY0hCV01HUnZZakZhY1ZGdGRGZE5WVFZKVmxkMGExbFdTbkpPVnps

V

lZrVktURll3V21Ga1IxWklaRWQwVGxacldUQldNblJYWWpGYVNGSlliR2hTY

l

ZKWVZGVmtVMVpHV1hkWGJVWnFZa1p3TVZkcldtdFZNa3BJWkROd1YyRnJiek

J

XUkVaclVqRmtXVnBIY0ZOV01taFpWa1prZDFJeVZuTlhia1pVWVROU2NWbHJ

W

VEZsYkZaMFpVZDBWV0pHYkRSVmJYQlBWakpLUjFOcmVGWk5ha1pUV2tSQk1W

W

XlTa2RXYld4WFZtNUNWbFp0TUhoT1IxRjRWMjVPWVZKdFVtaFZNR2hEVkRGY

W

NWRlVSbXhpUm13MVZHeGFUMVpYU2taalJXaFhUVzVvUkZacVNrdFdWa3BWVV

c

xR1YySklRbmxXYlhoaFZUSlNTRlZyYUdoU2JWSndWV3BPYTA1c1duSmFSRkp

X

VFZWV05GWXhhSE5XUjBwSFYyeE9XbFl6YUV4V01GcGhaRWRXU1ZwR2NGZGlh

M

HBIVmxjd2VFMUhSblJUYkZwUFZsZG9XVmxVUm5kamJGcElZek5vVTJKVk5VZ

F

dNbmhyVkd4YWRWRnViRmRTTTFKb1ZrY3hWMUl4Vm5WVGJHaHBVbFZ3V1ZkWG

R

HRmtNVnB6V2tab2ExSXdXbUZXYlhoM1UwWmFkRTVWT1ZkaVZYQkpWbGMxVDF

Z

eVNsbGhSRTVXVFVad2FGWXdaRk5TVmtaeldrZG9iR0V4Y0VoV2JUQjRUa2RG

Z

UZkc2FGUmhNbEp4VlRCYWQxZEdXbk5YYm1Sb1VtNUNTRll5ZEd0aGF6RnlZM

F

ZvVjAxcVJucFdWRUY0WkVkR1JsZHNaRTVpYldodlZqRmFhMUp0VmtkalJWcG

9

VbXhhYjFSV2FFTmxWbHBZVFZSU1dsWnNSalJaYTJoTFZqSktTVkZ1VGxwaE1

s

SlVXbGQ0WVdSRk1VVldiR1JvWld0YVdWZFVRbUZqTVZweVRWaEdWMkpGU21G

V

2FrNXZWVVpXZEdWRk9XcE5hMXBJVjJ0Vk1WVXdNWFJoUlZwWFRXNW9jbFJyW

k

ZKbFJrNXpXa1pTYVdKR2NGZFhWM2h2VVRGUmVGZFlaRmRoTTFKVlZXMTRjMD

V

XY0VaYVJFSm9ZWHBHZVZZeWVHOVdNVWw2Vlcxb1YyRXlVa3hWYWtwUFVqSkd

S

MWR0YUdobGJGbDVWakZTUjFsV1ZYbFVXR2hxVWxad1dWbHNhRzlXUm14Wlkw

W

mtUMUpzY0hoVk1uQlRWbFV4Y2xkc2JGWk5hbFpNVm10a1MxTkhSa2RoUm1Sb

1

lYcFdWVlpxUW1Ga01VcFhWRzVPWVZJelFrOVpXSEJYVTBaYWNWTnFVbWxOYT

F

wWlZXMXdZVll5U2tsUmJHaFhZbTVDUTFwVldsZFdWa3B5VDFaa1RtRjZWa2x

X

YWtvMFlqSkdSazFZVGxSaVIzaFlWVzE0ZDJGR2NFVlNhM0JzVW0xU1dsbHJa

R

FJWTURCNFUyeHNWMUpzY0doYVJFWktaVVprV1dKSGVGTmhlbFpaVjFkNFlWb

F

hWbGRYV0d4c1VtczFZVlp0ZEhkWFJtdDNXa2M1V2xacmNIcFpNRnBUV1ZaS1

d

GVnJlRlpoYTFwVVdYcEdVMk14Y0VkVWJXeFVVbFZ3YUZadE1IZGxSVEZIWWt

a

a1dGZEhhSEJWYlRGVFZteFpkMXBIT1ZkaVJtd3pWMnRqTldGc1duTlhha0po

V

mxkb2NsWXdXa3RqTWs1SllrWmthVlpGV2tsV2JYQkhXVmROZVZScmJHaFNiS

E

JZV2xkMFlWTldXbk5WYTNSVVRWVnNOVlpIZEd0aFZrbDNWMnhvVjJGcldraF

V

WRVpoWkVkT1JscEdVazVoZWxWM1ZtdGtNR0V4WkVoVGEyaFdZa2RvVmxaclZ

r

dFVSbHBYVjIxR2FsWnJXakJhUlZwVFZHeGFWVlpyYkZkaVZFVXdXWHBHYzFZ

e

FRuVlZiRnBwVWpKb1dGZFhkR3RpTVVwSFYyNU9XR0pWV25GV2JYUmhUVVpXZ

E

dWSFJsVmlSbXcwVlRKNGQxWXhXa1ppTTJSYVZrVmFjbFV3V2s5ak1rWklaVV

p

PYVZZeWFHOVdNV1EwWWpGVmVGVnJaRmhpUjFKWldXdG9RMk14Vm5ST1ZVNVR

Z

a1phV1ZwVlpFZFdWMHBXVm1wV1lWSkZOVE5XYWtaS1pXMUdSMkZHY0ZkU1Zu

Q

TJWbTF3UjFReVRYaGpSV1JvVWpKNFZGbHJhRU5pTVZwMFRWaGtWVTFXU2pCV

2

JYUnJZVVV3ZVdWSVRsWmhhMHBvV1RKNFYyTXhWbkphUms1cFVtNUJkMWRVVG

5

kV01WbDNUVmhLYWxKdVFtRldiRnBMVjBaYWNWRllhRmROVlRWNlZWZDRkMVl

4

U2xkalJteFhZbGhDU0ZkV1pFNWxSbVIxVTJzNVYyRjZWbFZXUmxKQ1pEQXhV

b

EJVTUQwPQ=="[/codebox]

http://www.nvidia.com/object/IO_86775.html

I didn’t know it would support Out of Order execution. From my somewhat limited hardware knowledge i understand this is used to utilize instruction level parallellism?

How does this effect the way you write your CUDA code? Or will it simply give a performance boost? External Image

The description of the out-of-order execution in the Anandtech article wasn’t very clear. They only mentioned it in the context of these “Polymorph Engines”, and not the CUDA cores themselves.

One general sense I’m getting from these articles is that Fermi is continuing the process of putting graphics and computing on equal footing in the hardware. We know there were features in the GT200/G92/G80 chips to support graphics processing that CUDA could not directly access. We could see the texture cache, but the articles (for example) mention FIFOs that were used to pipeline data between render operations. Now that is going to be done with the L2 cache, which means that CUDA kernels will also be able to efficiently operate with a pipeline architecture, if that makes sense for the problem.

(I wonder if this means we will see atomic operations that will make it easier to operate a ring buffer in global memory…)

Well for one it means that you’ll be able to execute multiple kernels simultaneously. Since the SM’s can be running different kernels, it means that if you have several kernels lined up to execute, when the last bits of kernel #1 are finishing (and perhaps not using the entire card), kernel #2 can just start executing on the idle SM’s instead of having to wait for kernel #1 to completely finish.

Also, some of the other hardware improvements will also change CUDA coding a bit; for example, the texture fetching is much faster and per SM, which means that you’ll probably using them a lot more often on Fermi (since they can give you a big processing boost if your application can use them). Having more cache/shared memory on the chip will also let you port some things to CUDA that you previously couldn’t (or that were ported and didn’t perform very well because there wasn’t enough shared memory for the algorithm to use).

Yeah I think the “out of order” references are nothing to do with the shader core execution model, but rather with the geometry pipeline (remembering that “hardware” tessellation support is now mandatory in DX11).

@avidday

That makes a bit more sense. I’ve heard that OoO is complicated and on CPUs you rarely get the theoretical boosts that it promises. Making the cores much more complex doesn’t really sing in tune with GPU architectures…

I’m working in company that produces medical appliances and quality assurance is very important.

The speedup of using CUDA is very convincing, but there are two very important questions.

  1. Will fermi have full hardware debug support, like on the CPU?

  2. Will there be memory protection or a kind of MMU to detect segmentation faults and to avoid crashing the whole PC with a buggy kernel? Crashing the application is inevitable, but crashing the whole PC must be impossible.

In most cases the emulation mode is not an option, since the execution is different than on the GPU and many errors can’t be found in emulation mode. I had often the case, that the kernel runs in emulation mode but crashes the whole PC when running on the device. That’s very anoying to “debug” by searching every possible line of code that might cause this behavior and often that are index errors when accessing data, which are hard to find. Better error messages would be also a great help.

I didn’t find exact information on these topics, though the fermi whitepaper states.

What is exactly meant by "besser debugging support and will there be memory protection to avoid the problems I mentioned above?

Hi,

Its not that related to Fermi, but NVIDIA has already put a cuda-gdb debugger for linux (much like the regular gdb debugger)

and its windows version (Nexus) is in beta - a very impressive tool. You also have some few emulators like Ocelot and something from TotalView I think.

The best thing is to make sure the kernel is not buggy :) second on linux its usually less of a problem. On windows I have a GTX280 and a C1060. I usually

debug on the C1060 and even if it crashes or doesnt function properly no harm is done.

eyal

Your solution is good for development, but it requires two GPUs and if you plan to distribute your software to the client you can’t demand 2 GPUs. In my PC it is even not possible to put two PCI-E graphics cards into it, since it has only one x16 PCI-E slot and the power supply is to weak. If you intend to have 3D rendering in your application and CUDA at the same time you would need two powerful graphics cards.

The second point, you cannot be sure that your program is 100 % correct. So there is always a chance that your application (kernel) can crash and when you are running the graphics driver on your CUDA card your PC can crash completely. That can happen during development (good) or when it’s already running on the client’s PC (very bad).

Another problem are “silent segementation faults”. Your programm access data that is no malloced by cudamalloc without any error message. You assume that your programs runs fine, but it doesn’t and you receive a wrong result and no error message. On the CPU you would get very probably a segmentation fault and you would know that your program has an error.

For safty critical applications memory protection is a “must have” feature and not and not an optiona onel. x86 CPUs support this, since they have a MMU. A full MMU is probably to much overhead for a GPU, but there must be some kind of memory protection and a mechnism to detect segmentations faults.

The introduction of GPUs to industry products depend on these key features that allow to assure the quality of software products. Like I said before a PC must not crash completly today due to a user application. The time ended with the introduction of Windows 2000.

CapJo,

You have raised valid points…Especially the silent failures which are not caught… Thats quite dangerous. One way would be for the driver to fill the memory regions (that contribute to silent faults) with some debug values and then examine them after your kernel completed and report errors. This could be a debug feature…that can be selectively enabled…

On light vein, I cant help making this comment… Pls take it easy…

Yeah… Right… Systems crash due to operating system itself… lol… Actually thinking of it, Even windows 95 was a 32-bit OS utilizing 386 features… Isnt it? So, the mem protection is there from winnt, win95 time itself… But probably the operating system crash stopped by 2000… And yeah, XP is reasonably stable and I hear windows 7 too is pretty good…

Windows NT was the first Microsoft operating system that had a proper memory protection, but Windows 2000 and especially Windows XP was broadly available and real success. Windows 95 / 98 / ME don’t have this feature.

Try to crash your PC with a C or C++ program by writing a user mode application on the CPU (not using any special operating system commands to change clock speed or by filling the host memory etc.). You won’t be able to do that.

With CUDA you can crash your PC very easily only by implementing a matrix multiplication and having some indicies wrong.

Most crashes on Windows are due to driver issues, that are running in supervisor mode without memory protection, like in CUDA.

Back to topic.

Are there more information from NVIDIA on this topic?

I had silent segmentation faults on a CPU many times. It is not hard to exceed array bounds in a C++ program and overwrite some other variable which is also yours, so OS does not complain. On a GPU most of the memory is “yours” as there are no other programs besides your kernel, running in parallel.

Most of my (silent or not) segmentation faults on GPU are because of array out-of-bounds actions. To immediately track the problem during debugging I simply check the index on every access.

template <typename T>

class Array {

private:

	unsigned int size;

	T *data;

public:

	   [...] //some boring constructors

	__host__ __device__ inline T &operator[](int addr) {

		if (addr<0 || addr>=size) {

			debugLong(60,"<"<<blockIdx.x<<":"<<threadIdx.x<<"> Out of bounds! ["<<size<<"]->"<<addr);

			return data[0];

		}

		return data[addr];

	}

	__host__ __device__ inline const T &operator[](int addr) const {

		if (addr<0 || addr>=size) {

			debugLong(60,"<"<<blockIdx.x<<":"<<threadIdx.x<<"> Out of bounds! ["<<size<<"]->"<<addr);

			return data[0];

		}

		return data[addr];

	}

};

(debugLong is a macro which prints a message on the screen from within the kernel, something simmilar to cuPrintf)

Thats not entirly correct, you can use an onboard card (I think it must support CUDA) but is chip and will be responsible for the screen and the “other” GPU will run the kernels.

The second point, you cannot be sure that your program is 100 % correct. So there is always a chance that your application (kernel) can crash and when you are running the graphics driver on your CUDA card your PC can crash completely. That can happen during development (good) or when it's already running on the client's PC (very bad).

Right, much like any other software… I caused many deadlocks to the system with “simple non GPU” multi-threaded code.

You have to have a good QA department and you should run your code through tools such as valgrind, Ocelot, a simple test like detailed

below from Cygnus ( very nice, btw :) )

Right again for every software in the world. Blue screens is something that NVIDIA didnt invent it was some one else.

Bottom line - the tools, debuggers, profilers and validators for GPUs will improve during time, but it doesnt say you cant provide a stable

environment with the current hardware and tools

my 1 cent, anyway :)

eyal