+0.25 15 years of FP64 segmentation, and why the Blackwell Ultra breaks the pattern

Name: HRCB Evaluation: 15 years of FP64 segmentation, and why the Blackwell Ultra breaks the pattern
Item: 15 years of FP64 segmentation, and why the Blackwell Ultra breaks the pattern
Rating: 0.247
Author: HN HRCB

Y	HN HRCB new \| past \| comments \| ask \| show \| by right \| domains \| dashboard \| about hrcb

+0.25	15 years of FP64 segmentation, and why the Blackwell Ultra breaks the pattern (nicolasdickenmann.com)
	202 points by fp64enjoyer 5 days ago \| 77 comments on HN \| Mild positive Editorial · vv3.4 · 2026-02-24

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Weighted Mean	+0.25	Unweighted Mean	+0.23
Max	+0.30 Article 19	Min	+0.20 Article 23
Signal	3	No Data	28
Confidence	3%	Volatility	0.05 (Low)
Negative	0	Channels	E: 0.6 S: 0.4
SETL	ND

Evidence: High: 0 Medium: 1 Low: 2 No Data: 28

Theme Radar

Domain Context Profile

Element	Modifier	Note
Privacy	—	No privacy policy or tracking disclosure observable on-domain.
Terms of Service	—	No terms of service observable on-domain.
Accessibility	—	No accessibility statements observable on-domain.
Mission	—	No mission statement observable on-domain.
Editorial Code	—	No editorial code of conduct observable on-domain.
Ownership	—	Personal blog; no ownership restrictions observed.
Access Model	—	Content appears openly accessible, no paywalls or restrictions observed.
Ad/Tracking	—	No advertising or tracking mechanisms observable in provided content.

HN Discussion 10 top-level comments

wtallis 2026-02-19 03:11 UTC link

It's amazing to step back and look at how much of NVIDIA's success has come from unforeseen directions. For their original purpose of making graphics chips, the consumer vs pro divide was all about CAD support and optional OpenGL features that games didn't use. Programmable shaders were added for the sake of graphics rendering needs, but ended up spawning the whole GPGPU concept, which NVIDIA reacted to very well with the creation and promotion of CUDA. GPUs have FP64 capabilities in the first place because back when GPGPU first started happening, it was all about traditional HPC workloads like numerical solutions to PDEs.

Fast forward several years, and the cryptocurrency craze drove up GPU prices for many years without even touching the floating-point capabilities. Now, FP64 is out because of ML, a field that's almost unrecognizable compared to where it was during the first few years of CUDA's existence.

NVIDIA has been very lucky over the course of their history, but have also done a great job of reacting to new workloads and use cases. But those shifts have definitely created some awkward moments where their existing strategies and roadmaps have been upturned.

jjmarr 2026-02-19 03:20 UTC link

FP64 performance is limited on consumer because the US government deems it important to nuclear weapons research.

Past a certain threshold of FP64 throughput, your chip goes in a separate category and is subject to more regulation about who you can sell to and know-your-customer. FP32 does not matter for this threshold.

https://en.wikipedia.org/wiki/Adjusted_Peak_Performance

It is not a market segmentation tactic and has been around since 2006. It's part of the mind-numbing annual export control training I get to take.

throwaway81523 2026-02-19 04:07 UTC link

No mention of the Radeon VII from 2019 where for some unfathomable reason AMD forgot about the segmentation scam and put real FP64 into a gaming GPU. From this 2023 list, it's still faster at FP64 than any other consumer GPU by a wide margin (enterprise GPU's aren't in the list). Scroll all the way to the end.

https://www.eatyourbytes.com/list-of-gpus-by-processing-powe...

gdiamos 2026-02-19 04:09 UTC link

I'm not sure why the article dismisses cost.

Let's say X=10% of the GPU area (~75mm^2) is dedicated to FP32 SIMD units. Assume FP64 units are ~2-4x bigger. That would be 150-300mm^2, a huge amount of area that would increase the price per GPU. You may not agree with these assumptions. Feel free to change them. It is an overhead that is replicated per core. Why would gamers want to pay for any features they don't use?

Not to say there isn't market segmentation going on, but FP64 cost is higher for massively parallel processors than it was in the days of high frequency single core CPUs.

SubiculumCode 2026-02-19 06:27 UTC link

To me it is crazy that NVIDIA somehow got away with telling owners of consumer grade hardware.that they cannot be used in datacenters.

botusaurus 2026-02-19 06:35 UTC link

this article is so dumb. NVIDIA delivered what the market wanted - gamers dont need FP64, they dont waste silicon on it. now enterprise doesnt want FP64 anymore and they are reducing silicon for it too

weird way to frame delivering exactly what the consumer wants as a big market segmentation fuck the user conspiracy

juleiie 2026-02-19 08:43 UTC link

I hope for their fall. I invest in their success

numbers_guy 2026-02-19 09:09 UTC link

A question that has been bugging me for a while is what will NVIDIA do with its HPC business? By HPC I mean clusters intended for non-AI related workloads. Are they going to cater to them separetely, or are they going to tell them to just emulate FP64?

adrian_b 2026-02-19 10:18 UTC link

While implementing double-precision by double-single may be a solution in some cases, the article fails to mention the overflow/underflow problem, which is critical in scientific/technical computing (a.k.a. HPC).

With the method from the article, the exponent range remains the same as in single precision, instead of being increased to that of double precision.

There are a lot of applications for which such an exponent range would cause far too frequent overflows and underflows. This could be avoided by introducing a lot of carefully-chosen scaling factors in all formulae, but this tedious work would remove the main advantage of floating-point arithmetic, i.e. the reason why computations are not done in fixed-point.

The general solution of this problem is to emulate double-precision with 3 numbers, 2 FP32 for the significand and a third number for the exponent, either a FP number or an integer number, depending on which format is more convenient for a given GPU.

This is possible, but it lowers considerably the achievable ratio between emulated FP64 throughput and hardware FP32 throughput, but the ratio is still better than the vendor-enforced 1:64 ratio.

Nevertheless, for now any small business or individual user can achieve a much better performance per dollar for FP64 throughput by buying Intel Battlemage GPUs, which have a 1:8 FP64/FP32 throughput ratio. This is much better than you can achieve by emulating FP64 on NVIDIA or AMD GPUs.

Intel B580 is a small GPU, so it has only a FP64 throughput about equal to a Ryzen 9 9900X and smaller than a Ryzen 9 9950X. However it provides that throughput at a much lower price. Thus if you start with a PC with a 9900X/9950X, you can double or almost double the FP64 throughput for a low additional price with an Intel GPU. Multiple GPUs will proportionally multiply the throughput.

The sad part is that with the current Intel CEO and with NVIDIA being a shareholder of Intel, it is unclear whether Intel will continue to compete in the GPU market, or they will abandon it, leaving us at the mercy of NVIDIA and AMD, which both refuse to provide products with good FP64 support to small businesses and individual users.

Sweepi 2026-02-19 13:25 UTC link

Table to compare Blackwell U300 to U200 (-97% FP64 performance): https://www.forum-3dcenter.org/vbulletin/showpost.php?p=1380...

Score Breakdown

Preamble Preamble

Content is technical/commercial analysis with no observable connection to UDHR preamble dignity, equality, or inalienable rights themes.

Article 1 Freedom, Equality, Brotherhood

No observable editorial or structural content addressing human dignity or equal rights.

Article 2 Non-Discrimination

No observable content addressing discrimination or distinction on any enumerated basis.

Article 3 Life, Liberty, Security

No observable content addressing life, liberty, or security of person.

Article 4 No Slavery

No observable content addressing slavery or servitude.

Article 5 No Torture

No observable content addressing torture or cruel treatment.

Article 6 Legal Personhood

No observable content addressing right to recognition as person before law.

Article 7 Equality Before Law

No observable content addressing equal protection before law.

Article 8 Right to Remedy

No observable content addressing remedy for rights violations.

Article 9 No Arbitrary Detention

No observable content addressing arbitrary arrest or detention.

Article 10 Fair Hearing

No observable content addressing fair trial or due process.

Article 11 Presumption of Innocence

No observable content addressing criminal liability or presumption of innocence.

Article 12 Privacy

No observable content addressing privacy, family, home, or correspondence.

Article 13 Freedom of Movement

No observable content addressing freedom of movement or residence.

Article 14 Asylum

No observable content addressing asylum or refuge.

Article 15 Nationality

No observable content addressing nationality or citizenship.

Article 16 Marriage & Family

No observable content addressing marriage, family, or property rights.

Article 17 Property

No observable content addressing property rights or deprivation thereof.

Article 18 Freedom of Thought

No observable content addressing freedom of thought, conscience, or religion.

+0.30

Article 19 Freedom of Expression

Medium A: Advocacy for technical information accessibility F: Framing of hardware constraints as market segmentation strategy

Editorial

+0.30

Structural

SETL

Combined

Context Modifier

Content presents technical analysis of GPU market segmentation with implicit advocacy for transparency regarding artificial performance limitations on consumer hardware. Discusses how contractual restrictions (EULA) replaced implicit technical segmentation—noting the 'divisive move' signals mild editorial lean toward information freedom and access transparency. However, content is primarily descriptive technical analysis rather than advocacy for freedom of opinion/expression itself.

Article 20 Assembly & Association

No observable content addressing freedom of assembly or association.

Article 21 Political Participation

No observable content addressing political participation or democracy.

Article 22 Social Security

No observable content addressing social security or welfare rights.

+0.20

Article 23 Work & Equal Pay

Low A: Implicit advocacy for market accessibility and barrier removal

Editorial

+0.20

Structural

SETL

Combined

Context Modifier

Content discusses how market segmentation artificially restricts access to computational capabilities for researchers, startups, and hobbyists. Implicit framing that restrictions on consumer GPU datacenter use limits economic opportunity and work access. However, content does not directly address work rights, fair wages, or conditions of employment—connection is tangential.

Article 24 Rest & Leisure

No observable content addressing rest, leisure, or reasonable working hours.

Article 25 Standard of Living

No observable content addressing standard of living, health, or social services.

Article 26 Education

No observable content addressing education or its free/equal character.

+0.20

Article 27 Cultural Participation

Low A: Implicit advocacy for scientific research accessibility

Editorial

+0.20

Structural

SETL

Combined

Context Modifier

Content documents technical progress in GPU computing and emulation schemes that enable research on consumer hardware. Discussion of AI boom enabling 'researchers, startups, and hobbyists' to conduct meaningful computational work carries implicit lean toward access to scientific/technological progress. However, this is framing of technical capability rather than direct advocacy for cultural or scientific participation rights.

Article 28 Social & International Order

No observable content addressing social/international order or realization of rights.

Article 29 Duties to Community

No observable content addressing duties to community or limitations on rights.

Article 30 No Destruction of Rights

No observable content addressing interpretation or destruction of rights.

build fc56cf0+0q5s · 2026-02-25 01:32 UTC