It's amazing to step back and look at how much of NVIDIA's success has come from unforeseen directions. For their original purpose of making graphics chips, the consumer vs pro divide was all about CAD support and optional OpenGL features that games didn't use. Programmable shaders were added for the sake of graphics rendering needs, but ended up spawning the whole GPGPU concept, which NVIDIA reacted to very well with the creation and promotion of CUDA. GPUs have FP64 capabilities in the first place because back when GPGPU first started happening, it was all about traditional HPC workloads like numerical solutions to PDEs.
Fast forward several years, and the cryptocurrency craze drove up GPU prices for many years without even touching the floating-point capabilities. Now, FP64 is out because of ML, a field that's almost unrecognizable compared to where it was during the first few years of CUDA's existence.
NVIDIA has been very lucky over the course of their history, but have also done a great job of reacting to new workloads and use cases. But those shifts have definitely created some awkward moments where their existing strategies and roadmaps have been upturned.
FP64 performance is limited on consumer because the US government deems it important to nuclear weapons research.
Past a certain threshold of FP64 throughput, your chip goes in a separate category and is subject to more regulation about who you can sell to and know-your-customer. FP32 does not matter for this threshold.
No mention of the Radeon VII from 2019 where for some unfathomable reason AMD forgot about the segmentation scam and put real FP64 into a gaming GPU. From this 2023 list, it's still faster at FP64 than any other consumer GPU by a wide margin (enterprise GPU's aren't in the list). Scroll all the way to the end.
Let's say X=10% of the GPU area (~75mm^2) is dedicated to FP32 SIMD units. Assume FP64 units are ~2-4x bigger. That would be 150-300mm^2, a huge amount of area that would increase the price per GPU. You may not agree with these assumptions. Feel free to change them. It is an overhead that is replicated per core. Why would gamers want to pay for any features they don't use?
Not to say there isn't market segmentation going on, but FP64 cost is higher for massively parallel processors than it was in the days of high frequency single core CPUs.
this article is so dumb. NVIDIA delivered what the market wanted - gamers dont need FP64, they dont waste silicon on it. now enterprise doesnt want FP64 anymore and they are reducing silicon for it too
weird way to frame delivering exactly what the consumer wants as a big market segmentation fuck the user conspiracy
A question that has been bugging me for a while is what will NVIDIA do with its HPC business? By HPC I mean clusters intended for non-AI related workloads. Are they going to cater to them separetely, or are they going to tell them to just emulate FP64?
While implementing double-precision by double-single may be a solution in some cases, the article fails to mention the overflow/underflow problem, which is critical in scientific/technical computing (a.k.a. HPC).
With the method from the article, the exponent range remains the same as in single precision, instead of being increased to that of double precision.
There are a lot of applications for which such an exponent range would cause far too frequent overflows and underflows. This could be avoided by introducing a lot of carefully-chosen scaling factors in all formulae, but this tedious work would remove the main advantage of floating-point arithmetic, i.e. the reason why computations are not done in fixed-point.
The general solution of this problem is to emulate double-precision with 3 numbers, 2 FP32 for the significand and a third number for the exponent, either a FP number or an integer number, depending on which format is more convenient for a given GPU.
This is possible, but it lowers considerably the achievable ratio between emulated FP64 throughput and hardware FP32 throughput, but the ratio is still better than the vendor-enforced 1:64 ratio.
Nevertheless, for now any small business or individual user can achieve a much better performance per dollar for FP64 throughput by buying Intel Battlemage GPUs, which have a 1:8 FP64/FP32 throughput ratio. This is much better than you can achieve by emulating FP64 on NVIDIA or AMD GPUs.
Intel B580 is a small GPU, so it has only a FP64 throughput about equal to a Ryzen 9 9900X and smaller than a Ryzen 9 9950X. However it provides that throughput at a much lower price. Thus if you start with a PC with a 9900X/9950X, you can double or almost double the FP64 throughput for a low additional price with an Intel GPU. Multiple GPUs will proportionally multiply the throughput.
The sad part is that with the current Intel CEO and with NVIDIA being a shareholder of Intel, it is unclear whether Intel will continue to compete in the GPU market, or they will abandon it, leaving us at the mercy of NVIDIA and AMD, which both refuse to provide products with good FP64 support to small businesses and individual users.
Content is technical/commercial analysis with no observable connection to UDHR preamble dignity, equality, or inalienable rights themes.
ND
Article 1Freedom, Equality, Brotherhood
No observable editorial or structural content addressing human dignity or equal rights.
ND
Article 2Non-Discrimination
No observable content addressing discrimination or distinction on any enumerated basis.
ND
Article 3Life, Liberty, Security
No observable content addressing life, liberty, or security of person.
ND
Article 4No Slavery
No observable content addressing slavery or servitude.
ND
Article 5No Torture
No observable content addressing torture or cruel treatment.
ND
Article 6Legal Personhood
No observable content addressing right to recognition as person before law.
ND
Article 7Equality Before Law
No observable content addressing equal protection before law.
ND
Article 8Right to Remedy
No observable content addressing remedy for rights violations.
ND
Article 9No Arbitrary Detention
No observable content addressing arbitrary arrest or detention.
ND
Article 10Fair Hearing
No observable content addressing fair trial or due process.
ND
Article 11Presumption of Innocence
No observable content addressing criminal liability or presumption of innocence.
ND
Article 12Privacy
No observable content addressing privacy, family, home, or correspondence.
ND
Article 13Freedom of Movement
No observable content addressing freedom of movement or residence.
ND
Article 14Asylum
No observable content addressing asylum or refuge.
ND
Article 15Nationality
No observable content addressing nationality or citizenship.
ND
Article 16Marriage & Family
No observable content addressing marriage, family, or property rights.
ND
Article 17Property
No observable content addressing property rights or deprivation thereof.
ND
Article 18Freedom of Thought
No observable content addressing freedom of thought, conscience, or religion.
+0.30
Article 19Freedom of Expression
Medium A: Advocacy for technical information accessibility F: Framing of hardware constraints as market segmentation strategy
Editorial
+0.30
Structural
ND
SETL
ND
Combined
ND
Context Modifier
ND
Content presents technical analysis of GPU market segmentation with implicit advocacy for transparency regarding artificial performance limitations on consumer hardware. Discusses how contractual restrictions (EULA) replaced implicit technical segmentation—noting the 'divisive move' signals mild editorial lean toward information freedom and access transparency. However, content is primarily descriptive technical analysis rather than advocacy for freedom of opinion/expression itself.
ND
Article 20Assembly & Association
No observable content addressing freedom of assembly or association.
ND
Article 21Political Participation
No observable content addressing political participation or democracy.
ND
Article 22Social Security
No observable content addressing social security or welfare rights.
+0.20
Article 23Work & Equal Pay
Low A: Implicit advocacy for market accessibility and barrier removal
Editorial
+0.20
Structural
ND
SETL
ND
Combined
ND
Context Modifier
ND
Content discusses how market segmentation artificially restricts access to computational capabilities for researchers, startups, and hobbyists. Implicit framing that restrictions on consumer GPU datacenter use limits economic opportunity and work access. However, content does not directly address work rights, fair wages, or conditions of employment—connection is tangential.
ND
Article 24Rest & Leisure
No observable content addressing rest, leisure, or reasonable working hours.
ND
Article 25Standard of Living
No observable content addressing standard of living, health, or social services.
ND
Article 26Education
No observable content addressing education or its free/equal character.
+0.20
Article 27Cultural Participation
Low A: Implicit advocacy for scientific research accessibility
Editorial
+0.20
Structural
ND
SETL
ND
Combined
ND
Context Modifier
ND
Content documents technical progress in GPU computing and emulation schemes that enable research on consumer hardware. Discussion of AI boom enabling 'researchers, startups, and hobbyists' to conduct meaningful computational work carries implicit lean toward access to scientific/technological progress. However, this is framing of technical capability rather than direct advocacy for cultural or scientific participation rights.
ND
Article 28Social & International Order
No observable content addressing social/international order or realization of rights.
ND
Article 29Duties to Community
No observable content addressing duties to community or limitations on rights.
ND
Article 30No Destruction of Rights
No observable content addressing interpretation or destruction of rights.