0.00 55 GiB/s FizzBuzz (codegolf.stackexchange.comS:ND)
1260 points by luu 1583 days ago | 255 comments on HN | Neutral Community · v3.7 · 2026-02-28 10:50:01
Summary Technical Content (Non-UDHR) Neutral
This Stack Exchange Code Golf page presents a technical programming challenge on optimizing FizzBuzz throughput. The content contains no observable editorial engagement with UDHR provisions. Structurally, the platform exhibits mild positive signals for Articles 19 (freedom of expression) and 27 (participation in cultural life) through its open discussion features and inclusive international challenge design.
Article Heatmap
Preamble: ND — Preamble Preamble: No Data — Preamble P Article 1: ND — Freedom, Equality, Brotherhood Article 1: No Data — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: ND — Life, Liberty, Security Article 3: No Data — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: ND — Equality Before Law Article 7: No Data — Equality Before Law 7 Article 8: ND — Right to Remedy Article 8: No Data — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: ND — Privacy Article 12: No Data — Privacy 12 Article 13: ND — Freedom of Movement Article 13: No Data — Freedom of Movement 13 Article 14: ND — Asylum Article 14: No Data — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: ND — Marriage & Family Article 16: No Data — Marriage & Family 16 Article 17: ND — Property Article 17: No Data — Property 17 Article 18: ND — Freedom of Thought Article 18: No Data — Freedom of Thought 18 Article 19: ND — Freedom of Expression Article 19: No Data — Freedom of Expression 19 Article 20: ND — Assembly & Association Article 20: No Data — Assembly & Association 20 Article 21: ND — Political Participation Article 21: No Data — Political Participation 21 Article 22: ND — Social Security Article 22: No Data — Social Security 22 Article 23: ND — Work & Equal Pay Article 23: No Data — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: ND — Standard of Living Article 25: No Data — Standard of Living 25 Article 26: ND — Education Article 26: No Data — Education 26 Article 27: ND — Cultural Participation Article 27: No Data — Cultural Participation 27 Article 28: ND — Social & International Order Article 28: No Data — Social & International Order 28 Article 29: ND — Duties to Community Article 29: No Data — Duties to Community 29 Article 30: ND — No Destruction of Rights Article 30: No Data — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean ND Structural Mean ND
Weighted Mean 0.00 Unweighted Mean 0.00
Max 0.00 N/A Min 0.00 N/A
Signal 0 No Data 31
Confidence 4% Volatility 0.00 (Low)
Negative 0 Channels E: 0.6 S: 0.4
SETL ND
FW Ratio 81% 13 facts · 3 inferences
Evidence: High: 0 Medium: 2 Low: 0 No Data: 29
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.00 (0 articles) Security: 0.00 (0 articles) Legal: 0.00 (0 articles) Privacy & Movement: 0.00 (0 articles) Personal: 0.00 (0 articles) Expression: 0.00 (0 articles) Economic & Social: 0.00 (0 articles) Cultural: 0.00 (0 articles) Order & Duties: 0.00 (0 articles)
HN Discussion 20 top-level · 30 replies
jtchang 2021-10-28 20:51 UTC link
The amount of low level CPU architecture knowledge to write such a program is mind boggling. Just goes to show how much room for improvement a lot of programs have.
no_time 2021-10-28 20:58 UTC link
next up: custom FizzBuzz ASIC with 128 fizz and 128 buzz cores for maximum throughput.
dmitrygr 2021-10-28 21:05 UTC link
The amount of completely unnecessary effort unleashed on this problem in this post is amazing. I want to meet the author and shake his hand! It has everything from Linux kernel trivia in terms of output speed to intel AVX2 code. So unnecessary and so awesome!

I've done stuff like this before, and I imagine the satisfaction of completing it! A tip of my hat to you, sir!

m0ck 2021-10-28 21:14 UTC link
Thanks for my daily dose of software engineer imposter syndrome.
fnord77 2021-10-28 21:23 UTC link

    > // Most FizzBuzz routines produce output with `write` or a similar
    > // system call, but these have the disadvantage that they need to copy
    > // the data being output from userspace into kernelspace. It turns out
    > // that when running full speed (as seen in the third phase), FizzBuzz
    > // actually runs faster than `memcpy` does, so `write` and friends are
    > // unusable when aiming for performance - this program runs five times
    > // faster than an equivalent that uses `write`-like system calls.
Why can't `write` use a reference like vmsplice?
nobrains 2021-10-28 21:24 UTC link
Did anyone else get recommended and see the FizzBuzz video on YouTube ( https://youtu.be/QPZ0pIK_wsc ) just before this article or did I just witness an incredible coincidence?
jerf 2021-10-28 21:45 UTC link
"The Grid. A digital frontier. I tried to picture clusters of information as they moved through the computer. What did they look like? Ships, motorcycles? Were the circuits like freeways? I kept dreaming of a world I thought I'd never see. And then, one day I got in...

It turns out The Grid is just a guy sitting in a chair, shouting about "Fizz!" and "Buzz!" as fast as he can.

It wasn't really what I had in mind.

(The image of this poor program, stuck shouting "fizz!" and "buzz!" for subjectively centuries at a time struck me...)

savant_penguin 2021-10-28 22:00 UTC link
"future-proofed where possible to be able to run faster if the relevant processor bottleneck – L2 cache write speed – is ever removed)."

Being able to write a function limited mostly by the l2 cache size and able to realize that is rad

And btw this is an interesting example of how hand optimized assembly can be much much faster than any other solution. Can you get as fast as this solution with mostly C/C++? It uses interesting tricks to avoid memcopy (calling it slow rofl)

kens 2021-10-28 22:13 UTC link
You could probably get some blazing performance out of an FPGA. I made an FPGA version of FizzBuzz a few years ago, but it was optimized for pointless VGA animation rather than performance.

https://www.righto.com/2018/04/fizzbuzz-hard-way-generating-...

ot 2021-10-28 22:24 UTC link
To me it's almost as impressive that pv does not become the bottleneck.
HALtheWise 2021-10-28 22:26 UTC link
Now I'm kinda curious to see how much faster you could go on an M1 Max with the GPU generating the data. Once his solution gets to the point of being a bytecode interpreter, it's trivially paralellizable and the M1 has _fantastic_ memory bandwidth. Does anyone know if the implementation of pv or /dev/null actually requires loading the data into CPU cache?
kvathupo 2021-10-28 22:49 UTC link
> 64 bytes of FizzBuzz per 4 clock cycles

That's borderline incredulous, given a single AVX2 instruction can last multiple clock-cycles. The reciprocal throughput also doesn't go below ~0.3 to my, admittedly shallow, knowledge. A remarkable piece of engineering!

booleandilemma 2021-10-28 23:27 UTC link
Wow. There is programming and then there is programming. Whenever I see something like this I feel like what I do for a living is duplo blocks in comparison.
brokenmachine 2021-10-28 23:35 UTC link
I love it that humans are so diverse - there's always someone prepared to dedicate their life to the most obscure things.

It's so impressive and hilarious to me that he actually spent months on this. Well done!!

mastax 2021-10-28 23:42 UTC link
Only getting 7GiB/s on a Ryzen 7 1800X w/DDR4 3000. Zen 1 executes AVX2 instructions at half speed, but that doesn't account for all of the difference. I guess I need a new computer. To run FizzBuzz.
_fjb4 2021-10-29 00:31 UTC link
As an aside, it would be fun to see a programming challenge website focused on performance and optimizations, rather than code golf (shortest program) or edge case correctness (interview type sites). I took a course like this in uni where we learned low-level optimization and got to compete with other classmates to see who had the fastest program - a fun experience that I don't really see much of online.
wly_cdgr 2021-10-29 00:35 UTC link
Literal 10x programmer moment
srcreigh 2021-10-29 05:17 UTC link

    ///// Third phase of output
    //
    // This is the heart of this program. It aims to be able to produce a
    // sustained output rate of 64 bytes of FizzBuzz per four clock cycles
    // in its main loop (with frequent breaks to do I/O, and rare breaks
    // to do more expensive calculations).
    //
    // The third phase operates primarily using a bytecode interpreter; it
    // generates a program in "FizzBuzz bytecode", for which each byte of
    // bytecode generates one byte of output. The bytecode language is
    // designed so that it can be interpreted using SIMD instructions; 32
    // bytes of bytecode can be loaded from memory, interpreted, and have
    // its output stored back into memory using just four machine
    // instructions.
sireat 2021-10-29 10:04 UTC link
The amount of optimizations remind me of story of Mel: http://www.catb.org/jargon/html/story-of-mel.html
e12e 2021-10-29 10:05 UTC link
But does it support arbitrarily large integers? ;)

Ed: no, but does pretty well:

> The program outputs a quintillion lines of FizzBuzz and then exits (going further runs into problems related to the sizes of registers). This would take tens of years to accomplish, so hopefully counts as "a very high astronomical number" (although it astonishes me that it's a small enough timespan that it might be theoretically possible to reach a number as large as a quintillion without the computer breaking).

loser777 2021-10-28 21:03 UTC link
FizzBuzz has many properties that make it very suitable for these kinds of optimizations that might not be applicable to general purpose code: + extremely small working set (a few registers worth of state) + extremely predictable branching behavior + no I/O

These properties however don't diminish the achievement of leveraging AVX-2 (or any vectorization) for a problem that doesn't immediately jump out as SIMD.

mordechai9000 2021-10-28 21:03 UTC link
I think optimizing binary tree inversion is a higher priority, right now.
function_seven 2021-10-28 21:06 UTC link
Substandard design. You should have a suitable ratio of cores to keep them load balanced. I suggest 160 fizz cores and 96 buzz cores.

EDIT: And a fizzbuzz co-processor, obviously.

randy408 2021-10-28 21:10 UTC link
FizzBuzzCoin?
wtallis 2021-10-28 21:11 UTC link
Or if your ASIC generalizes the problem in a slightly different direction, you end up reinventing TWINKLE and TWIRL: https://en.wikipedia.org/wiki/TWINKLE
tehjoker 2021-10-28 21:34 UTC link
I'm not an assembly programmer, but my intuition is that this is safer. You can only rely on "zero-copy" behavior when you have total control of the program and know that that memory region is going to stay put and remain uncorrupted. Therefore, most external code will make a copy because they can't insist on this (and because for most people, making a copy is pretty fast).
andkon 2021-10-28 21:39 UTC link
I was very grateful to see this disclaimer:

> I've spent months working on this program

jedberg 2021-10-28 21:49 UTC link
I kinda hope as a joke in Tron 3 they have a guy muttering in the corner "1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz" as someone walks by.
nonameiguess 2021-10-28 21:58 UTC link
Could probably store all multiples of 3 and 5 up to some really big number burned directly to hardware and then do something like a big CAM table the way Internet core routers do mapping the numbers to ASCII bit strings. Then speedup IO by not having a general purpose display, but something more like an old-school digital clock that can only show digits and the words "fizz" and "buzz."
BeeOnRope 2021-10-28 22:12 UTC link
You could probably to very close to this solution with C or C++, plus AVX intrinsics. Some might consider that "cheating" since intrinsics occupy kind of a grey area between a higher level language and asm.
tigershark 2021-10-28 22:22 UTC link
Still better than throwing away a significant percentage of the world total power output just to compute some useless hashes if you ask me..
NextHendrix 2021-10-28 22:27 UTC link
When I saw this I did wonder how much faster I could do it in hardware, but similarly I expect the bottleneck would be outputting it in a fashion that would be considered fair to be compared to software implementations (eg outputting readable text on a screen).

Regardless, I very much enjoyed your DVD screensaver-esque output.

jason0597 2021-10-28 22:37 UTC link
Couldn't we load it onto an NVIDIA RTX A6000? It is much much faster than the M1 Max. It has a much greater memory bandwith too
eyelidlessness 2021-10-28 23:27 UTC link
This is one where I’m fully comfortable with feeling like an impostor. I’ve gotten this far (~20 years) without more than a cursory glance at machine code, I’ll be pleased if I retire at the same level of relevant expertise.

Edit: don’t get me wrong! I admire the talent that goes into this and similar efforts, and find performance-chasing particularly inspiring in any context. This is just an area of that which I don’t anticipate ever wanting to tread myself.

gpderetta 2021-10-28 23:32 UTC link
I was wondering the same thing! pv probably never touches its input and might itself be using splice to never read the bytes in users pace and just accumulate the byte counts.

Edit: yes it has a --no-splice parameter.

dgunay 2021-10-28 23:43 UTC link
I am thankful every day for the work of those who came before me. Their long hours toiling over hardware, assembly, compilers, protocol stacks, libraries, frameworks, etc let us focus on the bigger picture, how to write same-y CRUD apps for businesses :)
gpderetta 2021-10-28 23:44 UTC link
That's 16 bytes per clock cycle, i.e. one avx register per clock cycle. As most intel CPUs can only do one store per clock cycle, that's also the theoretical limit with avx. I wonder if using non temporal stores would help make the code be cache size agnostic.

Note that the instruction latency is not important as long as you can pipeline the computation fully (which appear to be the case here!).

Edit: to go faster you would need to be able to use the bandwidth of more than one cpu. I wonder if you could precompute were the output will cross a page to be able to have distinct cores work on distinct pages... Hum I need a notepad.

Edit2: it is much simpler than that, you do not need to fill a page to vmsplice it. So in principle the code should parallelize quite trivially. Have each tread grab, say 1M numbers at a time, for example by atomically incrementing counter, serialize them to a bunch of pages, then grab the next batch. A simple queue will hold the ready batches that can be spliced as needed either by a service thread or opportunistically by the thread that has just finished the batch next in sequence.

gpderetta 2021-10-28 23:50 UTC link
Pv never touches the data. It splices the input into the output, /dev/null in this case, so once written, a page is never touched again.

Splice is linux specific though, so you would need to run it on M1.

monktastic1 2021-10-28 23:56 UTC link
> incredulous

FYI "incredible" might be a better word here. "Incredulous" would mean that it finds something incredible.

jjeaff 2021-10-29 00:11 UTC link
Don't worry. It just seems like everyone else is so talented because no one writes articles about the 2 hrs they spent just trying to get their project to just build without errors. Or if they do, they don't get voted to the top of HN.
mhh__ 2021-10-29 00:37 UTC link
Benchmarking reliably and fairly for minimal cost is really hard in a cloud environment.
iruoy 2021-10-29 00:58 UTC link
I ran this on a server with a 5950X (same CPU as this test was run on), but with 2666MHz memory (128GB) instead of 3600MHz memory (32GB) and I only got 41GB/s.
throwamon 2021-10-29 01:04 UTC link
I haven't been on YouTube today, but I literally JUST got out of the shower and I was thinking about FizzBuzz while there.

But yeah, frequency illusion.

irjustin 2021-10-29 01:08 UTC link
Would you be willing to back and try to optimize this for pure output speed?

I thought the exact same question, but wondered if FPGA's gate connections are too distant for FizzBuzz to beat 55 GiB/s.

ehsankia 2021-10-29 03:20 UTC link
Yeah, the fact that it's a whole order of magnitude faster than all the other, even assembly based solutions, is insane.
xtracto 2021-10-29 03:31 UTC link
Ive had the opportunity to tinker with ASM, z80 architecture, low level programming and other similar stuff (I'm less than 1/1000th able as the referenced author).

I find this programming it very beautiful and rewarding in that you really know that you are programming the hardware. Unfortunately it's not an easy path to get a good paying job (unless you are exceptional like the gentleman). So I ended up building fintech web apps.

MH15 2021-10-29 04:05 UTC link
This is called "specialization" and it's the only reason we've gotten so far as a species. Not everyone can or should know this level in any subject.
ummonk 2021-10-29 04:23 UTC link
GPU memory is an order of magnitude higher bandwidth than RAM, so that would seem to me to be the way to go to beat this. The output wouldn’t be accessible from CPU without a big slowdown though.
haliskerbas 2021-10-29 05:30 UTC link
And imagine not coming up with this solution in your next MANGA interview!
Editorial Channel
What the content says
ND
Preamble Preamble

No editorial engagement with fundamental human rights or human dignity principles.

ND
Article 1 Freedom, Equality, Brotherhood

No commentary on equal dignity or inherent rights.

ND
Article 2 Non-Discrimination

No discussion of discrimination or non-discrimination.

ND
Article 3 Life, Liberty, Security

Not applicable to technical content.

ND
Article 4 No Slavery

Not applicable.

ND
Article 5 No Torture

Not applicable.

ND
Article 6 Legal Personhood

Not applicable.

ND
Article 7 Equality Before Law

Not applicable.

ND
Article 8 Right to Remedy

No discussion of privacy rights.

ND
Article 9 No Arbitrary Detention

Not applicable.

ND
Article 10 Fair Hearing

Not applicable.

ND
Article 11 Presumption of Innocence

Not applicable.

ND
Article 12 Privacy

Not applicable.

ND
Article 13 Freedom of Movement

Not applicable.

ND
Article 14 Asylum

Not applicable.

ND
Article 15 Nationality

Not applicable.

ND
Article 16 Marriage & Family

Not applicable.

ND
Article 17 Property

Not applicable.

ND
Article 18 Freedom of Thought

Not applicable.

ND
Article 19 Freedom of Expression
Medium Practice

No editorial commentary on freedom of opinion or expression.

ND
Article 20 Assembly & Association

Not applicable.

ND
Article 21 Political Participation

Not applicable.

ND
Article 22 Social Security

Not applicable.

ND
Article 23 Work & Equal Pay

Not applicable.

ND
Article 24 Rest & Leisure

Not applicable.

ND
Article 25 Standard of Living

Not applicable.

ND
Article 26 Education

Not applicable.

ND
Article 27 Cultural Participation
Medium Practice

No explicit discussion of participation in cultural life.

ND
Article 28 Social & International Order

Not applicable.

ND
Article 29 Duties to Community

Not applicable.

ND
Article 30 No Destruction of Rights

Not applicable.

Structural Channel
What the site does
ND
Preamble Preamble

Page structure is standard web platform display; no structural signals regarding human rights foundations.

ND
Article 1 Freedom, Equality, Brotherhood

Platform structure treats all challenge participants equally; no specific structural signal unique to Article 1.

ND
Article 2 Non-Discrimination

Challenge rules apply uniformly to all participants; no observable discriminatory barriers.

ND
Article 3 Life, Liberty, Security

No structural engagement with right to life, liberty, or security.

ND
Article 4 No Slavery

No structural signals regarding slavery or servitude.

ND
Article 5 No Torture

No structural signals regarding torture or cruel treatment.

ND
Article 6 Legal Personhood

No structural signals regarding right to recognition as person.

ND
Article 7 Equality Before Law

No structural signals regarding equality before law.

ND
Article 8 Right to Remedy

Page contains standard web platform tracking and session management; privacy neither prioritized nor egregiously violated.

ND
Article 9 No Arbitrary Detention

No structural signals regarding freedom from arbitrary arrest.

ND
Article 10 Fair Hearing

No structural signals regarding fair and public hearing.

ND
Article 11 Presumption of Innocence

No structural signals regarding criminal liability.

ND
Article 12 Privacy

No structural signals regarding interference with privacy.

ND
Article 13 Freedom of Movement

No structural signals regarding freedom of movement.

ND
Article 14 Asylum

No structural signals regarding asylum.

ND
Article 15 Nationality

No structural signals regarding nationality.

ND
Article 16 Marriage & Family

No structural signals regarding marriage or family.

ND
Article 17 Property

No structural signals regarding property rights.

ND
Article 18 Freedom of Thought

No structural signals regarding freedom of conscience or belief.

ND
Article 19 Freedom of Expression
Medium Practice

Platform structure enables uncensored posting of code and ideas. Challenge rules explicitly welcome diverse solutions and approaches.

ND
Article 20 Assembly & Association

No structural signals regarding freedom of assembly.

ND
Article 21 Political Participation

No structural signals regarding political participation.

ND
Article 22 Social Security

No structural signals regarding social security.

ND
Article 23 Work & Equal Pay

Challenge is voluntary, unpaid participation; no labor rights signals.

ND
Article 24 Rest & Leisure

No structural signals regarding rest or leisure.

ND
Article 25 Standard of Living

No structural signals regarding health or standard of living.

ND
Article 26 Education

No structural signals regarding education.

ND
Article 27 Cultural Participation
Medium Practice

Platform enables programmers to participate in technical culture and share knowledge and achievements. Challenge recognizes authors publicly in leaderboard.

ND
Article 28 Social & International Order

No structural signals regarding social or international order.

ND
Article 29 Duties to Community

No structural signals regarding duties to community.

ND
Article 30 No Destruction of Rights

No structural signals regarding restriction of rights.

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.80 low claims
Sources
0.8
Evidence
0.8
Uncertainty
0.7
Purpose
0.9
Propaganda Flags
No manipulative rhetoric detected
0 techniques detected
Emotional Tone
Emotional character: positive/negative, intensity, authority
measured
Valence
+0.3
Arousal
0.5
Dominance
0.5
Transparency
Does the content identify its author and disclose interests?
0.95
✓ Author
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.91 solution oriented
Reader Agency
0.8
Stakeholder Voice
Whose perspectives are represented in this content?
0.60 3 perspectives
Speaks: individuals
Temporal Framing
Is this content looking backward, at the present, or forward?
prospective unspecified
Geographic Scope
What geographic area does this content cover?
global
Complexity
How accessible is this content to a general audience?
technical high jargon domain specific
Audit Trail 1 entries
2026-02-28 10:50 eval Evaluated by claude-haiku-4-5-20251001: 0.00 (Neutral)