+0.14 How Taalas “prints” LLM onto a chip?

Name: HRCB Evaluation: How Taalas “prints” LLM onto a chip?
Item: How Taalas “prints” LLM onto a chip?
Rating: 0.145
Author: HN HRCB

Y	HN HRCB new \| past \| comments \| ask \| show \| by right \| domains \| dashboard \| about hrcb

+0.14	How Taalas “prints” LLM onto a chip? (www.anuragk.com)
	425 points by beAroundHere 3 days ago \| 256 comments on HN \| Mild positive Editorial · vv3.4 · 2026-02-24

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Weighted Mean	+0.14	Unweighted Mean	+0.13
Max	+0.19 Article 19	Min	+0.10 Article 2
Signal	3	No Data	28
Confidence	3%	Volatility	0.04 (Low)
Negative	0	Channels	E: 0.6 S: 0.4
SETL	+0.08	Editorial-dominant

Evidence: High: 0 Medium: 1 Low: 2 No Data: 28

Theme Radar

Domain Context Profile

Element	Modifier	Affects	Note
Privacy	—		No privacy policy observable on provided content.
Terms of Service	—		No terms of service observable on provided content.
Accessibility	+0.05	Article 2 Article 19	Blog uses semantic HTML structure with proper heading hierarchy, code formatting, and image support. Modest positive signal for information accessibility.
Mission	—		No explicit mission statement observable.
Editorial Code	+0.02	Article 19	Content appears to aim for clarity and explanation. Modest positive signal for freedom of expression in technical communication.
Ownership	—		Individual blog by Anurag K. No corporate ownership concerns identified.
Access Model	+0.03	Article 19	Content appears freely accessible without paywalls or registration. Mild positive signal for information access.
Ad/Tracking	—		No advertising or tracking mechanisms observable in provided content.

HN Discussion 20 top-level comments

rustyhancock 2026-02-22 06:18 UTC link

Edit: reading the below it looks like I'm quite wrong here but I've left the comment...

The single transistor multiply is intriguing.

Id assume they are layers of FMA operating in the log domain.

But everything tells me that would be too noisy and error prone to work.

On the other hand my mind is completely biased to the digital world.

If they stay in the log domain and use a resistor network for multiplication, and the transistor is just exponentiating for the addition that seems genuinely ingenious.

Mulling it over, actually the noise probably doesn't matter. It'll average to 0.

It's essentially compute and memory baked together.

I don't know much about the area of research so can't tell if it's innovative but it does seem compelling!

Hello9999901 2026-02-22 06:18 UTC link

This would be a very interesting future. I can imagine Gemma 5 Mini running locally on hardware, or a hard-coded "AI core" like an ALU or media processor that supports particular encoding mechanisms like H.264, AV1, etc.

Other than the obvious costs (but Taalas seems to be bringing back the structured ASIC era so costs shouldn't be that low [1]), I'm curious why this isn't getting much attention from larger companies. Of course, this wouldn't be useful for training models but as the models further improve, I can totally see this inside fully local + ultrafast + ultra efficient processors.

[1] https://en.wikipedia.org/wiki/Structured_ASIC_platform

owenpalmer 2026-02-22 06:29 UTC link

> Kinda like a CD-ROM/Game cartridge, or a printed book, it only holds one model and cannot be rewritten.

Imagine a slot on your computer where you physically pop out and replace the chip with different models, sort of like a Nintendo DS.

rustybolt 2026-02-22 06:51 UTC link

Note that this doesn't answer the question in the title, it merely asks it.

abrichr 2026-02-22 07:14 UTC link

ChatGPT Deep Research dug through Taalas' WIPO patent filings and public reporting to piece together a hypothesis. Next Platform notes at least 14 patents filed [1]. The two most relevant:

"Large Parameter Set Computation Accelerator Using Memory with Parameter Encoding" [2]

"Mask Programmable ROM Using Shared Connections" [3]

The "single transistor multiply" could be multiplication by routing, not arithmetic. Patent [2] describes an accelerator where, if weights are 4-bit (16 possible values), you pre-compute all 16 products (input x each possible value) with a shared multiplier bank, then use a hardwired mesh to route the correct result to each weight's location. The abstract says it directly: multiplier circuits produce a set of outputs, readable cells store addresses associated with parameter values, and a selection circuit picks the right output. The per-weight "readable cell" would then just be an access transistor that passes through the right pre-computed product. If that reading is correct, it's consistent with the CEO telling EE Times compute is "fully digital" [4], and explains why 4-bit matters so much: 16 multipliers to broadcast is tractable, 256 (8-bit) is not.

The same patent reportedly describes the connectivity mesh as configurable via top metal masks, referred to as "saving the model in the mask ROM of the system." If so, the base die is identical across models, with only top metal layers changing to encode weights-as-connectivity and dataflow schedule.

Patent [3] covers high-density multibit mask ROM using shared drain and gate connections with mask-programmable vias, possibly how they hit the density for 8B parameters on one 815mm2 die.

If roughly right, some testable predictions: performance very sensitive to quantization bitwidth; near-zero external memory bandwidth dependence; fine-tuning limited to what fits in the SRAM sidecar.

Caveat: the specific implementation details beyond the abstracts are based on Deep Research's analysis of the full patent texts, not my own reading, so could be off. But the abstracts and public descriptions line up well.

[1] https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

[2] https://patents.google.com/patent/WO2025147771A1/en

[3] https://patents.google.com/patent/WO2025217724A1/en

[4] https://www.eetimes.com/taalas-specializes-to-extremes-for-e...

londons_explore 2026-02-22 07:28 UTC link

So why only 30,000 tokens per second?

If the chip is designed as the article says, they should be able to do 1 token per clock cycle...

And whilst I'm sure the propagation time is long through all that logic, it should still be able to do tens of millions of tokens per second...

cpldcpu 2026-02-22 07:42 UTC link

I wonder how well this works with MoE architectures?

For dense LLMs, like llama-3.1-8B, you profit a lot from having all the weights available close to the actual multiply-accumulate hardware.

With MoE, it is rather like a memory lookup. Instead of a 1:1 pairing of MACs to stored weights, you suddenly are forced to have a large memory block next to a small MAC block. And once this mismatch becomes large enough, there is a huge gain by using a highly optimized memory process for the memory instead of mask ROM.

At that point we are back to a chiplet approach...

punnerud 2026-02-22 08:04 UTC link

Could we all get bigger FPGAs and load the model onto it using the same technique?

thesz 2026-02-22 08:06 UTC link

8B coefficients are packed into 53B transistors, 6.5 transistors per coefficient. Two-inputs NAND gate takes 4 transistors and register takes about the same. One coefficient gets processed (multiplied by and result added to a sum) with less than two two-inputs NAND gates.

I think they used block quantization: one can enumerate all possible blocks for all (sorted) permutations of coefficients and for each layer place only these blocks that are needed there. For 3-bit coefficients and block size of 4 coefficients only 330 different blocks are needed.

Matrices in the llama 3.1 are 4096x4096, 16M coefficients. They can be compressed into only 330 blocks, if we assume that all coefficients' permutations are there, and network of correct permutations of inputs and outputs.

Assuming that blocks are the most area consuming part, we have block's transistor budget of about 250 thousands of transistors, or 30 thousands of 2-inputs NAND gates per block.

250K transistors per block * 330 blocks / 16M transistors = about 5 transistors per coefficient.

Looks very, very doable.

It does look doable even for FP4 - these are 3-bit coefficients in disguise.

brainless 2026-02-22 09:34 UTC link

If we can print ASIC at low cost, this will change how we work with models.

Models would be available as USB plug-in devices. A dense < 20B model may be the best assistant we need for personal use. It is like graphic cards again.

I hope lots of vendors will take note. Open weight models are abundant now. Even at a few thousand tokens/second, low buying cost and low operating cost, this is massive.

m101 2026-02-22 09:39 UTC link

So if we assume this is the future, the useful life of many semiconductors will fall substantially. What part of the semiconductor supply chain would have pricing power in a world of producing many more different designs?

Perhaps mask manufacturers?

briansm 2026-02-22 10:20 UTC link

I wonder if you could use the same technique (RAM models as ROM) for something like Whisper Speech-to-text, where the models are much smaller (around a Gigabyte) for a super-efficient single-chip speech recognition solution with tons of context knowledge.

kioku 2026-02-22 11:21 UTC link

I’m just wondering how this translates to computer manufacturers like Apple. Could we have these kinds of chips built directly into computers within three years? With insanely fast, local on-demand performance comparable to today’s models?

peteforde 2026-02-22 12:29 UTC link

I would appreciate some clarification on the "store 4 bits of data with one transistor" part.

This doesn't sound remotely possible, but I am here to be convinced.

bsenftner 2026-02-22 12:35 UTC link

I'm surprised people are surprised. Of course this is possible, and of course this is the future. This has been demonstrated already: why do you think we even have GPUs at all?! Because we did this exact same transition from running in software to largely running in hardware for all 2D and 3D Computer Graphics. And these LLMs are practically the same math, it's all just obvious and inevitable, if you're paying attention to what we have, what we do to have what we have.

qoez 2026-02-22 12:58 UTC link

> It took them two months, to develop chip for Llama 3.1 8B. In the AI world where one week is a year, it's super slow. But in a world of custom chips, this is supposed to be insanely fast.

LLama 3.1 is like 2 years at this point. Taking two months to convert a model that only updates every 2 years is very fast

ramshanker 2026-02-22 14:37 UTC link

I can imagine, where this becomes a mainstream PCIe extension card. Like back in days we had separate graphics card, audio card etc. Now AI card. So to upgrade the PC to latest model, we could buy a new card, load up the drivers and boom, intelligence upgrade of the PC. This would be so cool.

atentaten 2026-02-22 15:26 UTC link

Does this mean computer boards will someday have one or more slots for an AI chip? Or peripheral devices containing AI models, which can be plugged into computer's high speed port?

odyssey7 2026-02-22 17:17 UTC link

Quick! We have to approve all the nuclear plants for AI now, before efficiency from optimization shows up

kop316 2026-02-22 18:21 UTC link

Ohh neat! A generalized version of this was the topic of my PhD dissertation:

https://kilthub.cmu.edu/articles/thesis/Modern_Gate_Array_De...

And they are likely doing something similar to put their LLMs in silicon. I would believe a 10x electricity boost along with it being much faster.

The idea is that you can create a sea of generalized standard cells and it makes for a gate array at the manufacturing layer. This was also done 20 or so years ago, it was called a "structured ASIC".

I'd be curious to see if they use the LUT design of traditional structured ASICs or figured what what I did: you can use standard cells to do the same thing and use regular tools/PDKs to make it.

Score Breakdown

Preamble Preamble

Preamble addresses dignity, equality, and rights of all members of human family. Content is technical explainer with no engagement with these themes.

Article 1 Freedom, Equality, Brotherhood

Article 1 addresses equal dignity and rationality of all human beings. Content does not address human dignity or equality.

+0.10

Article 2 Non-Discrimination

Low Practice

Editorial

Structural

+0.05

SETL

Combined

Context Modifier

Accessible blog structure supports non-discriminatory information access. Minor positive signal from structural accessibility.

Article 3 Life, Liberty, Security

Article 3 addresses right to life, liberty, security. Technical content about chip design contains no relevant engagement.

Article 4 No Slavery

Article 4 addresses slavery and servitude. Content makes no statements or structural implications regarding these issues.

Article 5 No Torture

Article 5 addresses torture and cruel punishment. Content contains no relevant engagement.

Article 6 Legal Personhood

Article 6 addresses right to recognition before law. Content does not address legal personhood or status.

Article 7 Equality Before Law

Article 7 addresses equal protection before law. Content makes no statements regarding legal equality or discrimination.

Article 8 Right to Remedy

Article 8 addresses remedies for rights violations. Content does not engage with justice or remedy themes.

Article 9 No Arbitrary Detention

Article 9 addresses arbitrary arrest and detention. Content contains no relevant engagement.

Article 10 Fair Hearing

Article 10 addresses fair and public hearings by independent tribunal. Content does not address judicial processes.

Article 11 Presumption of Innocence

Article 11 addresses presumption of innocence and criminal liability. Content does not engage with criminal justice.

Article 12 Privacy

Article 12 addresses privacy, family, home, and correspondence. Content does not address privacy rights or personal matters.

Article 13 Freedom of Movement

Article 13 addresses freedom of movement and residence. Content does not engage with mobility or movement rights.

Article 14 Asylum

Article 14 addresses right to seek and enjoy asylum. Content does not address asylum or refuge.

Article 15 Nationality

Article 15 addresses right to nationality. Content does not engage with citizenship or nationality issues.

Article 16 Marriage & Family

Article 16 addresses marriage and family rights. Content does not address family or marital status.

Article 17 Property

Article 17 addresses property rights. Content discusses technology but not property ownership or deprivation.

Article 18 Freedom of Thought

Article 18 addresses freedom of thought, conscience, and religion. Content does not engage with ideological or religious matters.

+0.19

Article 19 Freedom of Expression

Medium Advocacy Practice

Editorial

+0.15

Structural

+0.08

SETL

+0.10

Combined

Context Modifier

Content exemplifies freedom to seek, receive, and impart information through detailed technical explanation. Author openly discusses research process, sources (LocalLLaMA, blogs), and reasoning. Blog structure with back-links supports information flow. Mild positive signal.

Article 20 Assembly & Association

Article 20 addresses freedom of peaceful assembly and association. Content does not address collective assembly or political association.

Article 21 Political Participation

Article 21 addresses right to participate in government. Content does not engage with political participation or governance.

Article 22 Social Security

Article 22 addresses social security and economic rights. Content does not address welfare, insurance, or social provision.

Article 23 Work & Equal Pay

Article 23 addresses work rights, fair wages, and trade unions. Content does not engage with labor or employment issues.

Article 24 Rest & Leisure

Article 24 addresses rest and leisure. Content does not address work-life balance or leisure rights.

Article 25 Standard of Living

Article 25 addresses standard of living and health. Content does not address healthcare, nutrition, or living standards.

+0.11

Article 26 Education

Low Practice Framing

Editorial

+0.08

Structural

+0.04

SETL

+0.06

Combined

Context Modifier

Content implicitly supports education through clear technical explanation and knowledge dissemination. Author provides learning pathway from basics to advanced concepts. Free accessibility supports educational access. Mild positive signal.

Article 27 Cultural Participation

Article 27 addresses participation in cultural life and protection of intellectual property. Content does not address cultural participation or IP rights.

Article 28 Social & International Order

Article 28 addresses social and international order. Content does not engage with systemic order or international relations.

Article 29 Duties to Community

Article 29 addresses community duties and restrictions on rights. Content does not address social duties or limitations on freedoms.

Article 30 No Destruction of Rights

Article 30 addresses prohibition of rights destruction. Content does not address attacks on or negation of human rights.

build fc56cf0+0q5s · 2026-02-25 01:32 UTC