+0.10 Absurd Success

Name: HRCB Evaluation: Absurd Success
Item: Absurd Success
Rating: 0.1
Author: HN HRCB

H	HN HRCB top \| articles \| domains \| dashboard \| models \| factions \| about \| exp

home / www.marginalia.nu / item 37331778

+0.10	Absurd Success (www.marginalia.nu)
	629 points by asicsp 910 days ago \| 160 comments on HN \| Mild positive Editorial · v3.7 ·

Summary Digital Access & Infrastructure Neutral

This technical blog post describes performance optimizations to an independent search engine, including database refactoring and reduced memory requirements. While the content does not explicitly address human rights, the structural choices to maintain accessible, decentralized search infrastructure have a mild positive inclination toward Article 19 (freedom of information and expression).

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Weighted Mean	+0.10	Unweighted Mean	+0.10
Max	+0.10 Article 19	Min	+0.10 Article 19
Signal	1	No Data	30
Confidence	2%	Volatility	0.00 (Low)
Negative	0	Channels	E: 0.6 S: 0.4
SETL	ND
FW Ratio	60%	3 facts · 2 inferences

Evidence: High: 0 Medium: 1 Low: 0 No Data: 30

Theme Radar

HN Discussion 18 top-level · 31 replies

BLKNSLVR 2023-08-31 03:16 UTC link

Totally useless commentary:

It makes me deeply happy to hear success stories like this for a project that's moving in the correctly opposite direction to that of the rest of the world.

Engildification. Of which there should be more!

My soul was also satisfied by the Sleeping At Night post which, along with the recent "Lie Still in Bed" article, makes for very simple options to attempt to fix sleep (discipline) issues.

bomewish 2023-08-31 03:28 UTC link

I enjoyed reading this but I also fundamentally don't get it at a basic level like... why re-implement stuff that has already been done by entire teams? There are so many bigger and productionised search and retrieval systems. Why invest the human capital in doing it all again yourself? I just don't get it.

38 2023-08-31 03:59 UTC link

if I search encoding/json, I get some interesting stuff:

https://search.marginalia.nu/search?query=encoding%2Fjson

but NOT what I am looking for. If I try again with Google:

https://google.com/search?q=encoding%2Fjson

first result is exactly what I want.

nicbou 2023-08-31 04:21 UTC link

I always love seeing marginalia.nu updates here. You are a cherished user on this website, and I hope that you keep posting.

fouc 2023-08-31 04:23 UTC link

Good example of how complexity often engenders complexity. The wrong abstraction might create 100x more work to support it.

gnyman 2023-08-31 04:56 UTC link

On a side note inspired by this blog post.

I'm wondering if humans are mostly incapable of producing great things without (artifical) restrictions.

In this case, marginalia is (ridiculously) efficient because Victor (the creator) is intentionally restricting what hardware it runs on and how much ram it has.

If he just caved in and added another 32GiB it would work for a while, but the inefficient design would persist and the problem would just show it's head later and then there would be more complexity around that design and it might not be as easy to fix then.

If the original thesis is correct, then I think it explains why most software is so bad (bloated, slow, buggy) nowadays. It's because very few individual pieces of software nowadays are hitting any limits (in isolation). So each individual piece is terribly inefficient but with the latest M2 Pro and GiB connection you can just keep ahead of the curve where it becomes a problem.

Anyways, turned into a rant; but the conclusion might be to limit yourself, and you (and e everyone else) will be better off long term.

janvdberg 2023-08-31 05:02 UTC link

I love this as another example where restriction breeds innovation. More often than not this is found not in abundance but in limits.

newman123 2023-08-31 05:03 UTC link

Wonder why SQLite was chosen over a key value store. Seems he wanted reads by id and no other columns so a relational db seems unnecessary?

flexagoon 2023-08-31 08:15 UTC link

By the way, Kagi, the paid search engine you might've seen on HackerNews as well, uses Marginalia as one of its data sources

https://help.kagi.com/kagi/search-details/search-sources.htm...

If you use the "non-commercial" lens, those results, among with results from Kagi's own index and a few other independent sources will be prioritized.

donotsay 2023-08-31 09:22 UTC link

I just tried "Russian sources about Ukraine war", and it returns ukranian sources. So I guess it is of limited use to avoid censorship.

mananaysiempre 2023-08-31 10:08 UTC link

> In brief, every time an SSD updates a single byte anywhere on disk, it needs to erase and re-write that entire page.

Is that actually true for SSDs? For raw flash it’s not, provided you are overwriting “empty” all-ones values or otherwise only changing 1s to 0s. Writing is orders of magnitude slower than reading, but still a couple orders of magnitude faster than erasing (resetting back to “empty”), and only erases count against your wear budget. It sounds like an own goal for an SSD controller to not take advantage of that, although if the actual guts of it are log-structured then I could imagine it not being able to.

csours 2023-08-31 11:32 UTC link

I took a start script from 90 seconds to 30 seconds yesterday, by finding a poorly named timeout value. Now I'm working on a graceful fallback from itimer to alarm instead of outdated c directives.

anyfactor 2023-08-31 11:33 UTC link

Oh thank you. I have been doing a hobby project on search engines, and I kept searching of variations of "Magnolia" for some reason. ""Marginalia"" at least for me is hard to remember. Currently, I am trying to figure my way around Searx.

Does Marginalia support "time filters" for search like past day, past week etc? According the special keywords the only search params accepted is based on years.

  year>2005 (beta) The document was ostensibly published in or after 2005
  year=2005 (beta) The document was ostensibly published in 2005
  year<2005 (beta) The document was ostensibly published in or before 2005

_madmax_ 2023-08-31 13:44 UTC link

I'm happy for you !

jorgeleo 2023-08-31 13:49 UTC link

This baffled me:

"I wish I knew what happened, or how to replicate it. It’s involved more upfront design than I normally do, by necessity. I like feeling my way forward in general, but there are problems where the approach just doesn’t work"

Yes, immediate (or soon enough) gratification feels good... To me, and maybe is because I am an old fart, this is the difference between programming and engineering.

ricardo81 2023-08-31 14:33 UTC link

Just a shout out to my boss at Mojeek who presumably has a very similar path to this (the post resonates a lot with past conversations). Mojeek started back in 2004 and for the most part has been a single developer who built the bones of it, and in that, pretty much all of the IR and infrastructure.

Limitations of finance and hardware, making decisions about 32 vs 64 bit ids, sharding, speed of updating all sound very familiar.

Reminds me of Google way back when and their 'Google dance' that updated results once a month, nowadays it's a daily flux. It's all an evolution, and great to see Marginalia offering another view point into the web beyond big tech.

aidenn0 2023-08-31 18:44 UTC link

Great to read this!

Lots of people treat optimization as some deep-black-magic thing[1], but most of the time, it's actually easier than fixing a typical bug; all you have to do is treat excessive resource usage identical to how you would treat a bug.

I'm going to make an assertion: most bugs that you can easily reproduce don't require wizardry to fix. If you can poke at a bug, then you can usually categorize it. Even the rare bugs that reveal a design flaw tend to do so readily once you can reproduce it.

Software that nobody has taken a critical eye to performance on is like software with 100s of easily reproducible bugs that nobody has ever debugged. You can chip away at them for quite a while until you run into anything that is hard.

1: I think this attitude is a bit of a hold-out from when people would do things like set their branch targets so that the drum head would reach the target at the same time the CPU wanted the instruction, and when resources were so constrained that everything was hand-written assembly with global memory-locations having different semantics depending on the stage the program was in. In that case, really smart people had already taken a critical eye to performance, so you need to find things they haven't found yet. This is rarely true of modern code.

alberth 2023-09-01 21:28 UTC link

TL;DR - SQLite is amazing even at scale.

brutusborn 2023-08-31 03:26 UTC link

Not useless at all, thanks for posting!

I’ve been struggling with sleep this year and finding out what works for others is very useful. I wouldn’t have found it if not for your comment.

Link for others interested: https://www.marginalia.nu/log/86-sleep/

noman-land 2023-08-31 03:31 UTC link

I like this term, engildification.

lbotos 2023-08-31 03:35 UTC link

How does one learn new things if not first by understanding them, and then looking to evolve them?

Sure, we can shell out to libraries and to other people's work, but at some point, you will have to understand the thing that you've abstracted away if you want to evolve it.

Or as the kids say, let him cook: https://knowyourmeme.com/memes/let-him-cook-let-that-boy-coo...

catchnear4321 2023-08-31 03:36 UTC link

someone climbed everest. others tried and didn’t make it. some didn’t make it back. people keep doing it.

plenty of folks dabble in art, many of which quite poorly, when good and even great artwork can be purchased for not all that much.

some paint the walls of their house rather than call a professional painter.

there are countless reasons, including but not limited to the easiest, most flippant, and possibly the most human response to your question.

why not?

BLKNSLVR 2023-08-31 03:40 UTC link

I want to do my own version of something like this to have a personally curated search function. The "it's mine" factor is enticing, if it does something unexpected, then I know all the dependent, interacting parts so I can trace the problem and fix it.

But I'm a privacy and self-hosting nut, which is probably just another way of saying the same thing.

(I will probably never actually do it, but that doesn't stop it being on the list).

keyle 2023-08-31 03:43 UTC link

It's a breath of fresh air to read of someone that

- cut his resources burning in half,

- is more productive with a smaller screen than before, and

- sleeps like a log at night

(his 3 last blog posts!)

EdwardDiego 2023-08-31 04:06 UTC link

Because you enjoy it.

Because you're able to offer a product with differences in the market.

Because you can.

It's not like they're implementing their own DB to get to a MVP of their product "Tinder, but like for dogs".

algas 2023-08-31 04:12 UTC link

Google still kills for useful applications. I use marginalia when I have either A) something deeply obscure I want to research or B) when I just want to read some entertaining long-form text posts about something in a category that interests me.

Stephen_0xFF 2023-08-31 04:15 UTC link

“So it’s a search engine. It’s perhaps not the greatest at finding what you already knew was there. Instead it is designed to help you find some things you didn’t even know you were looking for.”[0]

[0] https://www.marginalia.nu/marginalia-search/about/

gary_0 2023-08-31 04:38 UTC link

I also like how they decided to mix in sqlite alongside the existing MariaDB database because it gets the job done, and "a foolish consistency is the hobgoblin of little minds".

MichaelZuo 2023-08-31 05:19 UTC link

American Airlines ran SABRE, a sizeable airline ticketing and reservation system, in the mid-1970s on two system/360 mainframes that could only process a few tens of millions of instructions per second.

A raspberry pi 2 can do over 4 billion Dhrystone instructions per second, and a pi 4 over 10 billion per second.

Of course by modern standards mid-1970s SABRE was pretty barebones for an airline's main system, but it's at least theoretically possible to run simplified systems for over a 100 airlines simultaneously on a single pi 2...

So yes modern programs are very far from optimized. 1000x or 10 000x improvements are possible, less for math heavy stuff.

nottheengineer 2023-08-31 05:41 UTC link

Good point about not hitting limits individually.

I think microsoft has a huge problem with this. Even 3000$ laptops from 5 years ago struggle with running a teams call, some office instances and a browser with 30 tabs at the same time without slowing down to unacceptable levels.

They test stuff individually and running one thing alone is fine, but that's not what people do.

I'd imagine that artificial limits in the form of run time on well-defined hardware that are only raised after an explicit decision could be the solution to this.

But then again I only write business software where the performance aspect comes down to "don't do stupid shit with the database and don't worry about the rest because the client won't pay for those worries", so I might be on the wrong track entirely.

pjerem 2023-08-31 06:14 UTC link

Oh yes ! That’s my pet theory too.

I think it’s why old computers felt good and also why old games were so good.

Maybe it have something to do with the complexity of the systems we deal with.

When you have a restricted amount of some resource (RAM, physical space, food, materials, time, money …) you have to plan how you will use it. You are forced to be smart.

When you have a virtually infinite resource, you can make whatever you feel making but you don’t have to really care about the final state, you just start and you’ll see when it will work.

I’m not exactly a true gamer, but I’ve always been amazed by the fact humans were capable to store so much emotion, adventures and time to enjoy in the good old cartridges with some kb/mb of rom. I mean, Ocarina Of Time rom is just the size of the last 8 photos I took with my iPhone.

crote 2023-08-31 06:23 UTC link

It is mostly a matter of priorities.

For most applications it simply does not make any sense to spend this much time on relatively small optimizations. If you can choose to either buy 32GiB of RAM for your server for less than $50 or spend probably over 40 hours of developer time at at least $20 / hour, it is quite obvious which one makes more sense from a business perspective. Not to mention that the website was offline for an entire week - that alone would've killed most businesses!

A lot of tech people really like doing such deep dives and would happily spend years micro-optimizing even the most trivial code, but endless "yak shaving" isn't going to pay any bills. When the code runs on a trivial number of machines, it probably just isn't worth it. Not to mention that such optimizations often end up in code which is more difficult to maintain.

In my opinion, a lot of "software bloat" we see these days for apps running on user machines comes from a mismatch between the developer machine and the user machine. The developer is often equipped with a high-end workstation as they simply need those resources to do their job, but they end up using the same machine to do basic testing. On the other hand, the user is running it on a five-year-old machine which was at best mid-range when they bought it.

You can't really sell "we can save 150MB of memory" to your manager, but you can sell "saving 150MB of memory will make our app's performance go from terrible to borderline for 10% of users".

marginalia_nu 2023-08-31 07:15 UTC link

Yeah this aligns with my view. Limitations breed ingenuity, and that isn't limited to demo scene outputs. You're going to run into scaling problems sooner or later, and they're a lot easier to deal with early than late. If your software runs well on a raspberry pi[1], it's going to be absurdly performant on a real server.

It's actually how we used to build software. It's why we could have an entire operating system perform well on a machine like a Pentium 1 with most of what you'd expect today, etc. while at the same time we have web pages that struggle to scroll smoothly on a smartphone with literally a thousand times more resources across all axes. The Word 95 team were constantly faced with limits and performance tradeoffs, and it very clearly worked or did not.

If I had just gone and added more RAM (or whatever), I would still have been stuck with an inferior design, and soon enough I would need to buy even more RAM. The crazy part about this change is that it isn't just reducing the resource utilization, it's actually making the system more capable, and faster because free RAM means more disk caching.

[1] e.g. this runs on a single pi, and is much faster than production wikipedia because it doesn't permit updates: https://encyclopedia.marginalia.nu/article/Hacker_News

marginalia_nu 2023-08-31 07:18 UTC link

Could have, yeah. Could have also just have created a temporary table in mariadb.

SQLite has the benefit that it's a single file though, and you can do cool things with that. Such as copy it, share it, etc.

marginalia_nu 2023-08-31 07:18 UTC link

If all you want is the same results as Google I say you should just use Google. I'm not trying to compete with them, I give the results you won't find on the big G.

marginalia_nu 2023-08-31 07:22 UTC link

Most of what exists doesn't work for my application. It either assumes an unbounded resource budget, or makes different priorities that don't scale by e.g. permitting arbitrary realtime updates.

I'm building stuff myself because it's the only way I'm aware of to run a search engine capable of indexing quarter of a billion documents on a PC.

marginalia_nu 2023-08-31 08:19 UTC link

Aww shucks.

sph 2023-08-31 09:17 UTC link

It's a function of scale: the larger the team/company behind the product, the greater its enshittification factor/potential.

The author recently went full time on their Marginalia search engine, AFAIK it's a team size of 1, so it's the farthest away from any enshittification risk. Au contraire, like you say: it's at these sizes where you make jewels. Where creativity, ingenuity and vision shines.

This comment is sponsored by the "quit your desk job and go work for yourself" gang.

marginalia_nu 2023-08-31 10:14 UTC link

In this scenario I was basically re-writing the entire hard-drive completely in a completely random order, which is the worst case scenario for an SSD.

Normally the controller will use a whole bunch of tricks (e.g. overprovisioning, buffering and reordering of writes) to avoid this type of worst case pattern, but that only goes so far.

uoaei 2023-08-31 10:14 UTC link

Did you try searching in Russian? Is it reasonable to expect English-language sources?

Filligree 2023-08-31 10:16 UTC link

> Is that actually true for SSDs?

It's completely false. Even the most primitive SSD controllers would make some attempt at mitigating this.

marginalia_nu 2023-08-31 10:24 UTC link

This is a keyword based search engine. It shows you pages with the keywords you typed.

Though I strongly disagree with Putin's illegal war of aggression, I don't censor Russian websites. You'll find Russian truth on there as well: https://search.marginalia.nu/site/english.pravda.ru

lelanthran 2023-08-31 10:31 UTC link

>> In brief, every time an SSD updates a single byte anywhere on disk, it needs to erase and re-write that entire page.

> Is that actually true for SSDs? For raw flash it’s not, provided you are overwriting “empty” all-ones values or otherwise only changing 1s to 0s.

Maybe it depends. I wrote the driver for more than one popular flash chips (don't remember which ones now, but that employer had a policy of never using components that were not mainstream and available from multiple suppliers) and all the chips I dealt with did read and write exclusively via fixed-size pages.

Since SSDs are collection of chips, I'd expect each chip on the SSD to only support fixed-size paged IO.

mikehollinger 2023-08-31 11:17 UTC link

> Is that actually true for SSDs?

Not precisely. The logical view of a page living at some address of flash is not the reality. Pages get moved around the physical device as writes happen. The drive itself maintains a map of what addresses are used for what purpose, their health and so on. It’s a sparse storage scheme.

There’s even maintenance ops and garbage collection that happens occasionally or on command (like a TRIM).

In reality a “write” to a non-full drive is: 1. Figure out which page the data goes to. 2. Figure out if there’s data there or not. Read / modify / write if needed. 3. Figure out where to write the data. 4. Write the data. It might not go back where it started. In fact it probably won’t because of wear leveling.

You’re right that the controller does a far more complex set of steps for performance. That’s why an empty / new drive performs better for a while (page cache aside) then literally slows down compared to a “full” drive that’s old, with no spare pages.

Source: I was chief engineer for a cache-coherent memory mapped flash accelerator. We let a user map the drive very very efficiently in user space Linux, but eventually caved to the “easier” programming model of just being another hard drive after a while.

marginalia_nu 2023-08-31 11:54 UTC link

The search index isn't updated more than once every month, so no such filters. The year-filter is pretty rough too. It's very hard to accurately date most webpages.

gavinray 2023-08-31 12:46 UTC link

I was under the following impressions:

1. Writable Unit: The smallest unit you can write to in an SSD is a page.

2. Erasable Unit: The smallest unit you can erase in an SSD is a block, which consists of multiple pages.

So if a write operation impacts only 1 byte within a page, the SSD cannot erase just that byte. However, it does not need to erase the entire block either.

The SSD can perform a "read-modify-write" type of operation:

- Read the full page containing the byte that needs to change into the SSD's cache buffer.

- Modify just the byte that needs updating in the page cache.

- Erase a new empty block.

- Write the modified page from cache to the new block.

- Update the FTL mapping tables to point to the updated page in the new block.

So, a page does need to be rewritten even if just 1 byte changes. Whole-block erasure is avoided until many pages within it need to be modified.

not_your_vase 2023-08-31 15:11 UTC link

  > Engildification. Of which there should be more!

There are. You will just never find them with Google.

meithecatte 2023-08-31 15:53 UTC link

It's the search engine for the niche stuff. Marginal stuff, if you will. The name makes sense to me.

marginalia_nu 2023-08-31 19:10 UTC link

I agree in general, but I think bugs are a lot easier to track down with divide and conquer strategies. If you're able to reproduce the bug by sending request X to service Y, gradually shrink the test case down until you've found the culprit.

Optimization is often an architectural problem. Sure there are cases where you're copying a thing where you could recycle a buffer, but you run out of those fairly quickly, and a profiler will tell you what you need to know.

A lot of the big performance wins are in changing the entire data logistics, possibly eliminating significant portions of the flow until the code does what it needs to in as few steps as possible.

Editorial Channel

What the content says

Preamble Preamble

Not addressed

Article 1 Freedom, Equality, Brotherhood

Not addressed

Article 2 Non-Discrimination

Not addressed

Article 3 Life, Liberty, Security

Not addressed

Article 4 No Slavery

Not addressed

Article 5 No Torture

Not addressed

Article 6 Legal Personhood

Not addressed

Article 7 Equality Before Law

Not addressed

Article 8 Right to Remedy

Not addressed

Article 9 No Arbitrary Detention

Not addressed

Article 10 Fair Hearing

Not addressed

Article 11 Presumption of Innocence

Not addressed

Article 12 Privacy

Not addressed

Article 13 Freedom of Movement

Not addressed

Article 14 Asylum

Not addressed

Article 15 Nationality

Not addressed

Article 16 Marriage & Family

Not addressed

Article 17 Property

Not addressed

Article 18 Freedom of Thought

Not addressed

Article 19 Freedom of Expression

Medium Coverage Practice

Not addressed. Blog post does not explicitly discuss freedom of opinion, expression, or information access.

Article 20 Assembly & Association

Not addressed

Article 21 Political Participation

Not addressed

Article 22 Social Security

Not addressed

Article 23 Work & Equal Pay

Not addressed

Article 24 Rest & Leisure

Not addressed

Article 25 Standard of Living

Not addressed

Article 26 Education

Not addressed

Article 27 Cultural Participation

Not addressed

Article 28 Social & International Order

Not addressed

Article 29 Duties to Community

Not addressed

Article 30 No Destruction of Rights

Not addressed

Structural Channel

What the site does

+0.10

Article 19 Freedom of Expression

Medium Coverage Practice

Structural

+0.10

Context Modifier

SETL

The site maintains an independent search engine and implements optimization choices (reduced RAM requirements, support for low-powered hardware operation) that structurally support decentralized information infrastructure. This has a mild positive inclination toward enabling broader access to information resources.

Preamble Preamble

Not addressed

Article 1 Freedom, Equality, Brotherhood

Not addressed

Article 2 Non-Discrimination

Not addressed

Article 3 Life, Liberty, Security

Not addressed

Article 4 No Slavery

Not addressed

Article 5 No Torture

Not addressed

Article 6 Legal Personhood

Not addressed

Article 7 Equality Before Law

Not addressed

Article 8 Right to Remedy

Not addressed

Article 9 No Arbitrary Detention

Not addressed

Article 10 Fair Hearing

Not addressed

Article 11 Presumption of Innocence

Not addressed

Article 12 Privacy

Not addressed

Article 13 Freedom of Movement

Not addressed

Article 14 Asylum

Not addressed

Article 15 Nationality

Not addressed

Article 16 Marriage & Family

Not addressed

Article 17 Property

Not addressed

Article 18 Freedom of Thought

Not addressed

Article 20 Assembly & Association

Not addressed

Article 21 Political Participation

Not addressed

Article 22 Social Security

Not addressed

Article 23 Work & Equal Pay

Not addressed

Article 24 Rest & Leisure

Not addressed

Article 25 Standard of Living

Not addressed

Article 26 Education

Not addressed

Article 27 Cultural Participation

Not addressed

Article 28 Social & International Order

Not addressed

Article 29 Duties to Community

Not addressed

Article 30 No Destruction of Rights

Not addressed

Supplementary Signals

Epistemic Quality 0.74	Propaganda Flags 0 techniques detected	Solution Orientation No data
Emotional Tone No data	Stakeholder Voice No data	Temporal Framing No data
Geographic Scope No data	Complexity No data	Transparency No data

Event Timeline 6 events

2026-02-26 12:19	dlq	Dead-lettered after 1 attempts: Absurd Success	- -
2026-02-26 12:17	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 12:16	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 12:15	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 09:30	dlq	Dead-lettered after 1 attempts: Absurd Success	- -
2026-02-26 09:19	credit_exhausted	Credit balance too low, retrying in 280s	- -

build 1686d6e+53hr · deployed 2026-02-26 10:15 UTC · evaluated 2026-02-26 12:13:57 UTC