+0.20 The hunt for the missing data type (www.hillelwayne.com S:+0.15 )
690 points by todsacerdoti 725 days ago | 254 comments on HN | Mild positive Editorial · v3.7 · 2026-02-28 14:12:27
Summary Intellectual Discourse Neutral
A technical essay examining why programming languages lack built-in graph data types, based on interviews with four domain experts. While the content does not directly engage with UDHR provisions, it demonstrates commitment to transparent intellectual discourse and multi-perspectival analysis through open publication and fair presentation of diverse expert viewpoints.
Article Heatmap
Preamble: ND — Preamble Preamble: No Data — Preamble P Article 1: ND — Freedom, Equality, Brotherhood Article 1: No Data — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: ND — Life, Liberty, Security Article 3: No Data — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: ND — Equality Before Law Article 7: No Data — Equality Before Law 7 Article 8: ND — Right to Remedy Article 8: No Data — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: ND — Privacy Article 12: No Data — Privacy 12 Article 13: ND — Freedom of Movement Article 13: No Data — Freedom of Movement 13 Article 14: ND — Asylum Article 14: No Data — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: ND — Marriage & Family Article 16: No Data — Marriage & Family 16 Article 17: ND — Property Article 17: No Data — Property 17 Article 18: ND — Freedom of Thought Article 18: No Data — Freedom of Thought 18 Article 19: +0.18 — Freedom of Expression 19 Article 20: ND — Assembly & Association Article 20: No Data — Assembly & Association 20 Article 21: ND — Political Participation Article 21: No Data — Political Participation 21 Article 22: ND — Social Security Article 22: No Data — Social Security 22 Article 23: ND — Work & Equal Pay Article 23: No Data — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: ND — Standard of Living Article 25: No Data — Standard of Living 25 Article 26: ND — Education Article 26: No Data — Education 26 Article 27: ND — Cultural Participation Article 27: No Data — Cultural Participation 27 Article 28: ND — Social & International Order Article 28: No Data — Social & International Order 28 Article 29: ND — Duties to Community Article 29: No Data — Duties to Community 29 Article 30: ND — No Destruction of Rights Article 30: No Data — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean +0.20 Structural Mean +0.15
Weighted Mean +0.18 Unweighted Mean +0.18
Max +0.18 Article 19 Min +0.18 Article 19
Signal 1 No Data 30
Confidence 2% Volatility 0.00 (Low)
Negative 0 Channels E: 0.6 S: 0.4
SETL +0.10 Editorial-dominant
FW Ratio 67% 4 facts · 2 inferences
Evidence: High: 0 Medium: 1 Low: 0 No Data: 30
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.00 (0 articles) Security: 0.00 (0 articles) Legal: 0.00 (0 articles) Privacy & Movement: 0.00 (0 articles) Personal: 0.00 (0 articles) Expression: 0.18 (1 articles) Economic & Social: 0.00 (0 articles) Cultural: 0.00 (0 articles) Order & Duties: 0.00 (0 articles)
HN Discussion 20 top-level · 30 replies
obi1kenobi 2024-03-04 17:01 UTC link
As someone who did a lot of work with graphs, "why don't programming languages have a built-in graph data type?" is a question I’ve been asked a million times.

I'm thrilled I'll be able to point folks to a much more in-depth analysis like the one here, instead of saying some variation of "it's really hard to do it well" and having them just take my word for it.

rb-2 2024-03-04 17:07 UTC link
I wonder if it would be possible to mathematically define (in a theorem proving language like Coq) a bunch of accessor methods as well as a bunch of implementation primitives and then "compile" a custom graph implementation with whatever properties you need for your application. Some accessor methods will be very efficient for some implementations and very inefficient for others, but every method will still be available for every implementation. Profiling your application performance can help adjust the implementation "compiler" settings.

Ironically, this is a graph problem.

andsens 2024-03-04 17:11 UTC link
devils advocate: Is this maybe a case of discarding an 80% solution because you can’t do the last 20%?

I understand the constraints, but imagine how legible you could make code by replacing some key parts with a graph type that everybody knows. I honestly think that having a type that supports a small subset of possibilities and only has the simplest algorithms implemented would go a long way.

montmorency88 2024-03-04 17:19 UTC link
I'd highly recommend Erwigs FGL library in Haskell as a really nice example of a generally performant graph data structure that is easy to work with. The API feels like working with lists because you are essentially consing contexts(node, neighbours) into a list of contexts that form your graph. Many standard graph algorithms are then built up from depth or breadth first search and you can compose really succinct programs to manipulate the graph. Graphs are labelled with any Haskell data structure and there is a graphviz library complementary to FGL to generate dot files which can be rendered according to the data carried in the node labels. Often in an application you want to both perform computations on your graph and render a visualization simultaneously to the end user or for debugging and in FGL you minimise duplication of application and rendering logic.
gue5t 2024-03-04 17:26 UTC link
One interesting perspective is to view the sequence of lists -> trees -> DAGs -> general graphs as a loosening of restrictions:

- list nodes may have one child

- tree nodes may have multiple

- DAG nodes may have multiple parents though restricted by topological ordering

- graph nodes may have multiple parents from anywhere in the collection

Lists and trees can be fully captured by sum and product types, but extending this representation style to DAGs and graphs doesn't work--you either get inefficiency (for DAGs) and then infinite regress (for cyclic graphs) attempting to continue the "syntactic" style of representation, or you need to adopt an "indirect" representation based on identifiers or indices or hash consing.

taeric 2024-03-04 17:26 UTC link
A graph, as presented in this article, is a model of something else. That we have more than one way to implement this model is rather natural, all told. The hope that we can have a singular syntax and data structure that represents a graph in code is almost exactly why the author of Java's Linked List posted this gem: https://twitter.com/joshbloch/status/583813919019573248

My favorite on the idea of having a linked list where the node is first class in your code, is almost precisely the problem. You rarely want/need to work at that level. In a very real sense, objects that have other objects are already trees of data. Many can back reference, such that then you have a graph.

And then there is the joy of trying to use matrix operations to work with graphs. You can do some powerful things, but at that point, you almost certainly want the matrix to be the abstraction.

Excited to see someone come up with good things in this idea. I retain very serious doubts that I want a singular model for my data.

dustingetz 2024-03-04 17:27 UTC link
Electric Clojure uses Clojure itself (s-expressions) as a graph authoring syntax, using a macro to reify dataflow through a reactive client/server system (here the use case is full stack user interfaces but the idea generalizes) https://github.com/hyperfiddle/electric (I'm the founder).

IMO, the answer to the question "Where are all the graph types?" is: the graph authoring DSL needs to express scope, control flow and abstraction, which essentially makes it isomorphic to a programming language, freed of its evaluation model. In Python and Typescript, embedding a complete programming language is something that's rather hard to do!

Also see my blog post "Four problems preventing visual flowchart programming from expressing web applications" https://www.dustingetz.com/#/page/four%20problems%20preventi...

hobofan 2024-03-04 17:51 UTC link
I think the post mostly answers the questions "why are graph _algorithms_ not better supported in programming languages", with a focus that is much more on "big data" graph processing than graph support in general.

I think if you look at graph support in general you are also looking at wider questions, like "why are OGMs (Object Graph Mappers) not as popular as ORMs" and "why is JSON so prevalent while RDF (or another low-level graph serialization) isn't"?

And I think in the end it comes down to historic reasons (RDF emerged a bit too early and never evolved and accrued an ecosystem of horrible academic standards and implementations), and a graphs having just a smidge more of inherent complexity in implementation and learning curve that just doesn't scale well across many developers.

------

I also wouldn't put too much weight on the "Graph Querying Language" part of the article. It sadly reads like exactly the marketing copy you would read from Neo4J or SPARQL enthusiasts that haven't tried building a product on top of it.

> The main difference between all GQLs and SQL is that the “joins” (relationships) are first-class entities.

Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)

If you also go to the lower levels of any graph query language and look at it's query plans you'll notice that there isn't any meaningful difference to that of one you'll find in an SQL based query. The standardization of GQL[0] as an SQL extension is evidence for that.

> In SPARQL relationships are just edges, making the same query easy.

SPARQL is easy if you need to do exact path traversals. If you try to do anything sophisticated with it (like you would in the backend of a webapp), you'll quickly run into footguns like joins with unbound values and you accidently join your whole result set away.

[0]: https://en.wikipedia.org/wiki/Graph_Query_Language

criloz2 2024-03-04 18:39 UTC link
Graph drawing tools are also very underwhelming, they work pretty good for small graphs until you have something like 500 nodes or more, then eventually their output becomes complete incompressible or very difficult to look at it, they miss the ability to automatically organize those graph in hierarchical structures and provide a nice interface to explore them, we are used that everything around us have some kind of hierarchy, I think that is the same kind of problem that will need to be solved in order to have a generic graph data type, also this kind of thing will need to be implemented at the compiler level where those graph generic algos will be adapted to the generated hierarchy of structures, and if you add a theorem prover that can check that certain subgraph will always have certain structures you can statically generated those procedures and for the other super graphs those methods will be generated dynamically at runtime.

So whoever solve the problem for generic graph drawing will have the ability or the insight to implement this too.

rhelz 2024-03-04 18:51 UTC link
Ya, the central obstacle is that:

1. for simple and small graph problems, a simple vector-of-vectors adjacency list is easy enough to code up.

2. For complex and huge graph problems, the only way to get performant solutions is to tailor the graph implementation to the specific details of the problem to be solved.

And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.

It's another instance of the phenomenon which Strousroup noticed: we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.

graphviz 2024-03-04 19:04 UTC link
Graphviz has its own foundation graph library, that's not used by any other project. It has its good and bad points.

Based on that experience, we had our very own second-system-syndrome experience.

We decided our graph library should be modular, type safe, and efficient. (These properties came up in the comments here, too.) This is probably just a variation of "good, fast, cheap - pick any two."

By modular, want to write collections of graph algorithm libraries that are developed and even compiled independently.

By type safe, we mean we want to detect programming errors during compilation, or link time at the latest. We don't want programs to throw runtime errors like "your node does not have a color attribute".

By efficient, we mean that accessing an attribute of a graph is as cheap as a field in a C struct. (So, we don't want to carry around external hash table, or do a lot of string conversions, for instance.)

You can argue about whether these things are worth the price or even make sense, but that's what we wanted. We had some famous C++ creators in our lab, so we figured we could get help, and we were willing to give C++ another chance.

Gordon Woodhull, who had been an intern and kept working for us, is a brilliant programmer, and wrote an implementation of this kind of graph library working in templated C++. It's even published at https://www.dynagraph.org/ via sourceforge. The rest of us were not really sure we could ever understand how the code worked, so we had a code review with said famous C++ inventors. There were a lot of screens of code, and chinstroking, and greybeards pronounced "That would probably work." We knew we might have gone over the complexity cliff. (Let's not even talk about compile-time template errors, where one error fills an entire screen with details that only a... C++ inventor could love.) It was our fault, not anyone else's, and Gordon kept plugging away and even made all the dynamic graph layout stuff work, in Microsoft OLE, too. In hindsight it was probably our own Project Xanadu. While we got lost in this, a lot of things like Gephi (Java) and NetworkX and NetworKit (python) happened.

Also, John Ellson, a very talented software engineer who had written parts of Graphviz, revitalized the main effort.

skrebbel 2024-03-04 19:45 UTC link
I think most languages support representing many kinds of graphs very well, particularly directed graphs without data on the edges: with objects, lists, and object references (or pointers).

A tree is a graph. A typical Java-style object composing other objects composing other objects again, etc etc, often with cycles and parent backreferences and whatnot, is a graph. The html DOM is a graph.

I recognize that these are often very tree-like, which feels like cheating in the same way as saying “well a list is also a graph!” is. But given that cycles are common enough that serializers (eg JSON.stringify) need to special-case those, I think maybe this is simply not true, and they’re really just graphs. Very few tree-like class structures tend to remain pure trees.

The only thing missing from references/pointers to be able to represent what the author is looking for, is having data on the edges. I think this is trivially solvable by putting nodes halfway the edge (= add a level of indirection, an operation so common that we don’t even think of it as “adding data to the edges”).

So I think the answer is that there’s no explicit data structure named “graph” because the basic building block of composition in nearly every language (reference/pointer) is an edge, and the basic building block of data representation (objects/structs/records) is a node. So for most graphs, trying to pour it all into some fancy Graph<V, E> datastructure feels like needless complexity.

rdtsc 2024-03-04 20:20 UTC link
> There’s a gap between how often software engineers could use graphs and how little our programming ecosystems support them. Where are all the graph types?

They've been there for quite a while :-) https://www.erlang.org/doc/man/digraph.html https://www.erlang.org/doc/man/digraph_utils

And if you want to do some set theoretical stuff you're covered as well: https://www.erlang.org/doc/man/sofs.html

jkaptur 2024-03-04 20:23 UTC link
I've often wondered about a missing application: "Excel for graphs".

Just like Excel for tabular data, it would support RAM-sized data (enough to require a computer, but not so much that you need a data center), implement lots of algorithms and visualizations "well enough", and require no programming skill to operate.

As the article says, a lot of our real-world problems are graph problems - why are programmers the only ones who should have the tools to solve them?

ylow 2024-03-04 20:53 UTC link
I think this is because a graph is not a data-structure nor a data-type. It is really an abstraction.

Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v). And that really is all is needed for the most foundational set of graph algorithms.

Everything else are case-by-case constraints. Does A->B imply B->A? is the node set partitionable with certain constraints? Are there colors? labels?

To make things even more general I can go up one level and consider the hypergraph. In which case I just have a set of vertices, and a set of sets of vertices. This can be represented in a myriad of different ways depending on what you are interested in. Of which (non-hyper) graph is simply a special case.

An alternative way to think about it perhaps from the database perspective, is that its a query optimization and indexing problem. Depending on what questions you want to ask about the graph, there will be different ways to represent the graph to answer the question better. Just like there is not one way to represent the abstraction called "Table", there is not one way to do "Graph" either. It really depends on the questions you care about.

fatherzine 2024-03-04 21:06 UTC link
What an awesome article. Kudos to the author!

On the core observation "there are too many implementation choices", that is not quite right. True, the author mentions 4, and there are further variations. In practice, a library can:

1. Implement all suitable graph representations.

2. Implement algorithms tailored to the representation(s) that offer the highest performance.

3. Provide transformations from one representation to another. This is O(#representations), trivial to implement and trivial to use. Quite fair workload for both maintainers and users.

4. Bonus, provide import / export transformations from / to common standard library datatypes and idioms.

Memory and transformations are cheap, 99% of use-cases would likely find the overhead of transforming data, both in RAM and CPU, negligible.

Edit: "the harsh truth of working at Google is that in the end you are moving protobufs from one place to another." -- https://news.ycombinator.com/item?id=20132880

js8 2024-03-04 22:31 UTC link
Another data type that would be quite useful is a table (like in a database). For the same reasons, too many design choices.

Anyway, that being said, I have felt that progress will be made in programming languages if the compiler gets to choose an implementation of a data structure, kinda like when a database chooses an execution plan. So you just use an abstract structure (like sequence, map, set, table, graph) and based on the program profile, the compiler will pick the specific implementation. It will also transform the structure into another isomorphic one as needed. (Some programming languages already do something like this, for example, array of structs to struct of arrays conversion.)

Ptitselov 2024-03-05 00:25 UTC link
If you code in .NET, please try my graph library Arborescence [1]. It's small and not very feature-rich. However, I believe it provides a good separation between abstractions [2], algorithms, and data structures. Regarding the points mentioned in the article: - you can use the edges with or without their own identity, - you can use implicit graphs unfolded on the fly, - you can use both adjacency (out-neighbors) and incidence (out-edges + head) interfaces, - no edge type is emposed by the library, although it does provide the basic tail-head-pair structure as a utility.

[1] https://github.com/qbit86/arborescence

[2] https://github.com/qbit86/arborescence/tree/develop/src/Arbo...

kajic 2024-03-05 06:44 UTC link
One of the complications described by the author is performance. Personally, I find stdlib graph libraries extremely useful even if their performance is poor because it’s often the case that my dataset is small enough, and even if performance turns out to be an issue, first spending time on the problem with a underperforming graph library is a very worthwhile exercise before trying to write my own optimized implementation. By analogy, many programming languages are far from being the fastest, but they can nevertheless be very useful.

That said, I’m not surprised performance came up in interviews with experts; they probably have tons of interesting performance-related stories to tell from their extensive work on graphs.

superlopuh 2024-03-05 11:18 UTC link
This is basically my PhD thesis proposal, I don't think there's any fundamental technological problem here, just that for a graph to be efficient to process you need high-level optimisations that can take mathematical properties of graphs into account. For that you need to either reimplement a compiler into your framework, or be integrated into an existing compiler, both obviously take a lot of work.

Some comments here mention GraphBLAS, which is the big breakthrough in decoupling the layout of the graph from an efficient implementation of an algorithm, but none mention MLIR-GraphBLAS [0] which is the most promising integration into a compiler that I've seen.

I still think it's early days, I wouldn't throw in the towel quite yet :)

[0]: https://mlir-graphblas.readthedocs.io/en/latest/index.html

jjice 2024-03-04 17:11 UTC link
I used to think that since graphs are such a broad data structure that can be represented in different ways depending on requirements that it just made more sense to implement them at a domain-ish level (the article mentions this in the "There are too many implementation choices" section).

Then I saw Petgraph [0] which is the first time I had really looked at a generic graph library. It's very interesting, but I still have implemented graphs at a domain level.

[0] https://github.com/petgraph/petgraph

boothby 2024-03-04 17:41 UTC link
It isn't just an 80/20 problem. Imagine that you replace every linked list, binary tree, trie, etc., with a generic directed graph datatype. The resulting performance would be abysmal. The resulting notation would be horrid, too.

Here's our nice linked list:

  def last_element(ll):
      last = ll
      while ll is not None:
          last = ll
          ll = ll.next
      return last
And here's an implementation with generic graph notation:

  def last_element(g):
      for v, deg in g.out_degree:
          if deg == 0:
              return v
      return None
There are several problems with this; most importantly, there can be silent failures when g is not a linked list. But it also throws out a useful abstraction where a list is equivalent to a node, so I wrote a horrid implementation that takes O(n) regardless of the position in the list. And then comes all the baggage of representation, because you can't just represent a node with a pointer anymore.

When your data structure better reflects the, well, structure of your data, you can go faster and safer. There's a reason we teach undergrads about these specific datatypes and don't just sweep it all under a rug with "it's a graph!"

nickpsecurity 2024-03-04 17:54 UTC link
That’s a neat idea. The other side of it might be expressiveness.

The more expressive constructs are usually more productive and concise. The less expressive constructs are usually easier to optimize or analyze with a machine. The old rule, like in LANGSEC, is to pick the least-expressive option that works. Some people also develop transforming code (eg netaprogramming) to let you write highly-expressive code that generates correct, low-expressiveness code.

kevindamm 2024-03-04 18:13 UTC link
This sounds like Partial Evaluation and the Futamura Projection. The research around that shows that your interpreter determines the shape of the compiled output, so a formal proof of its application isn't necessary, if the $mix$-equivalent has the appropriate syntax and semantics for graph processes in its design.

I know this has been done for procedural languages and for declarative logical languages but I'm not aware of something like this specifically for graph processing and highly specialized code generation of graph processing. I wouldn't be surprised if Mix has been extended for this already, even if it has I'm sure there is still value in it.

tikhonj 2024-03-04 18:14 UTC link
FGL is a great example of how to make a "nice" high-level graph interface suited for functional programming. I'm a big fan. But it's orders of magnitude too slow and memory-inefficient for performance-sensitive graph computations—if you have even moderately sized graphs and graph algorithms are a bottleneck, you'll need to use something else, and probably something domain-specific. Given the way the interface works, I don't think you could have a high-performance version that would scale to large graphs.

In my experience this leaves FGL in an awkward spot: on the one hand, it isn't sufficient for heavy-duty graph processing; on the other, if you don't need fancy high-performance graph algorithms, chance are that encoding your problem as a graph is going to be more awkward than defining some domain-specific types for what you're doing. Graphs are such a general structure that they're usually the wrong level of abstraction for higher-level domain-specific logic.

Of course, sometimes you're writing graph code specifically and you need a nice way to express your graph algorithms without worrying about performance. In that case, FGL is great. I wrote a tutorial about using it to [generate mazes][1] and it helped me express the algorithms better than I would have been able to do without it. But that still leaves it as too narrow for something to be "the" graph representation in a language's standard library.

[1]: https://jelv.is/blog/Generating-Mazes-with-Inductive-Graphs/

JonChesterfield 2024-03-04 18:46 UTC link
I think this is a worthwhile direction.

For example, I'd like to program against a sequence abstraction. When sort is applied to it, I hope it's a vector. When slice or splice, I hope it's some sort of linked structure. Size is as cheap as empty for the vector but much more expensive for a linked list.

It should be possible to determine a reasonable data representation statically based on the operations and control flow graph, inserting conversions where the optimal choice is different.

The drawback of course is that people write different programs for different data structures. Knowing what things are cheap and what aren't guides the design. There's also a relinquishing of control implied by letting the compiler choose for you that people may dislike.

As an anecdote for the latter, clojure uses vectors for lambda arguments. I thought that was silly since it's a lisp that mostly works in terms of seq abstractions, why not have the compiler choose based on what you do with the sequence? The professional clojure devs I was talking to really didn't like that idea.

andoando 2024-03-04 18:58 UTC link
Ive been thinking about something like this. A mathematical definition of a function such that we can search it. Imagine we had something like "Find a function that fits this signature -> Input arr[numbers] out-> for every x in arr, x2>x1.
somat 2024-03-04 19:05 UTC link
This is a super naive take. but I would consider the pointer the be the native graph type. What is wanted is not a graph type but the tooling to traverse graphs.
hobofan 2024-03-04 19:05 UTC link
> we are used that everything around us have some kind of hierarchy

I think the problem is more that we are used to the illusion/delusion that everything is hierarchical. The problem that we then encouter is with graph drawing is that it has to try and reconcile the fact that things in practice are rarely really hierarchical, and it's hard to draw those lines of where the hierarchies are with mathematical rigor. And that problem gets worse and worse the less properties you are allowed to assume about the underlying graph structure (connectedness, cyclic/acyclic, sparse/dense).

In practice when you want build a UI that interacts with graphs it's often feasible to determine/impose one or two levels of meta-hierarchy with which you can do clustering (allows for reducing layout destroying impact of hairball nodes + improves rendering performance by reducing node count) and layout with fCOSE (Cytoscape.js has an implementation of that).

transitionnel 2024-03-04 19:24 UTC link
That all sounds fantastic!
schindlabua 2024-03-04 19:25 UTC link
I like thinking of this concept via free constructions.

As is somewhat commonly known the free Monoid is the List type; monoids are not commutative so we get a sense of "direction", like a list has a start and an end.

If we add commutativity and look at free groups, we find they are equivalent to multisets.

If we take associativity away from monoids and look at free semigroups, we get binary finger trees, I think?

In some sense removing constraints from the binary operator results in more general free types. Would be interesting to find what free construction makes digraphs but I have to bounce.

nine_k 2024-03-04 19:26 UTC link
Ideal things are often tree-like. Real-world structures are usually DAGs if they are nice and well-behaved.

Making things planar, or almost planar with few crossings and nice clustering of related nodes, is usually hard past a couple dozen nodes :(

paulddraper 2024-03-04 19:48 UTC link
> "it's really hard to do it well"

More importantly, there are a lot of tradeoffs.

Virtually every language offers a hash map. You can roll your own to outperform in an individual circumstance, the default works pretty well.

You can't really do that with a graph. Maybe if you offered a bunch of graph types.

---

PS. Bit of trivia: Java's HashMap is a bit different from almost every other language in that it lets you tune the load factor.

bigbillheck 2024-03-04 20:14 UTC link
I think it's more like discarding a 20% solution that can't do the last 80%.
twoodfin 2024-03-04 20:46 UTC link
And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.

I’m not so sure? Looking at an algorithm against an abstract graph type, then filling in the implementation to optimize for the particular algorithm seems right in the wheelhouse of code-specialized LLM’s.

swagasaurus-rex 2024-03-04 21:14 UTC link
The illustrations I've seen of neural networks really highlights the sheet incomprehensibility of visualizing large graphs
josephg 2024-03-04 21:27 UTC link
Sounds like the makings of a huge library that I’m not sure I’d even use in my work. I use graphs heavily in my work, and my experience matches the people the author interviewed.

We always end up reimplementing graphs because:

- Performance matters, and no off the shelf graph library I’ve seen can take advantage of many of the regularities in our particular data set. (We have an append-only DAG which we can internally run-length encode because almost all nodes just have an edge pointing to the last added item).

- I haven’t seen any generic graph library which supports the specific queries I need to make on my graphs. The big one is a subgraph diffing function.

- Writing something custom just isn’t much work anyway! Graphs are way simpler to reimplement than btrees. You can have a simple graph implementation in tens of lines. Our highly optimised library - with all the supporting algorithms - is still only a few hundred lines of code.

I think it would be handy to have ways to export the data into some standard format. But eh. I think pulling a library in for our use case would add more problems than it would solve.

matusp 2024-03-04 21:42 UTC link
Yeah, I feel like the article is too quick with its conclusions. Many other problems can be made arbitrary complex and difficult with additional requirements. But there are still data structure and standard libraries to provide good enough experience that fits most use-cases. And if you need something extra spicy you need to build a custom solution.

The article claims that graphs are often just too big, but yeah, if you ask people who are actively working on graph algorithms they might have that sort of experience. But most programmers and users probably only work with really small graphs.

kjqgqkejbfefn 2024-03-04 21:58 UTC link
>Graph drawing tools

It's hard

Graphviz-like generic graph-drawing library. More options, more control.

https://eclipse.dev/elk/

Experiments by the same team responsible for the development of ELK, at Kiel University

https://github.com/kieler/KLighD

Kieler project wiki

https://rtsys.informatik.uni-kiel.de/confluence/display/KIEL...

Constraint-based graph drawing libraries

https://www.adaptagrams.org/

JS implementation

https://ialab.it.monash.edu/webcola/

Some cool stuff:

HOLA: Human-like Orthogonal Network Layout

https://ialab.it.monash.edu/~dwyer/papers/hola2015.pdf

Confluent Graphs demos: makes edges more readable.

https://www.aviz.fr/~bbach/confluentgraphs/

Stress-Minimizing Orthogonal Layout of Data Flow Diagrams with Ports

https://arxiv.org/pdf/1408.4626.pdf

Improved Optimal and Approximate Power Graph Compression for Clearer Visualisation of Dense Graphs

https://arxiv.org/pdf/1311.6996v1.pdf

gloftus 2024-03-04 22:02 UTC link
Yes, graphs are ubiquitous because they are so abstract. They live on the same level of abstraction as pure numbers. There are useful "numerical" libraries that exist, and by analogy I think you could say there are also useful "graphical" libraries that exist. But we don't really have "number" libraries, and we don't really have "graph" libraries, because those concepts are a bit too abstract to write APIs against.
e_y_ 2024-03-04 22:09 UTC link
Back in the C days, it was common to not use generic data structures like lists; instead you'd have a next_item pointer as a field in the struct. For linked lists, this would save you from needing another memory allocation or wrapper struct, and since C doesn't have templates you'd either have to blindly cast the type or use macros or write a type-specific iterator anyways.

Lists eventually became a standard language feature in C++ and other languages, but it's trickier for trees and graphs. Taking the DOM example, you might be searching through child elements (div, span, etc) or nodes (text nodes, comment nodes) and different operations might only work with a specific subset of the "edges". There might be pointers to other objects, like from a DOM node to accessibility tree node. You might even have multiple parent node pointers, such as a pointer that takes you to the nearest shadow root or something.

Since there are multiple ways to traverse the same data structure, generic functions don't work on it. You could create a separate tree/graph for each thing you want to use it for, but that takes additional memory and has to be updated when the original struct changes. Or you could create some kind of adapter that has a get_edges() function, but this might not be very well optimized or might be clunky for many other reasons. So it usually just ends up being simpler rolling your own functions instead of using a library.

lmm 2024-03-04 22:14 UTC link
> Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)

Having its own keyword is pretty strong evidence that something isn't first-class (e.g. typeclasses are not first-class in Haskell; control flow is not first-class in most programming languages).

js8 2024-03-04 22:39 UTC link
> we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.

Interesting. But I am not sure we are good at sharing small things - every programming language has its own implementation of vectors. Within one language ecosystem, the API of a vector is small, and that's probably what makes it easy to share.

For operating systems, the API is relatively small compared to the internal complexity of the OS. This is also true for libraries for numerical problems, which are also easily shared. But the more you want to customize things (e.g. share a complicated data structure), this complicates the API and inhibits sharing.

So it seems to this is determined by the surface area (relative size of the API) of the thing being shared.

DylanSp 2024-03-04 23:02 UTC link
Erlang's briefly mentioned at the end of the article:

> There are two other languages I found with graph types: Erlang and SWI-Prolog. I don’t know either language and cannot tell when they were added; with Erlang, at least, it was before 2008. I reached out to a person on the Erlang core language committee but did not hear back.

Kamq 2024-03-05 00:23 UTC link
> So you just use an abstract structure (like sequence, map, set, table, graph) and based on the program profile, the compiler will pick the specific implementation. It will also transform the structure into another isomorphic one as needed.

I'm so not looking forward to having to debug a sudden change in perf characteristics when one additional usage of some feature tips a heuristic over the line and an implementation gets swapped out between builds.

samatman 2024-03-05 00:43 UTC link
Some algorithms do better at this than others, but "make a good diagram of a graph" is an intelligence-complete problem in the general case. Two people might render structurally-identical graphs in very different ways, to emphasize different aspects of the data. This is in fact a similar problem to the "generic graph algorithm" and "generic graph data structure" problems.

Graphs straddle the line between code and data. For instance, any given program has a call graph, so in a real sense, the "generic graph algorithm" is just computation.

kccqzy 2024-03-05 00:47 UTC link
In Haskell though I think Alga has an even nicer API. Don't know about performance as I haven't had a need to use Haskell to process enormous graphs. https://github.com/snowleopard/alga
dataflow 2024-03-05 00:51 UTC link
> Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v).

Even that is severely overconstrained. It doesn't allow multiple edges to the same neighbor!

dataflow 2024-03-05 00:57 UTC link
> "why don't programming languages have a built-in graph data type?"

What I find a little funny about that question is that people miss the fact that there isn't even a tree data structure in most languages. Most languages have static arrays, dynamic arrays, linked lists, and... that's it as far as structural types go. Everything else (BSTs, hashtables, etc.) is some semantic abstraction that hides some capabilities of the underlying structure, not a purely structural representation.

jonahss 2024-03-05 02:31 UTC link
I agree. And I think the key to this is VR.

Another comment in this thread is about how hard graphs are to visualize, but a 3D interface gives you a lot more room.

When VR hype began I thought "well what's the excel of VR?". Microsoft's answer was "2D spreadsheets floating in 3D space". What nonsense. I think graphs.

email my username at gmail.com if anyone is interested in exploring this together!

Editorial Channel
What the content says
+0.20
Article 19 Freedom of Expression
Medium Practice - transparent intellectual discourse Advocacy - multi-perspectival analysis
Editorial
+0.20
SETL
+0.10

Content demonstrates commitment to presenting multiple expert perspectives fairly and transparently, exemplifying open intellectual discourse and freedom of thought through research methodology grounded in honest inquiry

ND
Preamble Preamble

No engagement with foundational concepts of human dignity, equality, or rights

ND
Article 1 Freedom, Equality, Brotherhood

No discussion of human equality or inherent rights of all members of human family

ND
Article 2 Non-Discrimination

No engagement with non-discrimination principles

ND
Article 3 Life, Liberty, Security

No discussion of right to life, liberty, or personal security

ND
Article 4 No Slavery

No mention of slavery or servitude

ND
Article 5 No Torture

No discussion of torture or cruel, inhuman, or degrading treatment

ND
Article 6 Legal Personhood

No discussion of right to recognition as person before law

ND
Article 7 Equality Before Law

No discussion of equality before the law

ND
Article 8 Right to Remedy

No discussion of effective remedy for violations of rights

ND
Article 9 No Arbitrary Detention

No discussion of arbitrary arrest or detention

ND
Article 10 Fair Hearing

No discussion of fair and public hearing by independent tribunal

ND
Article 11 Presumption of Innocence

No discussion of criminal procedure or presumption of innocence

ND
Article 12 Privacy

No discussion of privacy, family, home, or correspondence

ND
Article 13 Freedom of Movement

No discussion of freedom of movement or residence

ND
Article 14 Asylum

No discussion of asylum or persecution

ND
Article 15 Nationality

No discussion of nationality

ND
Article 16 Marriage & Family

No discussion of marriage and family protection

ND
Article 17 Property

No discussion of property ownership or deprivation

ND
Article 18 Freedom of Thought

No discussion of freedom of thought, conscience, or religion

ND
Article 20 Assembly & Association

No discussion of freedom of peaceful assembly or association

ND
Article 21 Political Participation

No discussion of participation in governance or public affairs

ND
Article 22 Social Security

No discussion of social security or cultural rights

ND
Article 23 Work & Equal Pay

No discussion of labor rights, employment conditions, or fair wages (mentions software development work in descriptive context only)

ND
Article 24 Rest & Leisure

No discussion of rest, leisure time, or reasonable working hours

ND
Article 25 Standard of Living

No discussion of health, medical care, welfare, or standard of living

ND
Article 26 Education

No discussion of right to education (content is educational in nature but not about education as a human right)

ND
Article 27 Cultural Participation

No engagement with cultural participation rights or intellectual property protections as human rights (mentions proprietary software but not in rights context)

ND
Article 28 Social & International Order

No discussion of social and international order necessary for realization of rights

ND
Article 29 Duties to Community

No discussion of duties to community or proper limitations on rights exercise

ND
Article 30 No Destruction of Rights

No discussion of interpretation safeguards or prevention of destruction of rights

Structural Channel
What the site does
+0.15
Article 19 Freedom of Expression
Medium Practice - transparent intellectual discourse Advocacy - multi-perspectival analysis
Structural
+0.15
Context Modifier
ND
SETL
+0.10

Website structure allows open publication and dissemination of technical content without paywall or apparent censorship; content is freely accessible and shareable

ND
Preamble Preamble

N/A

ND
Article 1 Freedom, Equality, Brotherhood

N/A

ND
Article 2 Non-Discrimination

N/A

ND
Article 3 Life, Liberty, Security

N/A

ND
Article 4 No Slavery

N/A

ND
Article 5 No Torture

N/A

ND
Article 6 Legal Personhood

N/A

ND
Article 7 Equality Before Law

N/A

ND
Article 8 Right to Remedy

N/A

ND
Article 9 No Arbitrary Detention

N/A

ND
Article 10 Fair Hearing

N/A

ND
Article 11 Presumption of Innocence

N/A

ND
Article 12 Privacy

N/A

ND
Article 13 Freedom of Movement

N/A

ND
Article 14 Asylum

N/A

ND
Article 15 Nationality

N/A

ND
Article 16 Marriage & Family

N/A

ND
Article 17 Property

N/A

ND
Article 18 Freedom of Thought

N/A

ND
Article 20 Assembly & Association

N/A

ND
Article 21 Political Participation

N/A

ND
Article 22 Social Security

N/A

ND
Article 23 Work & Equal Pay

N/A

ND
Article 24 Rest & Leisure

N/A

ND
Article 25 Standard of Living

N/A

ND
Article 26 Education

N/A

ND
Article 27 Cultural Participation

N/A

ND
Article 28 Social & International Order

N/A

ND
Article 29 Duties to Community

N/A

ND
Article 30 No Destruction of Rights

N/A

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.82 medium claims
Sources
0.8
Evidence
0.8
Uncertainty
0.8
Purpose
0.9
Propaganda Flags
No manipulative rhetoric detected
0 techniques detected
Emotional Tone
Emotional character: positive/negative, intensity, authority
measured
Valence
-0.3
Arousal
0.3
Dominance
0.5
Transparency
Does the content identify its author and disclose interests?
0.67
✓ Author ✓ Conflicts ✗ Funding
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.32 problem only
Reader Agency
0.2
Stakeholder Voice
Whose perspectives are represented in this content?
0.65 5 perspectives
Speaks: compiler_developersecurity_tool_authordatabase_engineerlibrary_maintainerresearcher_author
About: language_maintainersprogrammerssoftware_engineers
Temporal Framing
Is this content looking backward, at the present, or forward?
present medium term
Geographic Scope
What geographic area does this content cover?
global
Complexity
How accessible is this content to a general audience?
moderate medium jargon domain specific
Audit Trail 11 entries
2026-02-28 14:12 eval Evaluated by claude-haiku-4-5-20251001: +0.18 (Mild positive) +0.08
2026-02-28 14:04 eval Evaluated by claude-haiku-4-5-20251001: +0.10 (Mild positive)
2026-02-28 09:58 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 09:58 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
2026-02-28 09:58 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 09:52 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 09:52 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-28 09:52 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 09:48 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 09:48 rater_validation_warn Light validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 09:48 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)