301 points by robinhouston 7 days ago | 193 comments on HN
| Mild positive Low agreement (3 models)
Editorial · v3.7· 2026-03-15 22:43:48 0
Summary Cybersecurity & Digital Safety Advocates
This Aikido Security blog article reports on malware research (Glassworm unicode attacks), advocating for developer awareness and protection of digital infrastructure. Editorially, content champions security rights and knowledge sharing (Articles 18-19, 26); structurally, the site deploys tracking and analytics infrastructure that undermines privacy rights (Article 12). The evaluation reflects strong positive signals around expression, education, and work rights, tempered by significant privacy and surveillance concerns.
Rights Tensions2 pairs
Art 12 ↔ Art 26 —Privacy rights (Article 12) are subordinated to educational benefit (Article 26) through unilateral tracking and data collection without explicit consent, privileging institutional knowledge dissemination over individual privacy autonomy.
Art 12 ↔ Art 19 —Freedom of expression (Article 19) is enabled while privacy rights (Article 12) are undermined; the platform facilitates speech but uses surveillance infrastructure to collect behavioral data about readers without transparent consent.
IMO while the bar is high to say "it's the responsibility of the repository operator itself to guard against a certain class of attack" - I think this qualifies. The same way GitHub provides Secret Scanning [0], it should alert upon spans of zero-width characters that are not used in a linguistically standard way (don't need an LLM for this, just n-tuples).
Sure, third-party services like the OP can provide bots that can scan. But if you create an ecosystem in which PRs can be submitted by threat actors, part of your commitment to the community should be to provide visibility into attacks that cannot be seen by the naked eye, and make that protection the norm rather than the exception.
It baffles me that any maintainer would merge code like the one highlighted in the issue, without knowing what it does. That’s regardless of being or not being able to see the “invisible” characters. There’s a transforming function here and an eval() call.
The mere fact that a software maintainer would merge code without knowing what it does says more about the terrible state of software.
Looks like the repo owner force-pushed a bad commit to replace an existing one. But then, why not forge it to maintain the existing timestamp + author, e.g. via `git commit --amend -C df8c18`?
Unicode should be for visible characters. Invisible characters are an abomination. So are ways to hide text by using Unicode so-called "characters" to cause the cursor to go backwards.
Things that vanish on a printout should not be in Unicode.
Attacks employing invisible characters are not a new thing. Prior efforts here include terminal escape sequences, possibly hidden with CSS that if blindly copied and pasted would execute who knows what if the particular terminal allowed escape sequences to do too much (a common feature of featuritis) or the terminal had errors in its invisible character parsing code.
For data or code hiding the Acme::Bleach Perl module is an old example though by no means the oldest example of such. This is largely irrelevant given how relevant not learning from history is for most.
Invisible characters may also cause hard to debug issues, such as lpr(1) not working for a user, who turned out to have a control character hiding in their .cshrc. Such things as hex viewers and OCD levels of attention to detail are suggested.
I wonder if this could be used for prompt injection, if you copy and paste the seemingly empty string into an LLM does it understand? Maybe the affect Unicode characters aren’t tokenized.
I use non-Unicode mode in the terminal emulator (and text editors, etc), I use a non-Unicode locale, and will always use ASCII for most kind of source code files (mainly C) (in some cases, other character sets will be used such as PC character set, but usually it will be ASCII). Doing this will mitigate many of this when maintaining your own software. I am apparently not the only one; I have seen others suggest similar things. (If you need non-ASCII text (e.g. for documentation) you might store them in separate files instead. If you only need a small number of them in a few string literals, then you might use the \x escapes; add comments if necessary to explain it.)
The article is about in JavaScript, although it can apply to other programming languages as well. However, even in JavaScript, you can use \u escapes in place of the non-ASCII characters. (One of my ideas in a programming language design intended to be better instead of C, is that it forces visible ASCII (and a few control characters, with some restrictions on their use), unless you specify by a directive or switch that you want to allow non-ASCII bytes.)
Why can't code editors have a default-on feature where they show any invisible character (other than newlines)? I seem to remember Sublime doing this at least in some cases... the characters were rendered as a lozenge shape with the hex value of the character.
Is there ever a circumstance where the invisible characters are both legitimate and you as a software developer wouldn't want to see them in the source code?
I feel like the threat of this type of thing is really overstated.
Sure the payload is invisible (although tbh im surprised it is. PUA characters usually show up as boxes with hexcodes for me), but the part where you put an "empty" string through eval isn't.
If you are not reviewing your code enough to notice something as non sensical as eval() an empty string, would you really notice the non obfuscated payload either?
Invisible characters, lookalike characters, reversing text order attacks [1].. the only way to use unicode safely seems to be by whitelisting a small subset of it.
And please, everyone arguing the code snippet should never have passed review - do you honestly believe this is the only kind of attack that can exploit invisible characters?
My hot take is that all programming languages should go back to only accepting source code saved in 7-bit ASCII. With perhaps an exception for comments.
The scary part is how invisible this is in code review. Unicode direction overrides and zero-width characters don't show up in most editors by default. Anyone know a solid pre-commit hook config that catches this reliably?
The exact quote is "Thanks for the submission! We have reviewed your report and validated your findings. After internally assessing your report based on factors including the complexity of successfully exploiting the vulnerability, the potential data and information exposure, as well as the systems and users that would be impacted, we have determined that they do not present a significant security risk to be eligible under our rewards structure." The funny thing is, they actually gave me $500 and a lifetime GitHub Pro for the submission.
Yeah it would have been nice to end with "and here's a five-line shell script to check if your project is likely affected". But to their credit, they do have an open-source tool [1], I'm just not willing to install a big blob of JavaScript to look for vulns in my other big blobs of JavaScript
The rule must be very simple: any occurrence of `eval()` should be a BIG RED FLAG. It should be handled like a live bomb, which it is.
Then, any appearance of unprintable characters should also be flagged. There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition. Ideally all such characters should be inserted as \xNNNN escape sequences, and not literal characters.
Simple lint rules would suffice for that, with zero AI involvement.
In this case LLMs were obviously used to dress the code up as more legitimate, adding more human or project relevant noise. It's social engineering, but you leave the tedious bits to an LLM. The sophisticated part is the obscurity in the whole process, not the code.
Yeah, I would have loved to see an example where it was not obvious that there is an exploit. Where it would be possible for a reviewer to actually miss it.
The value of the technique, I suppose, is that it hides a large payload a bit better. The part you can see stinks (a bunch of magic numbers and eval), but I suppose it’s still easier to overlook than a 9000-character line of hexadecimal (if still encoded or even decoded but still encrypted) or stuff mentioning Solana and Russian timezones (I just decoded and decrypted the payload out of curiosity).
But really, it still has to be injected after the fact. Even the most superficial code review should catch it.
So we need a new standard problem due to the complexity of the last standard? Isn't unicode supposed to be a superset of ASCII, which already has control characters like new space, CR, and new lines? xD
I'm not a JS person, but taking the line at face value shouldn't it to nothing? Which, if I understand correctly, should never be merged. Why would you merge no-ops?
I think a "force visible ASCII for files whose names match a specific pattern" mode would be a simple thing to help. (You might be able to use the "encoding" command in the .gitattributes file for this, although I don't know if this would cause errors or warnings to be reported, and it might depend on the implementation.)
That ship has sailed, but I consider Unicode a good thing, yet I consider it problematic to support Unicode in every domain.
I should be able to use Ü as a cursed smiley in text, and many more writing systems supported by Unicode support even more funny things. That's a good thing.
On the other hand, if technical and display file names (to GUI users) were separate, my need for crazy characters in file names, code bases and such are very limited. Lower ASCII for actual file names consumed by technical people is sufficient to me.
In this instance the PR that was merged was from 6 years ago and was clear https://github.com/pedronauck/reworm/pull/28. Looks to me like a force push overwrote the commit that now exists in history since it was done 6y later.
> ... and will always use ASCII for most kind of source code files
Same. And I enforce it. I've got scripts and hooks that enforces source files to only ever be a subset of ASCII (not even all ASCII codes have their place in source code).
Unicode chars strings are perfectly fine in resource files. You can build perfectly i18n/l10n apps and webapps without ever using a single Unicode character in a source file. And if you really do need one, there's indeed ASCII escaping available in many languages.
Some shall complan that their name as "Author: ..." in comments cannot be written properly in ASCII. If I wanted to be facetious I'd say that soon we'll see:
> It baffles me that any maintainer would merge code like the one highlighted in the issue, without knowing what it does.
I don't know if it is relevant in any specific case that is being discussed here, but if the exploit route is via gaining access to the accounts of previously trusted submitters (or otherwise being able to impersonate them) it could be a case of teams with a pile of PRs to review (many of which are the sloppy unverified LLM output that is causing a problem for some popular projects) lets through an update from a trusted source that has been compromised.
It could correctly be argued that this is a problem caused by laziness and corner cutting, but it is still understandable because projects that are essentially run by a volunteer workforce have limited time resources available.
No need to remove them. Just make them visible for applications that don't need to render every language. Make that behavior optional as well in case you really want to name characters with Hangul or Tibetan.
Some middle ground so that you can use greek letters in Julia might be nice as well.
But I don't see any purpose in using the Personal Use Areas (PUA) in programming.
And, yes, there is a circumstance if you want to include Arabic or Hebrew in comments or strings. You need the zero width left-right markers to make that work.
Content exemplifies freedom of expression and opinion through technical analysis and security research. Article publicly communicates findings about malware threats.
FW Ratio: 50%
Observable Facts
Article headline and content communicate security research findings.
Author name and professional affiliation provided, supporting attribution.
Blog platform hosted on company domain with structured publication infrastructure.
Inferences
Technical writing exemplifies freedom to express research findings and professional opinions.
Open blog platform enables expression without apparent editorial restriction.
Attribution model supports accountability essential to meaningful expression.
Content supports freedom of movement and residence by addressing threats (malware) that could restrict developer access to repositories and ecosystems.
FW Ratio: 50%
Observable Facts
Article references GitHub (global service) and npm (global package registry).
No geographic access restrictions or content localization visible.
Inferences
Global platform scope implicitly supports cross-border information access.
Absence of geofencing suggests commitment to unrestricted movement within digital ecosystem.
Content supports right to adequate standard of living and health by addressing security threats that could undermine economic stability and digital wellbeing.
Article frames security research as defense against malicious threats. Implicitly affirms human dignity through protection of digital infrastructure and developer community.
FW Ratio: 50%
Observable Facts
Headline references 'invisible unicode attacks' targeting 'hundreds of repositories'.
Article byline attributes content to named author (Ilyas Makari) with LinkedIn profile link.
Page metadata identifies publisher as 'Aikido Security' with logo and organization schema.
Inferences
Framing security threats as pervasive signals implicit concern for collective digital welfare.
Named attribution suggests editorial accountability consistent with human dignity principles.
Corporate publishing signals institutional commitment to security knowledge sharing.
Content addresses threats to intellectual property (malware targeting code repositories) and defends developer rights to own and control their digital work.
FW Ratio: 60%
Observable Facts
Article title references 'GitHub repositories' as targets, implying protection of code ownership.
Schema markup identifies article as intellectual property of Aikido Security (publisher).
Tracking pixels collect user behavioral data without visible compensation or opt-in.
Inferences
Security research supports developer property rights by identifying threats to repositories.
Tracking infrastructure treats reader data as publisher property without reader consent or compensation.
Content advocates for awareness of malware threats, implicitly supporting equal protection and dignity for all developers regardless of technical expertise.
FW Ratio: 50%
Observable Facts
Article title promises to explain a security threat and its scope.
Page embeds Google Tag Manager and Dalton tracking pixels.
Inferences
Educational framing of security threats supports principle that all humans deserve equal dignity and protection.
Tracking infrastructure treats reader data as commodity, reducing dignity of reader as autonomous individual.
Page loads third-party tracking scripts (Dalton, Google Tag Manager) and sets UTM parameter cookies without explicit first-party consent mechanism visible in provided content. Privacy policy not inspected.
Terms of Service
—
Terms of service not accessible from provided content.
Identity & Mission
Mission
+0.25
Article 3 Article 8 Article 12
Aikido Security positions itself as a security platform protecting digital assets. Mission implicitly supports safety, integrity, and privacy rights.
Editorial Code
—
No editorial code or ethics statement accessible from provided content.
Ownership
—
Corporate entity (Aikido Security) identified in schema, but ownership structure not disclosed in provided content.
Access & Distribution
Access Model
-0.05
Article 27
Content appears freely accessible, but underlying platform likely requires subscription/payment for full feature access. Not determinable from blog article alone.
Ad/Tracking
-0.10
Article 12
Multiple tracking pixels and UTM cookie collection detected; implies behavioral tracking for marketing purposes.
Accessibility
+0.10
Article 2 Article 25
CSS includes antialiasing and responsive design considerations, but no explicit accessibility features (ARIA, alt text for images) visible in provided content.
Accessibility features present (responsive design, monospace font styling) but limited. Tracking infrastructure may burden less-privileged users with slower connections.
Site deploys extensive tracking infrastructure: Google Tag Manager, Dalton analytics, UTM parameter collection. No visible privacy notice or explicit consent mechanism.