shuntaro-okuma — HRO

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

shuntaro-okuma 1 karma 27d on HN HN profile →

I am a Software engineer interested in LLM evaluation, prompt engineering, and infrastructure. I build tools that reveal hidden structures in AI systems. - AdaptGauge: LLM adaptation efficiency - Chatbot Benchmark: multi-turn benchmarking - Local Sidekick: Privacy-first attention tracking

Coverage

We've seen 3 of ~3 submissions

Full eval: 0 Lite-only: 0 Unevaluated: 3

3 stories

1.		Show HN: Tested 12 LLMs with few-shot examples
		I evaluated 12 models (6 cloud, 6 local) across 5 tasks at shot counts 0, 1, 2, 4, and 8, with 3 tri...
		2 points by shuntaro-okuma 4 days ago \| 0 comments \| skipped
2.		Show HN: ConvoProbe – Multi-turn scenario testing for Dify chatbots
		I've been building chatbots with Dify and kept hitting the same problem: single-turn Q&A te...
		1 points by shuntaro-okuma 10 days ago \| \| skipped
3.		Show HN: AdaptGauge – I found that adding few-shot examples can make LLMs worse (github.com)
		1 points by shuntaro-okuma 30 days ago \| 0 comments \| skipped

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-16 02:03:38 UTC