shuntaro-okuma 1 karma 27d on HN HN profile →
I am a Software engineer interested in LLM evaluation, prompt engineering, and infrastructure. I build tools that reveal hidden structures in AI systems. - AdaptGauge: LLM adaptation efficiency - Chatbot Benchmark: multi-turn benchmarking - Local Sidekick: Privacy-first attention tracking
Coverage
We've seen 3 of ~3 submissions
Full eval: 0 Lite-only: 0 Unevaluated: 3
3 stories
1. Show HN: Tested 12 LLMs with few-shot examples
I evaluated 12 models (6 cloud, 6 local) across 5 tasks at shot counts 0, 1, 2, 4, and 8, with 3 tri...
2 points by shuntaro-okuma 4 days ago | 0 comments | skipped
2. Show HN: ConvoProbe – Multi-turn scenario testing for Dify chatbots
I've been building chatbots with Dify and kept hitting the same problem: single-turn Q&A te...
1 points by shuntaro-okuma 10 days ago | | skipped
3. Show HN: AdaptGauge – I found that adding few-shot examples can make LLMs worse (github.com)
1 points by shuntaro-okuma 30 days ago | 0 comments | skipped