Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding (deep-reinforce.com)
37 points by kordlessagain 1 day ago | hide | past | favorite | 6 comments
 help



I added this to a benchmark I've been doing of how well agents find security bugs, specifically security bugs originally found by Mythos. It performs poorly with only read/grep/ls tools, but in a follow-up test with a full shell and Python, it doubled its findings (still a poor showing, but it does at least indicate it is doing what it says on the tin: making tools to help it solve problems). It also did worse than Qwen AgentWorld, another recent post-train of Qwen 3.6 MoE intended for agentic use.

https://swelljoe.com/post/will-it-mythos/


Good to know. Thanks for the research!

Instead of training the model to directly answer questions we trained the model to always write and execute the code that would solve the question ?

If that is the case, this isn't just a fancy way to perform prompt optimization?


I'd have expected this to get more HN attention. Qwen 3.6 35B capability in a 9B model is a bonkers claim.

It looks like they're comparing Orinth 9B to Qwen 3.5 35B, not Qwen 3.6. I guess it kind of makes sense since it's a finetune of 3.5, but I totally missed until I looked closely.

In my brief tests, Ornith 35B performed quite well. It won't replace DeepSeek V4 Flash for me, but if it was fast and cheap enough it might.

I don't remember being super impressed with Ornith 9B, but I could see it being on par with Qwen 3.5 35B.


I thought so too when I read the headline but I expect it's basically Qwen3.5-9B



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: