It's pretty safe to say that AI will be used on the battlefield making real life and death decisions before it will be able to render a decent pelican on a bike in SVG.
I read the article and it doesn’t say it was used for targeting or prioritizing?
> Neither Claude nor any other LLMs detects targets, processes radar, fuses sensor data or pairs weapons to targets. LLMs are late additions to Palantir’s ecosystem. In late 2024, years after the core system was operational, Palantir added an LLM layer – this is where Claude sits – that lets analysts search and summarise intelligence reports in plain English
There’s a lot of humans in that loop who make those decisions.
Yeah militaries don't use commercial chatbots for that, they have their own machine learning implementations. Look into Project Maven for example.
And while there are still humans in the loop, the impression I get is that this is increasingly becoming meaningless, from the way they talk about optimizing the "kill chain" and letting small teams make hundreds of targeting decisions per hour.
> The paradigm shift has already begun. Despite the row, Anthropic’s Claude has reportedly facilitated the massive and intensifying offensive which has already killed an estimated thousand-plus civilians in Iran. This is an era of bombing “quicker than the speed of thought”, experts told the Guardian this week, with AI identifying and prioritising targets, recommending weaponry and evaluating legal grounds for a strike.
I think it's beyond decent. I don't understand how people are not more impressed by this. Just a few years ago the only expectation would be garbled nonsense.
"shift left" on the battlefield. break down those silos. if you have to ask for permission it's already too late. remember the goal. find the bottlenecks in your system and remove them.
In many battlefield scenarios, there is more than one "somebody" on it. The "somebody" that you kill might not be the "somebody" that you intended to kill.
Depending on the how pelicans are created, it is entirely possible to indirectly kill "somebody" due to the externalised costs of global warming etc.
Haha, yeah. I tried for it to create a SVG with scissors and it was hopelessly overwhelmed. I think at least the SVG design niche will be safe a little while longer
Maybe all along what mattered most to them was making good software that people love, not the day to day part of writing code.
Now it’s the industry they’ve always wanted, and less the industry of people who wanted to get paid to write code.
Software engineers who never cared about the higher level product design aspect are finding themselves in the wrong industry. It’s dismal.
No, the handlebar is wrong. The handle bar is rotating the frame instead of rotating the front wheel. The handle bar should be mounted on the same line as the front wheel is.
I bet someone shares this link every time you post about bicycles, but since I didn't see anyone share it yet in this thread, I'll take the opportunity to do so:
On a new model release, you can guarantee two things are in the replies to Simon. One is your link, the other is "surely the models are being trained on this now"
Sure, but no one is trying to force art from most people into about every area in the economy where anyone ever pays for something visual. If you asked professional artists to draw a realistic bicycle, I'm guessing few of them would try to just randomly guess what the mechanical parts looked like
But if you need to draw a bicycle, you wouldn’t pick a random person in the street. You would hire an artist and you’d be guaranteed to have at least a believable one if not a perfect rendering.
No guarantees is why LLM is akin to gambling. Every new context is essentially picking someone out of the crowd.
As an aside, some of the renders have only a single side connection to the wheel and that is a valid bike design, the Cannondale Lefty front fork only has a left leg:
Simon, is your pelican test really captures differences among models or should you at least try like 10 times or something to average the random effects
I've been meaning to do a "run 3 times and pick the best" version for quite a while, I should really pull the trigger on that one. Currently it's one-shot only.
The vast majority (if not all) of these make it impossible to turn, among other fun things. Only out of curiosity, have you tried prompting further with how a bike must operate to see if it does the right thing?
Sadly I think the correlation between this benchmark and performance is starting to break down imo. Still a legendary idea will be remembered and ingrained in the models forever haha
I find the most miraculous thing about 4.7 to be that the pelican is facing left, wonder why the right facing everything is so ubiquitous in these images.
This happened to me in elementary school. We were doing fingerpaintings using plasticine. After all the bikes were hung on the wall, mine was racing the other way... Somehow it really stuck with me.
Lends credence to my vibe-based assertion that GPT-5.5 > Opus 4.7 (and now 4.8), which is why I've cancelled my Claude plan. Opus 4.8 is them seeing it reflected in their own numbers and having to pull stopgap measures to avoid falling behind while they embargo Mythos.
https://gist.github.com/simonw/68560eddb0b268a8417f80ceb7304...
The high one is notably better - the bicycle frame is the correct shape, unlike thinking level low.
For comparison, here's Opus 4.7: https://gist.github.com/simonw/afcb19addf3f38eb1996e1ebe749c...