Rendered at 04:38:28 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
naultic 3 hours ago [-]
I'm working on something a little similar but mines more a dev tool vs process automation but I love where yours is headed. The biggest issue I've run into is handling retries with agents. My current solution is I have them set checkpoints so they can revert easily and when they can't make an edit or they can't get a test passing, they just restart from earlier state. Problem is this uses up lots of tokens on retries how did you handle this issue in your app?
jawiggins 3 hours ago [-]
Generally I've found agents are capable of self correcting as long as they can bash up against a guardrail and see the errors. So in optio the agent is resumed and told to fix any CI failures or fix review feedback.
denysvitali 4 hours ago [-]
FWIW, a "cheaper" version of this is triggering Claude via GitHub Actions and `@claude`ing your agents like that. If you run your CI on Kubernets (ARC), it sounds pretty much the same
MrDarcy 5 hours ago [-]
Looks cool, congrats on the launch. Is there any sandbox isolation from the k8s platform layer? Wondering if this is suitable for multiple tenants or customers.
jawiggins 5 hours ago [-]
Oh good question, I haven't thought deeply about this.
Right now nothing special happens, so claude/codex can access their normal tools and make web calls. I suppose that also means they could figure out they're running in a k8s pod and do service discovery and start calling things.
What kind of features would you be interested in seeing around this? Maybe a toggle to disable internet connections or other connections outside of the container?
antihero 5 hours ago [-]
And what stops it making total garbage that wrecks your codebase?
jawiggins 5 hours ago [-]
There are a few things:
a) you can create CI/build checks that run in github and the agents will make sure pass before it merges anything
b) you can configure a review agent with any prompt you'd like to make sure any specific rules you have are followed
c) you can disable all the auto-merge settings and review all the agent code yourself if you'd like.
kristjansson 4 hours ago [-]
> to make sure
you've really got to be careful with absolute language like this in reference to LLMs. A review agent provides no guarantees whatsoever, just shifts the distribution of acceptable responses, hopefully in a direction the user prefers.
jawiggins 4 hours ago [-]
Fair, it's something like a semantic enforcement rather than a hard one. I think current AI agents are good enough that if you tell it, "Review this PR and request changes anytime a user uses a variable name that is a color", it will do a pretty good job. But for complex things I can still see them falling short.
SR2Z 1 hours ago [-]
I mean, having unit tests and not allowing PRs in unless they all pass is pretty easy (or requiring human review to remove a test!).
A software engineer takes a spec which "shifts the distribution of acceptable responses" for their output. If they're 100% accurate (snort), how good does an LLM have to be for you to accept its review as reasonable?
upupupandaway 5 hours ago [-]
Ticket -> PR -> Deployment -> Incident
abybaddi009 2 hours ago [-]
Does this support skills and MCP?
conception 4 hours ago [-]
What’s the most complicated, finished project you’ve done with this?
jawiggins 4 hours ago [-]
Recently I used to to finish up my re-implementation of curl/libcurl in rust (https://news.ycombinator.com/item?id=47490735). At first I started by trying to have a single claude code session run in an iterative loop, but eventually I found it was way to slow.
I started tasking subagents for each remaining chunk of work, and then found I was really just repeating the need for a normal sprint tasking cycle but where subagents completed the tasks with the unit tests as exit criteria. So optio came to my mind, where I asked an agent to run the test suite, see what was failing, and make tickets for each group of remaining failures. Then I use optio to manage instances of agents working on and closing out each ticket.
hmokiguess 4 hours ago [-]
the misaligned columns in the claude made ASCII diagrams on the README really throw me off, why not fix them?
|
|
|
|
jawiggins 4 hours ago [-]
Should be fixed now :)
rafaelbcs 5 hours ago [-]
[dead]
QubridAI 5 hours ago [-]
[flagged]
knollimar 5 hours ago [-]
I don't want to accuse you of being an LLM but geez this sounds like satire
Right now nothing special happens, so claude/codex can access their normal tools and make web calls. I suppose that also means they could figure out they're running in a k8s pod and do service discovery and start calling things.
What kind of features would you be interested in seeing around this? Maybe a toggle to disable internet connections or other connections outside of the container?
a) you can create CI/build checks that run in github and the agents will make sure pass before it merges anything
b) you can configure a review agent with any prompt you'd like to make sure any specific rules you have are followed
c) you can disable all the auto-merge settings and review all the agent code yourself if you'd like.
you've really got to be careful with absolute language like this in reference to LLMs. A review agent provides no guarantees whatsoever, just shifts the distribution of acceptable responses, hopefully in a direction the user prefers.
A software engineer takes a spec which "shifts the distribution of acceptable responses" for their output. If they're 100% accurate (snort), how good does an LLM have to be for you to accept its review as reasonable?
I started tasking subagents for each remaining chunk of work, and then found I was really just repeating the need for a normal sprint tasking cycle but where subagents completed the tasks with the unit tests as exit criteria. So optio came to my mind, where I asked an agent to run the test suite, see what was failing, and make tickets for each group of remaining failures. Then I use optio to manage instances of agents working on and closing out each ticket.
| | | |