Beads – A memory upgrade for your coding agent

106 points by latchkey a day ago

thih9 a day ago

> Agents report that they enjoy working with Beads, and they will use it spontaneously for both recording new work and reasoning about your project in novel ways.

I’m surprised by this wording. I didn’t encounter anyone talking about AI preference yet.

Can a trained LLM develop a preference for a given tool within some context and reliably report on that?

Is “what AI reports enjoying“ aligned with AI’s optimal performance?

Etheryte a day ago

LLM-s also report that they enjoy my questions, in fact they tell me it's a good question literally every time I ask about their weird choices.
- lnenad a day ago
  
  You're absolutely right!
  
  giancarlostoro a day ago
  
  The more infuriating part of that remark is when its due to you pointing out something really dumb, then you ask yourself, why didn't it ask this in its reasoning? lol
skybrian a day ago

Yegge makes stuff up and is known to say controversial things for fun, so I assume it’s trolling. Product pages often have endorsements and adding funny endorsements is an old joke.
But I also can’t rule out that he somehow believes it, which I suppose makes it a good troll.
kissgyorgy a day ago

I think (hope) it's meant to be a joke.
dude250711 a day ago

The author has a vested interest in AI, that is why it's capabilities may be greatly exaggerated/anthropomorphised as typical for LLM start-ups. Proceed with caution.
fnord77 a day ago

the readme.md seems ai generated

wowamit a day ago

I went through the whole readme first and kept wondering what problem the system aims to address. I understood that it is a distributed issue tracker. But how can that lead to a memory upgrade? It also hints at replacing markdown for plans.

So is the issue the format or lack of structure which a local database can bring in?

simonw a day ago

LLMs famously don't have a memory - every time you start a new conversation with the you are effectively resetting them to a blank slate.
Giving them somewhere to jot down notes is a surprisingly effective way of working around this limitation.
The simplest version of this is to let them read and write files. I often tell my coding agents "append things you figure out to notes.md as you are working" - then in future sessions I can tell them to read or search that file.
Beads is a much more structured way of achieving the same thing. I expect it works well partly because LLM training data makes them familiar with the issue/bug tracker style of working already.
- qudat a day ago
  
  I’ve been using beads for a few projects and I find it superior to spec kit or any other form of structured workflow.
  I also find it faster to use. I tell the agent the problem, ask them to write a set of tasks using beads, it creates the tasks and it creates the “depends on” tree structure. Then I tell it to work on one task at a time and require my review before continuing.
  The added benefit is the agent doesn’t need to hold so much context in order to work on the tasks. I can start a new session and tell it to continue the tasks.
  Most of this can work without beads but it’s so easy to use it’s the only spec tool I’ve found that has stuck.
  
  threecheese a day ago
  
  Do you find that it interferes with coding agents’ built-in task management features? I tried beads a few weeks ago and Claude exhibited some strange behavior there. I’ll have to try it again, everything is changing so quickly.
  
  wredcoll a day ago
  
  Is there a good way to use beads without pushing your .beads dir upstream?
  
  simonw a day ago
  
  Add the .beads directory to .gitignore and always make edits on that same machine.
  
  Sammi 19 hours ago
  
  Is there a good way to put it in a separate repo?
- wowamit a day ago
  
  Thanks! It is the structure that matters here, then. Just like you, I ask my agents to keep updating a markdown file locally and use it as a reference during working sessions. This mechanism has worked well for me.
  I even occasionally ask agents to move some learnings back to my Claude.md or Agents.md file.
  I'm curious whether complicating this behaviour with a database integration would further abstract the work in progress. Are we heading down a slippery slope?
- andai a day ago
  
  Using Claude code recently I was quite impressed by the TODO tool. It seemed like such a banal solution to the problem of keeping agents on track. But it works so well and allows even much smaller models to do well on long horizon tasks.
  Even more impressive lately is how good the latest models are without anything keeping them on track!
- Jeff_Brown a day ago
  
  I often have them append to notes, too, but also often ask them to deduplicate those notes, without which they can become quite redundant. Maybe redundancy doesn't matter to the AI because I've got tokens to burn, but it feels like the right thing to do. Particularly because sometimes I read the notes myself.

simonw a day ago

The Beads project uses Beads itself as an issue tracker, which means their issues data is available here as JSONL:

https://github.com/steveyegge/beads/blob/main/.beads/issues....

Here's that file opened in Datasette Lite which makes it easier to read and adds filters for things like issue type and status:

https://lite.datasette.io/?json=https://github.com/steveyegg...

threecheese a day ago

Does link 2 … build the whole thing from scratch, in the browser? Wth Simon, are you coding whilst sleeping these days?
- simonw 18 hours ago
  
  That's a project from a few years ago, it uses Pyodide to run Datasette in Python in WebAssembly: https://simonwillison.net/2022/May/4/datasette-lite/

pbw a day ago

Whether this exact approach catches on or not, it's turning the corner from "teaching AIs to develop using tools that were designed for humans" to "inventing new tools and techniques that are designed specifically for AI use". This makes sense because AIs are not human; they have different strengths and limitations.

thethimble a day ago

Absolutely. The limitations of AI (namely statelessness) require us to rethink our interfaces. It seems like there's going to be a new discipline of "UX for agents" or maybe even just Agent Experience or AX.
Software that has great AX will become significantly more useful in the same way that good UX has been critical.

mbanerjeepalmer a day ago

Makes me wonder whether you can just give agents [Taskwarrior](https://taskwarrior.org/).

Set the TASKDATA to `./.task/`. Then tell the agents to use the task CLI.

The benefit is most LLMs already understand Taskwarrior. They've never heard of Beads.

catketch a day ago

That's mentioned in the beads doc, it could have decent but beads is optimizing for agent use, semantic issue relationships, conflict resolution, etc. I've had success with just using gh issues and agents are pretty good at looking for new issues and closing them when done. I have a couple of toy projects where maintaining the code is basically filing a bug report or feature request.
Also when you say 'never heard of beads' --- it spits ou onboarding text to tell the agent exactly what it needs to know.
Requires a deep dive, but this is an interesting direction for agent tooling

themgt a day ago

It does theoretically look like a useful project. At the same time I'm starting to feel like we're slipping into the Matrix. I check a GitHub issue questioning the architecture.md doc:

> I appreciate that this is a very new project, but what’s missing is an architectural overview of the data model.

Response:

You're right to call me out on this. :)

Then I check the latest commit on architecture.md, which looks like a total rewrite in response to a beads.jsonl issue logged for this.

> JSONL for git: One entity per line means git diffs are readable and merges usually succeed automatically.

Hmm, ok. So readme says:

> .beads/beads.jsonl - Issue data in JSONL format (source of truth, synced via git)

But the beads.jsonl for that commit to fix architecture.md still has the issue to fix architecture.md in the beads.jsonl? So I wonder does that get line get removed now that it's fixed ... so I check master, but now beads.jsonl is gone?

But the readme still references beads.jsonl as source of truth? But there is no beads.jsonl in the dogfooded repo, and there's like ~hundreds of commits in the past few days, so I'm not clear how I'm supposed to understand what's going on with the repo. beads.jsonl is the spoon, but there is no spoon.

I'll check back later, or have my beads-superpowered agent check back for me. Agents report that they enjoy this.

https://github.com/steveyegge/beads/issues/376#issuecomment-...

https://github.com/steveyegge/beads/commit/c3e4172be7b97effa...

https://github.com/steveyegge/beads/tree/main/.beads

rmonvfer a day ago

lmao, agent powered development at its finest.
Reminds me of the guy who recently spammed PRs to the OCaml compiler but this time the script is flipped and all the confusion is self inflicted.
I wonder how long will it take us to see a vibe-coded, slop covered OS or database or whatever (I guess the “braveness” of these slop creators will (is?) be directly proportional to the quality of the SOTA coding LLMs).
Do we have a term for this yet? I mean the person, not the product (slop)
- andai a day ago
  
  Slorchestrator.

simonw a day ago

There are a ton of interesting ideas in the README - things like the way it uses the birthday paradox to decide when to increase the length of the hash IDs.

This tool works by storing JSONL in a .beads/ folder. I wonder if it could work using a separate initially-empty "beads" branch for this data instead? That way the beads data (with its noisy commit history) could travel with the repository without adding a ton of noise to the main branch history.

The downside of that is that you wouldn't be able to branch the .beads/ data or keep it synchronized with main on a per-commit basis. I haven't figured out if that would break the system.

wowamit a day ago

The way I read it is beads steers agents to make use of the .beads/ folder to stay in sync across machines. So, my understanding is a dedicated branch for beads data will break the system.
- simonw a day ago
  
  But wouldn't that dedicated branch, pushed to origin, also work for staying synced across multiple machines?
  
  amonks a day ago
  
  Depends what you mean by “synced”—do you want your beads state to be coupled with commits (eg: checking out an old commit also shows you the beads state at that snapshot)? Using a separate branch would decouple this. I think the coupling is a nice feature, but it isn’t a feature that other bug trackers have, so using a separate branch would make beads more like other bugtrackers. If you see the coupling as noise, though, then it sounds like that is what you want.
  
  wowamit a day ago
  
  The way I understand this, when the agent runs `bd onboard` at startup, it gets the instructions from beads, which might refer to data files in the beads directory. Keeping them in sync via a separate branch would be an unnecessary overhead. Right?
  
  simonw a day ago
  
  I don't see it as extra overhead - it just changes the git one-liner they use to push and pull their issue tracking content by a few characters.
  I like the idea of keeping potentially noisy changes out of my main branch history, since I look at that all the time.
  
  wowamit a day ago
  
  You are right. I dug through the document some more. The setup, as mentioned for protected branches [1], should ideally work without much overhead. It does suggest merging back to main, but the FAQ also mentions that the frequency can be decided individually.
  [1] https://github.com/steveyegge/beads/blob/main/docs/PROTECTED...

CuriouslyC a day ago

I don't understand the point of this project. We already have github/gitlab for tasks, and if you want to query the history of a chat just stuff the spans in otel.

Sammi 18 hours ago

They are made for / tuned for humans. This is tuned for LLMs.

_joel a day ago

I use gh cli to make and track issues on the repo's issue tracker, create and reference the issue in the PR. I use Claude normally, so have Gemini and Codex that sit as automated reviewers (github apps), then get Claude to review the comments. Rinse and repeat. Works quite well and catches some major issues. Reading the PR's yourself (at least skimming them for sanity) is still vital.

pradeeproark a day ago

Ha, I was working on the same problem and updating my article when this hit. My focus is on making the agent integration more seamless with the tool. Claude offers a fantastic way for this using "skills" and now a "marketplace"

[1] Demo with Claude - https://pradeeproark.github.io/pensieve/demos/

[2] Article about it - https://pradeeproark.com/posts/agentic-scratch-memory-using-...

[3] https://github.com/cittamaya/cittamaya - Claude Code Skills Marketplace for Pensieve

[4] https://claude.com/blog/skills

stingraycharles a day ago

If there’s any type of memory upgrade for a coding agent I would want, it’s the ability to integrate a RAG into the context.

The information being available is not the problem; the agent not realizing that it doesn’t have all the info is, though. If you put it behind an MCP server, it becomes a matter of ensuring the agent will invoke the MCP at the right moment, which is a whole challenge in itself.

Are there any coding agents out there that enable you to plug middleware in there? I’ve been thinking about MITM’ing Claude Code for this, but wouldn’t mind exploring alternative options.

simonw a day ago

What do you mean by a RAG here?
I've been having a ton of success just from letting them use their default grep-style search tools.
I have a folder called ~/dev/ with several hundred git projects checked out, and I'll tell Claude Code things like "search in ~/dev/ for relevant examples and documentation".
(I'd actually classify what I'm doing there as RAG already.)
- qudat a day ago
  
  I do the same thing for libraries I’m using in project. It’s a huge power up for code agents.
  Like you mentioned, agents are insanely good at grep. So much so that I’ve been trying to figure out how to create an llmgrep tool because it’s so good at it. Like, I want to learn how to be that good at grep, hah.
- stingraycharles a day ago
  
  What I mean is basically looking at the last (few) messages in the context, translating that to a RAG query, query your embeddings database + BM25 lookup if desired, and if you find something relevant inject that right before the last message in the context.
  It’s pretty common in a lot of agents, but I don’t see a way to do that with Claude Code.
  
  UmGuys a day ago
  
  I'm not familiar with Claude's architecture, but I'd be surprised if it doesn't index your codebase for semantic search with the explore feature it has. How else would they find context? They already have a semantic search tool -- which is rag.
  
  simonw a day ago
  
  Claude Code doesn't do anything with semantic search or embeddings out of the box. They use a simple grep tool instead.
  Neither does OpenAI's Codex CLI - you can confirm that by looking at the source code https://github.com/openai/codex
  Cursor and Windsurf both use semantic search via embeddings.
  You can get semantic search in Claude Code using this unofficial plugin: https://github.com/zilliztech/claude-context - it's built by and uses a managed vector database called Zilliz Cloud.
  
  UmGuys a day ago
  
  That's shocking to me. Although it does make sense from a UX perspective as indexing can take minutes depending on the setup.
  
  stingraycharles 17 hours ago
  
  It’s surprisingly fast to generate embeddings. I don’t think it’s a UX issue as much as it’s that Anthropic themselves don’t offer any embeddings API (they only have one internally, but publicly recommend Cohere).
  They do use RAGs a lot for their desktop app, their projects implementation make a lot of use of it.

losvedir a day ago

Is this that Steve Yegge? A former Googler/Amazon guy with long interesting rants? I don't even remember what about anymore, but I liked to read him back in the day.

Quiark an hour ago

yep lol one of the rants was how strong typing sucks and slows everything down another one about Haskell
rolisz a day ago

Yes. And he just published a book on vibe coding last month.

iand675 a day ago

I've been trying `beads` out for some projects, in tandem with https://github.com/github/spec-kit with pretty good results.

I set up spec-kit first, then updated its templates to tell it to use beads to track features and all that instead of writing markdown files. If nothing else, this is a quality-of-life improvement for me, because recent LLMs seem to have an intense penchant to try to write one or more markdown files per large task. Ending up with loads of markdown poop feels like the new `.DS_Store`, but harder to `.gitignore` because they'll name files whatever floats their boat.

vidarh a day ago

I usually just use a commit agent that has as one of its instructions to review various aspects of the prospective commit, including telling it to consolidate any documentation and remove documentation of completed work except where it should be rolled into lasting documentation of architecture or features. I've not rolled it out in all my projects yet, but for the ones I do, it's gotten rid of the excess files.
adamgordonbell a day ago

I've found it pretty useful as well. It doesn't compete with gh issues as much as it competes with markdown specs.
It's helpful for getting Claude code to work with tasks that will span multiple context windows.
hmokiguess a day ago

First I hear of spec-kit, that looks very promising, I’m interested in trying it. My approach is to combine beads with superpowers skills https://github.com/obra/superpowers I’m wondering how does it compare to this, gonna give it a try, thanks!

iddan a day ago

Cool stuff. The readme is pretty lengthy so it was a little hard to identify what is the core problem this tool is aiming to solve and how is it tackling it differently than the present solutions.

mimischi a day ago

A classic issue of AI generated READMEs. Never to the point, always repetitive and verbose
- cube2222 a day ago
  
  Funnily, AI already knows what stereotypical AI sounds like, so when I tell Claude to write a README but "make it not sounds like AI, no buzzwords, to the point, no repetition, but also don't overdo it, keep it natural" it does a very decent job.
  Actually drastically improves any kind of writing by AI, even if just for my own consumption.
- SwellJoe a day ago
  
  I'm not saying it is or isn't written by an LLM, but, Yegge writes a lot and usually well. It somehow seems unlikely he'd outsource the front page to AI, even if he's a regular user of AI for coding and code docs.
- kieckerjan a day ago
  
  And full of marketing hyperbole. When I have an AI produce a README I always have to ask it to tone it down and keep it factual.
vdm a day ago

https://steve-yegge.medium.com/beads-best-practices-2db636b9...
zaphirplane a day ago

This looks like a ticketing cli
- simonw a day ago
  
  That's exactly what this is, but it's one that's designed with coding agents in mind as its principle users.

aschearer a day ago

Neat! I am working on something similar and arriving at similar conclusions. eg sqlite local index. I am not ready to give up human authoring, though. How do tackle the quality gate problem and conformance? For programmatic checks like linting it’s reasonably clear but what about checks that require intelligence?

frodo76 a day ago

Could you do the same thing with your real issue tracking software? Your agent could use an MCP to create a Jira ticket and create subtasks or tasks for your subagents? Then you don't need to clutter up your repo with these MD files and .beads directories and what not.

simonw a day ago

Yes you can. I've experimented a bit with using the `gh` CLI tool to work with issues in a GitHub repository, but I don't particularly like the aesthetics of having a bunch of LLM-generated prose in my issue trackers like that.

igor47 a day ago

I've been trialing jj as my vcs on my latest project, but I guess this only supports git? Anyone using this with jujutsu?

steveklabnik a day ago

It works fine with jj. I have a line in my Claude.md to tell it to make sure to close before committing, and I don’t use the hooks that are provided.

jauntywundrkind a day ago

Somewhat aside but I love the data architrcture. Jsonl lines checked into git, and a sqlite local cache. Auto-sync to replicate changes between the canonical and cache. https://github.com/steveyegge/beads?tab=readme-ov-file#the-m...

I finally started digging in to OpenCode for real these past couple weeks. It has a planning mode, which nicely builds out a plan on text chat as usual, but also a right pain on the TUI builds out a Todo list, which has been really nice. I often give it the go-ahead to do the next item or two or three. I've wondered how this is implemented, how OpenCode sets up and picks up on this structuring.

Beads formalizing that a bit more is tempting. I also deeply deeply enjoy that Beads is checked in. With both Aider and OpenCode, there's a nice history. But it's typically not checked in. OpenCode 's history in particular isnt even kept in the project directory, and can be quite complex with multiple sessions and multiple agents all flying around. Beads, as a strategy to record the work & understand it better, is also very tempting.

Would love to see deeper OpenCode + Beads integration.