Hi there,
This week the underlying theme was harnesses, not models. The single biggest story is a CLI agent harness that just became the fastest repo in history past 100k stars. Around it, a whole vocabulary is forming: workflow layers on top of Codex, “harness engineering” as a discipline, screen recording reframed as a developer-grade tool, and a real attempt at killing the vector database for retrieval. The model layer is starting to feel almost boring compared to what is happening one floor up.
📃 In this Monday Morning Mashup:
⭐Highlight: Claw Code blows past 186k stars and pulls a whole CLI workflow layer with it
🤖AI: Qwen3.5 distillations make local “Claude-class” reasoning genuinely practical
🔧Tools: Recordly is the open source Screen Studio that actually feels finished
💽Data: Sirchmunk argues vector databases were the wrong abstraction all along
Have a great week!
⭐Highlight: Claw Code is the fastest repo in history past 100k stars, and it’s mostly an opinionated harness
ultraworkers/claw-code is the public Rust implementation of the “claw” CLI agent harness, and it is sitting at roughly 186k stars - the fastest repository in GitHub’s history to cross 100k. The repo describes itself as a build-from-source workspace centered on a single claw binary that wraps the Anthropic API, with a parity-tracker against an upstream agent harness, a container-first workflow, and a strict claw doctor health check. There is a fairly stern README warning that cargo install claw-code from crates.io is a deprecated stub and should not be used.
What makes the explosion in popularity interesting is that the actual model behavior is unchanged - the value sits entirely in the harness layer. The repo even calls itself out as “Built in Rust using oh-my-codex”, referring to Yeachan-Heo/oh-my-codex, a 24k star workflow layer for the OpenAI Codex CLI that adds canonical skills, structured plans in a .omx/ folder, and parallel agent teams via commands like $deep-interview, $ralplan, and $team. Together they are a strong signal that “agent” is increasingly synonymous with “harness”, and that the differentiation is moving up the stack.

The Rust source-of-truth implementation of the claw CLI agent harness, with a build-from-source workspace, container workflow, parity tracker, and the claw doctor first-time health check.
🤖AI: Qwen3.5 distillations make “local Claude” sound a lot less like marketing
Two posts this week landed on the same idea from different angles. 0xSero highlighted a 20% compression of Qwen3.5-35B that drops average benchmark performance by only ~1%, allowing it to fit in 4-bit on a single 24GB card - basically anything from a used 3090 upward. Meanwhile, the open community pushed out a 27B Qwen3.5 variant explicitly distilled on Claude 4.6 Opus reasoning traces and reported it beating Claude Sonnet 4.5 on SWE-bench in 4-bit, again on a 16GB local GPU.
You should always discount these kinds of claims a bit until independent reproductions land, but the trend is hard to ignore. The window where you needed an API subscription to get serious coding-grade reasoning is closing fast. If your workflow can tolerate a slightly slower local model in exchange for not paying per token and not shipping your code to a frontier lab, the trade is starting to look genuinely competitive instead of merely “good for a local model”.
Qwen3.5-35B compressed 20% with ~1% average drop
A short writeup on a Qwen3.5-35B compression that fits in 4-bit on a single 24GB GPU, useful as a baseline for what “local frontier” looks like right now.
🔧Tools: Recordly is the open source Screen Studio that actually ships
webadderall/Recordly is the most polished open source screen recorder I have seen in a long time - around 9.2k stars and feature-complete enough that it competes seriously with paid tools like Screen Studio. It does the things developers actually want: automatic zoom suggestions on activity, smooth cursor polish, motion effects, styled frames with wallpapers and gradients, dynamic webcam bubble overlays, and a drag-and-drop demo timeline. macOS gets ScreenCaptureKit, Windows gets the native Windows Graphics Capture path with WASAPI audio, Linux runs through Electron capture.
This matters more than yet-another-recorder usually does. Most software now grows its userbase through demo videos and clips, and the production gap between “screen recording” and “polished demo” has been quietly priced at roughly $90/year by closed-source incumbents. Recordly closing that gap as a free local app changes the default for every indie dev, founder, or open source maintainer who needs to ship a smooth ten-second clip on Friday afternoon.

An open source desktop app for polished screen recordings, with automatic zooms, cursor polish, styled frames, and webcam overlays across macOS, Windows, and Linux.
💽Data: Sirchmunk argues that the vector DB era was a detour
Of all the retrieval projects this week, modelscope/sirchmunk from the Modelscope team is the most pointed. The premise is that vector-based retrieval pipelines are “rigid and brittle” - they require expensive embeddings, slow indexing, are blind to real-time changes, and trade fidelity for compression. Sirchmunk skips the embedding store entirely and works directly on raw files using agentic search, knowledge clustering, and Monte Carlo evidence sampling, with built-in MCP server support and a web UI.
The “self-evolving” framing is a bit grand, but the underlying claim is reasonable: a lot of “memory” pipelines today are accidentally turning every workflow into a stale snapshot. If your data is actually a stream - chats, repos, documents in flight - then Sirchmunk’s argument that an LLM-driven, indexless approach is closer to the right shape is at least worth taking seriously. Combined with last week’s recommendation to layer markdown, structured retrieval, and SQLite, it suggests the agent memory conversation is starting to mature past “just throw everything into pgvector”.

An embedding-free, agentic search and self-evolving knowledge framework that operates on raw data instead of vector indexes, with a web UI and MCP server out of the box.
⚡Quick Hits
Yeachan-Heo/oh-my-codex - The 24k star workflow layer for the OpenAI Codex CLI mentioned above. Adds canonical skills, structured plans in a .omx/ folder, parallel agent teams, and a doctor-style smoke test, all while keeping Codex itself as the execution engine.
github.com
aiming-lab/AutoHarness - A small but interesting framework that wraps any OpenAI client with an “aha-moment” governance pipeline: tool governance, cost tracking, session persistence, risk pattern matching, and a YAML-based constitution. Two-line install, Python 3.10+. The framing - “agent = model + harness” - is the clearest articulation of the trend this week.
github.com
kevinrgu/autoagent - Another entry in the same harness-engineering bucket, focused more narrowly on autonomous, end-to-end harness construction for agents.
github.com
Fine-tune Gemma 4 locally on 8GB VRAM, with bug fixes - Unsloth shipped notebooks that fine-tune Gemma 4 E2B in 8GB of VRAM at ~1.5x the speed of FA2 setups, plus fixes for the gradient-accumulation loss explosion, an inference IndexError on the 26B and 31B variants, and the float16 audio overflow. If you have been parking Gemma 4 because of training instability, this is the week to revisit it.
reddit.com
A “performance skill” for coding agents - Pointed at any project, claims to come back an hour later with significant isomorphic performance improvements by mechanically applying classic LeetCode and IOI tricks. Even discounted heavily, this is a useful frame: skills as repeatable specialist passes over a codebase.
x.com
flash-moe: 397B MoE on a laptop - One of those “this should not work” repos that runs a 397B-parameter mixture-of-experts model on consumer hardware via aggressive expert offloading. Worth tracking even if you do not need it today.
x.com
Have a great week!