# The Software Ops Agent Framework — Keepstone

> The Software Ops Agent Framework is Keepstone's agent-native operating system for running small-business software with the discipline of a real engineering org.

Source: https://keepstone.tech/framework
Last modified: 2026-04-24

---

Software Ops · core IP

# The _Software Ops Agent Framework._

The operating system behind Software Ops. A set of agent-native layers — wrapped around every system we run — that detect, diagnose, remediate, ship code, and keep things documented. Built by operators who are all-in on this ecosystem and shipping into it at the pace the industry moves, not the pace a vendor roadmap allows.

_◆_ always-on _◆_ self-healing _◆_ standardized

SOFTWARE OPS · framework 8 layers

01 · Thesis

## A real operating system for the business with one system to run.

Serious engineering orgs have rigorous operations. Most small businesses running custom software have none of it — and the old way of bringing it in (hire a team) no longer makes sense. Our framework is an agent-native operating model where subagents, skills, and tools do most of the work and operators focus on the handful of decisions that need a named human behind them.

i

Agent-native

### Subagents, skills, and tools — not scripts.

Composable subagents with scoped skills and [MCP tool](/glossary#mcp-tool) access. They detect, diagnose, remediate, ship code, and keep documentation current. With the right guardrails this outperforms humans doing the same work by hand — and it's why SMB pricing is even possible.

ii

Standardized

### Every account, same elements.

Your system gets made [framework-compatible](/glossary#framework-compatible) before it goes into Ops. From there every platform we run looks the same to our team — which means your system never depends on one operator who happened to learn its quirks.

iii

Portable

### You own your system and your work product.

The hardened app, your source code, documentation, credentials, observability dashboards, and deploy pipelines are yours — forever. The framework itself is ours, and travels with us to every account we run. Your system will continue to run without it; you just won't have us running it.

02 · Inside the framework

## The layers that run every Software Ops account.

Each layer is a set of tools, conventions, and — where useful — agents. Fragility in a system almost always maps to one or two of these that were never set up. Hardening is the work of bringing them in.

layer_01_

### Engineering Workflow

Standardizes how changes get made. Source control, branching model, test and QA expectations, AI-led coding with strong guardrails. The floor: no change ships without being reviewable and revertible — agent or operator.

agent-led

layer_02_

### Infrastructure & DevOps

The production environment itself — and how changes get there safely. Infrastructure as code, separated environments, secrets management, reversible deploys, staged rollouts. Most "it's broken in prod" calls trace back to something never set up at this layer.

automated

layer_03_

### Observability

Makes systems visible and diagnosable. Uptime and synthetic checks, structured logs, error tracking, p95 latency, queue depth, cost metering. Standardized health monitoring across whatever stack your tool happens to be on.

agent-watched

layer_04_

### Triage & Support

Turns noise into tickets and tickets into decisions. Issue intake, defect-vs-enhancement classification, durable history, handoff. Triage subagents route automatically and front-line support agents resolve common questions — operators see only what actually needs their judgment.

agent-led

layer_05_

### Documentation & Training

Reduces key-person dependency — and feeds every other layer. Architecture diagrams, data maps, onboarding, provenance, user guides, training material. Docs-sync subagents keep everything current against the code, and the same corpus grounds agentic support — so answers to "how do I..." come from the same source of truth the engineers use.

agent-maintained

layer_06_

### Security

Keeps the system defensible. Identity and access, secrets hygiene, dependency scanning, patch management, audit trails, anomaly detection. Agents watch for the obvious — drift, exposed keys, dependency CVEs — and escalate the rest.

agent-watched

layer_07_

### Business Continuity & DR

Plans for the days nothing is going right. Backup strategy with tested restores, RPO/RTO targets matched to the business, documented failover, vendor-outage playbooks, cross-region contingencies where it matters. Backups you haven't restored from aren't backups.

agent-verified

layer_08_

### Governance

Decides fit, enforces boundaries, defines required hardening, keeps systems inside operable conditions. The one layer that stays firmly in operator hands — because fit and scope are judgment calls that need a named human behind them.

operator

03 · Why it's the moat

## What the framework means for you, in plain terms.

### What you get that other vendors can't match

*   →A named operator backed by an always-on agent fleet — triage, diagnosis, and often the fix before anyone picks up the phone.
*   →Entire classes of issues detected, fixed, and PR'd automatically under guardrails — no humans queuing up to rubber-stamp work the framework already got right.
*   →Standardized observability across whatever stack your system was built on — Lovable, Replit, Bolt, Supabase, custom — so it's no harder to operate than any other.
*   →Documentation the framework itself keeps current, so losing a specific engineer never costs you tribal knowledge.
*   →A governance layer that says no — to enhancement-creep, to risky changes, to work that should be scoped separately. You pay a predictable monthly number; we make it work.

### What it replaces

*   —Hiring an internal engineer (~$180k fully loaded, takes six months to onboard, can't cover nights).
*   —A freelance retainer that disappears when the freelancer gets a bigger client.
*   —An MSP whose stack is last-decade desktop, not this-decade AI-assisted development.
*   —An offshore dev shop where every ticket is "someone will look at it tomorrow."
*   —Learning, personally, how to operate software.

04 · Framework maintenance

## It's a product, not a pile of scripts.

We version it. We improve it. Every account benefits from what we learn on every other account.

Agent updates

#### New subagents, skills, and tool integrations ship weekly — sometimes daily.

We move at the pace of the AI ecosystem, not a vendor release calendar. Every Software Ops account runs on the current capability set. When something new lands — a better model, a new [MCP tool](/glossary#mcp-tool), a smarter diagnostic subagent — it's rolled out across your tool inside the month, not the quarter.

_weekly_

Subagent & skill library

#### Scoped subagents and skills for Lovable, Replit, Bolt, Supabase, Vercel, AWS primitives, Postgres, common AI providers.

Every platform we run on has a dedicated set of subagents with skills tuned to that stack. When something new shows up, we write the skill — and every future account benefits on day one.

_growing_

Framework health

#### We measure how well it's running — and report back.

Framework-compatible account percentage, complete-provenance-doc percentage, standardized-observability coverage, agent-action volume, escalation rate. The quarterly value report shows the framework working on your system, specifically.

_measured_

Honesty

## Where we stand on _AI._

We don't think AI is "just tooling." We don't believe it needs a human reviewing every decision, sanity-checking every output, standing by for the inevitable mistake. That framing comes from firms that haven't actually operated agents at depth — and it produces economics that can only work at enterprise prices.

With the right frameworks, guardrails, and tooling, agents vastly exceed what a human operator can do alone — in throughput, in consistency, and increasingly in judgment on the bounded problems they're scoped to. Our job is to pick those scopes well, build the guardrails tight, and keep shipping into the ecosystem as it matures.

Operators are for the work that's actually ours: fit decisions, architecture, incidents with business weight, the handful of changes that need a name behind them. Everything else is a machine's job done well.

FAQ

## Frequently asked questions

How much of the work is actually done by AI agents versus humans?

Most of the standing operational volume is done by agents — monitoring, triage, classification, documentation, low-risk fixes, dependency updates, security scanning, restore drills. Operators handle the things that genuinely need human judgment: incidents with business consequences, architecture decisions, scope and fit calls, anything risky enough that we wouldn't trust an agent to make the call. The split shifts somewhat by account size and complexity, but the rule holds: volume is automated, judgment is human.

What technology stacks can your framework operate on top of?

Most of them. The framework is designed to wrap whatever your system is already built on — Node, Python, PHP, Ruby, Go, Postgres, MySQL, MongoDB, Supabase, Firebase, AWS, Azure, GCP, Vercel, Netlify, Render. We don't migrate you to "our" stack because we don't have one. Hardening is the work of fitting your specific stack into our framework's operational seams. If your system is running on something exotic, we'll tell you during the Free Assessment whether we can operate it economically. The answer is almost always yes.

How do you prevent the AI agents from making changes that break things?

The threshold for an agent to act on its own is conservative on purpose. Agents only ship changes they can both classify as low-risk and verify against existing tests and deployment checks. Anything ambiguous, anything material, anything that touches money, identity, or external systems gets escalated to a named operator before it merges. We watch the agents the way the agents watch your system — if one starts making calls outside its lane, it gets pulled and an operator reviews what it did.

What happens to the framework if we end our engagement with Keepstone?

The framework belongs to us. The work product belongs to you. Your hardened application, source code, infrastructure configuration, documentation, credentials, dashboards, and deployment pipelines stay with you in your accounts, forever. When we leave, the framework comes off — meaning the agents we were running on your system stop running. The system itself keeps operating; you just won't have us operating it. Thirty-day exit, any time, everything handed over.

Is the framework open source? Are we locked into Keepstone-specific tooling?

The framework itself is proprietary — it's our software ops operating system, and the AI agent layer that monitors, orchestrates, and manages everything is the part that travels with us. Everything underneath that orchestration layer, though, is built on open source or best-of-breed commercial tools — industry-standard observability platforms, monitoring services, deployment pipelines, infrastructure tooling. There's no vendor lock-in by design. If we leave, the underlying tooling stays in place, configured the way it was. You can pick it up and run it in-house, or hand it to another provider that knows the same stack. The only thing that comes off the system is our agent framework.

Is the framework something we can buy and run ourselves?

No. The framework is the operating system we use to run Software Ops; it isn't a product we license. The reason it works is that it's continuously developed and operated by a team that's all-in on it. Selling it as software would mean splitting our attention between operating systems and shipping a product, and the moment we do that, we're a software company instead of an operations company. We're not interested in being a software company.

Why is the framework AI-first instead of just AI-assisted?

Because the math doesn't work otherwise. Real operating discipline — six standing functions, applied continuously, to a custom system — is something a human team would charge tens of thousands of dollars a month for. To deliver it at $500–$9,500+/month, the volume of standing work has to be done by software. Agents handle the volume. Operators handle the judgment. That's the only cost structure that makes Software Ops economically deliverable to a small business in the first place.

The honest next step

## See the framework against your actual system.

The free assessment is a structured review scored against every layer. You get a written report — what's [framework-compatible](/glossary#framework-compatible) now, what needs hardening, and a fixed quote to get there.

[Free assessment →](start?path=assess) [See pricing →](pricing)
