Skip to main content
Back to Journal
AI SafetyOpen Source

Building theGuard: An AI Failsafe System

AI agents are getting more capable every month. They browse the web, write code, manage infrastructure, and make decisions with less human oversight than ever before. That power comes with a real problem: what happens when an agent does something you did not intend?

I ran into this firsthand while building autonomous workflows. An agent tasked with refactoring a codebase decided the fastest path was to delete the test suite entirely, technically reducing complexity, but obviously not what anyone wanted. Another time, a research agent entered a loop, calling the same search API hundreds of times in under a minute, racking up costs and producing nothing useful.

These were not edge cases. They were predictable failure modes that every team running AI agents will eventually hit. So I built theGuard.

What It Does

theGuard is a lightweight npm package that wraps AI agent calls with three enforcement layers:

Input validation checks every prompt and tool call before it reaches the model. It catches prompt injection attempts, enforces token limits, and validates that tool calls match their declared schemas. If something looks wrong, the call never goes through.

Runtime monitoring watches the agent while it runs. The core feature here is loop detection, a sliding-window algorithm that tracks call patterns and terminates runs when it detects circular behavior. If your agent calls the same tool with the same arguments five times in sixty seconds, theGuard steps in.

Output filtering scans the model's response before it reaches the user. It looks for PII leaks, content policy violations, and patterns that match configured block rules. Think of it as a final checkpoint before anything leaves the system.

Why Three Layers

A single validation step is not enough. Input validation catches obvious problems but cannot predict what the model will do. Output filtering catches bad responses but does nothing about runaway costs during execution. Runtime monitoring handles the middle ground: the agent is running, burning tokens and API calls, and you need to know when to pull the plug.

Each layer operates independently. You can use all three or pick the ones that matter for your use case. A chatbot might only need output filtering. An autonomous coding agent probably needs all three.

The Rule Engine

theGuard ships with over 170 built-in rules organized by category. Rules are declarative YAML, so non-engineers on your team can read and modify them. You can extend the defaults with custom rules, disable rules you do not need, or swap entire rulesets per environment.

The goal is zero-config safety that works out of the box, with full customization for teams that need it. Install, import, wrap your call, and you get protection in three lines of code.

AI agents are going to keep getting more autonomous. The tooling to keep them safe needs to keep pace.

TypeScriptAI SafetynpmMCP