# unotest documentation — full text

> AI-native E2E testing for web and iOS. This file concatenates every docs page as Markdown.


---

# How it works

> The architecture behind unotest — MCP, semantic perception, a sandboxed engine, and local-first execution.

unotest connects your editor's AI agent to your real application through an
**MCP server**, and turns what the agent does into a reviewable test.

## The loop

1. **You describe a flow** in plain English to your agent.
2. **The agent explores** your live app through ~37 MCP tools — reading a
   semantic snapshot, clicking, filling, recording each action.
3. **It writes a scenario** to `unotest/e2e/<name>.js` with stable selectors and
   `step("intent", …)` labels.
4. **It runs the scenario** through the sandboxed engine and, on failure, pauses
   to inspect, patch and resume.
5. **You review and commit** the `.js`.

## Semantic perception, not pixels

The agent doesn't look at screenshots. It reads a **semantic snapshot** of the
page (web) or the **accessibility tree** (iOS) — roles, names, labels, test IDs,
rendered as a token-cheap text outline. This is cheaper, more reliable, and
stable across visual redesigns.

## A sandboxed engine

Scenarios are plain JavaScript, but they don't run in Node. They execute in a
**sandboxed AST interpreter** — no `require`, no `fetch`, no filesystem, no
network except the typed `apiCall` helper. That's why AI-generated tests are
safe to run blindly.

## Stable by construction

Selectors follow a strict priority — `getByTestId → getByRole → getByLabel →
getByText → locator(css)` — and the linter flags brittle patterns. Tests survive
refactors because they target meaning, not markup.

## Local-first

The MCP server, the browser/Simulator, the runner and the viewer all run on your
machine. Your app never leaves it. No cloud, no account.

## The ecosystem

| Package | Role |
| --- | --- |
| `@unotest/web` | CLI · MCP server · runner (web) |
| `@unotest/mobile` | CLI · MCP server · runner (iOS) |
| `@unotest/viewer` | local results browser |
| `@unotest/dsl` | scenario parser + validator |
| `@unotest/protocol` | shared types |


---

# Install & setup

> Requirements, browser choices, and wiring your editor to unotest over MCP.

## Requirements

- **Node 20+** (for `npx`; you don't need a Node project).
- **Web:** any OS. **iOS:** macOS with Xcode + the iOS Simulator.

## Set up a project

```sh
# web
npx @unotest/web init

# iOS
npx @unotest/mobile install /path/to/Your.app --update-env
```

`init` / `install` scaffold the `unotest/` layout, write `unotest.config.*`, and
wire MCP config (`.mcp.json` / editor settings) so your agent finds the server
automatically.

## Choosing a browser (web)

During `init` you pick how Chromium is provided:

- **System Chrome / Edge** — zero download. Sets `channel: "chrome"` (or
  `"msedge"`) in your config.
- **Bundled Chromium** — Playwright's build (~150 MB). Run
  `npx @unotest/web install-chromium` if you didn't during init.

You can also run **Firefox** and **WebKit** — set `browsers` in
[config](/reference/config/). Single browser in dev; CI can run all three.

## Wire your editor

unotest is an MCP server. After `init`, these editors pick it up automatically:

- **Claude Code** — via `.mcp.json` and project settings.
- **Cursor** / **Codex** — via their MCP config.

See [Connect your editor](/agent/connect/) for details.

## Verify

```sh
# web
npx @unotest/web e2e welcome

# iOS
npx @unotest/mobile doctor
```

If something's off, the CLI prints a single actionable line — no stack traces in
normal operation. Set `UNOTEST_DEBUG=1` for full diagnostics.


---

# Introduction

> What unotest is — AI-native E2E testing where your agent writes the tests and you stay in control.

**unotest is AI-native end-to-end testing.** You don't write the tests — your
AI agent does, by driving your real app through an MCP server. You review the
result and commit it.

It's not a black box. Everything the agent does ends up as ordinary,
human-readable `.js` in your repository, with stable selectors and plain-English
step labels. You can open it, understand it, and rewrite it by hand at any time.

**AI does the work. You keep control.**

## Two surfaces, one idea

| | **unotest web** | **unotest mobile** |
| --- | --- | --- |
| Target | Web apps | iOS apps (React Native + native Swift) |
| Engine | Playwright / Chromium | WebDriverAgent / XCUI |
| Perception | semantic DOM snapshots | accessibility tree |
| Runs on | any OS | macOS only (Apple licensing) |

Both share the same DNA: an MCP server, a sandboxed JavaScript DSL, plain `.js`
tests in your repo, a pause-on-failure debugger, and local-first execution.

## Why it's different

- **The agent perceives structure, not pixels.** It reads a semantic snapshot
  (roles, labels, test IDs) — cheap on tokens and stable across redesigns.
- **Stable selectors by default.** `getByTestId → getByRole → getByLabel →
  getByText → locator(css)`. The linter steers away from brittle CSS.
- **Self-healing, with you in the loop.** A failing step pauses mid-run; the
  agent inspects, patches and resumes. `agent_fix` never calls an LLM itself and
  never applies a patch silently — you approve the diff.
- **Safe to run blindly.** Scenarios execute in a sandboxed AST interpreter — no
  `require`, no `fetch`, no filesystem.
- **Local-first.** Everything runs on your machine. No cloud, no account.

## Next

- [Quick start — web](/start/quickstart-web/)
- [Quick start — iOS](/start/quickstart-ios/)
- [How it works](/start/how-it-works/)


---

# Quick start — iOS

> From a built .app to a passing E2E test on the iOS Simulator, written by your agent.

`unotest mobile` drives your iOS Simulator and writes real test scenarios for
your app — React Native / Expo or native Swift. Works with any backend stack; no
JS expertise required from you.

:::note[macOS only]
iOS simulators can't run on Linux/Windows (Apple licensing). You need macOS with
Xcode + the iOS Simulator, and Node 20+.
:::

## 1. Point it at your built `.app`

```sh
npx @unotest/mobile install /path/to/Your.app --update-env
```

This installs the app on the Simulator and auto-detects the bundle ID, URL
scheme and required permissions from `Info.plist`. On first run it offers to
bootstrap `unotest/` and `.mcp.json`.

Common `.app` locations:

- **React Native / Expo:** `./ios/build/Build/Products/Debug-iphonesimulator/<App>.app`
  after `npx expo run:ios`.
- **Native Swift:** Xcode → Product → Show Build Folder → `Products/Debug-iphonesimulator/`.

:::caution[First run compiles WebDriverAgent]
The first run builds WebDriverAgent once (~5–15 min) and caches it under
`~/.cache/unotest/mobile/wda/`. Subsequent runs reuse it.
:::

## 2. Ask your agent for a test

Open Claude Code (or Cursor / Codex) **in the same directory** and ask for a
test. The agent drives the Simulator over MCP, reading the live accessibility
tree — not screenshots — and records a clean `.js` scenario.

## 3. Run it

```sh
npx @unotest/mobile e2e <name>
```

Scenarios are plain `.js` in `unotest/e2e/` — into git, code review and CI.

## Next

- [How it works](/start/how-it-works/)
- [CLI — mobile](/reference/cli-mobile/)


---

# Quick start — web

> From zero to a passing E2E test in your web app, written by your agent.

Get your first test written and running in a few minutes. Works with any backend
stack (Node, Django, Rails, Go…) — no `package.json` required.

## 1. Initialize

Run once in your project:

```sh
npx @unotest/web init
```

`init` writes the config and a starter scenario, sets up the browser (bundled
Chromium, or your system Chrome if you pick it), and wires the MCP server so
Claude Code / Cursor / Codex pick it up automatically. Re-run anytime — it never
overwrites your edits.

## 2. Open the viewer

```sh
npx @unotest/web viewer
```

A local IDE-style UI for your tests — no cloud, no account. Browse scenarios,
run them, and watch each step live. This is your home base.

## 3. Ask your agent for a test

In your AI editor, open your project and describe the flow in plain English:

> Open my app, sign in as the demo user, and check that the dashboard heading
> appears.

Through the MCP server the agent explores your live app — clicking, filling,
reading the real DOM — then writes a clean scenario to `unotest/e2e/<name>.js`
with stable selectors, runs it, and debugs itself if it fails.

```js
function test_login() {
  step("Sign in as the demo user", () => {
    goto("/login");
    fill(getByLabel("Email"), TEST_USER_EMAIL);
    fill(getByLabel("Password"), TEST_PASSWORD);
    click(getByRole("button", { name: "Continue" }));
  });

  step("Dashboard is shown", () => {
    assertVisible(getByRole("heading", { name: "Dashboard" }));
  });
}
```

## 4. Run it

Click any scenario in the viewer, or from the CLI:

```sh
npx @unotest/web e2e login
```

## 5. Review & commit

You get a reviewable `.js` test. Read it, tweak it, commit it. That's the whole
loop — the agent does the legwork, you own the result.

:::tip
Point your agent at its authoring guide — see [For your AI agent](/agent/overview/).
:::

## Next

- [Concepts: scenarios & `step()`](/concepts/scenarios/)
- [Write your first test (guided)](/guides/first-test/)
- [The viewer](/viewer/overview/)


---

# Collections

> Group scenarios and run them as a set — smoke, regress, and more.

A **collection** is a named set of scenarios you run together — typically
`smoke` (fast, key flows) and `regress` (the full sweep).

## Run a collection

```sh
npx @unotest/web collection smoke --workers=4
```

| Flag | Effect |
| --- | --- |
| `--workers=N` | run N scenarios in parallel (default: serial) |
| `--bail` | stop after the first failure |
| `--headed` | run with a visible browser |

## In the viewer

The [viewer](/viewer/overview/) manages collections visually: create, rename,
reorder by drag, and run the whole set. During a run you see **per-scenario
status**, a progress bar, and a one-click **abort**.

## In CI

Collections are the natural CI unit — run `smoke` on every push, `regress`
nightly. See [Run in CI](/guides/ci/).


---

# Debugging & self-healing

> One pause-on-failure debugger for both the agent and you — and why fixes are never silent.

When a step fails, the run **freezes on that step** instead of crashing. The same
paused state is available to both the agent and you.

## The agent's loop

1. The step throws → the runtime pauses.
2. The agent calls `inspect_runtime` — live DOM, variables, the last event.
3. It patches the scenario and calls `resume` from the same step — no browser
   restart, no re-running setup.

## Your loop (the viewer)

In the [viewer's debugger](/viewer/debugger/) you do the same visually: set
gutter breakpoints, run paused, inspect on each stop (**Vars / Call stack /
Trace / Breakpoints**), and step with **Continue / Step / Stop**.

```sh
npx @unotest/web e2e checkout --debug
```

## Self-healing is agent-assisted, never silent

`agent_fix` composes the failure context and a suggested fix — but it **does not
call an LLM itself and never applies a patch on its own**. The agent forms a
diff; **you review and commit**. That's the core balance: AI repairs, a human
approves. Tests are never changed behind your back.


---

# Failure bundles

> The evidence captured when a run fails — screenshots, console, semantic DOM, and trace.

When a run fails, unotest writes a **failure bundle** — everything needed to
understand what happened, for you or the agent.

## What's captured

| Tier | Contents | Default |
| --- | --- | --- |
| **1** | error + stack, console log, semantic DOM snapshot, DSL trace | always on |
| **2** | screenshot (viewport + element-focused) | on |
| **3** | HAR (network) + video | off — **not wired yet** |

:::caution[Tier 3 is not shipped]
Network HAR and video capture aren't wired yet. Use the console + screenshot +
semantic DOM + trace. Don't promise HAR/video in your own docs or dashboards.
:::

## Where it lives

Bundles are written under `.unotest/failures/` (configurable), with retention
(default: keep 20 runs / 7 days). The [viewer](/viewer/inspector/) renders each
artifact per run; the agent reads them through the
[failure-artifact MCP tools](/reference/mcp-tools/).

## How the agent uses it

On failure the agent calls `agent_fix`, which bundles the trace, console,
semantic snapshot and scenario source plus a classification
(`rewrite-selector` / `add-waitfor` / `change-assertion`). It proposes a diff —
[you approve it](/concepts/debugging/).


---

# Helpers — flows & mocks

> Reusable JavaScript functions for repeated journeys and for seeding data.

Helpers are ordinary JS functions in `unotest/e2e/_helpers/`. There are two
kinds, and keeping them separate keeps scenarios clean.

## Flows — replay a UI journey

`flow_*` functions wrap a repeated user journey: sign-in, checkout, onboarding.
Write once, call from any scenario.

```js
// _helpers/flows.js
function flow_signin(email, password) {
  goto("/login");
  fill(getByLabel("Email"), email);
  fill(getByLabel("Password"), password);
  click(getByRole("button", { name: "Sign in" }));
}
```

An agent can also replay a flow live (`explore_run_flow`) to seed state, then
record a new test on top of it.

## Mocks — seed & reset data

Use the sandbox helpers to put the app into a known state before a test and clean
up after. Connection details are pinned in [config](/reference/config/) — a
scenario can't redirect them.

```js
// _helpers/mocks.js
function seed_cart(userId) {
  dbExec("INSERT INTO carts (uid) VALUES ($1)", userId);
  apiCall("POST", "/test/checkout/reset");
}
```

| Helper | Use |
| --- | --- |
| `dbQuery` / `dbExec` | parameterized SQL (postgres / mysql / sqlite) |
| `apiCall` | relative-path HTTP against `apiBaseUrl` |
| `shell` | run a binary (execFile, no shell interpretation) |

:::tip[Keep project knowledge in helpers]
The core stays project-agnostic. Anything specific to *your* app — seed scripts,
fixtures, endpoints — lives in `_helpers/`, never in the tool.
:::


---

# Scenarios & step()

> How a unotest scenario is structured — readable intent on top, precise DSL underneath.

A scenario is a plain `.js` file in `unotest/e2e/`. It exports `test_*`
functions, and **every executable step lives inside a `step()` block**.

```js
function test_checkout() {
  step("Add the first product to the cart", () => {
    goto("/products");
    click(getByRole("button", { name: "Add to cart" }));
  });

  step("Cart shows one item", () => {
    assertText(getByTestId("cart-count"), "1");
  });
}
```

## Two layers

Each `step()` carries two layers at once:

- **Intent** — the string label, plain English. Reads like a checklist, even to
  a non-engineer.
- **Execution** — the DSL calls inside the closure. One step can be several
  calls.

This is why repair is precise: the agent knows **what** a step is meant to do
(its label) and **how** it does it (the calls), so it rewrites only the broken
part — it doesn't guess. The same duality helps you: collapse a step to see the
logic, expand it to see the exact commands.

:::note[step() is required]
The validator requires every direct child of a `test_*` body to be a `step()`
call. Helpers (`flow_*`, `snake_case`) are exempt. The older `//@collapse`
comment form has been removed.
:::

## The DSL in one breath

Navigation (`goto`, `waitForUrl`), locators by stability (`getByTestId` →
`getByRole` → `getByLabel` → `getByText` → `locator`), actions (`click`, `fill`,
`press`, `selectOption`…), assertions (`assertText`, `assertVisible`…), chaining
(`getByRole(...).filter(...).first()`), multi-tab and iframes, and sandbox
helpers (`dbQuery`, `apiCall`, `shell`). See the [DSL reference](/reference/dsl/).

## What's not in the DSL

Comparison/logical operators in conditions aren't supported — use bare truthy
variables. Loops and branching are plain JS *around* steps. Regex literals are
allowed in matcher args (ES5 flags only). See [DSL → Not supported](/reference/dsl/).


---

# Stable selectors

> The selector priority that keeps tests resilient, and the linter that enforces it.

Tests break when they target markup that changes. unotest steers every locator
toward **meaning** over **structure**, so tests survive refactors.

## The priority

1. **`getByTestId(id)`** — an explicit `data-testid`. Most stable. Prefer it.
2. **`getByRole(role, { name })`** — semantic role + accessible name. Stable when
   the name is unique.
3. **`getByLabel(text)`** — form controls by their label.
4. **`getByText(text)`** — unique visible text.
5. **`locator(css)`** — raw CSS. Last resort.

```js
// good — resilient
click(getByRole("button", { name: "Save changes" }));
fill(getByLabel("Email"), TEST_USER_EMAIL);

// avoid — brittle
click(locator(".btn.btn-primary.css-1a2b3c"));
```

## Refine, don't index blindly

Narrow with `filter()` before reaching for position:

```js
click(getByRole("row").filter({ hasText: "Uma Quinn" }).first());
```

`nth()` / index-only refinement is fragile — the linter flags it.

## The linter enforces it

The [linter](/reference/linter/) warns on deep CSS, XPath, hashed class names
(Tailwind JIT, CSS Modules), and unexplained `pause()`. Run it anytime:

```sh
npx @unotest/web lint
```


---

# Variables & secrets

> Keep credentials out of code and run the same test across environments.

Environments, tokens and credentials live in variables — not hardcoded in
scenarios.

## Two files

- `unotest/.env` — ordinary variables.
- `unotest/.secrets` — secrets, git-ignored.

Reference them by **bare `UPPER_SNAKE` identifiers** in scenarios:

```js
goto(APP_BASE_URL + "/login");
fill(getByLabel("Email"), TEST_USER_EMAIL);
fill(getByLabel("Password"), TEST_PASSWORD);
```

:::caution[No mustache]
Don't write `"{{VAR}}"` string literals — reference the bare identifier. The
linter errors on mustache in the DSL.
:::

## Secrets are masked

Values registered as secrets are redacted in logs and failure artifacts (shown
as `‹secret:NAME›`), so they never leak into a trace or a screenshot bundle.

## Across environments

Because the test references variables, the same scenario runs against dev,
staging or prod — just change the values. The [viewer](/viewer/overview/)'s
**Variables** panel lets you edit values, reveal/hide secrets, and toggle boolean
flags without leaving the window.


---

# Auth & cached login

> Reuse a logged-in session instead of signing in at the start of every test.

Signing in at the top of every scenario is slow and brittle. Cache a logged-in
session once and reuse it.

## storageState

Point your config at a Playwright `storageState.json`:

```js
// unotest.config.mjs
export default {
  storageState: "unotest/.auth/state.json",
};
```

Scenarios then start already authenticated — no `flow_signin` per test.

## Seeding the state

Create the state once (an agent flow, or a small setup scenario that signs in and
captures cookies/localStorage). Keep the file git-ignored and refresh it when it
expires.

## When to still sign in

- Tests that specifically exercise the **login flow** itself.
- Tests that need a **different user** than the cached one — use
  [`flow_signin`](/guides/flows-and-data/) for those.

:::tip
Pair cached auth with [variables](/concepts/variables/) so the same suite runs
against dev / staging / prod with the right credentials.
:::


---

# Write a test by hand

> Author a scenario yourself — the format is just readable JavaScript.

You never *have* to let the agent write everything. Scenarios are plain `.js`,
so you can author or edit them by hand.

## The shape

```js
// unotest/e2e/search.js
function test_search() {
  step("Search for a known product", () => {
    goto("/");
    fill(getByRole("searchbox"), "wireless mouse");
    press(getByRole("searchbox"), "Enter");
  });

  step("Results contain the product", () => {
    assertVisible(getByRole("link", { name: /wireless mouse/i }));
  });
}
```

## Rules to keep in mind

- Put every executable step inside `step("intent", () => { … })`.
- Prefer [stable selectors](/concepts/selectors/).
- Reference [variables](/concepts/variables/) by bare `UPPER_SNAKE`.
- Scenarios live in a feature subfolder of `unotest/e2e/`, not its root.

## Lint as you go

```sh
npx @unotest/web lint
```

The [linter](/reference/linter/) catches brittle selectors, missing step
wrappers, and unexplained `pause()`. See the full [DSL reference](/reference/dsl/).


---

# Run in CI

> Run unotest scenarios and collections in continuous integration.

Because tests are plain `.js` in your repo with no proprietary format, CI is
straightforward.

## A minimal job

```yaml
# .github/workflows/e2e.yml
jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npx @unotest/web install-chromium
      - run: npx @unotest/web collection smoke --workers=4
        env:
          APP_BASE_URL: ${{ secrets.APP_BASE_URL }}
          TEST_USER_EMAIL: ${{ secrets.TEST_USER_EMAIL }}
          TEST_PASSWORD: ${{ secrets.TEST_PASSWORD }}
```

## Tips

- **Browsers:** run the bundled Chromium in CI; you can widen `browsers` to
  Firefox/WebKit in your [config](/reference/config/).
- **Retries:** turn on `retry.count` for CI (off in dev) to absorb transient
  flakiness — assertion failures still never retry.
- **Secrets:** inject [variables](/concepts/variables/) from your CI secret store;
  the runner masks them in logs and artifacts.
- **Split:** `smoke` on every push, `regress` nightly.
- **iOS:** `unotest mobile` needs macOS runners with Xcode.

## Failure artifacts

On failure the [bundle](/concepts/failure-bundles/) (screenshot, console,
semantic DOM, trace) is written under `.unotest/failures/` — upload it as a CI
artifact for inspection.


---

# Run smoke / regress

> Group scenarios into collections and run them as a set.

Group related scenarios into a [collection](/concepts/collections/) and run them
together.

## Create one

In the [viewer](/viewer/overview/): **Collections** (⌘3) → create → add
scenarios → reorder by drag. A collection is a small YAML file under
`unotest/e2e/collections/`.

## Run it

```sh
# fast key flows
npx @unotest/web collection smoke --workers=4

# full sweep, stop on first failure
npx @unotest/web collection regress --bail
```

During a run you get per-scenario status and a progress bar (in the viewer), or a
serial/parallel summary on the CLI.

## A good split

- **smoke** — login, checkout, the 5 flows that must never break. Run on every
  push.
- **regress** — everything. Run nightly or before release.

Wire it into CI: [Run in CI](/guides/ci/).


---

# Debug a failing test

> Use the pause-on-failure debugger — in the viewer or with your agent.

A failing step freezes the run so you can inspect it, instead of dumping a stack
trace.

## In the viewer

1. Open the scenario, click a gutter dot to set a **breakpoint**.
2. Click **Debug** (headed + pause on breakpoints and failure).
3. When it pauses, inspect **Vars / Call stack / Trace**, then **Continue /
   Step / Stop**.

See [Step debugger](/viewer/debugger/).

## From the CLI

```sh
npx @unotest/web e2e checkout --debug
# or target a line directly
npx @unotest/web e2e checkout --break 24:5
```

## With your agent

Ask the agent to fix the failure. It calls `inspect_runtime` at the pause point,
reads the [failure bundle](/concepts/failure-bundles/), and proposes a diff via
`agent_fix`. **You review and commit** — nothing is applied silently.

## Common fixes

| Symptom | Likely fix |
| --- | --- |
| Locator timed out | selector drifted → [stable selector](/concepts/selectors/) |
| Element found but not actionable | add a `waitFor` |
| Expected ≠ actual | update the assertion |

For full diagnostics (JSONL log + artifacts), set `UNOTEST_DEBUG=1`.


---

# Write your first test

> Let your agent write and verify a real E2E test, step by step.

This is the canonical loop: describe a flow, let the agent build it, review.

## 1. Make sure the project is set up

```sh
npx @unotest/web init
```

## 2. Describe the flow to your agent

In your AI editor, be specific about the start, the actions, and the assertion:

> Open `/login`, sign in with `TEST_USER_EMAIL` / `TEST_PASSWORD`, and check that
> a heading "Dashboard" is visible.

The agent will explore your live app over MCP, then write a scenario.

## 3. Let it write & run

The agent records actions, generates `unotest/e2e/login.js`, and runs it. If a
step fails, it pauses, inspects, patches and resumes — then shows you the result.

## 4. Read the result

```js
function test_login() {
  step("Sign in with the demo user", () => {
    goto("/login");
    fill(getByLabel("Email"), TEST_USER_EMAIL);
    fill(getByLabel("Password"), TEST_PASSWORD);
    click(getByRole("button", { name: "Continue" }));
  });

  step("Dashboard is shown", () => {
    assertVisible(getByRole("heading", { name: "Dashboard" }));
  });
}
```

## 5. Commit

It's ordinary `.js` in your repo. Tweak the labels or selectors if you like,
then commit it like any code.

:::tip
Keep `TEST_USER_EMAIL` / `TEST_PASSWORD` in [variables](/concepts/variables/),
not in the test.
:::


---

# Reuse flows & seed data

> Factor repeated journeys into flows and put the app into a known state with mocks.

Keep scenarios short and reliable by extracting repetition into
[helpers](/concepts/helpers/).

## Extract a flow

Move a repeated journey into a `flow_*` helper:

```js
// _helpers/flows.js
function flow_signin(email, password) {
  goto("/login");
  fill(getByLabel("Email"), email);
  fill(getByLabel("Password"), password);
  click(getByRole("button", { name: "Sign in" }));
}
```

Call it from a scenario:

```js
function test_orders() {
  step("Sign in", () => {
    flow_signin(TEST_USER_EMAIL, TEST_PASSWORD);
  });
  step("Orders page loads", () => {
    assertVisible(getByRole("heading", { name: "Your orders" }));
  });
}
```

## Seed data with mocks

Put the backend into a known state before the test, and clean up after:

```js
// _helpers/mocks.js
function seed_order(userId) {
  dbExec("INSERT INTO orders (uid, status) VALUES ($1, 'paid')", userId);
}
```

`dbQuery` / `dbExec` use the `database` URL from config; `apiCall` uses
`apiBaseUrl`; `shell` runs a binary. These are pinned in
[config](/reference/config/) — scenarios can't point them elsewhere.

:::tip[Cache login instead of repeating it]
For auth specifically, prefer a cached session over running `flow_signin` in
every test — see [Auth & cached login](/guides/auth/).
:::


---

# Multi-tab & iframes

> Drive multiple tabs and work inside nested iframes.

unotest scenarios can span tabs and reach into frames.

## Switch tabs

When an action opens a new tab, switch to it by index (0-based):

```js
step("Open the invoice in a new tab", () => {
  click(getByRole("link", { name: "View invoice" }));
  setPage(1);                      // focus the new tab
  assertVisible(getByText("Invoice #"));
});
```

Inspect tabs with the `list_pages` / `get_active_context` MCP tools while
exploring.

:::note[Waiting for a new tab]
`waitForPage()` isn't shipped yet. For a tab that opens asynchronously, use
`pause(ms)` with a `// reason:` comment until it lands.
:::

## Work inside an iframe

Enter a frame to scope subsequent calls; exit when done:

```js
step("Submit the embedded payment form", () => {
  enterFrame(getByTestId("payment-iframe"));
  fill(getByLabel("Card number"), TEST_CARD);
  click(getByRole("button", { name: "Pay" }));
  exitFrame();
});
```

Or resolve a single element across the boundary with `contentFrame()`. See the
[DSL reference](/reference/dsl/).


---

# Step debugger

> Gutter breakpoints, pause-on-failure, and live inspection.

The viewer's debugger is the same pause-on-failure machinery the agent uses —
exposed visually.

## Breakpoints

Click the gutter next to any step to toggle a breakpoint. Breakpoints persist to
`unotest/.debugger.json`, so the CLI and agent see them too.

## Run paused

Click **Debug** (or `e2e <name> --debug`) to run with a visible browser that
pauses on breakpoints and on failure. A paused run shows the reason (breakpoint /
manual / failure) and the exact line, highlighted in the gutter.

## Controls

- **Continue** — run to the next breakpoint or the end.
- **Step** — execute the next statement.
- **Stop** — abort the run.

## Inspect

While paused, the inspector offers:

| Tab | Shows |
| --- | --- |
| **Vars** | live variables in scope (truncated to 4 KB each) |
| **Call stack** | your function frames, outer → inner (`file:line`) |
| **Trace** | the full event log since the run started |
| **Breakpoints** | every breakpoint, with add/remove |

This is the human side of [self-healing](/concepts/debugging/): you can take over
the same paused state the agent works from.


---

# Inspector & artifacts

> The failure evidence shown per run — screenshot, semantic DOM, console, trace.

When a run fails, the inspector shows the [failure bundle](/concepts/failure-bundles/)
for that run.

| Tab | Contents |
| --- | --- |
| **Screenshot** | the page at the moment of failure (PNG) |
| **Semantic DOM** | a readable outline of the page structure at failure |
| **Console** | browser console output, with level (log / warn / error) |
| **Trace** | the full DSL execution trace |
| **Network** | placeholder — HAR capture isn't wired yet |

Artifacts are served from the run directory under `unotest/.runs/` /
`.unotest/failures/`. Past runs keep their bundles (subject to retention), so you
can reopen any historical failure from the **Runs** tab.

:::note
Network HAR and video aren't captured yet. Use the screenshot + console +
semantic DOM + trace.
:::


---

# Overview & launch

> A local IDE-style viewer for your tests — no cloud, no account.

The viewer is your home base: browse scenarios, run them, watch each step live,
and debug failures — all locally.

## Launch

```sh
npx @unotest/web viewer
```

It starts a localhost HTTP + WebSocket server and opens your browser. Set
`UNOTEST_VIEWER_NO_OPEN=1` to start without opening a tab. Your agent can also
launch it via the `open_viewer` MCP tool.

## What it is

- **Local-only.** No cloud, no account. It reads run artifacts from
  `unotest/.runs/` and your scenarios/helpers/collections from disk.
- **Read + run.** Browse and run; results stream live over WebSocket.
- **Single source of truth.** Breakpoints and variables are files on disk, so
  the CLI, the agent and the viewer all agree.

## Next

- [Tour](/viewer/tour/) — the activity bar and panels
- [Running tests](/viewer/running/)
- [Step debugger](/viewer/debugger/)


---

# Running tests

> Run from the UI and watch results stream live, step by step.

## Run a scenario

Open a scenario and click **Run** (headless) or **Debug** (visible browser +
pause on breakpoints and failure). A run tab opens immediately and updates live.

## Live streaming

Results stream over WebSocket as each step executes — no waiting for the whole
run to finish. Step status updates in place: pending → running → passed/failed.

## Block view

The scenario renders as a **block view**: each `step("…")` is a foldable group,
each DSL call a row with a status icon, line number and duration. Fold a step to
see intent; unfold to see the calls.

## Error cards

When a step fails, an inline **error card** pins to the offending line — the
error class, message, and the `file:line` source snippet — so you see exactly
what broke without scrolling away.

## Collections

Run a whole [collection](/concepts/collections/) from its view. You get a
per-scenario status list, a progress bar, and a one-click **abort** that stops
all in-flight scenarios.


---

# Shortcuts & themes

> Keyboard shortcuts and theme switching in the viewer.

## Keyboard shortcuts

| Shortcut | Action |
| --- | --- |
| `⌘1` … `⌘6` | switch activity-bar section (Scenarios … Variables) |
| `⌘K` | focus the tree search/filter |
| `⌘W` | close the active tab |
| `⌃\`` | toggle the docked terminal |
| `Esc` | blur search / cancel a drag-reorder |

## Themes

Light and dark, toggled from the activity bar (bottom). The viewer follows your
system preference on first load and remembers your choice. Every surface — block
view, debugger, terminal — adapts.


---

# Terminal (AI inside)

> A docked terminal with local, Claude and Codex sessions — drive the agent without leaving the viewer.

The viewer has a docked terminal (toggle ⌃\`) so you can drive the CLI — and your
AI agent — without leaving the window.

## Sessions

| Session | What it is |
| --- | --- |
| **local** | a native shell on your machine |
| **claude** | a Claude Code session, in the viewer |
| **codex** | a Codex session |

Open multiple tabs, switch between them, clear the buffer, and resize the dock.
Each session is its own PTY over a dedicated WebSocket.

## Why it matters

Ask the agent to write or fix a test right here — then watch it run in the same
window. The whole loop (author → run → inspect → repair) stays in one place.


---

# Tour

> The activity bar, document tabs, and inspector panel.

The viewer is a three-part IDE: an **activity bar** (left), **document tabs**
(center), and an **inspector** (right).

## Activity bar

Six sections, switchable with ⌘1–⌘6:

| Section | ⌘ | Shows |
| --- | --- | --- |
| **Scenarios** | ⌘1 | scenario files with last-run status |
| **Helpers** | ⌘2 | helper files (read-only) |
| **Collections** | ⌘3 | collections — create / rename / reorder / run |
| **Runs** | ⌘4 | history of every run, filterable by status |
| **Active** | ⌘5 | live, in-progress runs |
| **Variables** | ⌘6 | edit `.env` / `.secrets`, reveal secrets, toggle flags |

At the bottom: a terminal toggle, a theme toggle, and the version.

## Document tabs

Open scenarios, helpers, collections and runs as tabs. ⌘W closes the active tab;
right-click for Close / Close Others / Close All. Tabs and layout persist across
reloads.

## Inspector (right)

Context-aware: during a **live run** it shows Vars / Call stack / Trace /
Breakpoints; for a **failed run** it shows the artifact tabs (screenshot, semantic
DOM, console, trace). See [Inspector & artifacts](/viewer/inspector/).


---

# CLI — mobile

> Every npx @unotest/mobile subcommand (iOS).

The `unotest-mobile` CLI (iOS, macOS only). Run with `npx @unotest/mobile <command>`.

## `install`

Install a built .app on the Simulator; auto-detect bundle ID, URL scheme and permissions; optionally bootstrap unotest/ and .mcp.json.

```sh
npx @unotest/mobile install <path-to-.app> [--update-env]
```

- `--update-env` — Persist the detected app/simulator config to the environment.

**Example**

```sh
npx @unotest/mobile install ./ios/build/.../MyApp.app --update-env
```

## `init`

Bootstrap unotest/ + .mcp.json without installing an app.

```sh
npx @unotest/mobile init
```

## `doctor`

Re-check the environment: Xcode, Simulator, Node, WebDriverAgent cache.

```sh
npx @unotest/mobile doctor
```

## `e2e`

Run `unotest/e2e/<name>.js` against the Simulator.

```sh
npx @unotest/mobile e2e <name> [--quiet]
```

- `--quiet` — Suppress per-step trace output.

## `lint`

Static-check every scenario and helper.

```sh
npx @unotest/mobile lint
```

## `mcp`

Run as an MCP stdio server (what the editor launches).

```sh
npx @unotest/mobile
```


---

# CLI — web

> Every npx @unotest/web subcommand, with flags and examples.

The `unotest-web` CLI. Run any command with `npx @unotest/web <command>` — works with npm, pnpm, yarn or bun.

## `init`

Bootstrap the unotest/ layout, config, and MCP wiring for your editor. Re-run anytime — it never overwrites your edits.

```sh
npx @unotest/web init [target] [--browser system|bundled|none]
```

- `--browser` — Skip the interactive browser prompt: use system Chrome/Edge, bundled Chromium, or none.

**Example**

```sh
npx @unotest/web init --browser system
```

## `e2e`

Run a single scenario. `name` resolves to `unotest/e2e/<name>.js`.

```sh
npx @unotest/web e2e <name> [--debug] [--break line:col]
```

- `--debug` — Pause on breakpoints (.debugger.json) and on failure.
- `--break line:col[,line:col]` — Override breakpoints (comma-separated, no spaces).

**Example**

```sh
npx @unotest/web e2e auth/login --debug
```

## `collection`

Run a collection (a named set of scenarios), e.g. smoke or regress.

```sh
npx @unotest/web collection <name> [--workers=N] [--bail] [--headed]
```

- `--workers=N` — Run N scenarios in parallel (default: serial).
- `--bail` — Stop after the first failure.
- `--headed` — Run with a visible browser.

**Example**

```sh
npx @unotest/web collection smoke --workers=4
```

## `viewer`

Open the local IDE-style viewer (HTTP + WebSocket). No cloud, no account.

```sh
npx @unotest/web viewer
```

- `UNOTEST_VIEWER_NO_OPEN=1` — Env: start the server without auto-opening a browser.

**Example**

```sh
npx @unotest/web viewer
```

## `lint`

Static-check scenarios and helpers. Exits non-zero on errors.

```sh
npx @unotest/web lint [paths...]
```

**Example**

```sh
npx @unotest/web lint
```

## `install-chromium`

Download Playwright’s bundled Chromium (~150 MB). Not needed if you chose system Chrome/Edge.

```sh
npx @unotest/web install-chromium
```

## `mcp`

Run as an MCP stdio server. This is what your editor launches automatically; you rarely run it by hand.

```sh
npx @unotest/web mcp
```


---

# Configuration

> Every field in unotest.config.{js,mjs,ts}.

Configuration lives in `unotest.config.{js,mjs,ts}` at your project root, auto-discovered by the CLI and MCP server.

| Field | Type | Default | Description |
| --- | --- | --- | --- |
| `baseUrl` | `string (URL)` | `—` | Base for relative goto() paths. If unset, scenarios use absolute URLs. |
| `browsers` | `("chromium" / "firefox" / "webkit")[]` | `["chromium"]` | Browser families to run. Single in dev; CI can run all three. |
| `channel` | `"chrome" / "msedge" / "chrome-beta" / null` | `null` | For chromium: use a system browser (zero download) or bundled Chromium (null). |
| `viewport` | `{ width, height }` | `{ 1280, 720 }` | Browser viewport size. |
| `retry.count` | `int 0–10` | `0` | Retries for transient failures. Off in dev; 1+ in CI. |
| `retry.on` | `("transient" / "network" / "crash")[]` | `["transient"]` | Which failure classes retry. Assertion failures never retry. |
| `failureBundle.tier1` | `true (locked)` | `true` | Always on: error + console + semantic DOM snapshot + DSL trace. |
| `failureBundle.tier2` | `boolean` | `true` | Screenshots (viewport + element-focused). |
| `failureBundle.tier3` | `{ network, video }` | `{ false, false }` | HAR + video. Not wired yet — leave off. |
| `failureBundle.retention` | `{ runs, days }` | `{ 20, 7 }` | Keep N recent runs; delete older than D days. |
| `failureBundle.storageDir` | `string` | `.unotest/failures` | Where failure bundles are written. |
| `dialogPolicy` | `"accept" / "dismiss" / "manual"` | `"accept"` | How native dialogs (alert/confirm) are handled. |
| `storageState` | `string (path)` | `—` | Playwright storageState.json for cached login. |
| `defaultTimeoutMs` | `int` | `3000` | Action / locator-resolution timeout. Override per call. |
| `defaultNavigationTimeoutMs` | `int` | `15000` | Navigation timeout (goto/reload/waitForUrl). |
| `testDir` | `string` | `unotest/e2e` | Scenario directory. |
| `helpersDir` | `string` | `unotest/e2e/_helpers` | Helper functions directory. |
| `sandbox.shellCwd` | `string` | `cwd` | Working directory for shell(). |
| `sandbox.database` | `string (URL)` | `—` | Connection string for dbQuery/dbExec (postgres://, mysql://, sqlite:). |
| `sandbox.apiBaseUrl` | `string (URL)` | `—` | Base URL for apiCall(method, path). |
| `linter.enabled` | `boolean` | `true` | Enable the scenario linter. |
| `mcp.transport` | `"stdio"` | `"stdio"` | MCP server transport (stdio in the current release). |


---

# DSL reference

> Every function available inside a scenario, grouped — the unotest web vocabulary.

Scenarios are plain `.js` on a sandboxed engine. The vocabulary mirrors Playwright, so it reads the way you expect. Every executable step lives inside `step("intent", () => { ... })`.

:::tip[Selector priority]
getByTestId → getByRole(name) → getByLabel → getByText → locator(css)
:::

## Navigation

Drive the page: load URLs, go back/forward, and wait for state.

- `goto(url, { waitUntil?, timeout? })` — Navigate to a URL. `waitUntil`: 'load' | 'domcontentloaded' | 'networkidle'. Relative paths resolve against `baseUrl`.
- `reload({ waitUntil?, timeout? })` — Reload the current page.
- `goBack({ timeout? })` — Navigate back in history.
- `goForward({ timeout? })` — Navigate forward in history.
- `waitForUrl(pattern, { timeout? })` — Wait until the URL contains `pattern` (substring match).
- `waitForNavigation({ timeout? })` — Wait for the next navigation event.
- `waitFor(locator, { state?, timeout? })` — Wait for an element state: 'attached' | 'visible' | 'hidden'.
- `waitForText(text, { timeout? })` — Wait until `text` appears anywhere on the page.
- `pause(ms)` — Explicit delay. Discouraged — the linter wants a `// reason:` comment. Prefer a `waitFor*`.

## Locators

Build a locator. Prefer the most stable matcher available — see Stable selectors.

- `getByTestId(id)` — Match by `data-testid`. Most stable — prefer this.
- `getByRole(role, { name?, exact? })` — Match by ARIA role + accessible name. `name` accepts a string or `/regex/`.
- `getByLabel(text, { exact? })` — Match a form control by its associated label.
- `getByText(text, { exact? })` — Match by visible text content.
- `getByPlaceholder(text, { exact? })` — Match an input by its placeholder.
- `getByAltText(text, { exact? })` — Match an image by its `alt` text.
- `getByTitle(text, { exact? })` — Match by `title` attribute.
- `locator(css)` — Match by raw CSS selector. Last resort — the linter warns on deep/brittle CSS.

## Chain refiners

Narrow a locator. Chains desugar to free calls: `getByRole(...).filter(...).first()`.

- `filter(locator, { hasText?, hasNotText?, has?, hasNot? })` — Keep matches by contained text or a nested child locator.
- `first(locator)` — First match.
- `last(locator)` — Last match.
- `nth(locator, index)` — The N-th match (0-indexed). Index-only refinement is fragile (linter info).
- `contentFrame(locator)` — Resolve an `<iframe>` element to its content frame.

## Actions

Interact with elements. Every action takes an options object; e.g. `{ force: true }`.

- `click(locator, { force?, timeout?, noWaitAfter? })` — Click an element.
- `doubleClick(locator, { force?, timeout? })` — Double-click an element.
- `fill(locator, value, { timeout?, noWaitAfter? })` — Clear and type a value into an input.
- `press(locator, key, { delay?, timeout? })` — Press a key (e.g. 'Enter', 'Control+A').
- `check(locator, { force?, timeout? })` — Check a checkbox/radio.
- `uncheck(locator, { force?, timeout? })` — Uncheck a checkbox.
- `hover(locator, { force?, timeout? })` — Hover over an element.
- `selectOption(locator, value, { timeout? })` — Select option(s) in a `<select>`. `value` is a string or string[].
- `scrollIntoView(locator)` — Scroll an element into the viewport.
- `dragAndDrop(from, to, { force?, timeout? })` — Drag one element onto another. Uses synthetic mouse events (not native HTML5 DragEvent).
- `uploadFile(locator, files)` — Set files on a file input. `files` is a path or path[].
- `clipboardPaste(locator, text)` — Paste text. Synthetic, not a native ClipboardEvent.

## Assertions

Polling assertions (default timeout 5000ms). Assertion failures never retry.

- `assertText(locator, expected, { exact?, timeout? })` — Element's text equals/contains `expected`.
- `assertVisible(locator, { timeout? })` — Element is visible.
- `assertHidden(locator, { timeout? })` — Element is hidden or detached.
- `assertValue(locator, expected, { timeout? })` — Input's value equals `expected`.
- `assertCount(locator, expected, { timeout? })` — Number of matches equals `expected`.
- `assertUrl(pattern, { timeout? })` — Current URL contains `pattern`.
- `assertTrue(condition, message?)` — Assert a boolean condition.

## Queries

Read values (non-polling). Use to branch logic in plain JS.

- `count(locator) → number` — Number of matching elements.
- `textContent(locator) → string` — Text content ("" if none).
- `inputValue(locator) → string` — Current input value.
- `isVisible(locator) → boolean` — Whether the element is visible.
- `getAttribute(locator, name) → string` — Attribute value ("" if missing).
- `getInnerText(locator) → string` — Rendered inner text.
- `getInputValue(locator) → string` — Input value (alias of inputValue).
- `getTitle() → string` — Document title of the active page.
- `getUrl() → string` — URL of the active page.

## Storage & cookies

Read/write localStorage and cookies for setup and assertions.

- `setLocalStorage(key, value)` — Set a localStorage item.
- `getLocalStorage(key) → string` — Read a localStorage item ("" if missing).
- `setCookie(name, value, options?)` — Set a cookie (options: path, domain, expires, httpOnly, secure, sameSite).
- `getCookie(name) → string` — Read a cookie value ("" if missing).

## Multi-tab & iframes

Work across tabs and nested frames.

- `setPage(index)` — Switch the active tab/page by index (0-based).
- `enterFrame(locator)` — Scope subsequent calls to an iframe (persists until exitFrame).
- `exitFrame()` — Exit the innermost iframe scope.

## Setup & data (sandbox)

Seed and verify state. Connection details are pinned in config — scenarios cannot redirect them.

- `dbQuery(sql, ...params) → rows[]` — Parameterized SELECT. Dialect from the config `database` URL (postgres/mysql/sqlite).
- `dbExec(sql, ...params) → number` — Parameterized INSERT/UPDATE/DELETE. Returns affected row count.
- `apiCall(method, path, body?, headers?) → { status, body, headers }` — HTTP call. `path` is relative — base is the config `apiBaseUrl`.
- `shell(cmd, ...args) → { stdout, stderr, code }` — Run a binary (execFile, no shell interpretation). cwd from config `shellCwd`.

## Structure & escape hatch

The required step wrapper, logging, and the raw-JS escape hatch.

- `step(label, () => { ... })` — Required around every executable step in a `test_*` function. `label` is the human-readable intent the agent reads to repair the step.
- `log(...args)` — Write to the run trace.
- `evaluate(js, ...args) → any` — Run raw JS in the page context. Last resort — prefer typed helpers. Returns JSON-serializable values only.

## Not supported

- Comparison/logical operators in DSL conditions (`== != < <= > >= && ||`) — use bare truthy variables.
- Control-flow keywords are plain JS around steps, not DSL primitives.
- Regex literals are allowed in matcher args, ES5 flags only (`g i m`); `s u y d`, named groups and lookbehind are rejected at parse time.
- waitForPage() is not shipped yet — use pause(ms) with a `// reason:` comment for tab races.


---

# Linter rules

> The scenario linter rules and their default severities.

The linter keeps scenarios robust — stable selectors, no silent waits. Override any rule under `linter.rules` in your config (`"off" | "warn" | "error"`).

| Rule | Default | Purpose |
| --- | --- | --- |
| `lint:deep-css` | `warn` | Reject deep / brittle CSS selectors. Prefer role/label/testid. |
| `lint:xpath` | `warn` | Reject XPath selectors. |
| `lint:obfuscated-class` | `warn` | Reject hashed class names (Tailwind JIT, CSS Modules). |
| `lint:pause-explicit` | `warn` | Discourage pause(ms) without a `// reason:` comment. |
| `lint:disambig-by-index` | `off` | Flag index-only refinement (.nth) as fragile (info). |
| `lint:evaluate-discouraged` | `warn` | Discourage bare evaluate() — prefer typed helpers. |
| `lint:scenario-in-root` | `error` | Scenarios must live in a feature subfolder, not the e2e root. |
| `lint:mustache-in-dsl` | `error` | No `{{VAR}}` string literals — reference bare identifiers. |
| `validator:unknown-function` | `error` | Unknown DSL function (typo, unsupported call). |
| `validator:dsl` | `error` | Shape / argument / unsupported-syntax violations. |


---

# MCP tools

> Every MCP tool the agent uses, grouped.

unotest is an MCP server. Your editor’s agent calls these tools to explore your app, run scenarios, debug failures and write tests. You never wire them by hand.

## Core lifecycle & snapshots

Open/close the browser and read what the agent "sees" — a semantic outline, not pixels.

- `new_context` `—` — Open a fresh, isolated browser context.
- `close_context` `—` — Tear down the active context.
- `get_url` `—` — URL of the active page.
- `get_title` `—` — Document title of the active page.
- `get_page_snapshot` `—` — Compact, region-grouped semantic outline of the page (with `[ref=eN]` handles).
- `get_aria_snapshot` `—` — YAML ARIA tree; same-origin iframes stitched in.
- `get_frame_snapshot` `{ locator }` — Outline scoped to one iframe.
- `find_element` `{ role, name?, near?, nth? }` — Targeted role+name search (≤20 hits + total count). Pierces open shadow roots.

## Exploration & recording

The agent drives the live app and records a scenario.

- `explore_start` `{ scenario_name, title?, description? }` — Begin a recording session; returns available variables and flows.
- `explore_stop` `{ explorationId }` — End recording.
- `explore_state` `{ explorationId }` — Inspect the session and recorded entries.
- `explore_step` `{ action, locator?, value?, ... }` — Execute one action — record it (with a session) or run ad-hoc.
- `explore_record` `{ action, section, description, ... }` — Record an action with mandatory section + intent.
- `explore_remove_step` `{ explorationId, entryId }` — Delete a recorded entry.
- `generate_dsl_from_exploration` `{ explorationId }` — Emit DSL (+ warnings) from the session.
- `save_exploration_as_test` `{ explorationId, scenarioName? }` — Write the scenario to disk; extract `flow:` steps into helpers.
- `explore_run_flow` `{ explorationId, name, ... }` — Replay a saved flow to seed state, recorded as one step.

## Debugger & runtime

Run a scenario through the sandboxed engine and step through it.

- `run_test` `{ scenario, browsers? }` — Run a scenario; returns a runtimeId.
- `step` `{ runtimeId }` — Execute one step of a paused runtime.
- `resume` `{ runtimeId }` — Resume to completion or the next breakpoint.
- `inspect_runtime` `{ runtimeId }` — Variables, call stack and last event at the pause point.
- `abort_runtime` `{ runtimeId }` — Abort with clean teardown.
- `list_runtimes` `—` — All active runtimes + status.

## Multi-context

Tabs and frames.

- `list_pages` `—` — All open tabs (index + URL).
- `list_frames` `—` — The frame stack.
- `get_active_context` `—` — Which tab is active.
- `switch_page` `{ index }` — Switch the active tab.

## Failure artifacts

Read the evidence bundled when a run fails.

- `list_failures` `—` — All failure bundles with timestamps.
- `get_failure_trace` `{ runId }` — DSL execution trace.
- `get_failure_console` `{ runId }` — Browser console logs.
- `get_failure_a11y` `{ runId }` — ARIA tree at the failure (YAML).
- `get_failure_screenshot` `{ runId }` — Screenshot at the failure.
- `get_failure_network` `{ runId }` — HAR (only if tier3.network is enabled).

## Agent self-service & viewer

Diagnostics, the human-in-the-loop fix flow, and the viewer.

- `agent_fix` `{ location, fix }` — Composes failure context for a fix. No LLM call, no auto-apply — you review the diff.
- `get_last_mcp_log` `—` — JSONL debug log of the last run (under UNOTEST_DEBUG).
- `audit_last_run` `—` — Deterministic rubric audit of the run against the rules (no LLM).
- `open_viewer` `—` — Launch (or reuse) the local viewer; returns its URL.


---

# Authoring guide

> The rules an agent should follow when writing unotest scenarios.

This is the contract for writing good scenarios — the same guidance shipped with
the package at `node_modules/@unotest/web/guides/agent-integration.md`. Point your
agent here.

## Structure

- One scenario file per feature, in a subfolder of `unotest/e2e/` (not its root).
- Export `test_*` functions.
- Wrap **every** executable step in `step("intent", () => { … })`. The label is
  the human-readable intent; keep it specific.
- Factor repeated journeys into `flow_*` helpers; seed data with `dbExec` /
  `apiCall` / `shell` mocks.

## Selectors

Use the most stable matcher available:

```
getByTestId → getByRole(name) → getByLabel → getByText → locator(css)
```

Refine with `filter({ hasText })` before `first()`. Avoid index-only `nth()`,
deep CSS, XPath, and hashed class names — the linter flags them.

## Waiting

Prefer `waitFor` / `assert*` (which poll) over `pause(ms)`. If you must pause,
add a `// reason:` comment.

## Variables

Reference secrets and config by bare `UPPER_SNAKE` identifiers — never hardcode
credentials, never use `{{mustache}}`.

## The repair loop

On failure: call `inspect_runtime`, read the failure bundle, classify the cause
(`rewrite-selector` / `add-waitfor` / `change-assertion`), produce a **diff**, and
hand it to the human. Do not apply patches silently.

## Verify before finishing

Run the scenario (`run_test`) and confirm it passes. Lint it
(`npx @unotest/web lint`). The result must be clean, readable `.js`.


---

# Connect your editor

> Wire Claude Code, Cursor or Codex to the unotest MCP server.

After `npx @unotest/web init` (or `@unotest/mobile install`), the MCP server is
wired for you. This page is what's happening under the hood — and how to wire it
by hand if needed.

## What init writes

- `.mcp.json` — the MCP server entry (stdio transport).
- Editor settings (e.g. `.claude/settings.json`) — the tool allowlist + an agent
  authoring guide.

## Claude Code

`init` writes `.mcp.json` at the project root, which Claude Code auto-discovers.
The server is launched as `npx @unotest/web mcp` (stdio). Open Claude Code in the
project and ask it to write a test.

## Cursor / Codex

Both read MCP server config from their settings. `init` populates it. If you wire
it manually, point the server command at `npx @unotest/web` (web) or
`npx @unotest/mobile` (iOS) — running with no subcommand starts the MCP server.

## Monorepos

Open your editor at the **monorepo root** and run `init` there, so `.mcp.json`
lives where the editor looks for it.

## Verify

Ask the agent to "write an e2e test for sign-in". If the MCP tools are available,
it will explore your app and produce a scenario. If not, re-run `init` and reload
the editor.


---

# Machine-readable docs

> Every page as Markdown, plus llms.txt and llms-full.txt for agents.

These docs are built to be consumed by agents as well as people.

## Per-page Markdown

Every page is available as raw Markdown by appending `.md` to its path:

```
https://docs.unotest.com/reference/dsl.md
https://docs.unotest.com/concepts/scenarios.md
```

Each page also has **Copy as Markdown** and **Open in ChatGPT / Claude** buttons
under its title.

## llms.txt

A curated index of the docs for LLMs (per [llmstxt.org](https://llmstxt.org)):

```
https://docs.unotest.com/llms.txt
```

It lists every page with a one-line description and a link to its `.md` form.

## llms-full.txt

The entire documentation concatenated as Markdown, in one fetch:

```
https://docs.unotest.com/llms-full.txt
```

## In your editor

For authoring tests in-context, the unotest **MCP server** is the richest path —
the agent reads the live app and these conventions directly. See
[Connect your editor](/agent/connect/).


---

# Overview

> How an AI agent uses unotest — MCP-native, with a human in the loop.

unotest is built for agents. Your editor's AI agent connects to the MCP server
and drives your real app to write, run and repair tests — while you stay in
control of what gets committed.

## What the agent does

1. **Explore** — reads a [semantic snapshot](/start/how-it-works/) and interacts
   with the live app through MCP tools.
2. **Record** — captures actions into a scenario with stable selectors and
   `step("intent", …)` labels.
3. **Run** — executes the scenario through the sandboxed engine.
4. **Repair** — on failure, pauses, inspects, and proposes a fix.
5. **Hand off** — you review the `.js` diff and commit.

## The contract

- The agent calls ~37 [MCP tools](/reference/mcp-tools/) — it doesn't need to
  know your infrastructure.
- `agent_fix` **never calls an LLM and never auto-applies** a patch. The agent
  forms the diff; the human approves it.
- Output is always plain `.js` in the repo — auditable, reviewable, revertible.

## For agents reading this

The whole site is machine-readable: every page has a `.md` form, there's a
[`/llms.txt`](/llms.txt) index and a [`/llms-full.txt`](/llms-full.txt) dump. See
[Machine-readable docs](/agent/machine/).

## Next

- [Connect your editor](/agent/connect/)
- [Authoring guide](/agent/authoring/)
- [Worked example — the exact MCP call trace](/agent/worked-example/)
- [MCP tools reference](/reference/mcp-tools/)


---

# Worked example — authoring over MCP

> The exact MCP tool-call sequence to explore an app, record a scenario, save it, and verify it runs green.

This is the orchestration an agent runs to author a test end-to-end — the literal
tool-call trace, plus the exact input schema for the recording tools.

## The loop

```
explore_start  →  goto  →  get_page_snapshot  →  explore_step×N (with refs)
               →  save_exploration_as_test  →  run_test  →  (step / resume / inspect_runtime on pause)
```

## Trace: record a sign-in

```jsonc
// 1. Start a recording session.
explore_start({
  scenario_name: "auth/login",
  title: "Sign in",
  description: "Demo user signs in"
})
// → { explorationId, availableVariables, availableFlows }
// If availableFlows already has "signin", DON'T re-record it — call explore_run_flow.

// 2. Navigate. goto needs no element ref.
explore_step({
  explorationId, action: "goto", url: "/login",
  section: "Sign in", description: "Open the login page"
})

// 3. Snapshot to get element ref handles ([ref=eN]).
get_page_snapshot()
// → outline containing [ref=e3] email input, [ref=e5] password input, [ref=e8] submit button

// 4. Record actions, addressing elements by ref.
explore_step({
  explorationId, action: "fill",
  locator: { kind: "locator", steps: [{ kind: "ref", ref: "e3" }] },
  value: "demo@example.com",
  section: "Sign in", description: "Type the email"
})
explore_step({
  explorationId, action: "fill",
  locator: { kind: "locator", steps: [{ kind: "ref", ref: "e5" }] },
  value: "hunter2",
  section: "Sign in", description: "Type the password"
})
explore_step({
  explorationId, action: "click",
  locator: { kind: "locator", steps: [{ kind: "ref", ref: "e8" }] },
  section: "Sign in", description: "Submit the form"
})

// 5. Save to disk. flow:-marked steps are extracted into _helpers/<flow>.js.
save_exploration_as_test({ explorationId, scenarioName: "auth/login" })
// → writes unotest/e2e/auth/login.js

// 6. Verify it runs green.
run_test({ scenario: "auth/login" })
// → { runtimeId }. If it pauses (breakpoint/failure): inspect_runtime → patch → resume.
```

:::note[Assertions aren't recorded actions]
The recordable actions are interactions and waits — not `assert*`. Add assertions
(`assertVisible`, `assertText`, …) to the generated `.js`, or use `wait_for_text`
as an in-flight checkpoint while recording. Credentials should be
[variables](/concepts/variables/), not literals like above.
:::

## explore_step / explore_record — input schema

`explore_step` runs one action. With an `explorationId` it **records** the step;
without one it executes **ad-hoc** (no recording). `explore_record` is the same
envelope but `section` + `description` are **required** and locators **must** be
ref-form.

| Field | Type | Notes |
| --- | --- | --- |
| `action` | string | **required** — one of the actions below |
| `explorationId` | string | present → record · absent → ad-hoc run |
| `locator` | ref locator | element to act on (see below) |
| `url` | string | for `goto` |
| `pattern` | string | for `wait_for_url` (substring match) |
| `value` | string \| string[] | for `fill`, `select_option` |
| `key` | string | for `press` (e.g. `"Enter"`) |
| `text` | string | for `wait_for_text` |
| `options` | object | action options (e.g. `{ force: true }`, nav options) |
| `section` | string | group label · **required when recording** |
| `description` | string | step intent · **required when recording** |
| `flow` | string | mark step as part of a reusable `flow_<name>` |
| `allowNoRef` | boolean | allow a non-ref locator (last resort) |

### Actions and their fields

| Action | Fields |
| --- | --- |
| `goto` | `url` |
| `reload`, `go_back`, `go_forward` | — |
| `click`, `double_click`, `hover`, `check`, `uncheck`, `scroll_into_view`, `wait_for` | `locator` |
| `fill` | `locator`, `value` (string) |
| `press` | `locator`, `key` |
| `select_option` | `locator`, `value` (string \| string[]) |
| `wait_for_text` | `text` |
| `wait_for_url` | `pattern` |
| `enter_frame` | `locator` |
| `exit_frame`, `set_page` | tab/frame switch — see the live tool description |

## Ref locators

While recording, address elements by the `[ref=eN]` handles from
`get_page_snapshot` (or `get_aria_snapshot`):

```json
{ "kind": "locator", "steps": [{ "kind": "ref", "ref": "e8" }] }
```

Recording **rejects non-ref locators** (so saved tests get stable
`getByRole`/`getByTestId` selectors, not brittle ones). Pass `allowNoRef: true`
only as a last resort for a hand-written locator. On save, refs are resolved to
the stable selector form per the [selector priority](/concepts/selectors/).

## After saving

The generated `unotest/e2e/auth/login.js` is plain `.js` with `step("…")`
blocks. Add assertions, run `run_test`, then `npx @unotest/web lint`. On failure,
read the [failure bundle](/concepts/failure-bundles/) and propose a **diff** — the
human approves it ([self-healing is never silent](/concepts/debugging/)).


---

# Changelog

> Notable changes to unotest.

The packages are pre-1.0 and move fast. This page tracks notable, user-facing
changes to the documentation and ecosystem.

## Unreleased

- **Docs launched** at `docs.unotest.com` — human + agent first-class, with
  `llms.txt`, per-page Markdown, and a reference section generated from the code.
- `step("intent", () => { … })` is the scenario step form. The older
  `//@collapse` comment syntax has been removed.

:::note
Per-package release notes live in each package's `CHANGELOG.md` on npm. This page
summarizes changes that affect how you use unotest.
:::


---

# Ecosystem & packages

> The packages that make up unotest and how they fit together.

unotest is a small set of focused packages.

| Package | Role |
| --- | --- |
| [`@unotest/web`](https://www.npmjs.com/package/@unotest/web) | CLI · MCP server · runner · DSL engine (web) |
| [`@unotest/mobile`](https://www.npmjs.com/package/@unotest/mobile) | CLI · MCP server · runner (iOS) |
| [`@unotest/viewer`](https://www.npmjs.com/package/@unotest/viewer) | local results browser (HTTP + WS) |
| [`@unotest/dsl`](https://www.npmjs.com/package/@unotest/dsl) | scenario parser + vocab-agnostic validator |
| [`@unotest/protocol`](https://www.npmjs.com/package/@unotest/protocol) | shared types across runners |

## How they fit

- `@unotest/web` / `@unotest/mobile` are what you install. Each is a CLI **and**
  an MCP server **and** the runner.
- `@unotest/viewer` reads run artifacts and drives runs through a runner adapter
  — today the web adapter.
- `@unotest/dsl` parses and validates scenarios; the web vocab contracts live in
  `@unotest/web`.
- `@unotest/protocol` is pure types — the shared contract that lets the viewer
  drive any runner.

## Links

- Marketing site — [unotest.com](https://unotest.com)
- Playground — [playground.unotest.com](https://playground.unotest.com)


---

# Troubleshooting

> Common issues and known limitations.

## Diagnostics

In normal operation the CLI prints a single actionable line — no stack traces.
For full diagnostics (JSONL log + per-call artifacts), set:

```sh
UNOTEST_DEBUG=1 npx @unotest/web e2e <name>
```

## Common issues

- **`astro: not found` / command missing** — install dependencies first
  (`npm install`).
- **No MCP tools in the editor** — re-run `init`, then reload the editor so it
  re-reads `.mcp.json`.
- **Selector keeps drifting** — switch to a [stable selector](/concepts/selectors/)
  (`getByTestId` / `getByRole`).
- **Flaky tab/iframe timing** — `waitForPage()` isn't shipped; use `pause(ms)`
  with a `// reason:` comment.
- **iOS: "command not found" on `xcrun`** — you're not on macOS or missing Xcode
  CLI tools. iOS is macOS-only.

## Known limitations

- **Network HAR & video** aren't captured yet (failure bundle tier 3). Use
  screenshot + console + semantic DOM + trace.
- **`dragAndDrop` / clipboard paste** use synthetic events, not native HTML5
  `DragEvent` / `ClipboardEvent`. For apps that require native events, dispatch
  via `evaluate()`.
- **The viewer currently drives web only.** Its runner architecture is
  multi-runner-ready, but mobile isn't wired into it yet.

For agent-facing rules, see the [Authoring guide](/agent/authoring/).


---

# unotest documentation

> AI-native E2E testing for web and iOS. Your agent writes the tests — you review and commit.

import { Card, CardGrid, LinkCard } from '@astrojs/starlight/components';

## Start building

<CardGrid>
  <LinkCard title="Quick start — web" href="/start/quickstart-web/" description="npx @unotest/web init → ask your agent → run → review." />
  <LinkCard title="Quick start — iOS" href="/start/quickstart-ios/" description="Point it at a built .app and ask your agent for a test." />
  <LinkCard title="How it works" href="/start/how-it-works/" description="MCP, semantic snapshots, a sandboxed engine, local-first." />
  <LinkCard title="Install & setup" href="/start/install/" description="Node, browsers, and wiring your editor over MCP." />
</CardGrid>

## Learn the model

<CardGrid>
  <Card title="Scenarios & step()" icon="document">
    Tests are readable on top, precise underneath. Every step is `step("intent", () => { … })` — plain-English intent over real DSL calls. [Read more](/concepts/scenarios/)
  </Card>
  <Card title="The Viewer" icon="laptop">
    A local IDE for your tests: live runs, a step debugger, collections, variables — no cloud, no account. [Tour the viewer](/viewer/overview/)
  </Card>
  <Card title="Reference" icon="open-book">
    Auto-generated from the code: [CLI](/reference/cli-web/), [DSL](/reference/dsl/), [config](/reference/config/), [MCP tools](/reference/mcp-tools/).
  </Card>
  <Card title="For your AI agent" icon="rocket">
    MCP-native. Connect [Claude Code / Cursor / Codex](/agent/connect/) and let the agent drive. Docs are machine-readable too: [llms.txt](/llms.txt).
  </Card>
</CardGrid>