Logo Ipsum

Vizzia

How We Built a Slack AI Legal Agent with OpenAI Agent Builder - Vizzia

Context

The client: Vizzia, cloud video-surveillance for French municipalities

Vizzia is a scale-up building cloud video-surveillance with on-board detection algorithms, sold exclusively to French municipalities. Their 4G-based offering replaces traditional wired systems that require civil engineering, and the cloud layer makes judicial requisitions a one-click operation instead of a physical USB-stick handoff.

Like most scale-ups selling into a sector where the law is still being written, Vizzia's Legal team is structurally outnumbered: Julien (public affairs / legal) and Inès (in-house counsel). They're the only two people who can answer "can we use algorithm X in context Y without triggering CNIL obligations?" and everyone in the company needs that answer sometimes.


The problem: a 2-person Legal team about to get hit by a 15× volume spike

When we kicked off the project in January 2026, the situation was already familiar:

  • Every question routed to two people. Sales in a discovery call, a CSM preparing a renewal, a Product Manager scoping a new algorithm, all pinged Julien or Inès on Slack for legal clarity.

  • No self-service. Knowledge lived across Notion pages, scattered internal documents, 8 months of Slack Q&A history, and official sources (Légifrance, Conseil d'État, Conseil constitutionnel, CNIL, Dalloz). No single place to ask.

  • A hard deadline. The April 2026 French municipal elections were going to unblock a wave of new accounts. Vizzia's own projection: volume jumping from a few questions per day to ~30/day in mid-April. A 15× spike on a team of two.

  • A regulated sector with no settled case law. B2G video-surveillance with algorithmic detection sits in a legal grey zone, jurisprudence is thin, doctrine is fuzzy, and non-Legal teammates routinely conflate adjacent legal frameworks. You can't paper over that with a generic ChatGPT bot.

The ask: build something that lets Sales, CSM and Tech self-serve 80% of recurring questions, keeps Legal in the loop for everything that goes out, and is ready before the April peak.


Why OpenAI Agent Builder (and not "just a ChatGPT")

OpenAI released Agent Builder in October 2025 as part of AgentKit. It's a no-code visual canvas for building multi-step AI agents: nodes for prompts and models, native File Search for RAG, MCP connectors, guardrails, conditional branching, and a first-class User Approval node that pauses execution pending human validation.

Concretely, Agent Builder let us ship three things we would otherwise have had to assemble ourselves:

  1. A production-grade RAG pipeline on Vizzia's internal corpus, without running our own vector DB or chunking logic.

  2. A scoped web search restricted to a whitelist of official legal sources.

  3. A human-in-the-loop validation flow where every answer lands in Legal's review queue before reaching the requester.

Everything stays inside OpenAI's ecosystem. No extra infrastructure to operate. And if we ever need to sidestep the canvas, the workflow exports cleanly to TypeScript via the Agents SDK — so we're not locked into the no-code editor.

The architecture in production

The system has four moving parts:




The only custom layer is a Cloudflare Worker that acts as the Slack ↔ Agent Builder bridge: it handles inbound Slack events, maintains thread continuity, calls the agent workflow, surfaces the draft answer in #legal-review, listens for the Legal team's validation actions, and posts the approved answer back in the original Slack thread. Roughly ~300 lines of TypeScript, co-written with Claude, shipped in a day.

Everything else, the three agents, the RAG, the guardrails, the approval gate, lives inside Agent Builder.



The workflow, step by step

Step 1 - Critical analysis agent

A gpt-5.4 node that does three things in one pass: (1) verifies the question's premises against Vizzia's internal corpus (File Search) and the injected legal-formation knowledge, flagging any false assumption baked into the question, (2) lists the points of law the research agent will need to investigate, and (3) decides the response format the synthesis agent should produce — adapted to the nature of the question (a quick scoping ask doesn't need the same scaffolding as a doctrine question).

Prompt (abridged):

You are the critical analysis agent for Vizzia's internal legal assistant.

You receive a free-form question from a non-legal Vizzia employee
(Sales, CSM, Product, or Tech).

Your job:
1. Verify the premises of the question against the internal knowledge
   you have access to (File Search + injected formation). Flag any
   false premise explicitly.
2. Identify the points of law that need to be researched downstream.
3. Decide the response format the synthesis agent should produce —
   adapted to the question (long structured response vs. short
   clarifier).

Output JSON:
{
  "premises_check": "<what holds, what doesn't, why>",
  "points_of_law": ["<point 1>", "<point 2>", ...],
  "open_questions_for_research": ["<question 1>", ...]

You are the critical analysis agent for Vizzia's internal legal assistant.

You receive a free-form question from a non-legal Vizzia employee
(Sales, CSM, Product, or Tech).

Your job:
1. Verify the premises of the question against the internal knowledge
   you have access to (File Search + injected formation). Flag any
   false premise explicitly.
2. Identify the points of law that need to be researched downstream.
3. Decide the response format the synthesis agent should produce —
   adapted to the question (long structured response vs. short
   clarifier).

Output JSON:
{
  "premises_check": "<what holds, what doesn't, why>",
  "points_of_law": ["<point 1>", "<point 2>", ...],
  "open_questions_for_research": ["<question 1>", ...]

You are the critical analysis agent for Vizzia's internal legal assistant.

You receive a free-form question from a non-legal Vizzia employee
(Sales, CSM, Product, or Tech).

Your job:
1. Verify the premises of the question against the internal knowledge
   you have access to (File Search + injected formation). Flag any
   false premise explicitly.
2. Identify the points of law that need to be researched downstream.
3. Decide the response format the synthesis agent should produce —
   adapted to the question (long structured response vs. short
   clarifier).

Output JSON:
{
  "premises_check": "<what holds, what doesn't, why>",
  "points_of_law": ["<point 1>", "<point 2>", ...],
  "open_questions_for_research": ["<question 1>", ...]


Step 2 - Legal research agent

The workhorse. gpt-5.4 with two tools attached, and the only agent in the pipeline with web access:

  • File Search on a vector store of Vizzia's internal legal corpus, kept fresh by a daily Drive → OpenAI sync we wrote (any document removed from Drive is detached and the underlying File deleted, so storage cost stays bounded).

  • Web Search scoped to a whitelist of official French and EU legal domains: legifrance.gouv.fr, cnil.fr, conseil-etat.fr, conseil-constitutionnel.fr, service-public.fr, ...`.

The agent pulls grounding from both sources depending on the triage output and returns a raw research payload with explicit citations.

Prompt (abridged):





Step 3 - Answer synthesis agent

Takes the research brief and writes the final answer. gpt-5.4. The synthesis agent's only job is to turn validated research into a clean response.

The format is adapted to the question based on the routing call from Step 1: a longer, structured legal response when the question warrants it (bottom-line answer up top, reasoning, conditions and nuances, practical phrasing the Sales or CSM rep can reuse, citable sources at the end), or a tight one-paragraph answer when the question is a quick clarification or a scoping ask. The rep gets either the lawyerly explanation and the customer-safe phrasing side by side, or just the short answer they actually need — never more scaffolding than the question calls for.

This separation matters. Legal doesn't want a Sales rep to paste a jurisprudence excerpt into a deal email. They want the rep to understand the legal state and then have a pre-chewed customer-safe sentence to use in the conversation. The agent produces both, side by side.

Prompt (abridged):





Step 4 - User Approval node

The synthesised answer does not go back to the requester directly. It lands in a dedicated #legal-review Slack channel as an interactive message where Julien or Inès can:

  • Approve → the answer is published back to the original Slack thread.

  • ✏️ Correct → opens a Slack modal to rewrite the answer, then publish. The correction is persisted and re-injected into all three agents on the next runs as a reference example (Agent 1 uses it as a triage signal, Agent 2 as a sourcing guide, Agent 3 as a strong template).

  • 🔁 Improve → fires a follow-up instruction back into the pipeline to refine the draft.

Agent Builder's native User Approval node handles the pause-and-resume semantics. The Cloudflare Worker handles the Slack interactive payloads (button clicks and modal submissions) and resumes the workflow with the matching validation payload.


The guardrails

We layered four guardrails across the pipeline:

  • PII handling on input and output: we don't want a Sales rep accidentally pasting a prospect's personal data into the bot, and we don't want the bot echoing PII back.

  • Moderation: standard OpenAI moderation on output.

  • Self-validation in the synthesis prompt: the explicit checklist Agent 3 runs on its own draft (claim-vs-source audit, no internal docs cited as sources, format compliance). It replaces the separate validator agent we initially considered, fewer moving parts, no validator/author disagreement loop.

  • Domain whitelist on web search (handled at the tool level, only gouv.fr, cnil.fr, conseil-etat.fr and the rest of the official-domain list above can ever appear in a citation).

And one structural guardrail upstream of all of them: every answer is human-validated by Legal before it reaches the requester. Nothing auto-publishes today.

Results

What shipped, the output in production

Time to live: 6 days of build, spread across 2 months. Phase 1 (agent + Slack integration) in early February. Phase 2 (test & iterate with Julien/Inès validating every answer) through mid-March. Phase 3 (production handover, monitoring, hardening of the human-validation flow) in late March, live before the April election peak.

Response latency: ~30 seconds to a couple of minutes end-to-end from Slack question to review-ready draft (the longer runs hit web search + complex synthesis). Fast enough that a Sales rep can ask during a prep session, not only overnight.

Coverage at launch: the recurring topics Vizzia's Legal team identified during the 8-month Slack Q&A audit we ran before building. The pipeline adapts the response format to the question, long structured answer when the topic deserves it, short clarifier when it doesn't.

Citations, where they belong. Every claim that needs a public source carries an inline citation to a whitelisted official domain or an exact article of law. Internal documents are absorbed into the agent's reasoning, never quoted as a source, by design.

Legal keeps control. Every answer goes through human validation today. Validated rewrites are re-injected into all three agents on subsequent runs as reference Q&A, so the system gets sharper each week without any model fine-tuning.

Full handover. At the end of Phase 3 we transferred the OpenAI workspace, the Slack bot credentials, the Cloudflare Worker repo, and the prompt documentation to Vizzia's team. They own the system outright, we stay on retainer for category expansion and quarterly prompt reviews.


What we learned building on Agent Builder

What worked

  • File Search removes the RAG tax. Upload the corpus, OpenAI handles chunking, embeddings, indexing and inline citations. For a POC that needs to run in days, this saves a week of infra work against a self-hosted RAG stack.

  • The iteration loop is short. Editing a prompt, adding a node, swapping a model. You do it live on the canvas and test it immediately in preview. For a build where the client wants to see progress every week, that pace matters.

  • The built-in dashboard is enough for week one. Every run is logged with the full execution path, per-node inputs and outputs, tool calls, tokens, and latency. Debugging a weird answer means replaying the exact run, no external observability stack required on day one. This is the single feature that saved us the most time.

  • The SDK export is a real escape hatch. If we ever outgrow the canvas, we export the workflow to TypeScript and continue in our own repo. Not a prison.


The limits we hit

  • MCP integration doesn't handle OAuth. The MCP node takes a single access_token field, fine for a service-account setup like this one, blocking as soon as you want each user to authenticate against their Notion or HubSpot. That would require the SDK export and a custom OAuth flow. Open feature request on OpenAI's community board.

  • Fine-grained vector store ops require the API. Creating a static corpus through the UI is clean. Automating re-indexing when a document changes, setting expiration policies, versioning the corpus per tenant, all of that moves to the API. Agent Builder is a great place to build an agent, not to operate a production RAG lifecycle.

  • Still in beta, still OpenAI-locked. Six months post-launch, a few edges remain. And the whole system is lock-in on OpenAI models, vector store and infrastructure. Not disqualifying, but it's in the long-term equation.


Agent Builder vs n8n: where we drew the line

The first question people ask when we show this project: "why not n8n?"

Because n8n and Agent Builder don't solve the same problem.

n8n runs deterministic steps. Trigger fires, step A runs, step B runs, if condition then C. Same input, same output. That's procedural automation. For Vizzia's lead enrichment, scoring and Sales routing pipeline (which we also built), n8n is exactly right: hundreds of leads per week, explicit scoring weights, auditable routing logic. The CRO needs to explain to Finance why lead #47 is scored 85 and not 72, "the LLM decided" is not an acceptable answer.

Agent Builder orchestrates reasoning. You describe an objective and a toolbox; at runtime the agent picks which tool to call, in what order, with what parameters. Same input, possibly different outputs. For the Legal agent, this is the right shape: the question is ambiguous, the context is non-structured, and the agent needs to decide whether File Search is enough or whether a web search on official sources is also required.

In practice on Vizzia's broader stack, we stack both: n8n is the backbone for deterministic RevOps flows (enrichment, scoring, HubSpot syncs, Datapolitics signals), and Agent Builder agents are called via API from n8n whenever a step genuinely needs reasoning (ambiguous classification, unstructured text synthesis, legal Q&A like this one). You keep determinism and cost transparency on 90% of the stack, and you pay for LLM calls only where the reasoning is actually required.

Want a production AI agent for your own team?

Written by the Cashmyrr team. Technical deep-dive by Pierre Pénelon, Lead Tech.

Cashmyrr designs and builds AI agents, RevOps automation and CRM workflows for B2B scale-ups. If you have a recurring-question bottleneck, a knowledge base nobody queries, or a two-person team about to get hit by a volume spike → talk to us.

FAQ

Frequently

Asked Questions

Have questions? Our FAQ section has you covered with
quick answers to the most common inquiries.

What is a TAM Build?
How do you build a prospect database for the public sector when standard B2B tools don't cover it?
How long does it take to build a database like this?
Can you keep a public-sector database current?
Can this approach work for other countries?

Start streamlining your revenue operations today.

We’d love to hear about your team, your tools, and where you want to go next.