Building a self-healing query loop for a production Google Ads AI agent
One of the hardest parts of shipping an AI agent that reads real data is making it resilient. When you're generating database queries on the fly, things go wrong constantly: wrong field names, incompatible resource combinations, missing dependencies between queries. You can prompt your way around some of it, but at some point you need the system to detect failure, figure out what went wrong, and try again.
This is a writeup of how I built a self-healing GAQL query loop for Wolfy, a Google Ads analyst agent. The loop generates queries, executes them against the Google Ads API, diagnoses failures, and retries — all without user intervention.
The problem
Wolfy answers natural language questions about Google Ads accounts. "Why did conversions drop last week?" or "Which search terms are wasting budget?" Under the hood, that means generating and executing GAQL queries — Google Ads Query Language, which is SQL-like but with its own constraints and incompatibilities.
GAQL is not forgiving. You can't mix certain resources in the same query. Some metrics require specific segments. Some fields only exist in certain API versions. Quality score data has to come from a completely separate query than performance data. And some of these rules aren't surfaced until the API returns an error.
Early versions of Wolfy used a straightforward approach: generate queries, run them, return results. When a query failed, the whole thing failed. That's fine for demos, not fine for production.
The architecture
The loop lives inside a SequentialAgent that handles a full data retrieval cycle:
- Initial analyst — interprets the user's request, decides what data is needed, and either calls fixed pre-built tools or writes a data request for GAQL generation
- GAQL loop — generates, executes, and heals queries (more on this below)
- Data summarizer — post-processes results in pure Python before any LLM touches them
- Report writer — produces the final user-facing analysis
The loop itself is a LoopAgent with three sub-agents:
gaql_loop_agent = LoopAgent(
name="wolfy_gaql_loop",
sub_agents=[gaql_generation_agent, gaql_execution_agent, analyst_agent],
max_iterations=5,
before_agent_callback=skip_gaql_loop_if_data_complete,
after_agent_callback=mark_incomplete_gaql_steps_failed,
)
Each iteration runs all three agents in sequence: generate, execute, analyze. The loop exits when the analyst calls exit_loop(), or after 5 iterations.
Generation: writing GAQL that actually works
The generation agent's job is to write valid GAQL. But "valid" is harder than it sounds.
The agent has access to a get_view_doc tool that fetches live field documentation for any GAQL resource:
def get_view_doc(tool_context: ToolContext, view: str) -> dict:
"""Get resource view docs for a given Google Ads API view resource."""
with open(f"context/views/{view}.yaml", "r") as f:
raw = f.read()
# Returns structured field metadata: types, enums, valid segments, metrics
The view files are pre-generated YAML from the Google Ads API schema: every valid field, its type, and whether it's filterable or sortable. The agent is instructed to always call get_view_doc before writing a query for a resource. This prevents a whole class of hallucinated field names.
Beyond documentation, there's a static validation layer that catches known-bad patterns before the query ever hits the API:
def _validate_gaql_query(query: str) -> Optional[str]:
# DURING must be preceded by segments.date
# LIMIT required (unless segments.search_subcategory is present)
# LIMIT must not exceed 1000
# No subqueries (GAQL doesn't support them)
# No OR in WHERE clause (split into separate queries instead)
# campaign_search_term_insight requires a WHERE filter
# ad_group_criterion doesn't support any metrics
# ...and more
This runs inside the upsert_gaql_query tool, so the agent gets immediate feedback if a query violates a known constraint, before committing it to state.
Queries are stored in session state as a dictionary keyed by query ID, with status tracked per query: pending, pending_dependencies, success, or error. This lets the execution agent run only what's ready, and lets the analyst agent see the full picture across iterations.
Handling query dependencies
Some analyses require data from one query to generate another. To get quality scores for the top 10 campaigns by spend, for example, you first need the campaign IDs, which come from the first query.
The system handles this with a dependency model:
{
"query-1": {
"id": "query-1",
"query_title": "Top campaigns by spend",
"status": "pending",
"query": "SELECT campaign.id, metrics.cost_micros FROM campaign ...",
"depends_on": None
},
"query-2": {
"id": "query-2",
"query_title": "Quality scores for top campaigns",
"status": "pending_dependencies",
"query": None,
"depends_on": ["query-1"]
}
}
When query-1 succeeds, the loop analyst inspects its results, extracts the literal campaign IDs, and provides them to the generation agent. The generator then writes query-2 with those IDs hardcoded. No placeholders, no dynamic substitution at execution time. The validator explicitly rejects any query containing placeholder text.
This is intentional. Dynamic value substitution at execution time is a footgun. Hardcoded literal IDs in the query string make execution simple and deterministic.
Execution: parallel and stateful
The execution agent runs all pending queries in parallel via the execute_gaql_query tool. It's deliberately simple: no retry logic, no error recovery, just run everything and record what happened:
async def execute_gaql_query(tool_context, id, query, query_title):
# Validate no placeholders
# Execute against Google Ads API
# Store full results in gaql_execution_results[id]
# Update query status in gaql_queries[id]
# Return simple success/error dict
Results go into two places in session state: gaql_execution_results (full row data) and gaql_queries (status and metadata). Keeping these separate matters because execution results can be large, but the query status dict is small and gets passed to every subsequent agent as context.
The execution agent outputs a structured summary and does nothing else. Analysis is the next agent's job.
The loop analyst: where the healing happens
After each execution pass, the loop analyst reviews the state of all queries and decides what to do.
It has access to get_gaql_result_summary, a tool that returns pre-calculated KPIs, row counts, dimensions, and a sample of rows for any successfully executed query. The analyst only falls back to get_gaql_query_results (full rows) when it specifically needs more IDs than the summary sample provides.
The analyst handles four cases:
All queries succeeded. Verify data quality, then call exit_loop(). Done.
Some queries failed. Read the error messages, diagnose the root cause, and write guidance for the generator. The analyst doesn't fix queries itself — it explains what went wrong so the generator can produce corrected versions in the next iteration.
Some queries have unresolved dependencies. Check whether the dependencies succeeded, extract literal IDs from their results, and provide those IDs explicitly in the guidance. The generator needs literal values, not references.
Data quality issues. Even successful queries can return the wrong thing. The analyst checks whether retrieved data actually satisfies the original goal, and requests additional queries if not.
The guidance the analyst produces gets picked up by the generator in the next iteration. This is not a function call — it's text output that the generator reads as its input context (analyst_output is injected into the generator's prompt via session state). The loop runs again.
Exiting the loop
The loop exits in one of three ways:
- The analyst calls
exit_loop()— data is complete and verified max_iterations=5is hit — theafter_agent_callbackmarks any still-in-progress queries as failed- A pre-loop callback (
skip_gaql_loop_if_data_complete) detects that fixed tools already handled the request and skips the loop entirely
Hitting max_iterations without success is a legitimate outcome. The report writer downstream handles partial data gracefully — it knows which queries succeeded and which didn't, and can still produce useful output from whatever data came back.
The data summarizer: keeping LLMs out of arithmetic
Once the loop exits, results go through a pure Python processing step before any LLM sees them. This is a BaseAgent subclass with no model — it just runs Python:
class DataSummarizer(BaseAgent):
async def _run_async_impl(self, ctx):
for qid, result in gaql_results.items():
summarized[qid] = summarize_gaql_result(qid, result)
ctx.session.state[SUMMARIZED_RESULTS_KEY] = summarized
yield Event(actions=EventActions(skip_summarization=True))
The summarization pre-computes grand totals, KPIs (CTR, ROAS, CPA, CVR), and enriched per-row metrics. The report writer gets these pre-calculated numbers — it never sees raw cost_micros values or has to divide conversions by clicks.
This matters for two reasons. LLMs are unreliable at arithmetic, especially with large numbers. And it significantly reduces token usage: the report writer gets a compact, semantically rich payload rather than thousands of raw API rows.
What this pattern is actually solving
The self-healing loop exists because GAQL errors are not random. They're structural. The API has rules about what can be combined, what requires what, and what's unavailable for certain account types. You can encode the most common rules into static validation. For the rest, you need the system to observe actual API errors and reason about them.
The key design decisions:
Separating generation, execution, and analysis into distinct agents keeps each concern clean. The generator doesn't need to know about API error formats. The executor doesn't need to understand query semantics. The analyst doesn't generate code.
State as the integration layer means agents communicate through structured data in session state rather than through model outputs. Query status, results, and analyst guidance all live in state, not in conversation history.
Validation at write time catches the easy errors before they hit the API, making the error budget available for the genuinely hard cases.
The data summarizer as a non-LLM agent is a pattern worth stealing. Anytime you have deterministic post-processing that shouldn't involve a model, a BaseAgent subclass is the right tool.
The loop won't fix everything. There are GAQL constraints that require allowlisted API access, account-specific configurations, or edge cases that no prompt can navigate. When those hit, the analyst reports them clearly and the system degrades gracefully. But for the broad class of errors that come from imperfect query generation, five iterations is almost always enough.