Telecommunications

Agentic Analytics Platform

Challenge

A major network provider required an on-premise Chat with Data interface for their wireless and GPON domains. The initial internal prototype was not working; it attempted fully autonomous agentic reasoning on CPU-only infrastructure, resulting in 10-minute query latencies and unmeasured, erratic accuracy.

Solution

We restructured the project using our 'Complexity-Last' methodology. We replaced the monolithic agent with a Dual-Path Architecture that routes 80% of queries through a deterministic 'Fast Path' (sub-2s) and reserves expensive reasoning only for complex temporal or multi-domain queries. We shifted from 'prompt engineering' to an Evaluation-Driven pipeline.

Results

The system achieved >80% accuracy on a rigorously curated Golden Set while reducing average latency from minutes to 6-8 seconds on commodity hardware. The architecture successfully democratized data access for Field Engineers without needing GPU.

The Challenge: The “Impossible” Triangle

The client approached us with a paralyzed initiative. They sought to deploy a natural language interface allowing Field Engineers and NOC analysts to query massive datasets (700M+ records) using plain English (e.g., “Show sites with high congestion”).

However, the project was trapped in an “impossible triangle” of constraints:

  1. Capability: They demanded “Agentic” reasoning to handle complex, multi-step troubleshooting.
  2. Infrastructure: Strict data sovereignty required air-gapped, on-premise deployment on standard CPU servers.
  3. Performance: Users expected near-instant “Google-like” responsiveness.

Their initial prototype attempted to solve this by throwing a monolithic LLM agent at every query. The result was a system that took up to 10 minutes to process a single request on a CPU, with no quantitative way to measure if the answer was even correct.

Our Approach: Engineering over “Magic”

We diagnosed the failure as a classic case of “Premature Complexity.” The team was attempting to use a Level 3 Agent to solve Level 1 Retrieval problems. We dismantled the monolith and implemented a Dual-Path Architecture.

1. Evaluation-Driven Development (EDD)

Before writing code, we established a Golden Dataset. We discovered that the previous system’s accuracy was anecdotal rather than based on benchmarks.

  • We curated a benchmark of Question-SQL pairs.
  • We established a strict metric: Execution Match Accuracy. It does not matter if the SQL looks right; it matters if it returns the correct data.

2. The Dual-Path Architecture

We recognized that not all queries require “reasoning.” A query like “What is the RRC success rate for Site X?” is a retrieval task, not a planning task.

  • Path 1 (Fast Path): For 80% of queries (Simple Analytics), we bypassed the agent entirely. We used a streamlined Retrieval-Augmented Generation (RAG) pipeline to map questions directly to validated SQL patterns. This reduced latency to <2 seconds.
  • Path 2 (Complex Path): Only when the Triage Agent detected multi-step logic (e.g., “Find sites with SINR < 5dB for more than 6 hours”) did we escalate to the “Decompose-Plan-Execute” framework. This path allows for higher latency (Tier 3) but ensures correctness for complex temporal reasoning.

3. Intermediate Representation

The client’s previous “black box” model failed, and engineers had no idea where the problem was, if the error was in table selection, column matching, or filtering. We implemented an Intermediate Representation (IR) pipeline. Instead of generating SQL directly, the system first generates a JSON “Plan”.

  • Schema Pruning: A dedicated ML model reduces the search space from hundreds of tables to the specific few relevant to the query.
  • Deterministic Validation: By validating the JSON plan before SQL generation, we eliminated hallucinations like SELECT * or invented columns.

Business Impact

By treating Intelligence as a Cost rather than a Virtue, we delivered a system that was viable for production deployment.

  • Operational Viability: Latency dropped from ~10 minutes to an average of 6-8 seconds, falling within the acceptable “Tier 2/3” window for complex analytics.
  • Trust: The shift to an evaluation-based workflow raised accuracy on the Golden Set to over 80%, recovering the previously failing project.
  • Cost Efficiency: By optimizing for CPU inference (using quantized models), we introduced massive cost and time savings.

Technology Stack

  • Orchestration: Custom
  • Text-to-SQL: Custom Implementation
  • Vector Store: ChromaDB (Local/On-Prem)
  • Inference: vLLM
  • Caching: Redis (L1 Query Cache & L2 KV Cache)

Ready to discuss your AI project?

Our team of experts is here to help you transform your business with AI.

Schedule a Consultation