🏗️ ARCHITECTURE · TECH DEEP DIVE 11 min read

Building a Bangla-First Voice Bot: Lessons from Taritbandhu's Three-Tier Architecture

A practical guide to designing voice systems that handle millions of calls — with AI, deterministic data, and seamless human escalation.

By Speaklar Engineering Team ⚙️ March 2026

Bangla voice bot architecture multimodal AI Bangladesh speech API Bangladesh AI-human escalation

A voice bot that sounds great in a demo but fails under load isn't a solution — it's a liability. Taritbandhu, the hybrid AI call center for Bangladesh's power sector, handles over 50,000 calls per day during peak billing cycles. Its success isn't accidental; it's the result of a carefully designed three-tier architecture that balances AI flexibility with deterministic reliability.

In this deep dive, we unpack that architecture and show how any Bangladeshi organization can build a similar system.

The three-tier model at a glance

Taritbandhu's architecture separates concerns into three distinct layers:

Tier 1: Interface layer — where customers connect (voice, SMS, web).
Tier 2: Logic layer — the brain, combining AI, deterministic matching, and business rules.
Tier 3: Data layer — databases, APIs, and knowledge bases that hold the truth.

Let's explore each layer in detail.

📞 Tier 1: Interface layer — meeting the customer where they are

The interface layer is designed for universal access. Taritbandhu customers can interact via:

Phone call (primary): Speaklar's telephony integration handles SIP trunks, PSTN, and mobile networks. A single national short code (e.g., 1676) works for all.
SMS: For customers who prefer text or have low bandwidth.
Website/App: For digital-native users, a web widget with voice input.

Key lesson: Build for the lowest common denominator — basic phones with 2G. Voice works everywhere.

Speaklar's ASR is optimized for telephony-grade audio, with noise cancellation for rural environments. It converts “বিল কত?” to text accurately even with background static.

🧠 Tier 2: Logic layer — the brain that decides

This is where the magic happens. Incoming queries are classified into three paths:

Path A: Deterministic queries — things like “আমার বিল কত?” or “গত মাসের পেমেন্ট কবে করেছিলাম?”. These are routed to the data layer via API calls. The system fetches exact data from the billing database and returns it. No AI guesswork — just facts.

Path B: Conversational queries — questions like “নতুন সংযোগ পেতে কী কী লাগে?” or “লোডশেডিং কবে হবে?”. These go to a RAG pipeline that retrieves answers from FAQs, manuals, and outage schedules.

Path C: Complex / ambiguous — “আমার বিল বেশি এসেছে, কী করব?”. The AI collects details (account number, nature of dispute) and prepares a summary for Tier 3.

Key innovation: A classifier (trained on thousands of past calls) decides the path in under 200ms. If confidence is low, it defaults to Path C (human escalation).

🗄️ Tier 3: Data layer — the source of truth

The data layer consists of:

Transactional databases: Billing records, payment history, connection details. Accessed via secure APIs with real-time queries.
Document store (vector DB): FAQs, tariff manuals, outage schedules — chunked and embedded for RAG retrieval.
Conversation history: Stored transcripts for analytics and agent context.

Security: All PII (phone numbers, account details) is encrypted. The logic layer never sees raw credentials — it uses tokenized identifiers.

The escalation flow: AI → human with context

When the logic layer determines a human is needed, it doesn't just transfer — it passes a context package:

“গ্রাহক: মোঃ আলী, মিটার নং ১২৩৪৫৬। অভিযোগ: বিল বেশি আসায় আপত্তি। AI ইতিমধ্যে গত ৩ মাসের ব্যবহার দেখিয়েছে যা স্বাভাবিকের চেয়ে ৪০% বেশি। মিটার রিডিং ২৩৪৫৬ থেকে ২৩৫৭৮। সম্ভাব্য ত্রুটি।”

The agent sees this summary instantly. No need to ask "what's your account number?" again. Resolution time drops by 40%.

API-first design: connecting everything

Taritbandhu's architecture is built on APIs. Every component exposes well-defined interfaces:

Billing API: GET /customer/{id}/bill — returns current bill, due date, payment status.
Outage API: GET /zone/{id}/schedule — returns load-shedding times.
Document API: POST /retrieve — accepts query, returns relevant chunks from manuals.

This API-first approach means new channels (like a mobile app) can be added without touching the core logic.

Scaling for peak loads

During billing cycles, call volume can spike 5x. Taritbandhu's architecture scales horizontally:

Stateless logic layer: Any number of instances can handle requests.
Caching: Frequent queries (e.g., "bill amount") are cached for 1 hour, reducing database load.
Queue-based escalation: If all agents are busy, escalations go to a queue with estimated wait time.

The system has handled peaks of 120 calls/second without degradation.

Monitoring and observability

You can't improve what you don't measure. Taritbandhu tracks:

Containment rate: % of calls resolved without human (target >50%).
Path distribution: How many calls go deterministic vs. conversational vs. escalated.
Retrieval accuracy: Is the RAG finding the right documents?
Agent occupancy: Are humans overloaded or underutilized?

A dashboard gives operators real-time visibility.

Lessons for your own voice bot

Based on Taritbandhu's experience, here's what to prioritize:

Start with deterministic paths. Bill lookup, order status — these are easy to automate and build trust.
Design for handoff from day one. Assume some calls will need humans. Make that seamless.
Invest in your classifier. The decision of which path to take is make-or-break.
APIs are everything. If your data isn't accessible via API, your bot will be crippled.

How Speaklar enables this architecture

Speaklar provides the foundational components:

Telephony layer: SIP trunking, number provisioning, call routing.
Bangla ASR/TTS: Optimized for telephony and regional accents.
Conversational AI: Intent detection, entity extraction, dialog management.
RAG pipeline: Document ingestion, chunking, embedding, retrieval.
API gateway: Easy integration with your existing systems.

You focus on your business logic and data; we handle the voice complexity.

📊 Ready to build? Speaklar's professional services team can help you design your three-tier architecture in a 2-week sprint.

🏗️ See the architecture in action

Speaklar demo →

From telephony to RAG — we provide the building blocks.

⚙️ স্থাপত্য যত শক্তিশালী, সেবা তত নির্ভরযোগ্য

🔍 Architecture white papers at speaklar.com
Keywords: Bangla voice bot architecture, multimodal AI Bangladesh, speech API Bangladesh, AI-human escalation · based on Taritbandhu production deployment 2026