🏗️ ARCHITECTURE · TECH DEEP DIVE 11 min read

Building a Bangla-First Voice Bot: Lessons from Taritbandhu's Three-Tier Architecture

A practical guide to designing voice systems that handle millions of calls — with AI, deterministic data, and seamless human escalation.
Bangla voice bot architecture multimodal AI Bangladesh speech API Bangladesh AI-human escalation

A voice bot that sounds great in a demo but fails under load isn't a solution — it's a liability. Taritbandhu, the hybrid AI call center for Bangladesh's power sector, handles over 50,000 calls per day during peak billing cycles. Its success isn't accidental; it's the result of a carefully designed three-tier architecture that balances AI flexibility with deterministic reliability.

In this deep dive, we unpack that architecture and show how any Bangladeshi organization can build a similar system.

The three-tier model at a glance

Taritbandhu's architecture separates concerns into three distinct layers:

Let's explore each layer in detail.

📞 Tier 1: Interface layer — meeting the customer where they are

The interface layer is designed for universal access. Taritbandhu customers can interact via:

Key lesson: Build for the lowest common denominator — basic phones with 2G. Voice works everywhere.

Speaklar's ASR is optimized for telephony-grade audio, with noise cancellation for rural environments. It converts “বিল কত?” to text accurately even with background static.

🧠 Tier 2: Logic layer — the brain that decides

This is where the magic happens. Incoming queries are classified into three paths:

Path A: Deterministic queries — things like “আমার বিল কত?” or “গত মাসের পেমেন্ট কবে করেছিলাম?”. These are routed to the data layer via API calls. The system fetches exact data from the billing database and returns it. No AI guesswork — just facts.

Path B: Conversational queries — questions like “নতুন সংযোগ পেতে কী কী লাগে?” or “লোডশেডিং কবে হবে?”. These go to a RAG pipeline that retrieves answers from FAQs, manuals, and outage schedules.

Path C: Complex / ambiguous“আমার বিল বেশি এসেছে, কী করব?”. The AI collects details (account number, nature of dispute) and prepares a summary for Tier 3.

Key innovation: A classifier (trained on thousands of past calls) decides the path in under 200ms. If confidence is low, it defaults to Path C (human escalation).

🗄️ Tier 3: Data layer — the source of truth

The data layer consists of:

Security: All PII (phone numbers, account details) is encrypted. The logic layer never sees raw credentials — it uses tokenized identifiers.

The escalation flow: AI → human with context

When the logic layer determines a human is needed, it doesn't just transfer — it passes a context package:

“গ্রাহক: মোঃ আলী, মিটার নং ১২৩৪৫৬। অভিযোগ: বিল বেশি আসায় আপত্তি। AI ইতিমধ্যে গত ৩ মাসের ব্যবহার দেখিয়েছে যা স্বাভাবিকের চেয়ে ৪০% বেশি। মিটার রিডিং ২৩৪৫৬ থেকে ২৩৫৭৮। সম্ভাব্য ত্রুটি।”

The agent sees this summary instantly. No need to ask "what's your account number?" again. Resolution time drops by 40%.

API-first design: connecting everything

Taritbandhu's architecture is built on APIs. Every component exposes well-defined interfaces:

This API-first approach means new channels (like a mobile app) can be added without touching the core logic.

Scaling for peak loads

During billing cycles, call volume can spike 5x. Taritbandhu's architecture scales horizontally:

The system has handled peaks of 120 calls/second without degradation.

Monitoring and observability

You can't improve what you don't measure. Taritbandhu tracks:

A dashboard gives operators real-time visibility.

Lessons for your own voice bot

Based on Taritbandhu's experience, here's what to prioritize:

How Speaklar enables this architecture

Speaklar provides the foundational components:

You focus on your business logic and data; we handle the voice complexity.

📊 Ready to build? Speaklar's professional services team can help you design your three-tier architecture in a 2-week sprint.

🏗️ See the architecture in action

Speaklar demo →

From telephony to RAG — we provide the building blocks.

⚙️ স্থাপত্য যত শক্তিশালী, সেবা তত নির্ভরযোগ্য


🔍 Architecture white papers at speaklar.com
Keywords: Bangla voice bot architecture, multimodal AI Bangladesh, speech API Bangladesh, AI-human escalation · based on Taritbandhu production deployment 2026