A voice bot that sounds great in a demo but fails under load isn't a solution — it's a liability. Taritbandhu, the hybrid AI call center for Bangladesh's power sector, handles over 50,000 calls per day during peak billing cycles. Its success isn't accidental; it's the result of a carefully designed three-tier architecture that balances AI flexibility with deterministic reliability.
In this deep dive, we unpack that architecture and show how any Bangladeshi organization can build a similar system.
Taritbandhu's architecture separates concerns into three distinct layers:
Let's explore each layer in detail.
The interface layer is designed for universal access. Taritbandhu customers can interact via:
Key lesson: Build for the lowest common denominator — basic phones with 2G. Voice works everywhere.
Speaklar's ASR is optimized for telephony-grade audio, with noise cancellation for rural environments. It converts “বিল কত?” to text accurately even with background static.
This is where the magic happens. Incoming queries are classified into three paths:
Path A: Deterministic queries — things like “আমার বিল কত?” or “গত মাসের পেমেন্ট কবে করেছিলাম?”. These are routed to the data layer via API calls. The system fetches exact data from the billing database and returns it. No AI guesswork — just facts.
Path B: Conversational queries — questions like “নতুন সংযোগ পেতে কী কী লাগে?” or “লোডশেডিং কবে হবে?”. These go to a RAG pipeline that retrieves answers from FAQs, manuals, and outage schedules.
Path C: Complex / ambiguous — “আমার বিল বেশি এসেছে, কী করব?”. The AI collects details (account number, nature of dispute) and prepares a summary for Tier 3.
Key innovation: A classifier (trained on thousands of past calls) decides the path in under 200ms. If confidence is low, it defaults to Path C (human escalation).
The data layer consists of:
Security: All PII (phone numbers, account details) is encrypted. The logic layer never sees raw credentials — it uses tokenized identifiers.
When the logic layer determines a human is needed, it doesn't just transfer — it passes a context package:
“গ্রাহক: মোঃ আলী, মিটার নং ১২৩৪৫৬। অভিযোগ: বিল বেশি আসায় আপত্তি। AI ইতিমধ্যে গত ৩ মাসের ব্যবহার দেখিয়েছে যা স্বাভাবিকের চেয়ে ৪০% বেশি। মিটার রিডিং ২৩৪৫৬ থেকে ২৩৫৭৮। সম্ভাব্য ত্রুটি।”
The agent sees this summary instantly. No need to ask "what's your account number?" again. Resolution time drops by 40%.
Taritbandhu's architecture is built on APIs. Every component exposes well-defined interfaces:
GET /customer/{id}/bill — returns current bill, due date, payment status.GET /zone/{id}/schedule — returns load-shedding times.POST /retrieve — accepts query, returns relevant chunks from manuals.This API-first approach means new channels (like a mobile app) can be added without touching the core logic.
During billing cycles, call volume can spike 5x. Taritbandhu's architecture scales horizontally:
The system has handled peaks of 120 calls/second without degradation.
You can't improve what you don't measure. Taritbandhu tracks:
A dashboard gives operators real-time visibility.
Based on Taritbandhu's experience, here's what to prioritize:
Speaklar provides the foundational components:
You focus on your business logic and data; we handle the voice complexity.
📊 Ready to build? Speaklar's professional services team can help you design your three-tier architecture in a 2-week sprint.
🏗️ See the architecture in action
Speaklar demo →From telephony to RAG — we provide the building blocks.
⚙️ স্থাপত্য যত শক্তিশালী, সেবা তত নির্ভরযোগ্য
🔍 Architecture white papers at speaklar.com
Keywords: Bangla voice bot architecture, multimodal AI Bangladesh, speech API Bangladesh, AI-human escalation · based on Taritbandhu production deployment 2026