Active 2025

Enterprise RAG System

AI-powered knowledge management for enterprise document processing

Overview

Built a production RAG (Retrieval-Augmented Generation) system that processes over 10,000 documents to power an enterprise knowledge management chatbot. The system handles diverse document formats including PDFs, Word documents, and complex Excel spreadsheets.

Beyond the core chatbot, developed internal LLM tooling including a Slack bot for quick queries, a knowledge base management UI, and automated deployment pipelines.

10K+ Documents processed

85% Query accuracy

< 3s Average response time

5 Internal tools built

Challenge

The client's knowledge base was scattered across thousands of documents in various formats. Employees spent significant time searching for information, and institutional knowledge was often locked in the heads of senior staff.

Specific challenges included:

Excel files with complex structures and embedded business logic
Legacy documents with inconsistent formatting
Need for accurate citations and source references
Multi-language support (Japanese and English)

Solution

1. Semantic Chunking Pipeline

Developed custom chunking strategies for different document types. For Excel files, built a tool that understands table structures, preserves relationships between cells, and extracts meaningful chunks that maintain context.

2. RAG Architecture

Implemented a hybrid search approach combining dense vector embeddings with BM25 keyword search. The retrieval pipeline uses re-ranking to improve relevance, especially for domain-specific queries.

3. Chatbot Interface

Built a conversational interface using LangChain that maintains context across turns, provides source citations, and gracefully handles queries outside its knowledge base.

4. Internal Tooling

Beyond the main chatbot, created:

Slack Bot: Quick Q&A directly in team channels
Knowledge Base UI: Document upload, tagging, and management
Deploy Pipeline: Automated testing and deployment for model updates

Infrastructure

Cloud-native architecture on Azure with infrastructure as code:

Compute: Azure Container Apps for scalable API hosting
Vector Store: Azure AI Search for embedding storage and retrieval
Storage: Azure Blob Storage for document management
IaC: Terraform modules for consistent environment provisioning

Results

The system has become a critical part of the client's daily operations:

Average time to find information reduced from 15 minutes to under 30 seconds
85% of queries answered accurately without human intervention
Onboarding time for new employees reduced significantly
Knowledge documentation improved as gaps were identified

Technology Stack

Backend

Python
FastAPI
LangChain

AI/ML

OpenAI API
Azure AI Search
Embeddings

Infrastructure

Azure
Terraform
Docker

DevOps

GitHub Actions
Container Apps
Blob Storage