portfolio

Docira - AWS Serverless RAG System

90-Minute Build <$5/Month Costs 100+ MB PDFs
Built: 2025 Timeline: 90 minutes Status: Production

📊 Overview

Complete AWS serverless RAG (Retrieval Augmented Generation) pipeline built in 90 minutes despite never using AWS before. Production-ready on first deployment.

The Achievement:


🎯 The Problem

Client needed PDF document Q&A system:


💡 The Solution

AWS Serverless Architecture:

Three-Level Extraction:

  1. PyPDF2 (fast, machine-readable PDFs)
  2. pdfplumber (better tables, more reliable)
  3. AWS Textract (OCR for scanned documents)

Result:


🏗️ Architecture

User → React Frontend
  ↓
API Gateway (REST)
  ↓
Lambda (Docker)
  ├─ PDF Upload → S3
  ├─ Extraction (PyPDF2 → pdfplumber → Textract)
  ├─ Embeddings (semantic search)
  └─ Claude API (Q&A)
  ↓
Response to user

Cost Optimization:

Result: <$5/month for small to medium workloads


🛠️ Technology Stack

AWS:

Backend:

Frontend:


🎓 What I Learned

AWS (learned in 90 minutes):

Problem Solving:


📊 Metrics

Metric Value
Build Time 90 minutes (concept to production)
AWS Experience Before Zero (never used AWS)
Monthly Cost <$5 (Lambda + S3 + API Gateway)
File Size 100+ MB PDFs supported
Extraction Levels 3 (PyPDF2 → pdfplumber → Textract)

Built by Craig Bosman Back to Portfolio