Selected Work

A collection of AI agents, RAG systems, and ML pipelines I've built. Click any project to see the full story.

Full-Stack CRM System

Multi-tenant CRM with automatic Row-Level Security at the ORM layer. Each company's data is completely isolated via JWT-injected tenant filters.

The Challenge

Building a CRM that could serve multiple companies on the same database without any data leakage, while keeping the codebase clean and maintainable.

My Approach

Instead of adding WHERE company_id = ? to every query manually, I implemented Row-Level Security at the SQLAlchemy ORM level using event listeners. Every query automatically gets filtered by the company_id extracted from the JWT token.

Technical Implementation

  • Backend: Flask + Flask-RESTful with JWT authentication
  • Database: MySQL with SQLAlchemy ORM
  • Security: Custom @event.listens_for(db.session, 'do_orm_execute') decorator that injects with_loader_criteria on all queries
  • Frontend: Vanilla JS with modular API client
  • Modules: Leads, Contacts, Opportunities, Accounts, Tasks, Notes, Products, Quotes with full CRUD

Key Features

  • Real-time dashboard with 8 KPI cards and sales pipeline visualization
  • Global search across all modules with keyboard shortcuts
  • Role-based access control (Admin vs Member)
  • Quick action modals for rapid data entry

Gemini Autonomous Agent

ReAct-pattern agentic loop using Gemini 2.0 Flash with native function calling. The agent autonomously decides when to search the web, read emails, or fetch news.

The Challenge

Create an AI agent that can autonomously perform real-world tasks—not just chat, but actually take actions like searching the web, reading emails, and aggregating news.

My Approach

Implemented the ReAct (Reasoning + Acting) pattern where the model thinks about what tool to use, executes it, observes the result, and decides the next step—all in a loop until the task is complete.

Technical Implementation

  • Model: Gemini 2.0 Flash with native function calling via types.FunctionDeclaration
  • Tools: Web search (Google Custom Search), Gmail read/send (simplegmail), News aggregator (feedparser + newspaper3k)
  • Architecture: Flask backend with persistent chat history stored as JSON files
  • Loop: Model generates function call → backend executes tool → result fed back to model → repeat until final answer

Key Features

  • Chat session management with create/load/save functionality
  • Tool descriptions engineered for minimal hallucination
  • Automatic OAuth flow for Gmail integration

Multi-Modal Document Search

RAG system supporting PDFs, images, audio, and spreadsheets. Uses Gemini Vision for OCR and SentenceTransformers for semantic retrieval.

The Challenge

Build a document search system that can answer questions across any file type—not just text files, but scanned documents, photos of whiteboards, meeting recordings, and spreadsheets.

My Approach

Created a unified pipeline where every file gets converted to text (using the right tool for each format), then embedded into a vector space for semantic search. The AI answers based strictly on uploaded content.

Technical Implementation

  • File Processing: PyMuPDF (PDF), python-docx (Word), pandas (Excel/CSV), python-pptx (Slides), Pillow (Images)
  • Vision AI: Gemini 2.5 Flash Lite for OCR on images and audio transcription with speaker separation
  • Embeddings: SentenceTransformers (all-MiniLM-L6-v2) with cosine similarity via Scikit-Learn
  • Two Modes: Search Query (Kendra-style citations) and Answer Mode (conversational response)

Key Features

  • Automatic keyword extraction from queries for better retrieval
  • Context-aware answers that cite specific files
  • Simple upload interface with automatic embedding generation

RL Server Autoscaler

DQN agent trained in a custom Gymnasium environment to learn optimal server scaling policies. Outperforms threshold-based autoscalers on the cost-latency tradeoff.

The Challenge

Traditional autoscalers use simple thresholds (if CPU > 80%, add server). But this leads to cold-start latency and wasted idle resources. Can an AI learn a smarter policy?

My Approach

Built a custom Gymnasium environment simulating realistic traffic patterns (sinusoidal + random flux). Trained a DQN agent to minimize a reward function balancing utilization efficiency vs. server costs.

Technical Implementation

  • Environment: Custom ServerEnv(gym.Env) with observation space [CPU Load, Normalized Server Count] and action space [Scale Down, Hold, Scale Up]
  • Reward Engineering: +10 for optimal utilization (40-70%), -20 for overload risk (>90%), -5 for waste (<20%), minus server costs
  • Agent: PyTorch DQN with experience replay (10k buffer), target network sync every 5 episodes, epsilon-greedy exploration
  • Network: 2 hidden layers (256 neurons each) with ReLU, trained for 3000 episodes on Apple Metal GPU

Key Features

  • Interactive testing CLI for real-time scaling recommendations
  • Training visualization with reward history plots
  • Model exports to .pth for deployment

Offline Voice Chatbot

100% offline voice assistant running Vosk STT, Ministral 3B (GGUF), and Kokoro TTS locally. No internet, no cloud, complete privacy.

The Challenge

Build a voice assistant that works entirely offline—no API calls, no cloud dependencies. Must run on consumer hardware with acceptable latency.

My Approach

Carefully selected lightweight models for each stage of the pipeline: Vosk for fast local STT, a quantized GGUF model via llama.cpp for reasoning, and Kokoro for natural-sounding TTS. Optimized the loop to minimize latency.

Technical Implementation

  • STT: Vosk (vosk-model-small-en-us-0.15) for lightweight speech recognition
  • LLM: Ministral 3B via llama-cpp-python, Q4_K_M quantization for memory efficiency
  • TTS: Kokoro for high-fidelity voice synthesis with multiple voice options
  • Audio: sounddevice + soundfile for real-time mic input and speaker output
  • Optimization: Microphone disabled during TTS to prevent self-hearing

Key Features

  • Zero internet dependency after initial model download
  • Configurable context window and GPU layer offloading
  • Works on Apple Silicon and x86 CPU

Email Briefing Assistant

Background service that polls Gmail, generates executive-style AI summaries, and delivers briefings to Slack or WhatsApp.

The Challenge

Email overload is real. I wanted a system that would monitor my inbox, intelligently summarize what matters, and push a briefing to my preferred channel—without me checking email constantly.

My Approach

Built a polling service that reads unread emails (including attachments), uses Gemini to generate human-like summaries, and forwards them to Slack/WhatsApp webhooks.

Technical Implementation

  • Email: Gmail API via simplegmail with OAuth authentication
  • AI: Gemini 2.5 Flash for smart summarization with attachment context
  • Delivery: Slack webhooks and WhatsApp Business API
  • Architecture: Flask backend with continuous polling loop (configurable interval)

Key Features

  • Attachment analysis included in summaries
  • Automatic mark-as-read after processing
  • Multi-platform delivery (Slack or WhatsApp)

Product Insight System

SQL product catalog with FAISS vector embeddings for natural language queries. Ask about products in plain English, get structured answers.

The Challenge

Traditional product databases require exact queries. I wanted to enable natural language questions like "What's a good laptop under $1000 for video editing?" against a structured catalog.

My Approach

Embedded all product data (name, specs, features, reviews) using SentenceTransformers, stored in FAISS for fast similarity search. The AI retrieves relevant products and generates contextual responses.

Technical Implementation

  • Database: SQLAlchemy with Products, ProductReviews, and ChatHistory models
  • Embeddings: SentenceTransformers → FAISS index with automatic sync on product updates
  • Retrieval: Top-20 semantic matches fed as context to Gemini 2.5 Flash
  • Features: Compare products, explain features, get specs, summarize reviews, generate quotations

Key Features

  • Automatic embedding sync when products change
  • Persistent chat history per user per feature
  • Multiple specialized endpoints (compare, specs, quotes, reviews)

Resume Scoring System

AI-powered resume scorer with human-in-the-loop feedback. Uses Pinecone to learn each user's preferences over time.

The Challenge

Generic resume scorers don't account for individual hiring manager preferences. I wanted a system that learns from feedback—when you disagree with a score, it remembers why.

My Approach

Built a feedback loop where user corrections get embedded and stored in Pinecone. On subsequent evaluations, the system retrieves similar past feedback to adjust its scoring criteria.

Technical Implementation

  • AI: Mistral AI via LangChain for semantic resume-JD matching
  • Vector Store: Pinecone for storing user feedback embeddings
  • Document Parsing: pypdf, python-docx for multi-format resume support
  • Feedback Loop: Combined JD+Resume text embedded, stored with user's reason for disagreement

Key Features

  • Multi-resume batch scoring
  • Per-user preference learning
  • Transparent scoring with pros/cons breakdown

Web Search Powered LLM

Real-time web search augmented responses. Fast mode (6s) uses snippets, deep mode (20s) scrapes full pages for comprehensive answers.

The Challenge

LLM knowledge has a cutoff date. For questions about current events, prices, or news, I needed a system that can search the web and synthesize real-time information.

My Approach

Built a pipeline that converts user queries into optimized search terms, fetches top results via Google Custom Search API, and feeds the content to Gemini for synthesis.

Technical Implementation

  • Search: Google Custom Search API with configurable result count and pagination
  • Fast Mode: Uses title + snippet from search results (4-6s response time)
  • Deep Mode: Scrapes full page content via BeautifulSoup (15-20s, more accurate)
  • Chat: Persistent history per session with SQLAlchemy storage

Key Features

  • Automatic search query generation from conversational input
  • Token usage tracking for cost monitoring
  • Configurable search depth and result count

Bank API Agent

LangChain tool agent that dynamically constructs API calls from JSON schema. Give it a bank's API spec and it figures out how to query it.

The Challenge

Banks have complex APIs with many endpoints. I wanted an agent that could take a natural language query like "What's my account balance?" and automatically figure out which API to call.

My Approach

Feed the agent a structured JSON schema of all bank endpoints. It reads the schema, constructs the right URL, makes the API call, and synthesizes the response—all autonomously.

Technical Implementation

  • Framework: LangChain with ChatGoogleGenerativeAI
  • Agent: create_agent() with custom tool for API requests
  • Schema: JSON files per bank containing endpoint structures
  • Tool: make_api_request() with rate limiting and error handling

Key Features

  • Minimal API calls—agent inspects context before calling
  • Supports multiple bank schemas
  • System instructions prevent over-calling and hallucination

News Summariser

Web search powered news aggregation with automatic summarization. Get caught up on any topic in seconds.

The Challenge

News is scattered across many sources. I wanted a simple way to ask "What's happening with [topic]?" and get a synthesized summary from multiple sources.

My Approach

Combined web search with article extraction. The system searches for recent news, extracts key content from top results, and generates a cohesive summary.

Technical Implementation

  • Search: Google Custom Search targeting news sources
  • Extraction: BeautifulSoup for article text parsing
  • Summarization: Gemini 2.5 Flash for coherent multi-source synthesis
  • Backend: Flask-RESTful API

Key Features

  • Multi-source aggregation
  • Configurable search parameters
  • Clean, readable summaries

Codebase Chat Interface

Interactive Q&A over any local codebase. Point it at a project folder and ask questions about the code in natural language.

The Challenge

Understanding a new codebase takes time. I wanted a tool where I could load any project and immediately start asking "What does this function do?" or "How are these modules connected?"

My Approach

Scan all code files in a project, store their content, and feed it as context to Gemini. Maintain chat history for follow-up questions that build on previous answers.

Technical Implementation

  • File Scanning: Automatic detection of .py, .js, .ts, .go, .java, .txt files
  • Smart Filtering: Skips .venv, __pycache__, node_modules
  • Context: All code content stored in JSON, sent with each query
  • Session: Persistent chat history for contextual follow-ups

Key Features

  • Three modes: New project, continue session, refresh codebase
  • Local path persistence across sessions
  • Works with any Gemini-compatible project size