Technical Arsenal

The tools and technologies I use daily to build intelligent systems. Each skill is backed by real implementations you can explore in my projects.

Agentic AI Systems

Building autonomous agents that can reason, plan, and execute multi-step tasks without human intervention.

ReAct Pattern Implementation

I implement the Thought-Action-Observation loop where the model reasons about what tool to use, executes it, observes the result, and iterates until the task is complete. My Gemini Agent uses native function calling with types.FunctionDeclaration to autonomously decide when to search the web, read emails, or fetch news.

Tool Orchestration with LangChain

I build tool-calling agents using LangChain's create_agent() that dynamically construct API calls from JSON schemas. My Bank Agent reads a bank's endpoint schema and figures out how to query it—minimal calls, maximum efficiency.

Model Selection Strategy

Different tasks need different models. I use Gemini 2.0 Flash for function calling (best tool-use support), Gemini 2.5 Flash for fast general inference, and Ministral 3B (GGUF) via llama.cpp for fully offline reasoning.

Gemini 2.0 Flash LangChain Function Calling ReAct Tool Agents

RAG & Vector Search

Retrieval-Augmented Generation that actually works—from document ingestion to semantic retrieval to grounded answers.

Vector Database Selection

I use Pinecone (serverless) for production user-preference storage where I need persistence and filtering—like in my Resume Scoring system's feedback loop. For local high-speed search, FAISS handles my Product Insight catalog queries.

Embedding Strategy

I embed with SentenceTransformers (all-MiniLM-L6-v2) for its balance of speed and accuracy. My systems automatically sync embeddings when source data changes—no stale vectors, no manual reindexing.

Multi-Modal RAG Pipeline

My Document Search project isn't limited to text. It handles PDFs (PyMuPDF), images via Gemini Vision OCR, audio transcription with speaker separation, and spreadsheets—all unified into a single semantic search index.

Pinecone FAISS SentenceTransformers Gemini Vision Cosine Similarity

Backend & ML Engineering

Building the infrastructure that makes AI systems production-ready—APIs, databases, authentication, and training pipelines.

Flask Microservices Architecture

My CRM project is a full-stack Flask-RESTful application with SQLAlchemy ORM, JWT authentication, and something I'm proud of—Row-Level Security at the ORM level. Using @event.listens_for(db.session, 'do_orm_execute'), every query automatically filters by the tenant ID from the JWT. No manual WHERE clauses, no data leaks.

Deep Reinforcement Learning

My RL Autoscaler trains a PyTorch DQN agent on a custom Gymnasium environment. The setup includes experience replay (10k buffer), target network sync, epsilon-greedy exploration, and a reward function that balances utilization efficiency against server costs. Trained on Apple Metal GPU.

API Integrations

I work extensively with external APIs: Gmail API (OAuth + simplegmail), Google Custom Search, Slack/WhatsApp webhooks for notifications, and the Google Gemini SDK for AI capabilities. My Email Briefing system ties them all together.

Flask SQLAlchemy PyTorch Gymnasium JWT REST APIs

Local AI & Edge Deployment

Running AI entirely offline—no cloud, no API calls, complete privacy. Making large models work on consumer hardware.

Offline Voice Pipeline

My Voice Chatbot runs 100% locally. Vosk handles speech-to-text (lightweight, fast), llama-cpp-python runs the Ministral 3B GGUF model for reasoning, and Kokoro synthesizes natural-sounding speech. No internet required after model download.

Model Quantization

I work with GGUF format (Q4_K_M quantization) to run 3B+ parameter models on consumer hardware without a dedicated GPU. The key is balancing quality vs. memory footprint—Q4_K_M gives good results while fitting in ~3GB RAM.

Latency Optimization

Voice assistants need to feel responsive. I disable the microphone during TTS playback to prevent self-hearing loops, configure context windows carefully (n_ctx), and use GPU layer offloading (n_gpu_layers) where available. The result: natural conversation flow on Apple Silicon.

llama.cpp GGUF Vosk Kokoro TTS Apple Metal