The tools and technologies I use daily to build intelligent systems. Each skill is backed by real implementations you can explore in my projects.
Building autonomous agents that can reason, plan, and execute multi-step tasks without human intervention.
I implement the Thought-Action-Observation loop where the model reasons about what tool to
use, executes it, observes the result, and iterates until the task is complete. My Gemini Agent uses
native function calling with types.FunctionDeclaration to autonomously decide
when to search the web, read emails, or fetch news.
I build tool-calling agents using LangChain's create_agent() that dynamically
construct API calls from JSON schemas. My Bank Agent reads a bank's endpoint
schema and figures out how to query it—minimal calls, maximum efficiency.
Different tasks need different models. I use Gemini 2.0 Flash for function calling (best tool-use support), Gemini 2.5 Flash for fast general inference, and Ministral 3B (GGUF) via llama.cpp for fully offline reasoning.
Retrieval-Augmented Generation that actually works—from document ingestion to semantic retrieval to grounded answers.
I use Pinecone (serverless) for production user-preference storage where I need persistence and filtering—like in my Resume Scoring system's feedback loop. For local high-speed search, FAISS handles my Product Insight catalog queries.
I embed with SentenceTransformers (all-MiniLM-L6-v2) for its balance of speed
and accuracy. My systems automatically sync embeddings when source data changes—no stale
vectors, no manual reindexing.
My Document Search project isn't limited to text. It handles PDFs (PyMuPDF), images via Gemini Vision OCR, audio transcription with speaker separation, and spreadsheets—all unified into a single semantic search index.
Building the infrastructure that makes AI systems production-ready—APIs, databases, authentication, and training pipelines.
My CRM project is a full-stack Flask-RESTful
application with SQLAlchemy ORM, JWT authentication, and something I'm proud
of—Row-Level Security at the ORM level. Using
@event.listens_for(db.session, 'do_orm_execute'), every query automatically
filters by the tenant ID from the JWT. No manual WHERE clauses, no data leaks.
My RL Autoscaler trains a PyTorch DQN agent on a custom Gymnasium environment. The setup includes experience replay (10k buffer), target network sync, epsilon-greedy exploration, and a reward function that balances utilization efficiency against server costs. Trained on Apple Metal GPU.
I work extensively with external APIs: Gmail API (OAuth + simplegmail), Google Custom Search, Slack/WhatsApp webhooks for notifications, and the Google Gemini SDK for AI capabilities. My Email Briefing system ties them all together.
Running AI entirely offline—no cloud, no API calls, complete privacy. Making large models work on consumer hardware.
My Voice Chatbot runs 100% locally. Vosk handles speech-to-text (lightweight, fast), llama-cpp-python runs the Ministral 3B GGUF model for reasoning, and Kokoro synthesizes natural-sounding speech. No internet required after model download.
I work with GGUF format (Q4_K_M quantization) to run 3B+ parameter models on consumer hardware without a dedicated GPU. The key is balancing quality vs. memory footprint—Q4_K_M gives good results while fitting in ~3GB RAM.
Voice assistants need to feel responsive. I disable the microphone during TTS playback to
prevent self-hearing loops, configure context windows carefully (n_ctx), and
use GPU layer offloading (n_gpu_layers) where available. The result: natural
conversation flow on Apple Silicon.