Ex-Disney · Ex-Globant · Freelance since 2014
Integrate Claude into your product — by an engineer who ships LLM features to production.
I build Claude-powered features that actually work under real traffic: tool use, RAG pipelines, prompt caching, agents, structured output. Not a POC in a notebook — a system your users depend on.
Start an integrationPrompt caching and cost discipline
Most Claude integrations burn money on tokens. I cache what should be cached, stream what should stream, and move logic out of prompts when it belongs in code. Usually cuts bill by 40-70%.
Tool use and agents that don't loop
Agents are easy to prototype, hard to ship. I scope the loop bounds, handle partial failures, add observability, and keep humans in the loop where it matters. No "the agent just kept calling the API" stories.
RAG that retrieves the right chunks
Chunking strategy, reranking, hybrid search (BM25 + vectors), metadata filters, freshness — retrieval quality determines answer quality. I spend more time on retrieval than on prompts, because it matters more.
Multi-provider without lock-in
Claude-first where it wins, OpenAI for specific cases, open-source for edge workloads. I design the abstraction layer so you can swap providers without rewriting features.
The LLM stack I work with
- Claude APIDefault for reasoning, long context, tool use, agents
- Anthropic SDKOfficial TypeScript SDK — streaming, caching, retries
- OpenAIFor embeddings, Whisper, or when Claude isn't the right fit
- LangChainWhen a project really needs it — often it doesn't
- Vercel AI SDKStreaming UI and tool calling in Next.js apps
- Prompt CachingCut token costs by 40-70% on repeated-context calls
- RAGRetrieval-augmented generation — the hard part is retrieval
- pgvectorPostgres-native vectors when you don't need a dedicated DB
- n8nVisual workflow automation — great for ops, not for core product logic
- TypeScriptTypes on prompt inputs, tool schemas, structured outputs
- NestJSProduction backend wrapping the LLM layer with queues, retries, rate limits
- Next.jsFrontend with streaming responses and server actions
When Claude is the right provider
Choose Claude first when your feature lives or dies on reasoning quality — long-context document analysis, complex multi-step tool use, nuanced writing, anything where "sort of right" isn't good enough. Claude's long context windows, strong tool-use behavior and refusal handling give you a shorter path to production than building around a model that sometimes loops or hallucinates tools.
Choose OpenAI when you need cheap embeddings, Whisper speech-to-text, DALL-E image generation, or when a specific feature lives only in their API (realtime voice, for example). I often use both in the same project — Claude for reasoning, OpenAI for auxiliary tasks.
Where integrations fail: teams treat the LLM as a product instead of a primitive. A prompt is not a feature — the feature is the UX around the prompt. Cache hits, retry logic, fallback paths, content moderation, observability, user feedback loops, eval harnesses — that's the engineering work. The prompt is often the last 5% of the job.
I've integrated Claude and OpenAI into CRMs, content workflows, agent-style automation, document Q&A systems, and internal tooling for freelance operations. The pattern is consistent: start narrow, instrument everything, iterate on the prompt AND the retrieval AND the UX together. Projects that succeed treat the LLM layer as regular software — with tests, logs, cost budgets and rollback plans.
FAQs
Multi-provider by default. Claude as the primary reasoning model, OpenAI for embeddings or voice, open-source via Bedrock/Azure for specific workloads. I design the integration with an abstraction so you can swap or add providers without rewriting features.
Yes — this is usually the first optimization I run on existing integrations. Identifying stable prefix content, structuring messages for cache hits, and measuring cache hit rate. A well-cached integration can cut bills by 40-70% with no quality change.
Full stack: document ingestion, chunking strategy, embedding pipelines (OpenAI or Cohere), hybrid search (BM25 + vector), reranking, freshness filters, and the eval harness to measure if it's actually getting better over time.
Yes, but carefully. Agents that work in production have bounded loops, clear success criteria, human-in-the-loop checkpoints where appropriate, and serious observability. "Let the agent figure it out" rarely ships — I scope agent autonomy based on blast radius.
Absolutely. Common issues I find: no caching, chatty agent loops, prompt injection vulnerabilities, no cost budgets per user, retrieval that misses because chunks are too small. I'll deliver a written audit with prioritized fixes — you decide what to implement.
Yes. Claude Code as a harness, the Agent SDK for custom agents, managed agents for offloaded execution. I can build, extend or migrate between these depending on what fits your infrastructure and team.
Need Claude in your product, done right?
Integrations, audits, RAG pipelines, agent systems. 24-hour reply time, technical scoping call included.
Start the conversation