Question 1

Do you work with Claude only, or multi-provider?

Accepted Answer

Multi-provider by default. Claude as the primary reasoning model, OpenAI for embeddings or voice, open-source via Bedrock/Azure for specific workloads. I design the integration with an abstraction so you can swap or add providers without rewriting features.

Question 2

Can you handle prompt caching setup?

Accepted Answer

Yes — this is usually the first optimization I run on existing integrations. Identifying stable prefix content, structuring messages for cache hits, and measuring cache hit rate. A well-cached integration can cut bills by 40-70% with no quality change.

Question 3

What about RAG / retrieval-augmented generation?

Accepted Answer

Full stack: document ingestion, chunking strategy, embedding pipelines (OpenAI or Cohere), hybrid search (BM25 + vector), reranking, freshness filters, and the eval harness to measure if it's actually getting better over time.

Question 4

Do you build agents?

Accepted Answer

Yes, but carefully. Agents that work in production have bounded loops, clear success criteria, human-in-the-loop checkpoints where appropriate, and serious observability. "Let the agent figure it out" rarely ships — I scope agent autonomy based on blast radius.

Question 5

Can you audit an existing integration?

Accepted Answer

Absolutely. Common issues I find: no caching, chatty agent loops, prompt injection vulnerabilities, no cost budgets per user, retrieval that misses because chunks are too small. I'll deliver a written audit with prioritized fixes — you decide what to implement.

Question 6

Are you familiar with the Claude Agent SDK and managed agents?

Accepted Answer

Yes. Claude Code as a harness, the Agent SDK for custom agents, managed agents for offloaded execution. I can build, extend or migrate between these depending on what fits your infrastructure and team.

Integrate Claude into your product — by an engineer who ships LLM features to production.

Prompt caching and cost discipline

Tool use and agents that don't loop

RAG that retrieves the right chunks

Multi-provider without lock-in

The LLM stack I work with

When Claude is the right provider

FAQs

Need Claude in your product, done right?