We help clients ship chat, agents, RAG, and internal copilots on top of their data, with permissions, cost controls, and audit-friendly logging. You will design and build the platform patterns those products depend on, not one-off demos.
What you'll do
- Design and implement AI-facing services: routing to LLM providers, tool use, streaming, retries, and fallbacks
- Build and harden retrieval pipelines (embeddings, chunking, re-ranking) with access control that matches customer identity models
- Define evaluation loops: golden sets, regression checks, and lightweight dashboards so quality does not drift silently
- Partner with cloud and app engineers on deployment, secrets, networking, and cost visibility (tokens, GPU, vector stores)
- Document patterns and defaults so teams can ship consistently; playbooks beat heroics
What we're looking for
- Strong backend engineering (e.g. Python and/or TypeScript) and comfort owning services end to end
- Hands-on experience shipping LLM features beyond prompts: RAG, agents, or structured tool calling in production
- Solid grasp of API design, observability (logs, traces, metrics), and failure modes at scale
- Pragmatic security mindset: PII boundaries, retention, and least-privilege access to data stores
- Excellent written communication. Most of us are remote; clarity is velocity
Nice to have
- Experience with major model providers (OpenAI, Anthropic, Azure OpenAI, Amazon Bedrock) and switching cost tradeoffs
- Familiarity with vector databases and hybrid search patterns
- Kubernetes or managed container platforms; infrastructure-as-code exposure
- Prior consulting or client-facing engineering: you can explain tradeoffs without jargon walls