Building AI features is expensive. API costs, infrastructure, and compute add up fast. Here's how to build AI products that are actually profitable.
Understanding AI Cost Structures
interface AICostBreakdown {
perRequest: {
llmCost: number // API calls
embeddingCost: number // Vector operations
storageCost: number // Conversation history, vectors
computeCost: number // Processing, orchestration
}
monthly: {
infrastructure: number // Servers, databases
monitoring: number // Observability tools
development: number // Ongoing improvement
}
}
// Example: Customer support bot
const supportBotCosts = {
perConversation: {
avgMessages: 6,
inputTokens: 2000,
outputTokens: 1000,
llmCost: 0.015, // $0.015 per conversation
ragCost: 0.002, // Vector search
total: 0.017
},
conversationsPerMonth: 50000,
monthlyLLMCost: 850,
monthlyInfra: 500,
totalMonthlyCost: 1350
}
Pricing Models That Work
// Model 1: Per-use pricing
const perUsePricing = {
costPerConversation: 0.017,
pricePerConversation: 0.05, // ~3x markup
marginPercent: 66,
// Pros: Aligns cost with revenue
// Cons: Unpredictable revenue, discourages usage
}
// Model 2: Subscription with limits
const subscriptionPricing = {
tiers: [
{ name: 'Starter', price: 49, conversations: 1000, costToServe: 17 },
{ name: 'Pro', price: 199, conversations: 5000, costToServe: 85 },
{ name: 'Enterprise', price: 999, conversations: 30000, costToServe: 510 },
],
// Note: Enterprise tier has lowest margin % but highest absolute margin
}
// Model 3: Hybrid (base + usage)
const hybridPricing = {
baseFee: 99,
includedConversations: 2000,
additionalPrice: 0.03, // Per conversation after limit
// Predictable base revenue + growth upside
}
Cost Optimization Strategies
// 1. Prompt caching (up to 90% savings)
const cachedPrompt = {
systemPrompt: longSystemPrompt, // 5000 tokens, cached
userMessage: userInput, // 100 tokens, not cached
// Without caching: 5100 tokens × $3/M = $0.0153
// With caching: 100 + (5000 × 0.1) = 600 effective = $0.0018
}
// 2. Model routing
function selectModel(request: AIRequest): Model {
// Simple queries → cheap model
if (request.complexity === 'low') {
return 'gpt-4o-mini' // 15x cheaper than GPT-4o
}
// Complex reasoning → capable model
if (request.complexity === 'high') {
return 'claude-3-5-sonnet'
}
return 'gpt-4o-mini' // Default to cheap
}
// 3. Response length limits
const costControlledRequest = {
maxTokens: 500, // Limit response length
// Each 100 output tokens = ~$0.0015 with Sonnet
}
Margin Protection: When to Say No
function shouldImplementFeature(feature: FeatureProposal): Decision {
const analysis = {
additionalCostPerUser: feature.estimatedCostIncrease,
additionalRevenuePerUser: feature.estimatedRevenueIncrease,
marginImpact: (feature.estimatedRevenueIncrease - feature.estimatedCostIncrease) / currentRevenue,
}
// Red flags
if (analysis.additionalCostPerUser > analysis.additionalRevenuePerUser) {
return { decision: 'reject', reason: 'Negative unit economics' }
}
if (analysis.marginImpact < -0.05) {
return { decision: 'reconsider', reason: 'Significant margin compression' }
}
// Features that seem cool but destroy margins:
// - Unlimited conversations
// - Very long context windows by default
// - Expensive models for simple tasks
// - Features users don't pay more for
}
The Path to Profitability
- Start with high margins: Price at 3-5x cost initially
- Optimize relentlessly: Caching, model routing, response limits
- Monitor unit economics: Track cost per user action
- Tier appropriately: Heavy users should pay more
- Say no to margin-killers: Not every feature is worth building
Key Takeaways
Understand your costs granularly. Track cost per conversation, per user, per feature.
Price for sustainability. 3x cost is minimum viable margin for AI products.
Optimize continuously. Prompt caching and model routing can 10x your margins.
Protect margins proactively. Say no to features with negative unit economics.
