The Business of AI: Pricing, Margins, and Sustainability

Building AI features is expensive. API costs, infrastructure, and compute add up fast. Here's how to build AI products that are actually profitable.

Understanding AI Cost Structures

interface AICostBreakdown {
  perRequest: {
    llmCost: number        // API calls
    embeddingCost: number  // Vector operations
    storageCost: number    // Conversation history, vectors
    computeCost: number    // Processing, orchestration
  }
  monthly: {
    infrastructure: number // Servers, databases
    monitoring: number     // Observability tools
    development: number    // Ongoing improvement
  }
}

// Example: Customer support bot
const supportBotCosts = {
  perConversation: {
    avgMessages: 6,
    inputTokens: 2000,
    outputTokens: 1000,
    llmCost: 0.015,        // $0.015 per conversation
    ragCost: 0.002,        // Vector search
    total: 0.017
  },
  conversationsPerMonth: 50000,
  monthlyLLMCost: 850,
  monthlyInfra: 500,
  totalMonthlyCost: 1350
}

Pricing Models That Work

// Model 1: Per-use pricing
const perUsePricing = {
  costPerConversation: 0.017,
  pricePerConversation: 0.05,  // ~3x markup
  marginPercent: 66,
  
  // Pros: Aligns cost with revenue
  // Cons: Unpredictable revenue, discourages usage
}

// Model 2: Subscription with limits
const subscriptionPricing = {
  tiers: [
    { name: 'Starter', price: 49, conversations: 1000, costToServe: 17 },
    { name: 'Pro', price: 199, conversations: 5000, costToServe: 85 },
    { name: 'Enterprise', price: 999, conversations: 30000, costToServe: 510 },
  ],
  // Note: Enterprise tier has lowest margin % but highest absolute margin
}

// Model 3: Hybrid (base + usage)
const hybridPricing = {
  baseFee: 99,
  includedConversations: 2000,
  additionalPrice: 0.03,  // Per conversation after limit
  
  // Predictable base revenue + growth upside
}

Cost Optimization Strategies

// 1. Prompt caching (up to 90% savings)
const cachedPrompt = {
  systemPrompt: longSystemPrompt,  // 5000 tokens, cached
  userMessage: userInput,           // 100 tokens, not cached
  
  // Without caching: 5100 tokens × $3/M = $0.0153
  // With caching: 100 + (5000 × 0.1) = 600 effective = $0.0018
}

// 2. Model routing
function selectModel(request: AIRequest): Model {
  // Simple queries → cheap model
  if (request.complexity === 'low') {
    return 'gpt-4o-mini'  // 15x cheaper than GPT-4o
  }
  // Complex reasoning → capable model
  if (request.complexity === 'high') {
    return 'claude-3-5-sonnet'
  }
  return 'gpt-4o-mini'  // Default to cheap
}

// 3. Response length limits
const costControlledRequest = {
  maxTokens: 500,  // Limit response length
  // Each 100 output tokens = ~$0.0015 with Sonnet
}

Margin Protection: When to Say No

function shouldImplementFeature(feature: FeatureProposal): Decision {
  const analysis = {
    additionalCostPerUser: feature.estimatedCostIncrease,
    additionalRevenuePerUser: feature.estimatedRevenueIncrease,
    marginImpact: (feature.estimatedRevenueIncrease - feature.estimatedCostIncrease) / currentRevenue,
  }
  
  // Red flags
  if (analysis.additionalCostPerUser > analysis.additionalRevenuePerUser) {
    return { decision: 'reject', reason: 'Negative unit economics' }
  }
  
  if (analysis.marginImpact < -0.05) {
    return { decision: 'reconsider', reason: 'Significant margin compression' }
  }
  
  // Features that seem cool but destroy margins:
  // - Unlimited conversations
  // - Very long context windows by default
  // - Expensive models for simple tasks
  // - Features users don't pay more for
}

The Path to Profitability

Start with high margins: Price at 3-5x cost initially
Optimize relentlessly: Caching, model routing, response limits
Monitor unit economics: Track cost per user action
Tier appropriately: Heavy users should pay more
Say no to margin-killers: Not every feature is worth building

Key Takeaways

Understand your costs granularly. Track cost per conversation, per user, per feature.

Price for sustainability. 3x cost is minimum viable margin for AI products.

Optimize continuously. Prompt caching and model routing can 10x your margins.

Protect margins proactively. Say no to features with negative unit economics.