Workers AI Edge Inference

Priority: P1 (High Value)

What is Workers AI?

Serverless AI inference on Cloudflare's GPU network. Run 50+ open-source models (LLMs, embeddings, image gen, TTS, STT, classification) with no infrastructure management. Pay per inference.

Why This Matters for Company Manager

Current AI Usage

Service	Current Provider	Model	Use Case
Content generation	OpenAI API	GPT-4o-mini	Product descriptions, SEO
City descriptions	OpenAI API	GPT-4o-mini	Business/association descriptions
OCR	Tesseract	Local	Document processing
Tool discovery	OpenAI API	Function calling	TRPC tool auto-generation
Press AI	OpenAI API	Various	Layout detection, segmentation

Benefits of Workers AI

Factor	OpenAI API	Workers AI
Latency	200-500ms (US East)	10-50ms (nearest PoP)
Privacy	Data sent to OpenAI	Data stays on Cloudflare
Cost (small tasks)	~$0.15/M input tokens	10K Neurons/day free
Availability	Rate limits	Edge-distributed
Model lock-in	OpenAI only	50+ open models
Streaming	Yes	Yes (SSE)

Integration Opportunities

1. Content Moderation (Live Chat)

The chat-worker already exists. Add real-time content moderation:


// In chat-worker
async function moderateMessage(env: Env, message: string): Promise<boolean> {
  const result = await env.AI.run("@cf/meta/llama-guard-3-8b", {
    messages: [{ role: "user", content: message }],
  });

  // Llama Guard returns safety classification
  return result.response.includes("safe");
}

export default {
  async fetch(request: Request, env: Env) {
    const { message, tenantId } = await request.json();

    if (!await moderateMessage(env, message)) {
      return Response.json({ blocked: true, reason: "content_policy" });
    }

    // Forward to Durable Object for broadcast
    // ...
  },
};

\1: Sub-10ms moderation at edge, no external API call.

2. Product Description Generation

Replace OpenAI for ContentManagementAgent's description generation:


// In content-worker or via REST API
async function generateProductDescription(
  env: Env,
  product: { name: string; category: string; attributes: Record<string, string> }
): Promise<string> {
  const prompt = `Write a compelling product description for an e-commerce store.
Product: ${product.name}
Category: ${product.category}
Attributes: ${JSON.stringify(product.attributes)}

Write in French. Keep it under 200 words. Be engaging and SEO-friendly.`;

  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      { role: "system", content: "You are an expert e-commerce copywriter." },
      { role: "user", content: prompt },
    ],
    max_tokens: 500,
  });

  return result.response;
}

\1: Llama 3.1 8B is very cheap; most descriptions fit in free tier.

3. SEO Optimization

Auto-generate meta descriptions, titles, and keywords:


async function generateSEO(env: Env, content: { title: string; body: string }) {
  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [{
      role: "user",
      content: `Generate SEO metadata for this article:
Title: ${content.title}
Content: ${content.body.slice(0, 2000)}

Return JSON: { "metaTitle": "...", "metaDescription": "...", "keywords": ["..."] }`,
    }],
    max_tokens: 300,
  });

  return JSON.parse(result.response);
}

4. Image Alt Text Generation

For city portal image enrichment and product images:


async function generateAltText(env: Env, imageUrl: string): Promise<string> {
  // Fetch image
  const imageResponse = await fetch(imageUrl);
  const imageBuffer = await imageResponse.arrayBuffer();
  const base64 = btoa(String.fromCharCode(...new Uint8Array(imageBuffer)));

  const result = await env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {
    messages: [{
      role: "user",
      content: [
        { type: "text", text: "Describe this image in one sentence for use as alt text." },
        { type: "image_url", image_url: { url: `data:image/jpeg;base64,${base64}` } },
      ],
    }],
    max_tokens: 100,
  });

  return result.response;
}

\1: Replace manual alt text workflow in CityImageEnrichmentAgent.

5. Sentiment Analysis for Customer Operations

For CustomerOperationsAgent ticket processing:


async function analyzeSentiment(env: Env, text: string) {
  const result = await env.AI.run("@cf/huggingface/distilbert-sst-2-int8", {
    text: text,
  });

  return {
    label: result[0].label,     // "POSITIVE" or "NEGATIVE"
    score: result[0].score,     // confidence 0-1
  };
}

6. Translation (Multilingual Support)

For content translation between French and English:


async function translate(env: Env, text: string, from: string, to: string) {
  const result = await env.AI.run("@cf/meta/m2m100-1.2b", {
    text: text,
    source_lang: from,
    target_lang: to,
  });

  return result.translated_text;
}

7. Text-to-Speech for Press Center

Audio articles from press center content:


async function generateAudio(env: Env, text: string): Promise<ArrayBuffer> {
  const result = await env.AI.run("@cf/deepgram/aura-2-en", {
    text: text,
  });

  return result; // Audio buffer (WAV format)
}

8. Image Generation for Marketing

For MarketingAgent campaign visuals:


async function generateImage(env: Env, prompt: string): Promise<ArrayBuffer> {
  const result = await env.AI.run("@cf/black-forest-labs/flux-1-schnell", {
    prompt: prompt,
    num_steps: 4, // Schnell is optimized for few steps
  });

  return result; // PNG image buffer
}

9. Spam/Fraud Detection

For classified ads and order management:


async function detectSpam(env: Env, content: string): Promise<{ isSpam: boolean; confidence: number }> {
  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [{
      role: "user",
      content: `Classify this content as SPAM or NOT_SPAM. Return only the classification and confidence (0-1).
Content: ${content}`,
    }],
    max_tokens: 50,
  });

  // Parse response
  const isSpam = result.response.toLowerCase().includes("spam");
  return { isSpam, confidence: 0.9 };
}

10. RAG Pipeline (with Vectorize)

Retrieval-Augmented Generation for intelligent support:


async function ragAnswer(env: Env, question: string, tenantId: string) {
  // 1. Embed the question
  const queryVector = await embed(env, question);

  // 2. Find relevant documents
  const docs = await env.KNOWLEDGE_INDEX.query(queryVector, {
    topK: 5,
    namespace: tenantId,
    returnMetadata: "all",
  });

  // 3. Build context
  const context = docs.matches.map(d => d.metadata?.content).join("\n\n");

  // 4. Generate answer with context
  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      {
        role: "system",
        content: `Answer the user's question using ONLY the provided context. If the answer isn't in the context, say "I don't know."

Context:
${context}`,
      },
      { role: "user", content: question },
    ],
    max_tokens: 500,
  });

  return {
    answer: result.response,
    sources: docs.matches.map(d => ({ id: d.id, score: d.score, title: d.metadata?.title })),
  };
}

Model Selection Guide

Text Generation

Model	Best For	Speed	Quality
`llama-3.1-8b-instruct`	General tasks, descriptions	Fast	Good
`llama-3.3-70b-instruct-fp8-fast`	Complex reasoning	Slow	Excellent
`mistral-small-3.1-24b-instruct`	Vision + text, function calling	Medium	Very good
`gemma-3-12b-it`	Multilingual, 140+ languages	Medium	Very good
`hermes-2-pro-mistral-7b`	Function calling, JSON output	Fast	Good

Embeddings

Model	Dimensions	Best For
`bge-small-en-v1.5`	384	Fast English embeddings
`bge-base-en-v1.5`	768	Balanced English
`bge-m3`	Variable	Multilingual (best for FR+EN)
`embeddinggemma-300m`	Variable	100+ languages, state-of-art

Specialized

Model	Use Case
`llama-guard-3-8b`	Content safety classification
`distilbert-sst-2-int8`	Sentiment analysis
`m2m100-1.2b`	Translation
`whisper-large-v3-turbo`	Speech-to-text
`aura-2-en` / `aura-2-es`	Text-to-speech
`flux-1-schnell`	Image generation
`llama-3.2-11b-vision-instruct`	Image understanding

Agent Integration Map

Agent	Workers AI Use	Model
ContentManagementAgent	Description gen, SEO, alt text	llama-3.1-8b, vision
CustomerOperationsAgent	Sentiment, ticket routing	distilbert, llama-3.1-8b
InventoryPricingAgent	Demand forecasting context	llama-3.1-8b
MarketingAgent	Copy gen, image gen, A/B text	llama-3.1-8b, flux
OrderManagementAgent	Fraud detection text analysis	llama-3.1-8b
CityDescriptionAgent	Business descriptions	llama-3.1-8b (replaces OpenAI)
CityImageEnrichmentAgent	Alt text, image classification	vision model

Pricing

Plan	Included	Overage
Free	10,000 Neurons/day	N/A
Paid	10,000 Neurons/day	$0.011/1K Neurons

\1 (approximate per operation):

Text generation (8B model, 100 tokens): ~5-10 Neurons
Embedding (single text): ~1-2 Neurons
Image generation: ~50-100 Neurons
Sentiment classification: ~1 Neuron

\1: $5-30/mo for moderate usage (most small tasks fit in free tier).

Migration Strategy

Phase 1: New Capabilities (no OpenAI replacement)

Add Workers AI for new features that don't exist yet:

Content moderation in chat-worker
Sentiment analysis for tickets
Image alt text generation

Phase 2: Supplementary Use

Use Workers AI alongside OpenAI:

Edge-local tasks → Workers AI (latency-sensitive)
Complex reasoning → OpenAI GPT-4 (quality-sensitive)
Cost optimization → Workers AI for high-volume, low-complexity tasks

Phase 3: Primary AI Provider

Evaluate replacing OpenAI for most tasks:

Description generation → Llama 3.1 / Gemma 3
Translation → M2M100
SEO → Llama 3.1
Keep OpenAI only for tasks requiring GPT-4 quality

Estimated Impact

**New capabilities**: Content moderation, sentiment analysis, image understanding
**Latency**: 200-500ms → 10-50ms for edge inference
**Cost**: Reduce OpenAI spend by 50-80% for routine tasks
**Privacy**: Sensitive data stays on Cloudflare (no third-party API)
**Effort**: 1-2 weeks per use case, can be rolled out incrementally