Workers AI Edge Inference

Priority: P1 (High Value)

What is Workers AI?

Serverless AI inference on Cloudflare's GPU network. Run 50+ open-source models (LLMs, embeddings, image gen, TTS, STT, classification) with no infrastructure management. Pay per inference.

Why This Matters for Company Manager

Current AI Usage

ServiceCurrent ProviderModelUse Case
Content generationOpenAI APIGPT-4o-miniProduct descriptions, SEO
City descriptionsOpenAI APIGPT-4o-miniBusiness/association descriptions
OCRTesseractLocalDocument processing
Tool discoveryOpenAI APIFunction callingTRPC tool auto-generation
Press AIOpenAI APIVariousLayout detection, segmentation

Benefits of Workers AI

FactorOpenAI APIWorkers AI
Latency200-500ms (US East)10-50ms (nearest PoP)
PrivacyData sent to OpenAIData stays on Cloudflare
Cost (small tasks)~$0.15/M input tokens10K Neurons/day free
AvailabilityRate limitsEdge-distributed
Model lock-inOpenAI only50+ open models
StreamingYesYes (SSE)

Integration Opportunities

1. Content Moderation (Live Chat)

The chat-worker already exists. Add real-time content moderation:


// In chat-worker
async function moderateMessage(env: Env, message: string): Promise<boolean> {
  const result = await env.AI.run("@cf/meta/llama-guard-3-8b", {
    messages: [{ role: "user", content: message }],
  });

  // Llama Guard returns safety classification
  return result.response.includes("safe");
}

export default {
  async fetch(request: Request, env: Env) {
    const { message, tenantId } = await request.json();

    if (!await moderateMessage(env, message)) {
      return Response.json({ blocked: true, reason: "content_policy" });
    }

    // Forward to Durable Object for broadcast
    // ...
  },
};

\1: Sub-10ms moderation at edge, no external API call.

2. Product Description Generation

Replace OpenAI for ContentManagementAgent's description generation:


// In content-worker or via REST API
async function generateProductDescription(
  env: Env,
  product: { name: string; category: string; attributes: Record<string, string> }
): Promise<string> {
  const prompt = `Write a compelling product description for an e-commerce store.
Product: ${product.name}
Category: ${product.category}
Attributes: ${JSON.stringify(product.attributes)}

Write in French. Keep it under 200 words. Be engaging and SEO-friendly.`;

  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      { role: "system", content: "You are an expert e-commerce copywriter." },
      { role: "user", content: prompt },
    ],
    max_tokens: 500,
  });

  return result.response;
}

\1: Llama 3.1 8B is very cheap; most descriptions fit in free tier.

3. SEO Optimization

Auto-generate meta descriptions, titles, and keywords:


async function generateSEO(env: Env, content: { title: string; body: string }) {
  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [{
      role: "user",
      content: `Generate SEO metadata for this article:
Title: ${content.title}
Content: ${content.body.slice(0, 2000)}

Return JSON: { "metaTitle": "...", "metaDescription": "...", "keywords": ["..."] }`,
    }],
    max_tokens: 300,
  });

  return JSON.parse(result.response);
}

4. Image Alt Text Generation

For city portal image enrichment and product images:


async function generateAltText(env: Env, imageUrl: string): Promise<string> {
  // Fetch image
  const imageResponse = await fetch(imageUrl);
  const imageBuffer = await imageResponse.arrayBuffer();
  const base64 = btoa(String.fromCharCode(...new Uint8Array(imageBuffer)));

  const result = await env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {
    messages: [{
      role: "user",
      content: [
        { type: "text", text: "Describe this image in one sentence for use as alt text." },
        { type: "image_url", image_url: { url: `data:image/jpeg;base64,${base64}` } },
      ],
    }],
    max_tokens: 100,
  });

  return result.response;
}

\1: Replace manual alt text workflow in CityImageEnrichmentAgent.

5. Sentiment Analysis for Customer Operations

For CustomerOperationsAgent ticket processing:


async function analyzeSentiment(env: Env, text: string) {
  const result = await env.AI.run("@cf/huggingface/distilbert-sst-2-int8", {
    text: text,
  });

  return {
    label: result[0].label,     // "POSITIVE" or "NEGATIVE"
    score: result[0].score,     // confidence 0-1
  };
}

6. Translation (Multilingual Support)

For content translation between French and English:


async function translate(env: Env, text: string, from: string, to: string) {
  const result = await env.AI.run("@cf/meta/m2m100-1.2b", {
    text: text,
    source_lang: from,
    target_lang: to,
  });

  return result.translated_text;
}

7. Text-to-Speech for Press Center

Audio articles from press center content:


async function generateAudio(env: Env, text: string): Promise<ArrayBuffer> {
  const result = await env.AI.run("@cf/deepgram/aura-2-en", {
    text: text,
  });

  return result; // Audio buffer (WAV format)
}

8. Image Generation for Marketing

For MarketingAgent campaign visuals:


async function generateImage(env: Env, prompt: string): Promise<ArrayBuffer> {
  const result = await env.AI.run("@cf/black-forest-labs/flux-1-schnell", {
    prompt: prompt,
    num_steps: 4, // Schnell is optimized for few steps
  });

  return result; // PNG image buffer
}

9. Spam/Fraud Detection

For classified ads and order management:


async function detectSpam(env: Env, content: string): Promise<{ isSpam: boolean; confidence: number }> {
  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [{
      role: "user",
      content: `Classify this content as SPAM or NOT_SPAM. Return only the classification and confidence (0-1).
Content: ${content}`,
    }],
    max_tokens: 50,
  });

  // Parse response
  const isSpam = result.response.toLowerCase().includes("spam");
  return { isSpam, confidence: 0.9 };
}

10. RAG Pipeline (with Vectorize)

Retrieval-Augmented Generation for intelligent support:


async function ragAnswer(env: Env, question: string, tenantId: string) {
  // 1. Embed the question
  const queryVector = await embed(env, question);

  // 2. Find relevant documents
  const docs = await env.KNOWLEDGE_INDEX.query(queryVector, {
    topK: 5,
    namespace: tenantId,
    returnMetadata: "all",
  });

  // 3. Build context
  const context = docs.matches.map(d => d.metadata?.content).join("\n\n");

  // 4. Generate answer with context
  const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      {
        role: "system",
        content: `Answer the user's question using ONLY the provided context. If the answer isn't in the context, say "I don't know."

Context:
${context}`,
      },
      { role: "user", content: question },
    ],
    max_tokens: 500,
  });

  return {
    answer: result.response,
    sources: docs.matches.map(d => ({ id: d.id, score: d.score, title: d.metadata?.title })),
  };
}

Model Selection Guide

Text Generation

ModelBest ForSpeedQuality
`llama-3.1-8b-instruct`General tasks, descriptionsFastGood
`llama-3.3-70b-instruct-fp8-fast`Complex reasoningSlowExcellent
`mistral-small-3.1-24b-instruct`Vision + text, function callingMediumVery good
`gemma-3-12b-it`Multilingual, 140+ languagesMediumVery good
`hermes-2-pro-mistral-7b`Function calling, JSON outputFastGood

Embeddings

ModelDimensionsBest For
`bge-small-en-v1.5`384Fast English embeddings
`bge-base-en-v1.5`768Balanced English
`bge-m3`Variable**Multilingual** (best for FR+EN)
`embeddinggemma-300m`Variable100+ languages, state-of-art

Specialized

ModelUse Case
`llama-guard-3-8b`Content safety classification
`distilbert-sst-2-int8`Sentiment analysis
`m2m100-1.2b`Translation
`whisper-large-v3-turbo`Speech-to-text
`aura-2-en` / `aura-2-es`Text-to-speech
`flux-1-schnell`Image generation
`llama-3.2-11b-vision-instruct`Image understanding

Agent Integration Map

AgentWorkers AI UseModel
ContentManagementAgentDescription gen, SEO, alt textllama-3.1-8b, vision
CustomerOperationsAgentSentiment, ticket routingdistilbert, llama-3.1-8b
InventoryPricingAgentDemand forecasting contextllama-3.1-8b
MarketingAgentCopy gen, image gen, A/B textllama-3.1-8b, flux
OrderManagementAgentFraud detection text analysisllama-3.1-8b
CityDescriptionAgentBusiness descriptionsllama-3.1-8b (replaces OpenAI)
CityImageEnrichmentAgentAlt text, image classificationvision model

Pricing

PlanIncludedOverage
Free10,000 Neurons/dayN/A
Paid10,000 Neurons/day$0.011/1K Neurons

\1 (approximate per operation):

\1: $5-30/mo for moderate usage (most small tasks fit in free tier).

Migration Strategy

Phase 1: New Capabilities (no OpenAI replacement)

Add Workers AI for new features that don't exist yet:

Phase 2: Supplementary Use

Use Workers AI alongside OpenAI:

Phase 3: Primary AI Provider

Evaluate replacing OpenAI for most tasks:

Estimated Impact