Workers AI Edge Inference
Priority: P1 (High Value)
What is Workers AI?
Serverless AI inference on Cloudflare's GPU network. Run 50+ open-source models (LLMs, embeddings, image gen, TTS, STT, classification) with no infrastructure management. Pay per inference.
Why This Matters for Company Manager
Current AI Usage
| Service | Current Provider | Model | Use Case |
|---|---|---|---|
| Content generation | OpenAI API | GPT-4o-mini | Product descriptions, SEO |
| City descriptions | OpenAI API | GPT-4o-mini | Business/association descriptions |
| OCR | Tesseract | Local | Document processing |
| Tool discovery | OpenAI API | Function calling | TRPC tool auto-generation |
| Press AI | OpenAI API | Various | Layout detection, segmentation |
Benefits of Workers AI
| Factor | OpenAI API | Workers AI |
|---|---|---|
| Latency | 200-500ms (US East) | 10-50ms (nearest PoP) |
| Privacy | Data sent to OpenAI | Data stays on Cloudflare |
| Cost (small tasks) | ~$0.15/M input tokens | 10K Neurons/day free |
| Availability | Rate limits | Edge-distributed |
| Model lock-in | OpenAI only | 50+ open models |
| Streaming | Yes | Yes (SSE) |
Integration Opportunities
1. Content Moderation (Live Chat)
The chat-worker already exists. Add real-time content moderation:
// In chat-worker
async function moderateMessage(env: Env, message: string): Promise<boolean> {
const result = await env.AI.run("@cf/meta/llama-guard-3-8b", {
messages: [{ role: "user", content: message }],
});
// Llama Guard returns safety classification
return result.response.includes("safe");
}
export default {
async fetch(request: Request, env: Env) {
const { message, tenantId } = await request.json();
if (!await moderateMessage(env, message)) {
return Response.json({ blocked: true, reason: "content_policy" });
}
// Forward to Durable Object for broadcast
// ...
},
};
\1: Sub-10ms moderation at edge, no external API call.
2. Product Description Generation
Replace OpenAI for ContentManagementAgent's description generation:
// In content-worker or via REST API
async function generateProductDescription(
env: Env,
product: { name: string; category: string; attributes: Record<string, string> }
): Promise<string> {
const prompt = `Write a compelling product description for an e-commerce store.
Product: ${product.name}
Category: ${product.category}
Attributes: ${JSON.stringify(product.attributes)}
Write in French. Keep it under 200 words. Be engaging and SEO-friendly.`;
const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [
{ role: "system", content: "You are an expert e-commerce copywriter." },
{ role: "user", content: prompt },
],
max_tokens: 500,
});
return result.response;
}
\1: Llama 3.1 8B is very cheap; most descriptions fit in free tier.
3. SEO Optimization
Auto-generate meta descriptions, titles, and keywords:
async function generateSEO(env: Env, content: { title: string; body: string }) {
const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [{
role: "user",
content: `Generate SEO metadata for this article:
Title: ${content.title}
Content: ${content.body.slice(0, 2000)}
Return JSON: { "metaTitle": "...", "metaDescription": "...", "keywords": ["..."] }`,
}],
max_tokens: 300,
});
return JSON.parse(result.response);
}
4. Image Alt Text Generation
For city portal image enrichment and product images:
async function generateAltText(env: Env, imageUrl: string): Promise<string> {
// Fetch image
const imageResponse = await fetch(imageUrl);
const imageBuffer = await imageResponse.arrayBuffer();
const base64 = btoa(String.fromCharCode(...new Uint8Array(imageBuffer)));
const result = await env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {
messages: [{
role: "user",
content: [
{ type: "text", text: "Describe this image in one sentence for use as alt text." },
{ type: "image_url", image_url: { url: `data:image/jpeg;base64,${base64}` } },
],
}],
max_tokens: 100,
});
return result.response;
}
\1: Replace manual alt text workflow in CityImageEnrichmentAgent.
5. Sentiment Analysis for Customer Operations
For CustomerOperationsAgent ticket processing:
async function analyzeSentiment(env: Env, text: string) {
const result = await env.AI.run("@cf/huggingface/distilbert-sst-2-int8", {
text: text,
});
return {
label: result[0].label, // "POSITIVE" or "NEGATIVE"
score: result[0].score, // confidence 0-1
};
}
6. Translation (Multilingual Support)
For content translation between French and English:
async function translate(env: Env, text: string, from: string, to: string) {
const result = await env.AI.run("@cf/meta/m2m100-1.2b", {
text: text,
source_lang: from,
target_lang: to,
});
return result.translated_text;
}
7. Text-to-Speech for Press Center
Audio articles from press center content:
async function generateAudio(env: Env, text: string): Promise<ArrayBuffer> {
const result = await env.AI.run("@cf/deepgram/aura-2-en", {
text: text,
});
return result; // Audio buffer (WAV format)
}
8. Image Generation for Marketing
For MarketingAgent campaign visuals:
async function generateImage(env: Env, prompt: string): Promise<ArrayBuffer> {
const result = await env.AI.run("@cf/black-forest-labs/flux-1-schnell", {
prompt: prompt,
num_steps: 4, // Schnell is optimized for few steps
});
return result; // PNG image buffer
}
9. Spam/Fraud Detection
For classified ads and order management:
async function detectSpam(env: Env, content: string): Promise<{ isSpam: boolean; confidence: number }> {
const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [{
role: "user",
content: `Classify this content as SPAM or NOT_SPAM. Return only the classification and confidence (0-1).
Content: ${content}`,
}],
max_tokens: 50,
});
// Parse response
const isSpam = result.response.toLowerCase().includes("spam");
return { isSpam, confidence: 0.9 };
}
10. RAG Pipeline (with Vectorize)
Retrieval-Augmented Generation for intelligent support:
async function ragAnswer(env: Env, question: string, tenantId: string) {
// 1. Embed the question
const queryVector = await embed(env, question);
// 2. Find relevant documents
const docs = await env.KNOWLEDGE_INDEX.query(queryVector, {
topK: 5,
namespace: tenantId,
returnMetadata: "all",
});
// 3. Build context
const context = docs.matches.map(d => d.metadata?.content).join("\n\n");
// 4. Generate answer with context
const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [
{
role: "system",
content: `Answer the user's question using ONLY the provided context. If the answer isn't in the context, say "I don't know."
Context:
${context}`,
},
{ role: "user", content: question },
],
max_tokens: 500,
});
return {
answer: result.response,
sources: docs.matches.map(d => ({ id: d.id, score: d.score, title: d.metadata?.title })),
};
}
Model Selection Guide
Text Generation
| Model | Best For | Speed | Quality |
|---|---|---|---|
| `llama-3.1-8b-instruct` | General tasks, descriptions | Fast | Good |
| `llama-3.3-70b-instruct-fp8-fast` | Complex reasoning | Slow | Excellent |
| `mistral-small-3.1-24b-instruct` | Vision + text, function calling | Medium | Very good |
| `gemma-3-12b-it` | Multilingual, 140+ languages | Medium | Very good |
| `hermes-2-pro-mistral-7b` | Function calling, JSON output | Fast | Good |
Embeddings
| Model | Dimensions | Best For |
|---|---|---|
| `bge-small-en-v1.5` | 384 | Fast English embeddings |
| `bge-base-en-v1.5` | 768 | Balanced English |
| `bge-m3` | Variable | **Multilingual** (best for FR+EN) |
| `embeddinggemma-300m` | Variable | 100+ languages, state-of-art |
Specialized
| Model | Use Case |
|---|---|
| `llama-guard-3-8b` | Content safety classification |
| `distilbert-sst-2-int8` | Sentiment analysis |
| `m2m100-1.2b` | Translation |
| `whisper-large-v3-turbo` | Speech-to-text |
| `aura-2-en` / `aura-2-es` | Text-to-speech |
| `flux-1-schnell` | Image generation |
| `llama-3.2-11b-vision-instruct` | Image understanding |
Agent Integration Map
| Agent | Workers AI Use | Model |
|---|---|---|
| ContentManagementAgent | Description gen, SEO, alt text | llama-3.1-8b, vision |
| CustomerOperationsAgent | Sentiment, ticket routing | distilbert, llama-3.1-8b |
| InventoryPricingAgent | Demand forecasting context | llama-3.1-8b |
| MarketingAgent | Copy gen, image gen, A/B text | llama-3.1-8b, flux |
| OrderManagementAgent | Fraud detection text analysis | llama-3.1-8b |
| CityDescriptionAgent | Business descriptions | llama-3.1-8b (replaces OpenAI) |
| CityImageEnrichmentAgent | Alt text, image classification | vision model |
Pricing
| Plan | Included | Overage |
|---|---|---|
| Free | 10,000 Neurons/day | N/A |
| Paid | 10,000 Neurons/day | $0.011/1K Neurons |
\1 (approximate per operation):
- Text generation (8B model, 100 tokens): ~5-10 Neurons
- Embedding (single text): ~1-2 Neurons
- Image generation: ~50-100 Neurons
- Sentiment classification: ~1 Neuron
\1: $5-30/mo for moderate usage (most small tasks fit in free tier).
Migration Strategy
Phase 1: New Capabilities (no OpenAI replacement)
Add Workers AI for new features that don't exist yet:
- Content moderation in chat-worker
- Sentiment analysis for tickets
- Image alt text generation
Phase 2: Supplementary Use
Use Workers AI alongside OpenAI:
- Edge-local tasks → Workers AI (latency-sensitive)
- Complex reasoning → OpenAI GPT-4 (quality-sensitive)
- Cost optimization → Workers AI for high-volume, low-complexity tasks
Phase 3: Primary AI Provider
Evaluate replacing OpenAI for most tasks:
- Description generation → Llama 3.1 / Gemma 3
- Translation → M2M100
- SEO → Llama 3.1
- Keep OpenAI only for tasks requiring GPT-4 quality
Estimated Impact
- **New capabilities**: Content moderation, sentiment analysis, image understanding
- **Latency**: 200-500ms → 10-50ms for edge inference
- **Cost**: Reduce OpenAI spend by 50-80% for routine tasks
- **Privacy**: Sensitive data stays on Cloudflare (no third-party API)
- **Effort**: 1-2 weeks per use case, can be rolled out incrementally