Migration Roadmap

Overview

Phased rollout plan for integrating Cloudflare Workers data platform into Company Manager. Each phase is independently valuable and builds on the previous.

Prerequisites

[x] Cloudflare account with Workers Paid plan ($5/mo)
[x] Existing Workers infrastructure (4 Workers deployed)
[ ] Wrangler CLI installed and configured
[ ] API tokens for R2, KV, D1, AI access
[ ] R2 bucket created for media storage

Phase 1: Foundation (Weeks 1-4)

\1: Immediate cost savings and performance wins with minimal risk.

1.1 Hyperdrive for Existing Workers (Week 1)

\1: 1-2 days | \1: Low | \1: High

[ ] Create Hyperdrive config pointing to Neon PostgreSQL
[ ] Add Hyperdrive binding to all 4 existing Workers
[ ] Update Worker DB access to use `env.HYPERDRIVE.connectionString`
[ ] Verify connection pooling in `pg_stat_activity`
[ ] Configure cache TTL for read-heavy queries

\1: Worker response times drop, Neon connection count stabilizes.

1.2 R2 Media Storage Migration (Weeks 1-3)

\1: 3-5 days | \1: Low | \1: High (cost)

[ ] Create R2 bucket (`company-manager-media`)
[ ] Generate S3-compatible API credentials
[ ] Update S3 client configuration (endpoint + credentials)
[ ] Enable dual-write (S3 + R2)
[ ] Run R2 Super Slurper to copy existing S3 data
[ ] Switch reads to R2 (with S3 fallback)
[ ] Verify all file operations work
[ ] Disable S3 writes, R2 is primary
[ ] Set up lifecycle rules for temp files

\1: All file uploads/downloads work, S3 egress drops to zero.

1.3 KV Edge Caching (Weeks 2-4)

\1: 3-5 days | \1: Low | \1: Medium

[ ] Create KV namespace (`CACHE`)
[ ] Implement cache-aside helper for TRPC routers
[ ] Cache Tier 1 data: tenant config, permissions, menu items, feature flags
[ ] Add cache invalidation on write paths
[ ] Add KV binding to existing Workers
[ ] Monitor cache hit rates

\1: DB read count drops 30-50% for cached patterns, response times improve.

Phase 1 Milestone


Estimated monthly savings:
- S3 egress: -$30-100 (depending on traffic)
- DB connections: more stable, fewer cold starts
- Read latency: 50ms → 5ms for cached data
- Total CF cost: ~$5 (base plan) + ~$5-15 (KV/R2 usage) = ~$20
- Net savings: $20-80/month + significant performance improvement

Phase 2: New Capabilities (Weeks 5-10)

\1: Add capabilities that don't exist today.

2.1 Vectorize + Workers AI Semantic Search (Weeks 5-7)

\1: 1-2 weeks | \1: Medium | \1: High

[ ] Create search Worker (`search-worker`)
[ ] Configure Workers AI binding + Vectorize index
[ ] Choose embedding model (bge-m3 for multilingual FR+EN)
[ ] Build embedding pipeline for articles
[ ] Implement semantic search endpoint
[ ] Add "similar articles" recommendation
[ ] Build product embedding pipeline
[ ] Implement product recommendations
[ ] Integrate with existing TRPC search router (hybrid search)

\1: Search returns semantically relevant results. "Similar articles" works.

2.2 Workers AI for Agent Tasks (Weeks 6-8)

\1: 1 week per use case | \1: Low | \1: Medium

[ ] Add content moderation to chat-worker
[ ] Implement image alt text generation via vision model
[ ] Add sentiment analysis for customer tickets
[ ] Set up translation endpoint (FR ↔ EN)
[ ] Compare Workers AI quality vs. OpenAI for description generation
[ ] Create AI Gateway for monitoring and caching

\1: New AI capabilities work. OpenAI costs decrease for migrated tasks.

2.3 Cloudflare Queues (Weeks 8-10)

\1: 2-3 weeks | \1: Medium | \1: High

[ ] Create queues (email, image-processing, analytics)
[ ] Build email sending pipeline with DLQ
[ ] Build image processing pipeline (R2 event → Queue → process)
[ ] Add AI agent task queue
[ ] Test retry behavior and DLQ handling
[ ] Migrate one cron job to Queue + Scheduled Worker
[ ] Monitor queue metrics in dashboard

\1: Email delivery is reliable with retries. Image processing is async. One cron job replaced.

Phase 2 Milestone


New capabilities added:
- Semantic search across articles and products
- Product recommendations ("similar items")
- Content moderation in live chat
- Image alt text auto-generation
- Reliable email delivery with DLQ
- Async image processing pipeline

Phase 3: Infrastructure Evolution (Weeks 11-18)

\1: Replace legacy patterns with modern CF equivalents.

3.1 Durable Objects Extensions (Weeks 11-13)

\1: 1-2 weeks per DO class | \1: Medium | \1: Medium

[ ] POS session coordination DO
[ ] Real-time notification hub DO
[ ] Collaborative document editing DO (for canvas)
[ ] Test WebSocket hibernation patterns
[ ] Connect DOs to Hyperdrive for DB access

\1: Real-time features work cross-instance. POS terminals stay in sync.

3.2 Cloudflare Workflows (Weeks 13-16)

\1: 2-4 weeks | \1: Medium | \1: High

[ ] Build Prestashop sync workflow (replace shell scripts)
[ ] Build AI decision workflow (approval + rollback)
[ ] Build data import workflow (CSV with progress)
[ ] Build tenant onboarding workflow (multi-day)
[ ] Build supplier order workflow (with human-in-the-loop)
[ ] Set up monitoring and alerting

\1: Sync jobs are crash-proof. Progress is trackable. Approvals work.

3.3 Analytics Engine (Weeks 14-18)

\1: 2-3 weeks | \1: Low | \1: High

[ ] Define analytics datasets (POS, page views, API, agents)
[ ] Instrument POS transactions to write data points
[ ] Instrument page views
[ ] Instrument agent actions
[ ] Build SQL API query service
[ ] Replace POS analytics SQL queries with AE queries
[ ] Build custom analytics dashboard

\1: Analytics queries don't hit PostgreSQL. Dashboard loads 10x faster.

Phase 3 Milestone


Infrastructure modernized:
- Cron jobs → Durable Workflows (crash-proof)
- Bull/Redis → CF Queues (managed)
- POS analytics → Analytics Engine (10x faster)
- Real-time → Durable Objects (cross-instance)
- Total CF cost: ~$30-50/month
- Upstash Redis can be downgraded/removed: -$10-30/month

Phase 4: Advanced (Weeks 19+)

\1: Long-term strategic capabilities.

4.1 D1 Per-Tenant Edge Database (Weeks 19-22)

\1: 3-4 weeks | \1: High | \1: High

[ ] Design D1 schema for hot data (products, config, permissions)
[ ] Build Neon → D1 sync pipeline
[ ] Create per-tenant D1 databases for active tenants
[ ] Add read path: Worker → D1 (with Neon fallback)
[ ] Test Time Travel recovery
[ ] Monitor sync lag and consistency

4.2 Data Platform Lakehouse (Weeks 22+)

\1: 4-6 weeks | \1: Medium | \1: Medium

[ ] Set up Pipelines for event streaming
[ ] Define Iceberg tables (orders, products, activity)
[ ] Build ingestion from Workers
[ ] Configure R2 SQL query engine
[ ] Connect BI tool (Metabase/Superset)
[ ] Build platform-wide analytics (cross-tenant)

4.3 Full Queue Migration (Weeks 24+)

[ ] Migrate remaining Bull jobs to CF Queues
[ ] Migrate all 3 cron jobs to Workflows
[ ] Remove Upstash Redis dependency entirely
[ ] Remove Bull package dependency

Decision Matrix

When to Start Each Phase

Phase	Trigger	Dependencies
1 (Foundation)	Now	CF account + Workers plan
2 (Capabilities)	Phase 1 validated	R2 + KV working
3 (Evolution)	Business need	Phase 1-2 stable
4 (Advanced)	Scale demand	Phase 1-3 mature

Risk/Reward Matrix


High Reward ─────────────────────────────┐
│                                         │
│  Hyperdrive ●          Workflows ●      │
│  R2 ●                  Queues ●         │
│  KV ●          Analytics Engine ●       │
│                                         │
│  Vectorize ●   Durable Objects ●        │
│  Workers AI ●                           │
│                                         │
│              D1 ●    Data Platform ●    │
│                                         │
Low Reward ──────────────────────────────┘
Low Risk ──────────────────── High Risk

Environment Variables (New)


# apps/app/.env (additions)

# Cloudflare Account
CF_ACCOUNT_ID=
CF_API_TOKEN=

# R2 (S3-compatible)
R2_ACCESS_KEY_ID=
R2_SECRET_ACCESS_KEY=
R2_BUCKET_NAME=company-manager-media
R2_ENDPOINT=https://${CF_ACCOUNT_ID}.r2.cloudflarestorage.com

# KV
KV_NAMESPACE_ID=
KV_API_TOKEN=

# Hyperdrive
HYPERDRIVE_ID=

# Workers AI (for REST API access from Next.js)
CF_AI_API_TOKEN=

# Search Worker
SEARCH_WORKER_URL=https://search-worker.wd29.workers.dev

# Analytics Engine
AE_DATASET_TOKEN=

Monitoring & Observability

Tool	What to Monitor
CF Dashboard → Workers	Request count, error rate, CPU time
CF Dashboard → KV	Hit rate, operations, storage
CF Dashboard → R2	Storage, operations, egress (should be $0)
CF Dashboard → Queues	Queue depth, DLQ messages, consumer lag
CF Dashboard → D1	Queries, storage, read/write ops
Neon Dashboard	Connection count (should decrease with Hyperdrive)
Application logs	Cache hit/miss ratios, sync status

Rollback Plan

Each integration has an independent rollback path:

Integration	Rollback
Hyperdrive	Remove binding, Workers use direct connection
R2	Switch S3 client back to AWS endpoint
KV	Bypass cache, all reads go to DB
Vectorize	Disable semantic search, use keyword-only
Workers AI	Revert to OpenAI API calls
Queues	Switch back to Bull/Redis
Workflows	Re-enable cron scripts
D1	All reads go to Neon (no edge replica)
Analytics Engine	Revert to PostgreSQL analytics queries

No integration requires a hard cutover. All can run in parallel with the existing system during migration.