Migration Roadmap
Overview
Phased rollout plan for integrating Cloudflare Workers data platform into Company Manager. Each phase is independently valuable and builds on the previous.
Prerequisites
- [x] Cloudflare account with Workers Paid plan ($5/mo)
- [x] Existing Workers infrastructure (4 Workers deployed)
- [ ] Wrangler CLI installed and configured
- [ ] API tokens for R2, KV, D1, AI access
- [ ] R2 bucket created for media storage
Phase 1: Foundation (Weeks 1-4)
\1: Immediate cost savings and performance wins with minimal risk.
1.1 Hyperdrive for Existing Workers (Week 1)
\1: 1-2 days | \1: Low | \1: High
- [ ] Create Hyperdrive config pointing to Neon PostgreSQL
- [ ] Add Hyperdrive binding to all 4 existing Workers
- [ ] Update Worker DB access to use `env.HYPERDRIVE.connectionString`
- [ ] Verify connection pooling in `pg_stat_activity`
- [ ] Configure cache TTL for read-heavy queries
\1: Worker response times drop, Neon connection count stabilizes.
1.2 R2 Media Storage Migration (Weeks 1-3)
\1: 3-5 days | \1: Low | \1: High (cost)
- [ ] Create R2 bucket (`company-manager-media`)
- [ ] Generate S3-compatible API credentials
- [ ] Update S3 client configuration (endpoint + credentials)
- [ ] Enable dual-write (S3 + R2)
- [ ] Run R2 Super Slurper to copy existing S3 data
- [ ] Switch reads to R2 (with S3 fallback)
- [ ] Verify all file operations work
- [ ] Disable S3 writes, R2 is primary
- [ ] Set up lifecycle rules for temp files
\1: All file uploads/downloads work, S3 egress drops to zero.
1.3 KV Edge Caching (Weeks 2-4)
\1: 3-5 days | \1: Low | \1: Medium
- [ ] Create KV namespace (`CACHE`)
- [ ] Implement cache-aside helper for TRPC routers
- [ ] Cache Tier 1 data: tenant config, permissions, menu items, feature flags
- [ ] Add cache invalidation on write paths
- [ ] Add KV binding to existing Workers
- [ ] Monitor cache hit rates
\1: DB read count drops 30-50% for cached patterns, response times improve.
Phase 1 Milestone
Estimated monthly savings:
- S3 egress: -$30-100 (depending on traffic)
- DB connections: more stable, fewer cold starts
- Read latency: 50ms → 5ms for cached data
- Total CF cost: ~$5 (base plan) + ~$5-15 (KV/R2 usage) = ~$20
- Net savings: $20-80/month + significant performance improvement
Phase 2: New Capabilities (Weeks 5-10)
\1: Add capabilities that don't exist today.
2.1 Vectorize + Workers AI Semantic Search (Weeks 5-7)
\1: 1-2 weeks | \1: Medium | \1: High
- [ ] Create search Worker (`search-worker`)
- [ ] Configure Workers AI binding + Vectorize index
- [ ] Choose embedding model (bge-m3 for multilingual FR+EN)
- [ ] Build embedding pipeline for articles
- [ ] Implement semantic search endpoint
- [ ] Add "similar articles" recommendation
- [ ] Build product embedding pipeline
- [ ] Implement product recommendations
- [ ] Integrate with existing TRPC search router (hybrid search)
\1: Search returns semantically relevant results. "Similar articles" works.
2.2 Workers AI for Agent Tasks (Weeks 6-8)
\1: 1 week per use case | \1: Low | \1: Medium
- [ ] Add content moderation to chat-worker
- [ ] Implement image alt text generation via vision model
- [ ] Add sentiment analysis for customer tickets
- [ ] Set up translation endpoint (FR ↔ EN)
- [ ] Compare Workers AI quality vs. OpenAI for description generation
- [ ] Create AI Gateway for monitoring and caching
\1: New AI capabilities work. OpenAI costs decrease for migrated tasks.
2.3 Cloudflare Queues (Weeks 8-10)
\1: 2-3 weeks | \1: Medium | \1: High
- [ ] Create queues (email, image-processing, analytics)
- [ ] Build email sending pipeline with DLQ
- [ ] Build image processing pipeline (R2 event → Queue → process)
- [ ] Add AI agent task queue
- [ ] Test retry behavior and DLQ handling
- [ ] Migrate one cron job to Queue + Scheduled Worker
- [ ] Monitor queue metrics in dashboard
\1: Email delivery is reliable with retries. Image processing is async. One cron job replaced.
Phase 2 Milestone
New capabilities added:
- Semantic search across articles and products
- Product recommendations ("similar items")
- Content moderation in live chat
- Image alt text auto-generation
- Reliable email delivery with DLQ
- Async image processing pipeline
Phase 3: Infrastructure Evolution (Weeks 11-18)
\1: Replace legacy patterns with modern CF equivalents.
3.1 Durable Objects Extensions (Weeks 11-13)
\1: 1-2 weeks per DO class | \1: Medium | \1: Medium
- [ ] POS session coordination DO
- [ ] Real-time notification hub DO
- [ ] Collaborative document editing DO (for canvas)
- [ ] Test WebSocket hibernation patterns
- [ ] Connect DOs to Hyperdrive for DB access
\1: Real-time features work cross-instance. POS terminals stay in sync.
3.2 Cloudflare Workflows (Weeks 13-16)
\1: 2-4 weeks | \1: Medium | \1: High
- [ ] Build Prestashop sync workflow (replace shell scripts)
- [ ] Build AI decision workflow (approval + rollback)
- [ ] Build data import workflow (CSV with progress)
- [ ] Build tenant onboarding workflow (multi-day)
- [ ] Build supplier order workflow (with human-in-the-loop)
- [ ] Set up monitoring and alerting
\1: Sync jobs are crash-proof. Progress is trackable. Approvals work.
3.3 Analytics Engine (Weeks 14-18)
\1: 2-3 weeks | \1: Low | \1: High
- [ ] Define analytics datasets (POS, page views, API, agents)
- [ ] Instrument POS transactions to write data points
- [ ] Instrument page views
- [ ] Instrument agent actions
- [ ] Build SQL API query service
- [ ] Replace POS analytics SQL queries with AE queries
- [ ] Build custom analytics dashboard
\1: Analytics queries don't hit PostgreSQL. Dashboard loads 10x faster.
Phase 3 Milestone
Infrastructure modernized:
- Cron jobs → Durable Workflows (crash-proof)
- Bull/Redis → CF Queues (managed)
- POS analytics → Analytics Engine (10x faster)
- Real-time → Durable Objects (cross-instance)
- Total CF cost: ~$30-50/month
- Upstash Redis can be downgraded/removed: -$10-30/month
Phase 4: Advanced (Weeks 19+)
\1: Long-term strategic capabilities.
4.1 D1 Per-Tenant Edge Database (Weeks 19-22)
\1: 3-4 weeks | \1: High | \1: High
- [ ] Design D1 schema for hot data (products, config, permissions)
- [ ] Build Neon → D1 sync pipeline
- [ ] Create per-tenant D1 databases for active tenants
- [ ] Add read path: Worker → D1 (with Neon fallback)
- [ ] Test Time Travel recovery
- [ ] Monitor sync lag and consistency
4.2 Data Platform Lakehouse (Weeks 22+)
\1: 4-6 weeks | \1: Medium | \1: Medium
- [ ] Set up Pipelines for event streaming
- [ ] Define Iceberg tables (orders, products, activity)
- [ ] Build ingestion from Workers
- [ ] Configure R2 SQL query engine
- [ ] Connect BI tool (Metabase/Superset)
- [ ] Build platform-wide analytics (cross-tenant)
4.3 Full Queue Migration (Weeks 24+)
- [ ] Migrate remaining Bull jobs to CF Queues
- [ ] Migrate all 3 cron jobs to Workflows
- [ ] Remove Upstash Redis dependency entirely
- [ ] Remove Bull package dependency
Decision Matrix
When to Start Each Phase
| Phase | Trigger | Dependencies |
|---|---|---|
| 1 (Foundation) | **Now** | CF account + Workers plan |
| 2 (Capabilities) | Phase 1 validated | R2 + KV working |
| 3 (Evolution) | Business need | Phase 1-2 stable |
| 4 (Advanced) | Scale demand | Phase 1-3 mature |
Risk/Reward Matrix
High Reward ─────────────────────────────┐
│ │
│ Hyperdrive ● Workflows ● │
│ R2 ● Queues ● │
│ KV ● Analytics Engine ● │
│ │
│ Vectorize ● Durable Objects ● │
│ Workers AI ● │
│ │
│ D1 ● Data Platform ● │
│ │
Low Reward ──────────────────────────────┘
Low Risk ──────────────────── High Risk
Environment Variables (New)
# apps/app/.env (additions)
# Cloudflare Account
CF_ACCOUNT_ID=
CF_API_TOKEN=
# R2 (S3-compatible)
R2_ACCESS_KEY_ID=
R2_SECRET_ACCESS_KEY=
R2_BUCKET_NAME=company-manager-media
R2_ENDPOINT=https://${CF_ACCOUNT_ID}.r2.cloudflarestorage.com
# KV
KV_NAMESPACE_ID=
KV_API_TOKEN=
# Hyperdrive
HYPERDRIVE_ID=
# Workers AI (for REST API access from Next.js)
CF_AI_API_TOKEN=
# Search Worker
SEARCH_WORKER_URL=https://search-worker.wd29.workers.dev
# Analytics Engine
AE_DATASET_TOKEN=
Monitoring & Observability
| Tool | What to Monitor |
|---|---|
| CF Dashboard → Workers | Request count, error rate, CPU time |
| CF Dashboard → KV | Hit rate, operations, storage |
| CF Dashboard → R2 | Storage, operations, egress (should be $0) |
| CF Dashboard → Queues | Queue depth, DLQ messages, consumer lag |
| CF Dashboard → D1 | Queries, storage, read/write ops |
| Neon Dashboard | Connection count (should decrease with Hyperdrive) |
| Application logs | Cache hit/miss ratios, sync status |
Rollback Plan
Each integration has an independent rollback path:
| Integration | Rollback |
|---|---|
| Hyperdrive | Remove binding, Workers use direct connection |
| R2 | Switch S3 client back to AWS endpoint |
| KV | Bypass cache, all reads go to DB |
| Vectorize | Disable semantic search, use keyword-only |
| Workers AI | Revert to OpenAI API calls |
| Queues | Switch back to Bull/Redis |
| Workflows | Re-enable cron scripts |
| D1 | All reads go to Neon (no edge replica) |
| Analytics Engine | Revert to PostgreSQL analytics queries |
No integration requires a hard cutover. All can run in parallel with the existing system during migration.