Quick Definition (30–60 words)
Candidate Generation is the stage that retrieves and proposes a manageable set of candidate items for downstream ranking or processing. Analogy: a recruiter shortlists applicants before final interviews. Formal technical line: it is a high-recall, low-latency retrieval layer that filters a very large universe into a feed of candidates for scoring or action.
What is Candidate Generation?
Candidate Generation is the system or stage that produces a bounded set of potential items, entities, or actions from a very large corpus to be further ranked, scored, or processed. It emphasizes recall and throughput while keeping latency and cost controlled.
What it is NOT
- Not the final decision maker. It does not produce the final ranked result except in trivial systems.
- Not a full ranking model. It is typically lighter-weight and focused on coverage.
- Not a storage layer, though it often queries indexes or stores.
Key properties and constraints
- High recall within latency and cost constraints.
- Deterministic vs stochastic behaviors depending on business needs.
- Resource-efficient: uses indexes, precomputations, caches.
- Observable: telemetry must capture throughput, recall proxies, and failure rates.
- Secure and privacy-aware: must respect filters, access control, and PII constraints.
Where it fits in modern cloud/SRE workflows
- Deployed as a service or microservice (Kubernetes, serverless, or managed PaaS).
- Integrated with feature stores, vector indexes, search backends, and ranking pipelines.
- Part of CI/CD, observability, and incident response workflows.
- Often scaled separately from ranking due to different latency and compute profiles.
Text-only “diagram description” readers can visualize
- Input stream of user/context signals flows into Candidate Generation service.
- The service queries indexes and caches, runs lightweight models, applies filters, and returns ~N candidates to the Ranking service.
- The Ranking service scores candidates, applies business rules, and returns final results to the client.
- Telemetry and logs feed an observability plane; feature and item stores provide data; CI/CD pipelines manage deploys.
Candidate Generation in one sentence
Candidate Generation is the retrieval stage that selects a manageable set of potentially relevant items from a large corpus for downstream ranking and decisioning.
Candidate Generation vs related terms (TABLE REQUIRED)
| ID | Term | How it differs from Candidate Generation | Common confusion |
|---|---|---|---|
| T1 | Ranking | Focuses on ordering and scoring candidates | Often conflated with retrieval |
| T2 | Indexing | Prepares searchable structures for retrieval | Not the live decision process |
| T3 | Feature Store | Stores features used to score items | Not responsible for retrieval |
| T4 | Re-ranking | Lightweight ordering after main ranking | Mistaken as initial retrieval |
| T5 | Filtering | Removes items by hard rules | Filtering is a subset of candidate generation |
| T6 | Recall | Metric about coverage not a system | People call recall a service |
| T7 | Search | Broad user-facing retrieval experience | Candidate Generation can be internal |
| T8 | Recommendation | Full system including ranking and UX | Candidate Generation is one component |
| T9 | Exploration | Intentionally diverse sampling step | Confused with deterministic retrieval |
| T10 | Index Shard | Physical partition of index storage | Not the algorithmic layer |
Row Details (only if any cell says “See details below”)
- None
Why does Candidate Generation matter?
Business impact
- Revenue: Better candidates increase conversion and engagement by improving the pool that ranking can optimize.
- Trust: Reduces irrelevant or unsafe suggestions, maintaining user trust.
- Risk: Poor candidate pools can amplify bias, privacy leaks, or regulatory non-compliance.
Engineering impact
- Incident reduction: Efficient retrieval reduces downstream overload and cascading failures.
- Velocity: Decouples retrieval from ranking so teams can iterate independently.
- Cost control: Appropriately tuned candidate generation limits expensive scoring work.
SRE framing
- SLIs/SLOs: Candidate availability, candidate latency, candidate recall proxy.
- Error budgets: Candidate generation failures can quickly burn budgets due to user-visible degradation.
- Toil: Manual re-tuning of heuristics is toil unless automated.
- On-call: Pager for severe failures like index unavailability or high-latency retrievals.
3–5 realistic “what breaks in production” examples
- Index corruption or mis-sharding causing near-zero candidates returned.
- Cache stampede during a promotional spike causing high latencies and downstream timeouts.
- Model or feature change reduces recall leading to lower revenue.
- Unintended filter or permission change hides relevant content, causing user complaints.
- Cost runaway when candidate set sizes are too large, increasing compute for ranking.
Where is Candidate Generation used? (TABLE REQUIRED)
| ID | Layer-Area | How Candidate Generation appears | Typical telemetry | Common tools |
|---|---|---|---|---|
| L1 | Edge-Network | Gateway-level routing to candidate services | Request latency and qps | Envoy Istio |
| L2 | Service-API | Microservice endpoint returning candidates | Latency errors throughput | gRPC HTTP servers |
| L3 | Application | Component in app logic creating candidate lists | Candidate count histogram | Application logs |
| L4 | Data-Infrastructure | Indexing and vector stores used by retrieval | Index size update rate | VectorDB search engine |
| L5 | Cloud-Infrastructure | Autoscaling groups and caches supporting retrieval | CPU memory autoscale events | K8s Autoscaler |
| L6 | Kubernetes | Pods running retrieval components | Pod restarts and OOMs | Kubelet Prometheus |
| L7 | Serverless-PaaS | On-demand candidate functions | Cold starts and invocation count | Managed functions |
| L8 | CI-CD | Tests and deploys for candidate generation code | Test pass rate deploy latency | CI pipelines |
| L9 | Observability | Dashboards and traces for retrieval | Trace spans and error rates | Tracing APM |
| L10 | Security | ACLs and filters applied during retrieval | Auth failures and access logs | IAM WAF |
Row Details (only if needed)
- None
When should you use Candidate Generation?
When it’s necessary
- Corpus size is large enough that scoring all items is infeasible.
- Low latency requirement prevents exhaustive evaluation.
- You need to decouple retrieval from ranking for scaling or team autonomy.
- You require pre-filtering for compliance and safety.
When it’s optional
- Small datasets where exhaustive scoring is cheap.
- Prototypes or experiments where getting a working pipeline quickly matters.
When NOT to use / overuse it
- Treating candidate generation as monolithic business logic can cause rigidity.
- Over-shortlisting may reduce diversity and introduce bias.
- Over-engineering retrieval early in product lifecycle wastes effort.
Decision checklist
- If corpus > 10k items and 100ms latency budget -> use Candidate Generation.
- If recall is critical and scoring budget limited -> use multi-stage retrieval.
- If simple deterministic rules suffice and traffic low -> consider direct ranking.
Maturity ladder
- Beginner: Rule-based retrieval + cache; static indexes; manual metrics.
- Intermediate: Hybrid retrieval with simple learned recall models; CI/CD for indexes; basic telemetry.
- Advanced: Learned retrieval (dense vectors + ANN), feature store integration, automated retraining, A/B, and safety checks.
How does Candidate Generation work?
Step-by-step
- Input collection: receive user context, item metadata, and system signals.
- Pre-filters: apply permission, locality, and safety filters.
- Query construction: translate intent into index queries or model inputs.
- Retrieval: call inverted indexes, ANN/vector DBs, or lightweight models.
- Merge and dedupe: combine multiple candidate sources and remove duplicates.
- Shortlist: limit to N candidates using scoring heuristics or thresholds.
- Post-filters: apply business rules and privacy checks.
- Emit: return candidates to ranking service; log telemetry and traces.
Components and workflow
- Frontend API receives request.
- Candidate service orchestrates sub-retrievals.
- Feature metadata and stores enrich candidates.
- Cache layer returns hot lists.
- Indexes and vector stores provide retrieval.
- Observability captures metrics, logs, spans, and samples.
Data flow and lifecycle
- Items are ingested into indexes or stored raw.
- Index updates run via batch or streaming pipelines.
- Real-time signals augment context during query time.
- Candidate sets are ephemeral, traced per request, and stored short-term for diagnostics.
Edge cases and failure modes
- Cold start when features not available for new items.
- Stale indexes due to lag in ingestion pipeline.
- Partial outages of index shards causing uneven coverage.
- Permission mismatch hiding valid content.
Typical architecture patterns for Candidate Generation
- Multi-source hybrid retrieval – When: heterogeneous item types and features. – Use: combine lexical, collaborative, and content-based sources.
- ANN vector retrieval with re-rank – When: semantic search and embeddings needed. – Use: vector DB as first stage, ranking model downstream.
- Heuristic + cache pattern – When: highly repeatable queries and predictable hot items. – Use: precomputed shortlists cached at edge.
- Streaming incremental index updates – When: low-latency freshness is required. – Use: near-real-time ingestion with log-based replication.
- Feature-aware retrieval – When: retrieval must honor complex feature constraints. – Use: retrieval using feature-store-enriched keys or tags.
- Ensemble retrieval pipeline – When: maximize recall by combining many retrieval strategies. – Use: weighting and probabilistic merging then dedupe.
Failure modes & mitigation (TABLE REQUIRED)
| ID | Failure mode | Symptom | Likely cause | Mitigation | Observability signal |
|---|---|---|---|---|---|
| F1 | Low recall | Too few relevant candidates | Index shard missing | Auto-rebuild shard and fallback | Candidate count drop |
| F2 | High latency | Slow responses | Cache miss storm | Add cache warming and rate limit | P95 latency increase |
| F3 | Partial outage | Degraded results for subset | Service dependency down | Circuit breaker and degrade gracefully | Error rate in traces |
| F4 | Stale data | Old items returned | Ingestion lag | Backfill and stream sync | Last ingest timestamp |
| F5 | Cost runaway | Excessive downstream compute | Candidate set too large | Enforce caps and sampling | Increased ranking CPU |
| F6 | Duplicates | Repeated candidates | Merge logic bug | Improve dedupe and keys | Duplicate count metric |
| F7 | Permission leak | Unauthorized items seen | ACL misconfiguration | Tighten policy and tests | Auth failure logs |
| F8 | Bias drift | Trending irrelevant items | Model drift or data shift | Retrain and apply guardrails | CTR or engagement shift |
| F9 | Cold start failure | No candidates for new item | Feature not materialized | Fallback heuristics | New-item miss rate |
| F10 | Data corruption | Invalid items returned | Bad ingest pipeline | Rollback and validate | Validation error logs |
Row Details (only if needed)
- None
Key Concepts, Keywords & Terminology for Candidate Generation
(Note: 40+ compact glossary entries)
- Candidate — An item proposed for downstream scoring.
- Shortlist — The bounded list of candidates returned.
- Recall — Fraction of relevant items retrieved.
- Precision — Fraction of retrieved items that are relevant.
- Candidate Pool — Universe of possible items considered.
- Index — Data structure enabling fast retrieval.
- Inverted Index — Term to doc mapping used in search.
- ANN — Approximate Nearest Neighbor for vector retrieval.
- Vector Embedding — Numeric representation for semantic similarity.
- Feature Store — Centralized feature repository for serving.
- Cold Start — Lack of features for new item or user.
- Cache Stampede — Concurrent misses causing overload.
- Dedupe — Removal of duplicate candidates.
- Sharding — Partitioning index storage across nodes.
- Shard Rebalance — Moving shards across nodes.
- Bloom Filter — Probabilistic membership test to reduce lookups.
- Pre-filter — Hard rule applied before retrieval.
- Post-filter — Hard rule applied after retrieval.
- Heuristic — Rule-based shortlisting logic.
- Re-rank — Secondary ordering after initial ranking.
- Multi-stage Retrieval — Several sequential retrieval steps.
- Ensemble — Combination of multiple retrieval strategies.
- Telemetry — Metrics and logs emitted by service.
- Trace Span — Distributed trace unit for requests.
- Latency SLO — Target for response time.
- Error Budget — Allowance for errors during SLO period.
- Circuit Breaker — Safety mechanism to avoid cascading failures.
- Feature Drift — Changes in feature distribution over time.
- Model Drift — Performance degradation due to data changes.
- Sampling — Taking subset to reduce compute or bias.
- Cold Partition — Rarely accessed shard with cold caches.
- Warmup — Pre-loading caches or shortlists.
- Backfill — Recomputing features or indexes for lagging items.
- ACL — Access control list governing visibility.
- Fairness Guardrail — Constraint to prevent biased suggestions.
- Explainability — Ability to trace why a candidate was chosen.
- Observability — Ability to understand system health and behavior.
- Canary — Gradual deployment to a subset of traffic.
- Runbook — Step-by-step operational play for incidents.
- Ingestion Pipeline — Flow that adds or updates items in index.
- Feature Hydration — Enriching candidates with runtime features.
How to Measure Candidate Generation (Metrics, SLIs, SLOs) (TABLE REQUIRED)
| ID | Metric-SLI | What it tells you | How to measure | Starting target | Gotchas |
|---|---|---|---|---|---|
| M1 | Candidate availability | Whether candidates are returned | Percent requests with >=1 candidate | 99.9% | Edge-case queries may be valid zero |
| M2 | Candidate count | Size of shortlist per request | Histogram of candidate counts | 5-50 depending use | Large variance hides issues |
| M3 | Retrieval latency | Time to get candidates | P50 P95 P99 request time | P95 < 50-150ms | Dependent on network and cache |
| M4 | Recall proxy | Proxy for true recall | Holdout tests or shadow experiments | See details below: M4 | Requires labeled relevance |
| M5 | Up-to-dateness | Freshness of candidate pool | Time since last index ingest | <1min to hours vary | Data pipeline lag affects this |
| M6 | Error rate | Failures in retrieval service | 5xx or RPC errors percent | <0.1% | Transient errors can spike |
| M7 | Cache hit rate | Cache effectiveness | Ratio hits to total lookups | >80% for hot queries | Over-eviction harms performance |
| M8 | Duplicate rate | How many duplicates emitted | Percent candidates deduped | <1-2% | Merge bugs can hide duplicates |
| M9 | Cost per request | Downstream compute cost | CPU mem and request compute cost | Budget-based target | Requires infra cost mapping |
| M10 | Permission failures | Unauthorized candidate attempts | Auth failure count | 0 tolerated | Misconfigurations cause user impact |
Row Details (only if needed)
- M4: Use offline labeled data or canary band tests; compare candidate recall before ranking; instrument shadow ranking where ranking runs but not exposed.
Best tools to measure Candidate Generation
Tool — Prometheus
- What it measures for Candidate Generation: latency, counters, histograms, uptime.
- Best-fit environment: Kubernetes and cloud-native stacks.
- Setup outline:
- Export metrics from candidate service.
- Use histograms for latency and gauges for counts.
- Configure Prometheus scrape and retention.
- Strengths:
- Lightweight and highly adoptable.
- Native for cloud-native observability.
- Limitations:
- Long-term storage and analytics require extension.
- Not ideal for detailed trace-level analysis.
Tool — OpenTelemetry + Tracing Backend
- What it measures for Candidate Generation: distributed traces and spans per retrieval call.
- Best-fit environment: microservice architectures.
- Setup outline:
- Instrument candidate and index calls.
- Ensure context propagation.
- Collect spans and push to backend.
- Strengths:
- Deep request-level visibility.
- Correlates latency sources.
- Limitations:
- Sampling needed to limit costs.
- Requires careful instrumentation.
Tool — Vector DB (built-in telemetry)
- What it measures for Candidate Generation: query latency, index stats, recall heuristics.
- Best-fit environment: ANN-based retrieval.
- Setup outline:
- Enable query logging and index metrics.
- Export metrics via exporter.
- Strengths:
- Domain-specific telemetry for ANN.
- Helps tune recall-latency tradeoffs.
- Limitations:
- Vendor-specific metrics. Varies.
Tool — APM (Application Performance Monitoring)
- What it measures for Candidate Generation: service-level performance, database calls, slow queries.
- Best-fit environment: enterprise apps needing full-stack observability.
- Setup outline:
- Integrate agent in service.
- Define key transactions.
- Create dashboards for candidate flows.
- Strengths:
- End-to-end visibility and alerts.
- Easy adoption for teams.
- Limitations:
- Cost and sampling policies can be limiting.
Tool — Experimentation Platform / Feature Flagging
- What it measures for Candidate Generation: lift from retrieval changes via A/B tests.
- Best-fit environment: product teams validating candidate changes.
- Setup outline:
- Create experiment cohorts.
- Shadow serve alternative candidate sets.
- Measure engagement and downstream metrics.
- Strengths:
- Safe validation curve before full rollout.
- Controlled impact measurement.
- Limitations:
- Requires careful metric design and traffic allocation.
Recommended dashboards & alerts for Candidate Generation
Executive dashboard
- Panels:
- Overall availability and candidate availability trend.
- Top-line engagement uplift by candidate change A/B.
- Cost per request and trend.
- Why: high-level view for business and product stakeholders.
On-call dashboard
- Panels:
- P95/P99 latency for retrieval endpoints.
- Error rate and recent traces.
- Candidate count histogram and cache hit rate.
- Index health and last ingest timestamp.
- Why: actionable live signals for operators to triage.
Debug dashboard
- Panels:
- Recent sample traces with span breakdown.
- Top slow queries and expensive retrievals.
- Recent candidate lists sampled for debugging.
- Dedupe and permission failure logs.
- Why: deep troubleshooting for engineers.
Alerting guidance
- Page vs ticket:
- Page for candidate availability below threshold or P99 latency spike causing downstream timeouts.
- Ticket for moderate error-rate increases, slowdowns in ingestion, or cache hit rate degradation.
- Burn-rate guidance:
- If candidate availability drops causing user-visible failures, burn at high rate and escalate immediately.
- Noise reduction tactics:
- Deduplicate alerts by service and error type.
- Group alerts by shard or index to avoid noisy paging.
- Use suppression windows for known digestible maintenance.
Implementation Guide (Step-by-step)
1) Prerequisites – Defined SLIs for candidate availability and latency. – Access to indexes/vector DB or capacity to provide retrieval engines. – Feature store or metadata store accessible. – CI/CD for safe deploys and canary capability. – Observability stack for metrics, logs, and traces.
2) Instrumentation plan – Instrument candidate counts, latency, error rates, and cache metrics. – Add spans around index calls and merges. – Emit request context ID for tracing across services.
3) Data collection – Build ingestion pipeline for index updates with schema validation. – Implement change data capture for low-latency freshness. – Provide batch backfills for existing items.
4) SLO design – Choose SLI and initial SLOs (e.g., Candidate availability 99.9% P95 latency < 150ms). – Define error budget and escalation policies.
5) Dashboards – Create Executive, On-call, and Debug dashboards as above. – Add historical comparisons for feature changes.
6) Alerts & routing – Page on major outages and critical SLO breaches. – Tickets for degradations and resurfacing regressions. – Route to owning team with runbook link.
7) Runbooks & automation – Document recovery steps: failover to cached lists, rebuild index shard, re-route traffic. – Automate common fixes: cache purge, index reload script, scaling trigger.
8) Validation (load/chaos/game days) – Perform load tests simulating realistic query patterns. – Run chaos tests: kill index nodes, throttle network, simulate cache failure. – Game days to exercise on-call and runbooks.
9) Continuous improvement – Weekly review of candidate metrics and error budget. – Automate retraining and A/B validation of retrieval models. – Maintain a backlog of improvements prioritized by business impact.
Pre-production checklist
- Unit and integration tests for retrieval logic.
- Load test against synthetic traffic that mimics production distribution.
- Canary release configuration and rollback plan.
- Baseline telemetry dashboards created.
Production readiness checklist
- SLOs set and alerts configured.
- Automatic scaling and health checks in place.
- Index recovery and backup procedures documented.
- Access control and privacy filters validated.
Incident checklist specific to Candidate Generation
- Assess user-visible impact and activate incident bridge.
- Check candidate availability, latency, and index health metrics.
- Switch to cached or degraded mode if available.
- Apply rollback or configuration fixes; engage index engineering for corruption.
- Postmortem and follow-up actions to prevent recurrence.
Use Cases of Candidate Generation
Provide 8–12 use cases with concise structure.
1) Personalized Feed – Context: Social platform serving millions. – Problem: Scanning all posts per user is infeasible. – Why helps: Produces relevant set for ranking to optimize engagement. – What to measure: Candidate availability, recall proxy, engagement lift. – Typical tools: Vector DB, feature store, cache layer.
2) Product Search – Context: E-commerce catalog of millions of SKUs. – Problem: Need sub-second search relevance across languages. – Why helps: Hybrid lexical and embedding retrieval increases recall fast. – What to measure: Retrieval latency, recall proxy, conversion rate. – Typical tools: Inverted index search engine, ANN, CDN cache.
3) Recommendations for New Users (Cold Start) – Context: New user with no history. – Problem: No personalized signals. – Why helps: Rule-based and popularity candidate generation provides reasonable defaults. – What to measure: New-user engagement and candidate diversity. – Typical tools: Heuristics, trending indices, category-based retrieval.
4) Fraud Detection Alert Candidates – Context: Transaction monitoring. – Problem: Must pick suspicious transactions to score more expensively. – Why helps: Filters bulk transactions and enables focused scoring. – What to measure: Candidate precision for true fraud and false positive rates. – Typical tools: Streaming filters, rules engine, feature store.
5) Ads Auction Preselection – Context: Real-time bidding needs eligible ad candidates. – Problem: Many advertisers but low latency selection needed. – Why helps: Pre-filters eligible ads and constraints to limit auction size. – What to measure: Eligibility match rate, latency, auction fill-rate. – Typical tools: In-memory eligibility stores, fast key-value stores.
6) Content Moderation Pipeline – Context: Automated content review. – Problem: Need to prioritize high-risk items for expensive human review. – Why helps: Shortlists items flagged by lightweight models for further scoring. – What to measure: Candidate hit rate and moderation accuracy. – Typical tools: Streaming classifiers, queues, human-in-loop tooling.
7) On-call Alert Triage – Context: Observability system triaging alerts. – Problem: Many alerts need prioritized attention. – Why helps: Candidate generation proposes likely incidents to route to responders. – What to measure: Precision of incident candidates, mean time to acknowledge. – Typical tools: Alert correlator, ML-based grouping.
8) API Rate Limiting Preselection – Context: DDoS or abusive clients. – Problem: Must preselect traffic flows for deeper inspection. – Why helps: Saves expensive checks by preselecting suspicious flows. – What to measure: True-positive inspection ratio and latency cost. – Typical tools: Edge filters, WAF.
Scenario Examples (Realistic, End-to-End)
Scenario #1 — Kubernetes: Real-time Recommendations at Scale
Context: Media platform serving personalized recommendations via k8s. Goal: Reduce end-to-end latency while preserving recall. Why Candidate Generation matters here: Retrieval is the first stage; if slow or low recall, rankings and UX suffer. Architecture / workflow: Candidate service deployed as k8s Deployment; vector DB as StatefulSet; Redis cache; ranking service separate. Step-by-step implementation:
- Build hybrid retrieval combining ANN and cached hot lists.
- Instrument Prometheus and OpenTelemetry.
- Add canary rollout via k8s and Istio traffic shifting.
- Implement circuit breaker to fallback to cached lists. What to measure: P95 retrieval latency, candidate availability, cache hit rate, recall proxy. Tools to use and why: K8s for orchestration; Redis for cache; Vector DB for ANN; Prometheus + tracing. Common pitfalls: Pod OOMs during spikes; cold shards after rollouts. Validation: Load test with production-like distribution and chaos test killing index pod. Outcome: Stable P95 latency, reduced downstream CPU, improved engagement.
Scenario #2 — Serverless/Managed-PaaS: Shortlists for Instant Apps
Context: News app using serverless functions for on-demand retrieval. Goal: Keep cold-starts low and provide fresh candidates. Why Candidate Generation matters here: Serverless requires small, fast retrieval or cached precomputed shortlists. Architecture / workflow: Edge CDN queries serverless function which queries managed vector DB and cached lists. Step-by-step implementation:
- Precompute hot shortlists into CDN edge cache.
- Use tiny serverless function for personalization signals and merging.
- Log traces to managed APM. What to measure: Cold start rate, P95 latency, candidate freshness. Tools to use and why: Managed function platform, CDN caching, vector DB. Common pitfalls: Cache TTL misconfiguration causing staleness; cold start causing latency spikes. Validation: Synthetic load with bursts and measurement of cold starts. Outcome: Lowered P95 latency and cost due to cached precomputation.
Scenario #3 — Incident-response/Postmortem: Missing Candidates in Production
Context: Sudden drop in click-throughs. Goal: Diagnose whether retrieval failure caused the drop. Why Candidate Generation matters here: Retrieval failures reduce the candidate set preventing ranking from showing relevant items. Architecture / workflow: Candidate service logs, index health metrics, shadow ranking telemetry. Step-by-step implementation:
- Triage: check candidate availability metric.
- Pull recent traces and index ingestion timestamps.
- Run backfill and restore index shard.
- Rollback recent index change if implicated. What to measure: Candidate availability, last ingest timestamp, recall proxy. Tools to use and why: Tracing backend, index monitoring, CI rollback. Common pitfalls: Lack of shadow testing made cause unclear. Validation: Re-run traffic in staging and validate candidate counts. Outcome: Restored candidate return; postmortem recommends guardrail tests.
Scenario #4 — Cost/Performance Trade-off: Large Candidate Pools vs Compute Cost
Context: Enterprise search with costly ranking. Goal: Balance recall with ranking compute and cost. Why Candidate Generation matters here: Candidate pool size directly affects ranking cost. Architecture / workflow: Multi-stage retrieval with sampling and adaptive caps. Step-by-step implementation:
- Measure cost per candidate scored.
- Implement dynamic cap: increase N for high-value queries.
- Use stratified sampling for low-value queries. What to measure: Cost per request, conversion per candidate, candidate count. Tools to use and why: Cost analytics, retrieval telemetry, experimentation platform. Common pitfalls: Static cap hurting edge-user experiences. Validation: A/B test performance and cost trade-offs. Outcome: Reduced cost with minimal impact on conversions.
Common Mistakes, Anti-patterns, and Troubleshooting
List of common mistakes with Symptom -> Root cause -> Fix (15–25 items)
- Symptom: Zero candidates returned for many users -> Root cause: Index shard offline -> Fix: Circuit-breaker to fallback caches and auto-rebuild shard.
- Symptom: Sudden latency spike -> Root cause: Cache stampede on warmup -> Fix: Add cache warming and request coalescing.
- Symptom: High duplicate candidates -> Root cause: Merge key bug -> Fix: Normalize keys and improve dedupe logic.
- Symptom: Low recall after deploy -> Root cause: Model or feature change -> Fix: Rollback and run shadow experiments.
- Symptom: Permission-protected items visible -> Root cause: ACL misconfiguration -> Fix: Tighten ACLs and add tests.
- Symptom: Rising cost from ranking -> Root cause: Candidate set size too large -> Fix: Implement caps and sampling.
- Symptom: Stale content displayed -> Root cause: Ingestion lag -> Fix: Monitor last ingest timestamp and expedite pipeline.
- Symptom: Noisy alerts -> Root cause: Improper alert thresholds -> Fix: Tune thresholds and use grouping/suppression.
- Symptom: Cannot reproduce bug in staging -> Root cause: Data drift and environment mismatch -> Fix: Use production-like snapshot testing.
- Symptom: Poor new-user experience -> Root cause: Cold-start logic missing -> Fix: Add popularity and category-based shortlists.
- Symptom: Frequent OOMs on pods -> Root cause: Unbounded retrieval results -> Fix: Enforce caps and streaming limits.
- Symptom: Inconsistent A/B results -> Root cause: Improper randomization or telemetry gaps -> Fix: Ensure consistent assignment and instrumentation.
- Symptom: Low cache hit rate -> Root cause: High cardinality keys or TTL too low -> Fix: Re-key caching and adjust TTL policy.
- Symptom: Biased candidate lists -> Root cause: Training data bias -> Fix: Introduce fairness guardrails and sampling.
- Symptom: Slow investigation -> Root cause: Missing traces and contextual logs -> Fix: Add tracing and request sampling.
- Symptom: Regressions after index rebuild -> Root cause: Missing validation tests -> Fix: Pre-deploy checks and canary validation.
- Symptom: Incomplete metrics -> Root cause: Lack of instrumentation for candidate counts -> Fix: Add counters and histograms.
- Symptom: Page floods during promotion -> Root cause: Hot lists not prepared -> Fix: Precompute and cache hot lists.
- Symptom: Alert storms due to correlated failures -> Root cause: Non-suppressed downstream errors -> Fix: Grouping by root cause and cascade suppression.
- Symptom: Unauthorized access attempts -> Root cause: Weak auth at service boundary -> Fix: Harden auth and audit logs.
- Symptom: High false positives in fraud candidates -> Root cause: Over-aggressive heuristics -> Fix: Tune thresholds and add human review.
- Symptom: Long rollback time -> Root cause: Manual rollback process -> Fix: Automate rollback and maintain immutable deploys.
- Symptom: Feature drift unnoticed -> Root cause: No drift detection -> Fix: Add feature distribution monitoring.
- Symptom: Debugging needs sample candidates -> Root cause: No sampled candidate logs -> Fix: Implement sampled candidate logging with privacy filters.
Observability pitfalls (at least 5 included above)
- Missing traces for index calls.
- No candidate count metric.
- Lack of freshness timestamps.
- Sparse sampling causing blind spots.
- Aggregating metrics hiding per-shard problems.
Best Practices & Operating Model
Ownership and on-call
- Retrieval team owns candidate generation service, index health, and runbooks.
- On-call includes index engineers and service owners.
- Cross-team ownership for features touching ranking or indexing.
Runbooks vs playbooks
- Runbook: deterministic steps to recover a known failure (index rebuild, cache fallback).
- Playbook: higher-level guidance for novel incidents (diagnostic checklist, escalation path).
Safe deployments (canary/rollback)
- Canary small traffic slices; validate candidate recall proxies and downstream metrics before full rollout.
- Automated rollback on SLO breaches.
Toil reduction and automation
- Automate index rebuilds, cache warmups, and candidate validation tests.
- Auto-scale based on query patterns and shard load.
Security basics
- Enforce ACLs and privacy filters at candidate generation.
- Audit access logs and mask PII in telemetry.
- Validate that fallbacks do not leak restricted content.
Weekly/monthly routines
- Weekly: Review candidate availability, error budget consumption, top slow queries.
- Monthly: Validate index freshness, run drift detection, and review A/B experiments impacting retrieval.
- Quarterly: Large model retrain and fairness audits.
What to review in postmortems related to Candidate Generation
- Timeline of candidate availability and latency.
- Changes to indexes or feature pipelines just before incident.
- Observability gaps that delayed detection.
- Remediation steps to prevent recurrence.
- Action items to add tests or automation.
Tooling & Integration Map for Candidate Generation (TABLE REQUIRED)
| ID | Category | What it does | Key integrations | Notes |
|---|---|---|---|---|
| I1 | Vector DB | ANN retrieval and similarity search | Feature store ranking systems | See details below: I1 |
| I2 | Search Engine | Lexical and inverted index retrieval | CDN and app servers | See details below: I2 |
| I3 | Cache | Stores shortlists and hot queries | Edge CDN and redis clusters | See details below: I3 |
| I4 | Feature Store | Provides features for retrieval decisions | Ranking and training pipelines | See details below: I4 |
| I5 | Tracing | Distributed traces for requests | APM and log systems | See details below: I5 |
| I6 | Metrics | Metrics storage and alerting | Dashboards and alert manager | See details below: I6 |
| I7 | CI/CD | Deploys retrieval service and indexes | Git, infra pipelines | See details below: I7 |
| I8 | Experimentation | A/B testing of candidate changes | Analytics and rollout systems | See details below: I8 |
| I9 | IAM | Access control for candidate visibility | Auth systems and audit logs | See details below: I9 |
| I10 | Orchestration | Run services and scale components | Kubernetes serverless platforms | See details below: I10 |
Row Details (only if needed)
- I1: Vector DB details — Provides ANN query latency metrics and index rebuild tools; integrates with embedding pipelines; supports HNSW and PQ algorithms.
- I2: Search Engine details — Handles lexical retrieval, provides tokenization and BM25 scoring; integrates with indexing pipelines and content ingestion.
- I3: Cache details — In-memory caches used for hot lists and query caching; supports TTLs and invalidation hooks; often deployed at edge.
- I4: Feature Store details — Stores online features for serving; offers feature hydration APIs; integrates with offline training pipelines.
- I5: Tracing details — Captures spans for candidate retrieval, index calls, and merge; helps identify P95 contributors.
- I6: Metrics details — Stores SLIs and SLOs; used for alerting on candidate availability and latency.
- I7: CI/CD details — Automates index schema migrations and service rollouts; supports canary tests and rollback.
- I8: Experimentation details — Routes traffic between candidate generation variants; gathers user metrics for evaluation.
- I9: IAM details — Enforces which items are visible to which users; critical for privacy and compliance.
- I10: Orchestration details — Hosts retrieval services, manages resource limits, and autoscaling behavior.
Frequently Asked Questions (FAQs)
What is the difference between Candidate Generation and Ranking?
Candidate Generation returns a shortlist; ranking orders and scores that shortlist to produce final results.
How many candidates should I return?
Varies / depends; typical ranges 5–50 depending on downstream cost and latency.
Is Candidate Generation always machine learning based?
No. It can be rule-based, heuristic, or ML-driven depending on maturity and needs.
How do I measure recall without labeled data?
Use shadow experiments, holdout sets, or offline labeling pipelines to approximate recall.
What latency targets are realistic?
P95 50–200ms for retrieval is common; depends on product SLAs and downstream allowances.
How do you avoid privacy leaks in candidate generation?
Apply ACLs early, redact PII from logs, and ensure policy tests in CI/CD.
When should I use ANN vector retrieval?
When semantic similarity is required and lexical methods are insufficient.
How do I test candidate generation at scale?
Use production-like load testing with sampled requests and synthetic data distributions.
How to handle new items with no features?
Fallback to popularity, category rules, and quick feature backfill.
Should candidate generation be monolithic?
No. Prefer decoupled components: precomputation/indexing, retrieval service, and merge layer.
How to balance recall and cost?
Measure marginal value of additional candidates and use adaptive caps and sampling to optimize.
When do I page on-call for a candidate generation issue?
Page when candidate availability drops below SLO or P99 latency causes user-visible failures.
How to avoid bias in candidate generation?
Monitor distributional metrics, add fairness constraints, and use diverse training data.
How often should indexes be rebuilt?
Varies / depends on freshness needs; anywhere from seconds (streaming) to hours or daily for batch.
What telemetry is essential?
Candidate count, availability, latency percentiles, cache hit rate, and ingestion timestamps.
Can candidate generation be serverless?
Yes for modest workloads; prepare for cold start and scale limits.
How do I validate a retrieval model offline?
Use held-out labeled queries and measure recall proxies and downstream simulated ranking impact.
Conclusion
Candidate Generation is the crucial retrieval layer that shapes the universe of items considered by downstream ranking and business logic. Properly designed candidate generation improves performance, reduces costs, and safeguards product quality and compliance.
Next 7 days plan (5 bullets)
- Day 1: Instrument candidate count, availability, and latency metrics in prod.
- Day 2: Create On-call and Debug dashboards and link runbooks.
- Day 3: Implement or verify cache fallback and circuit breaker for retrieval.
- Day 4: Run a smoke load test and capture P95/P99 latency.
- Day 5–7: Add a canary experiment for a small retrieval improvement and monitor recall proxy and downstream metrics.
Appendix — Candidate Generation Keyword Cluster (SEO)
- Primary keywords
- Candidate Generation
- Retrieval layer
- Shortlist generation
- Multi-stage retrieval
-
ANN retrieval
-
Secondary keywords
- Candidate recall
- Retrieval latency
- Retrieval architecture
- Vector search retrieval
-
Indexing and retrieval
-
Long-tail questions
- How does candidate generation work in recommendation systems
- What is the difference between retrieval and ranking
- How to measure recall in candidate generation
- How to reduce candidate generation latency in Kubernetes
-
Best practices for candidate generation in serverless apps
-
Related terminology
- Shortlist
- Recall proxy
- Feature store
- Vector DB
- Inverted index
- Cache warming
- Dedupe logic
- Shard rebalance
- Circuit breaker
- Cold start
- Warmup strategies
- Shadow experiments
- Canary rollout
- Error budget
- P95 P99 latency
- Candidate availability
- Ingestion pipeline
- ACL enforcement
- Fairness guardrail
- Telemetry and traces
- Observability plane
- Load testing
- Chaos testing
- Backfill
- Sampling strategy
- Cost per request
- Heuristic retrieval
- Hybrid retrieval
- Re-rank
- Ensemble retrieval
- Online feature hydration
- Query coalescing
- Cache eviction policy
- Index freshness
- Data drift detection
- Bias mitigation
- Privacy filters
- Production readiness
- Runbook automation