Production Set 07 // TAR Technical Reference
Predictive Coding 2.0 and TAR, from Da Silva Moore to GenAI review.
VERIFIED 21 APR 2026 // INDEPENDENT REFERENCE // NOT LEGAL ADVICE
Technology Assisted Review has a 14-year case law history and a clear technical evolution. Most practitioners who use the term ‘predictive coding’ in 2026 are actually describing Continuous Active Learning (CAL), which was what replaced TAR 1.0. Here is the complete technical and legal history, and where GenAI sits in that lineage.
Section 01 // Terminology
What TAR, CAL, and predictive coding actually mean
These terms are often used interchangeably, but they are technically distinct. TAR (Technology Assisted Review) is the umbrella term endorsed by the Sedona Conference. Within TAR there are two major sub-categories. Predictive coding, or TAR 1.0, refers to the first generation: a fixed seed set, one training run, a static classifier applied to the full corpus. CAL (Continuous Active Learning), or TAR 2.0, refers to the iterative approach: the classifier continuously retrains as reviewers code documents during the active review.
When vendors and courts say ‘predictive coding’ in 2026, they almost always mean TAR 2.0 / CAL, not the original TAR 1.0 fixed-seed-set approach. The terminology has drifted. GenAI review is a further evolution: an LLM relevance scorer replaces or augments the classical CAL classifier.
Section 02 // CAL Mechanics
How TAR 2.0 (CAL) actually works
The CAL workflow operates in iterative training cycles. In each cycle: the system selects a small batch of documents for attorney review, typically prioritising those it is least certain how to classify; the attorney codes those documents; the model retrains on the cumulative coded set; relevance scores update across the corpus. The cycle repeats until the model stabilises, typically measured by the elusion rate on a random sample of low-scored documents falling below the target threshold.
The theoretical advantage of CAL over TAR 1.0 is that the model's training set continuously improves in quality and coverage. Rolling productions (where new custodian data arrives during review) are handled naturally because the model simply retrains on the new data in the next cycle. In TAR 1.0, a new production would require a new seed set and a new training run.
Grossman and Cormack, in their foundational 2011 paper in the Richmond Journal of Law and Technology, demonstrated empirically that CAL outperforms both manual review and fixed-seed TAR 1.0 on recall at equivalent cost for most large review populations. Their paper established the F1 scoring convention and the 95 percent confidence with plus-or-minus 5 percent margin target that has become the industry default validation standard.
Section 03 // Validation
Statistical validation: what you are required to measure
| Metric | Definition | Target (Grossman-Cormack) |
|---|---|---|
| Recall | Proportion of all responsive documents in corpus found by review | 75 percent+ (negotiated; often 80 percent+) |
| Precision | Proportion of documents marked responsive that are actually responsive | Negotiated; depends on issue |
| Elusion | Rate of responsive documents in the ‘not responsive’ bin | Less than 1 to 3 percent (negotiated) |
| F1 | Harmonic mean of precision and recall | Maximise; often 0.75+ |
| Sample size | Random sample of low-scored documents for elusion testing | 95 percent confidence, +/-5 percent margin |
Last verified Apr 2026
Specific targets should be stipulated in the discovery protocol, ideally agreed with opposing counsel or approved by the court in advance. Hyles v. City of New York (2016 WL 4077114, S.D.N.Y. 2016) clarified that TAR is not mandatory: the court does not require TAR even when it might be more efficient, but where TAR is used, cooperation in methodology design is strongly preferred.
Section 04 // GenAI Difference
How GenAI review differs technically
In classical CAL, the relevance classifier is typically a support-vector machine or gradient-boosted tree trained on TF-IDF or similar bag-of-words document representations. The classifier learns which word patterns correlate with responsiveness from the coded seed documents. The weakness is that it fails on conceptual relevance: documents that are responsive but do not use the expected vocabulary.
GenAI review replaces or augments the classical classifier with an LLM (large language model) that has been pre-trained on vast text corpora and fine-tuned on legal documents. The LLM understands conceptual meaning, not just word-frequency patterns. The attorney writes a natural-language issue description (‘documents relating to the October 2024 pricing decision made by executive team members including...’); the LLM scores documents for relevance against that description semantically, not lexically.
The validation framework does not change: elusion testing, precision and recall measurement, and stipulated protocol remain required. What changes is the explanation of the methodology: instead of ‘we used a TF-IDF SVM classifier with n training cycles’, the producing party must explain which LLM, what issue prompt, and what validation steps were applied. EEOC v. Tesla (2024-2025) addressed this in the GenAI context: the court accepted the methodology subject to validation requirements and basic transparency about the LLM used.
SEDONA PRINCIPLE 6
Section 05 // Defensibility
Sedona Principle 6, FRCP 26(g), and the stipulated protocol
The defensibility framework for both TAR 2.0 and GenAI review rests on three pillars. First, Sedona Conference Principle 6: the producing party is best situated to choose the review method. Courts following Rio Tinto have not second-guessed methodology choices as long as the process was documented and validated. Second, FRCP 26(g) certification: the certifying attorney must certify that the disclosure is complete and correct after reasonable inquiry. The inquiry duty extends to AI-generated results: the attorney must understand and validate what the AI produced. Third, the stipulated protocol: agreed in advance with opposing counsel or the court, specifying the methodology, validation approach, and recall targets.
A Rule 502(d) order should accompany any AI-assisted privilege review to protect against inadvertent disclosure of privileged documents. See /privilege-review for the Judge Peck model 502(d) order language.
Section 06 // FAQ