Production Set 07 // TAR Technical Reference

Predictive Coding 2.0 and TAR, from Da Silva Moore to GenAI review.

Q: What is the Grossman-Cormack validation standard?

The Grossman-Cormack validation standard, from their 2011 Richmond Journal of Law and Technology paper, uses F1 scoring combining precision and recall, targeting 95 percent confidence with a plus or minus 5 percent margin of error on elusion sampling.

VERIFIED 21 APR 2026 // INDEPENDENT REFERENCE // NOT LEGAL ADVICE

Technology Assisted Review has a 14-year case law history and a clear technical evolution. Most practitioners who use the term ‘predictive coding’ in 2026 are actually describing Continuous Active Learning (CAL), which was what replaced TAR 1.0. Here is the complete technical and legal history, and where GenAI sits in that lineage.

Section 01 // Terminology

What TAR, CAL, and predictive coding actually mean

These terms are often used interchangeably, but they are technically distinct. TAR (Technology Assisted Review) is the umbrella term endorsed by the Sedona Conference. Within TAR there are two major sub-categories. Predictive coding, or TAR 1.0, refers to the first generation: a fixed seed set, one training run, a static classifier applied to the full corpus. CAL (Continuous Active Learning), or TAR 2.0, refers to the iterative approach: the classifier continuously retrains as reviewers code documents during the active review.

When vendors and courts say ‘predictive coding’ in 2026, they almost always mean TAR 2.0 / CAL, not the original TAR 1.0 fixed-seed-set approach. The terminology has drifted. GenAI review is a further evolution: an LLM relevance scorer replaces or augments the classical CAL classifier.

Section 02 // CAL Mechanics

How TAR 2.0 (CAL) actually works

The CAL workflow operates in iterative training cycles. In each cycle: the system selects a small batch of documents for attorney review, typically prioritising those it is least certain how to classify; the attorney codes those documents; the model retrains on the cumulative coded set; relevance scores update across the corpus. The cycle repeats until the model stabilises, typically measured by the elusion rate on a random sample of low-scored documents falling below the target threshold.

The theoretical advantage of CAL over TAR 1.0 is that the model's training set continuously improves in quality and coverage. Rolling productions (where new custodian data arrives during review) are handled naturally because the model simply retrains on the new data in the next cycle. In TAR 1.0, a new production would require a new seed set and a new training run.

Grossman and Cormack, in their foundational 2011 paper in the Richmond Journal of Law and Technology, demonstrated empirically that CAL outperforms both manual review and fixed-seed TAR 1.0 on recall at equivalent cost for most large review populations. Their paper established the F1 scoring convention and the 95 percent confidence with plus-or-minus 5 percent margin target that has become the industry default validation standard.

Section 03 // Validation

Statistical validation: what you are required to measure

Metric	Definition	Target (Grossman-Cormack)
Recall	Proportion of all responsive documents in corpus found by review	75 percent+ (negotiated; often 80 percent+)
Precision	Proportion of documents marked responsive that are actually responsive	Negotiated; depends on issue
Elusion	Rate of responsive documents in the ‘not responsive’ bin	Less than 1 to 3 percent (negotiated)
F1	Harmonic mean of precision and recall	Maximise; often 0.75+
Sample size	Random sample of low-scored documents for elusion testing	95 percent confidence, +/-5 percent margin

Last verified Apr 2026

Specific targets should be stipulated in the discovery protocol, ideally agreed with opposing counsel or approved by the court in advance. Hyles v. City of New York (2016 WL 4077114, S.D.N.Y. 2016) clarified that TAR is not mandatory: the court does not require TAR even when it might be more efficient, but where TAR is used, cooperation in methodology design is strongly preferred.

Section 04 // GenAI Difference

How GenAI review differs technically

In classical CAL, the relevance classifier is typically a support-vector machine or gradient-boosted tree trained on TF-IDF or similar bag-of-words document representations. The classifier learns which word patterns correlate with responsiveness from the coded seed documents. The weakness is that it fails on conceptual relevance: documents that are responsive but do not use the expected vocabulary.

GenAI review replaces or augments the classical classifier with an LLM (large language model) that has been pre-trained on vast text corpora and fine-tuned on legal documents. The LLM understands conceptual meaning, not just word-frequency patterns. The attorney writes a natural-language issue description (‘documents relating to the October 2024 pricing decision made by executive team members including...’); the LLM scores documents for relevance against that description semantically, not lexically.

The validation framework does not change: elusion testing, precision and recall measurement, and stipulated protocol remain required. What changes is the explanation of the methodology: instead of ‘we used a TF-IDF SVM classifier with n training cycles’, the producing party must explain which LLM, what issue prompt, and what validation steps were applied. EEOC v. Tesla (2024) reflected this in the GenAI context: the court entered the parties' stipulated protocol allowing AI-assisted review subject to validation requirements and transparency principles.

SEDONA PRINCIPLE 6

‘Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.’

Section 05 // Defensibility

Sedona Principle 6, FRCP 26(g), and the stipulated protocol

The defensibility framework for both TAR 2.0 and GenAI review rests on three pillars. First, Sedona Conference Principle 6: the producing party is best situated to choose the review method. Courts following Rio Tinto have not second-guessed methodology choices as long as the process was documented and validated. Second, FRCP 26(g) certification: the certifying attorney must certify that the disclosure is complete and correct after reasonable inquiry. The inquiry duty extends to AI-generated results: the attorney must understand and validate what the AI produced. Third, the stipulated protocol: agreed in advance with opposing counsel or the court, specifying the methodology, validation approach, and recall targets.

A Rule 502(d) order should accompany any AI-assisted privilege review to protect against inadvertent disclosure of privileged documents. See /privilege-review for the Judge Peck model 502(d) order language.

Section 06 // FAQ

Frequently asked questions

How does predictive coding work?+

Predictive coding (TAR 1.0) uses a machine learning classifier trained on attorney-coded seed documents to predict relevance for the remaining corpus. TAR 2.0 (Continuous Active Learning) continuously retrains the classifier as reviewers code documents, handling rolling productions and achieving higher recall. Modern GenAI review layers an LLM relevance scorer on top of CAL.

What is the Grossman-Cormack validation standard?+

The Grossman-Cormack standard uses F1 scoring combining precision and recall, targeting 95 percent confidence with a plus or minus 5 percent margin on elusion sampling. From their 2011 Richmond Journal of Law and Technology paper. Widely adopted as the baseline for TAR validation in litigation, though specific targets should be stipulated in the protocol.

How many documents need to be in the seed set for TAR?+

For CAL, the seed set size is less critical than for TAR 1.0 because the model continuously retrains. A seed set of 200 to 500 attorney-coded documents is often sufficient to start the CAL process; the model improves through the review cycles. For GenAI review, the issue prompt replaces the seed set as the primary input.

Is TAR 2.0 admissible evidence?+

TAR 2.0 is a production methodology, not evidence itself. The documents produced through TAR are admissible in the same way as manually reviewed documents. The methodology may be challenged through the discovery protocol, but courts since Da Silva Moore have consistently upheld TAR as a defensible production methodology when properly validated.

Can I use TAR on a very small corpus (under 10,000 documents)?+

TAR is disproportionately expensive relative to manual review on very small corpora. For matters with fewer than 20,000 to 30,000 documents, manual review or enhanced keyword search is typically faster and less expensive. CAL requires a sufficient training population to achieve meaningful classifier accuracy. Under 10,000 documents, manual review is almost always more proportionate under FRCP 26(b)(2)(B).

Cross-reference

DEF /Full taxonomy LAW /Da Silva Moore + Rio Tinto PRV /Privilege review PLT /Platforms compared