Production Set 02 // Definition + Taxonomy
What is agentic eDiscovery? A 2026 taxonomy.
VERIFIED 21 APR 2026 // INDEPENDENT REFERENCE // NOT LEGAL ADVICE
The phrase “AI eDiscovery” is a marketing envelope, not a technology. Inside the envelope there are five distinct technologies with five distinct defensibility profiles, five different procurement conversations, and five different cost curves. This page defines each one precisely.
KEY DISTINCTION
Tier 01 // Baseline
Keyword and Boolean search
The baseline. Terms-and-connectors search uses exact keyword matches, proximity operators, and Boolean logic. It predates AI entirely and remains in active use because it is fast, transparent, and easy to document. Approximately 40 percent of matters still rely primarily on keyword search for initial culling, particularly for early case assessment and litigation hold scoping.
Keyword search fails when relevant documents do not use the expected terminology, when reviewers lack domain knowledge to anticipate relevant terms, or when the corpus spans multiple languages or includes audio and video files. FRCP 26(g) requires the certifying attorney to conduct a “reasonable inquiry” before signing off on a search. Keyword design is not inherently defensible simply because it is transparent.
Tier 02 // Predictive Coding
TAR 1.0, predictive coding with a fixed seed set
TAR 1.0, commonly called predictive coding, was the first generation of machine-learning-based document review. Attorneys code a fixed seed set of documents (responsive / not responsive), a classifier trains on those coded documents, and the trained model then predicts relevance scores for the remaining corpus. The process runs once; the classifier is applied to the full population.
PULL QUOTE // 287 F.R.D. 182
‘The Court approves the use of predictive coding for this litigation. Predictive coding is an acceptable way to search for relevant ESI in appropriate cases.’
Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 193 (S.D.N.Y. 2012). Judge Andrew J. Peck.
Judge Andrew Peck's opinion in Da Silva Moore was the first judicial approval of TAR, establishing the core principle that process transparency, not document disclosure, is the defensibility standard. TAR 1.0 fell out of favour on large matters because the fixed seed set became stale as the corpus changed during rolling productions, requiring re-training and increasing cost.
Tier 03 // Industry Default
TAR 2.0, Continuous Active Learning (CAL)
Continuous Active Learning, described by Grossman and Cormack in their landmark 2011 Richmond Journal of Law and Technology paper (“Technology-Assisted Review in Electronic Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review”), continuously retrains the classifier as reviewers code documents. The model actively selects the most informative documents for coding at each iteration, accelerating convergence.
TAR 2.0 is the current industry default. Rio Tinto PLC v. Vale S.A. (306 F.R.D. 125, S.D.N.Y. 2015) reinforced its defensibility framework, with Judge Peck holding that TAR, including CAL, could proceed without disclosure of the seed set, provided the producing party documented its process adequately. In re Biomet (2013 WL 6405156, N.D. Ind. 2013) extended the proportionality analysis to justify cost-burden shifting.
Validation requirements for TAR 2.0 are well-established: elusion testing (a random sample of documents predicted non-responsive is reviewed to measure the elusion rate), precision and recall measurement, and a final agreement on the recall level before production. Grossman and Cormack established F1 scoring and 95 percent confidence with plus-or-minus 5 percent margin as the standard target, though specific thresholds should be stipulated in the discovery protocol.
Tier 04 // GenAI Layer
GenAI review, LLM-scored relevance
In 2023 to 2026, the major platforms added an LLM relevance scorer alongside or replacing the classical CAL classifier. The attorney writes a natural-language issue description; the LLM scores documents for relevance against that description. Relativity's aiR for Review, EverlawAI Assistant's Single Document Review, DISCO Cecilia's narrative intelligence, and Reveal Ask all implement this architecture.
The legal defensibility profile is substantially the same as TAR 2.0 under the existing case law framework: documented process, statistical sampling, and stipulated protocol. EEOC v. Tesla (N.D. Cal. 2024-2025), the first public-record matter involving GenAI document review, was accepted subject to the same validation requirements. No new case-law category was created; existing defensibility doctrine applied.
What GenAI scoring adds over classical CAL: better handling of conceptual relevance (documents that are responsive but do not use the expected terms), faster convergence on complex multi-issue review sets, and natural-language prompt flexibility that allows issue-by-issue scoring in parallel. What it does not add: automatic per-document explainability (most platforms return a score, not a reasoning trace), reduced human reviewer requirement, or reduced validation obligation.
Tier 05 // Frontier
Agentic review, LLM agents reasoning across the corpus
Genuinely agentic eDiscovery is narrow in 2026. It describes LLM agent workflows that perform multi-step reasoning across a document corpus: retrieving related documents, scoring relevance and privilege in chained steps, identifying cross-custodian communication patterns, reconstructing narrative timelines, and producing per-document reasoning traces that are exportable for attorney review.
The closest current examples are Lighthouse AI's agentic retrieval workflows, the experimental Relativity aiR agent chain for case strategy, and Nuix Neo's agentic investigation workflows. Early OSS legal stacks built on LangChain and LlamaIndex are also in use in specialist shops. Full agentic review is not yet a mainstream procurement category; it is a feature set that distinguishes the advanced tier of the leading platforms.
The defensibility question for agentic review is not yet settled. The same Sedona Principle 6 / FRCP 26(g) framework applies, but the requirement for auditable reasoning traces is significantly more important when an agent, rather than a human reviewer, is making or influencing privilege and responsiveness determinations. Firms using agentic features should document the agent chain as thoroughly as the review protocol.
Section 06 // Defensibility
The defensibility framework for all five tiers
Sedona Conference Principle 6 is the foundational statement: the responding party is best situated to evaluate the search and review methods appropriate to its circumstances. Courts following Rio Tinto and Biomet apply this principle by evaluating the process, not the outcome. The question is not whether every responsive document was found, but whether the producing party used a documented, validated, proportionate process.
Practical defensibility requirements for any AI-assisted review: (1) a written review protocol describing the methodology, agreed with opposing counsel or approved by the court in advance; (2) statistical sampling to validate recall and elusion rates; (3) documented seed set or issue prompt; (4) quality control coding and log; (5) a Rule 502(d) order covering inadvertent privilege disclosures. See the full case-law reference.
Section 07 // Decision Matrix
Which tier do you actually need?
| Matter Profile | Recommended Tier | Key Reason |
|---|---|---|
| Single issue, under 50 GB, 3 reviewers | Keyword + Boolean | Proportionate; CAL overhead not justified |
| 500 GB, defined issue set, 8 reviewers | TAR 2.0 / CAL | Industry default; established defensibility |
| 5 TB, complex multi-issue, 25 reviewers | GenAI on CAL | LLM handles conceptual relevance; faster convergence |
| 50 TB, regulatory, cross-custodian patterns | GenAI + Agentic | Agent chains for cross-custodian reasoning; trace export required |
Last verified Apr 2026
Section 08 // FAQ