The e-discovery world is abuzz about the new decision by Magistrate Judge Andrew J. Peck of the Southern District of New York regarding computer-assisted coding. The case is Moore v. Publicis Group, No. 11 Civ. 1279 (ALC) (AJP), slip op. (S.D.N.Y. Feb. 24, 2012). This is an issue of particular interest for Judge Peck, who also wrote an article on the topic in October 2011. While it’s an issue that goes far beyond class actions or insurance class actions, it certainly has the potential to impact such cases in a major way.
For those who haven’t heard of it, computer-assisted coding involves the use of special software to conduct a search through electronic documents (e-mails, attachments and other electronically-stored data) to identify documents relevant to a particular lawsuit. The way it works, as explained in the opinion, is that a more senior lawyer representing the party responding to the document request reviews a small, random sample of a subset of documents, say a few thousand documents, drawn from a massive collection of say several million potentially-relevant documents (the Moore case involved over 3 million emails, and it appears that was just the beginning). This sample might or might not first be narrowed through key word searching. The reviewer scores the documents for their degree of relevance to the document requests. The software then, “trained” by the reviewer, searches through the remainder of the documents and endeavors to identify their degree of relevance. A particular statistical confidence level is selected in using the software, say a 95% confidence rate. Sometimes more than one round of this kind of review is needed to “train” the computer consistent with the selected confidence rate. Experienced lawyers do the review because they are better at identifying the degree of relevance. Documents that the system determines to have a lack of relevance below a particular score selected by the parties are presumed to be irrelevant, and only the relevant documents are reviewed for privilege and produced. The idea is that this process should achieve the objective of identifying, with a sufficient confidence rate, the relevant documents, at a substantially lower cost than the usual procedure of having junior associates or contract attorneys review documents one by one for relevance (as I did over a decade ago as a junior associate at Skadden Arps). According to Judge Peck, statistical data shows that this new computer software, when properly “trained” by senior lawyers, is now more accurate in determining relevance than the old method of manual human review. It also makes possible review projects where the set of electronic data is so large that it would be impossible (or the cost unfathomable) to conduct a manual review.
What Magistrate Judge Peck’s opinion seems to suggest is that this process works best where the protocol provides the party seeking the discovery with complete access to all non-privileged documents that were reviewed by the senior reviewer, and the manner in which they were tagged (for a particular level of relevance or relevance as to a particular issue), so that the party seeking the discovery can be comfortable that the “training” of the computer system is being done fairly, and thus that the computer search that is conducted based on that “training” is appropriate. It seems to me that another approach that might be appropriate in some cases would be to let the party seeking the discovery do the review that “trains” the computer, so they are ones that are identifying what they deem to be relevant. First the producing party would need to do a privilege review on the random sample of documents (and probably exclude other particularly sensitive documents that do not fall within the scope of the requests), but the random sample would be only a relatively small number of documents to review for privilege, etc. If the party seeking the discovery then reviews the non-privileged documents and “trains” the computer surely they cannot complain that it was not done properly. There may be problems that prevent using that approach in particular cases but it strikes me as an option that is worth considering.
It also strikes me that this kind of system could also be useful in identifying privileged documents, perhaps not to make final determinations, but for narrowing what needs to be reviewed for that purpose, particularly where a clawback order is in place. If a computer can be trained to identify levels of relevance to a high degree of certainty it likely can be trained to identify privilege in a similar way.
Judge Peck’s decision makes a few other interesting points:
- Rule 26(g)(1)’s certification requirement applies only to initial disclosures, and does not require producing parties to certify that document productions are “complete.” He also notes that in cases involving massive amounts of data no lawyer could certify that a production is “complete.” Rather, the standard of compliance is one of proportionality under Rule 26(b)(2)(C). This rule requires federal courts to limit discovery where “the burden or expense of the proposed discovery outweighs its likely benefit, considering the needs of the case, the amount in controversy, the parties’ resources, the importance of the issues at stake in the action, and the importance of the discovery in resolving the issues.”
- Rule 702 and Daubert only apply to expert testimony and are inapplicable to the method used to search for documents in discovery. Thus, the procedure followed for computer-assisted coding should not have to meet Daubert standards, but instead be a reasonable process in accordance with the proportionality rule.
In my mind the advent of this software and the fact that it has the strong support of at least one judge on a court that handles many of our country’s most complex civil cases seems like a great development for the cause of achieving what the very first rule of civil procedure (which Judge Peck cites) calls for, that is, “the just, speedy and inexpensive determination of every action and proceeding.” Fed. R. Civ. P. 1. If this new technology (no doubt expensive, but that won’t last forever) can cut the cost of litigation substantially and result in more cases being resolved on the merits (rather than by settlements driven by litigation costs) that will be a very positive development for our legal system and the legal profession. More complex cases can be tried if the parties spend less time and money on discovery. There will be fewer jobs for recent (and not so recent) law school graduates to stare at a computer screen all day coding documents, but that is not what they thought they were going to school for anyway, and not particularly productive for our economy.