Build strong predictive coding with the right PDF and OCR foundation

Predictive coding offers strong benefits to the legal industry, but if care isn’t paid to selecting the right document conversion solutions agencies risk falling victim to GIGO

At LegalTech this week a lot of talk focused on electronic discovery again this year, but this time there was more talk about predictive coding.

For those unfamiliar with predictive coding it is a machine-learning technology that utilizes a combination of keyword search, natural language processing, data filtering and sampling techniques to automate portions of the document review process in legal discovery.  The goal of predictive coding is to reduce the number of electronic documents that must be reviewed by humans to evaluate their relevance to a legal case.  It is a particularly important within the legal industry given the pace at which the volume of electronically stored information has grown and continues to grow.

One comment from a panelist that stood out in a stream of tweets was, “95% of cases are too small for predictive coding.”

This quote, taken out of context, can be interpreted in more than one way.  For example, it could be interpreted in the same vein as the old adage that warns against the killing of an ant with a sledgehammer.  Or it simply could mean that predictive coding technology hasn’t yet matured to the point it’s suitable for the majority of discovery requests.

Whatever the intent of the comment was I think it underscores an important point.  Technology needs to serve the business, not drive it.  Frequently in our zeal to achieve greater productivity and efficiency through technology we have a tendency to overlook the obvious and forget to stop and consider whether a particular solution makes solid business sense.

A subject related to this idea of overlooking the obvious, and one that is also related to electronic discovery, is considering whether some of the core document processing solutions that have been in place for some time may be failing the business by introducing problems into the discovery process that undermine productivity or drive unseen costs.

For example, PDF and OCR are both critical to the processing of ESI (electronically stored information) and significantly impact the success of newer technologies like predictive coding that are sensitive to the quality of the data contained in the document. They are the critical first elements in converting images and paper to searchable content and if they aren’t built for the legal environment, could introduce errors into the coding making it worthless. If the technologies you use as the foundation for new solutions that promise to transform your business aren’t capable of supporting the needs of that solution and its users, you’re wasting resources at best, and damaging the productivity, effectiveness and reputation of your organization at worst

Are you evaluating eDiscovery solutions?  If so, I suggest it’s also time to evaluate your current PDF and OCR tools to ensure your efforts don’t become a victim of GIGO – garbage in, garbage out.

    I just downloaded all the emails for a couple of real estate transactions. Several printed on 400 – 500 pages of documented email. If this was to go to a legal case, for a small transaction it would be many hours of reading through these for an attorney or paralegal. My suggestions is that even small transactions can get complicated quickly so that pdf or ocr reading solutions need to come of age quickly.

      Janice. That’s a great example of a very common scenario where the overhead of a predictive coding and/or an eDiscovery system may be too great for the “small” document set. As you suggest, even small document sets can be very costly to review particularly within the constraints of competitively priced legal fees. Your scenario is further complicated by the fact the emails were printed on hardcopy (a common practice for smaller cases like this one), which renders them electronically unsearchable. You’re right about PDF and OCR being the right technologies to apply in these cases. I’d like to suggest, however, that PDF and OCR solutions have come of age. If you’re looking for a solution let us help you. We have a PDF solution that makes it very easy to scan and convert printed pages into electronically searchable PDF/A files that comply with electronic court filing system format requirements. We even include sophisticated search and pattern searching capabilities to assist with review and redaction. Again, thank you for your thoughtful post.

