Docket Data: What Data Scientists Have to Tell Lawyers & Their Firms

Topics: Data Analytics, Law Firms, Legal Innovation

futurelaw

If you were a data scientist, what would you do with five years of federal courts docket data? That’s what Thomson Reuters and CodeX, the Stanford Center for Legal Informatics, set out to discover when they jointly launched their Legal Tech Open Innovation Challenge.

The challenge involved making docket data, along with Thomson Reuters’ Open Source company identification system PermID, available to teams of “solvers” around the world. Programmers, legal professionals, entrepreneurs, data scientists and others were invited to use the data to develop an application that provides high-value analytics to legal practitioners. More than 450 solvers from around the world registered interest, showing there is a large appetite for building new applications and a very broad-based community with great capacity for this type of innovation.

The solutions submitted by the four winning solvers reflect the many ways data scientists look at legal data. They delivered some very interesting solutions that offer a taste of how data analytics might be integrated into legal practice in the not-too-distant future.

The winners of the top prizes were:

  • US-based Praescient Analytics took the top $20,000 prize for an application called iHammurabi. It leverages docket data to create a model predicting the likelihood for success of motions to dismiss before a specific judge in federal civil court based on key factors about the case.
  • Alexander Büse of Germany earned the second place spot and $15,000 with a tool that enables users to uncover trends and patterns among a variety of entities involved in litigation matters.
  • Tied for third place, and awarded $10,000 each, were EPAM Systems, an international software engineering company in Belarus and the Ukraine, and Muthukumarasamy Karthikeyan, an independent solver from India. EPAM’s entry used the provided data to build a prediction model to forecast docket outcomes, and built a mechanism using contextual search to find similar dockets. Karthikeyan’s program was configured to read about 100,000 records at once from a single file for greater efficiency.

What’s really interesting here is the range of geographies represented among the winners — and the fact that none came from a legal background. The winners’ responses to the challenge represent some interesting new ways of looking at legal data, untainted by too much familiarity with the US legal system.

Praescient’s submission represented the most polished application — something that looked like an actual product, even thought it was limited in scope. The Praescient team decided to focus on predicting outcomes of motions to dismiss. That scope meant that their system could base predictions on information that existed at the time of the motion; in other words, prior to discovery or other procedural stages in litigation where more information about the merits of the case — and thus more complicating factors — becomes available. Based on the factors they were able to isolate, their application was able to predict the outcomes of motions to dismiss across all cases with an 83% accuracy. For individual judges, the accuracies ranged between a 73% and 94% accuracy.


What’s really interesting here is the range of geographies represented among the winners — and the fact that none came from a legal background. The winners’ responses to the challenge represent some interesting new ways of looking at legal data, untainted by too much familiarity with the US legal system.


Büse’s submission was most interesting for the way it used data visualization to create a simple “map” of an entire case, using color-coding to isolate distinct stages of litigation. Büse’s submission leveraged Thomson Reuters’ PermID company identifier in an interesting way, allowing analysis of groups of litigants (for example, companies in a specific industry). Büse’s application also could be used to address a number of potential uses, including identifying trends in case filings, identifying companies typically involved in certain types of cases, etc. The visualization aspect, however, provides some intriguing possibilities for legal strategists. Each case was represented by a bar chart, with different-color bands on the bar representing a different type of pleading or action, such as a motion, a ruling, a filing of some kind, etc. In this way, similarities between cases were exposed. This could be a useful tool for a law firm planning for a certain type of litigation, by allowing the firm to “see” the typical course of that type of case, and to understand the time frame involved.

The biggest lesson from this experiment, however, is that innovation in the legal service industry can come from a wide variety of sources. The legal industry is becoming more data-intensive, and this challenge shows that there is an international community of innovators and data scientists ready to help it build its needed next-generation applications.