Back to Data Consulting

Motionize

Motionize, previously known as aiLanthus, is a document automation startup in the legal technology space, building software tools to improve productivity and client success for law practitioners. They leverage natural language processing and artificial intelligence to extract data and generate litigation documents in a process that saves lawyers 80% of billable time.

Problem

In the patent litigation process, a majority of initial applications are approved by the US Patent and Trademark Office (USPTO). However, a 2014 Supreme Court decision has drastically reduced the number of patents that are considered valid under §101 of the U.S. patent laws in Federal District Courts. Knowing if a patent claim is too abstract to be upheld as eligible in court is critical for legal practitioners submitting a patent application. As a result, Motionize wanted to develop an algorithm to determine if a patent will be ruled valid when challenged. We aimed to improve upon their algorithm to predict the validity of a patent based on the document text.

Methodology

This challenge was especially complex because the model needed to generalize across all domains, so it could not leverage technical terminology from any specific sector. We utilized natural language processing techniques throughout this project to develop our predictive model. We applied bag-of-words and legal vectorization techniques, from the Law2Vec library, in order to preprocess the data set into word embeddings. Due to a large proportion of invalid patent records, we also used the SMOTE oversampling method to overcome the data imbalance. Finally, we built and tuned supervised machine learning models to classify patent documents, including random forests, logistic regression models, and LSTM neural networks.

Results

We delivered an algorithm that used a logistic regression model and leveraged forward feature selection. Our pipeline resulted in a significant improvement from their baseline metrics by 20% in precision and recall, and contributed to their final published results.

Semester

Spring 2020

Project Manager

Lucas Bandarkar