Back to Data Consulting

Indeed

Indeed is the world’s #1 employment website and attracts the highest traffic for job listings in the United States. They aggregate postings from thousands of web pages that include job boards, staffing agencies, nationwide associations, and company career sites. By offering premier job advertising and resume filtration features to companies, Indeed provides a valuable two-sided recruitment service built on large-scale data ingestion.

Problem

By offering paid job advertisements to firms, Indeed is able to generate additional analytics for job posters and seekers alike. These include the ability to search job postings and track the occurrence of words over time as an indicator of job market trends. As the site continues to grow, they aim to continue empowering firms and job hunters with employment market information. As a result, Indeed worked with us to create real time models to predict job market trends and analyze existing opportunities to determine what a typical job posting would look like in the future.

Methodology

We first tackled the goal of understanding and predicting job market trends. To this end, we built prediction models with a Facebook time-series forecasting framework called Prophet using different data sources, including IPUMS census records, stock market indicators, and Indeed’s proprietary data. Additionally, for a simpler, scalable model, we implemented regression models that leveraged information about the number of legislative bills passed in Congress and economic health data. We also scraped and analyzed textual data to extract the keywords for a typical job in each sector over time. To this end, we developed a Term Frequency-Inverse Document Frequency vectorizer and trained k-Nearest Neighbors models to classify job postings with no tag into the correct industry.

Results

We delivered a real-time predictive econometric model to forecast job market trends and predict the change in employment in a given year with a 99% accuracy. This will inform Indeed’s marketing and budgeting decisions because proactive planning can occur to increase the number of job-seeking users and anticipate the increased demands from companies trying to fill job openings. We also delivered a natural language processing-based clustering model to extract the most important keywords in a posting to better understand the relationship between the job description and industry. This pipeline was designed to help classify jobs that lack a clear sector label with an 87% accuracy and more accurately estimate the employment levels in each sector.

Semester

Spring 2020

Project Manager

Amal Bhatnagar