In ways unthinkable just a few years ago, machine learning combines massive computing power and often very large amounts of data to train and implement algorithms that can make individual-level outcome predictions and analyze large amounts of text information, such as public comments or journal articles. As a form of artificial intelligence, machine learning algorithms can help identify malignant moles as well as a veteran dermatologist can and play video games better than humans. While the areas of health, e-commerce and defense have been the largest users of machine learning, Abt Associates is exploring its use in the social policy arena.

Abt uses machine learning to do things that until recently would have been impossible, such as categorizing a vast amount of unstructured text into main themes or counting mosquito eggs from a smartphone picture using image recognition. Decision-makers can analyze the large amounts of data that result from a program evaluation in a variety of ways. For example, machine learning algorithms can dig through vast numbers of open-ended survey responses to help delineate the most important ideas or analyze spatial data to predict the onset of droughts. Such uses can cut the cost of program evaluations by analyzing data faster and more comprehensively than would be possible through purely human efforts. Machine learning also has the potential to change qualitative research by leaving time-consuming rote or repetitive tasks—such as reviewing and cataloging documents—to algorithms. That way researchers can spend more time on substantive thinking.


Abt uses large datasets, powerful computing and efficient programming to explore machine learning applications in the public policy domain and to transform social policymaking. We use the technique in “supervised learning” to predict outcomes and in “unsupervised learning” to provide unique insights into a document collection of any size. In supervised learning, the target outcome is clear, such as graduating on time. In these cases, we can use machine learning to predict outcomes with high accuracy, and we build recommender systems to provide the best steps toward achieving a desired outcome. In unsupervised learning, the data have no preconceived outcome, so the algorithm looks for patterns. Unsupervised learning can be particularly helpful in identifying main themes or opinions within a body of text and can examine anything from journal articles to social media posts.

Relevant Expertise

Career Pathways Descriptive and Analytical Study

Client: U.S. Department of Labor

In 2018, Abt received its first federal contract with a machine learning component. As part of this contract, Abt is exploring possible applications of machine learning to advance knowledge about career pathways. Abt will use machine learning to examine a large set of resumes to gather information on how individuals move through their careers. Abt also will analyze qualitative data from research reports to identify characteristics of career pathways programs. Additional possible applications include the analysis of research materials (e.g., interviews and notes) from Abt projects on career pathways to determine the main themes in the current body of research. Ideas for future use cases include web-based tools to address fundamental questions about career pathways training for job-seeking clients such as an interactive chat bot built into a job center website. Abt also may use machine learning to predict employment and wage outcomes.

Predicting Associate Degree Completion with High Accuracy

Client: Miami Dade College

Abt worked with Miami Dade College, one of the largest postsecondary institutions in the U.S., to use machine learning to predict students’ associate degree completion within four years of their first enrollment. Using only the administrative data that the college regularly collects, Abt developed and implemented a type of machine learning algorithm called a gradient boosting to predict the degree completion outcomes of 300,000 students over several cohorts. Abt was able to accomplish this with up to 93 percent accuracy. The analysis also found that certain academic course performance indicators and baseline assessment characteristics are better predictors of completion than general demographic characteristics, such as race or gender.

Using Machine Learning for Targeted Nonresponse Follow-up in Large Community Surveys

Client: Los Angeles Public Health Department and Texas Department of Transportation

Abt uses machine learning to target important subgroups for non-response follow up in mixed mode surveys. In situations where initial recruitment has not met expectations for certain subgroups, Abt (like many other survey vendors) uses external data to assist with targeted nonresponse follow-up. The external data often are incomplete and sporadic. The Abt difference is in the use of regional data (e.g., Census information) and machine learning to improve targeting. Abt relies on a combination of random forest models and simple neural networks to make such predictions, both of which can work well with incomplete data. Our clients in California and Texas are benefitting from better insights into where we can target follow-up to increase responses from under-represented or high-priority groups. Recently, we were more than 80 percent effective at identifying a specific subgroup of interest to one client for targeted follow-up. These algorithms need direction from subject matter experts to be most effective. So we combine machine learning with our vast knowledge of survey design and implementation to reduce the need for nonresponse follow-up, which is critical to achieving client goals.

Career Pathways Intermediate Outcomes

Client: U.S. Department of Health and Human Services

A survey of participants in a large multi-site study of career pathways gathered responses to questions on schools attended and credentials awarded. Abt used the Naïve Bayes text classification algorithm, a supervised machine learning technique, to classify each of these responses into one of many known schools and known occupational categories, respectively. This work involved a combination of human and machine effort. A researcher manually identified many previously unknown schools (using web searches) and made a nearly exhaustive list of occupations. We used fuzzy text matching techniques along with manual inspection to classify some of the responses. Once we classified at least several responses into each category, we used machine learning to classify those responses not already manually classified. We identified classification errors and used them to refine the results with multiple waves of processing. This project illustrates how machine learning can greatly amplify the return on human effort.

Comment Counts

Client: United States Environmental Protection Agency (EPA)

When the EPA wanted to understand citizen reaction to mining activities in Alaska, Abt gathered and arrayed more than 13,000 citizen comments in the form of handwritten notes, PDF files, emails, HTML and text files. Abt data scientists used natural language processing and machine learning techniques to analyze the data, identifying common themes and text, interpreting overall and thematic sentiment and geo-locating response types. Abt discovered that a majority of the public comments came from a single form letter or a variation of it. The groups would create a letter to have members send to the Bristol Bay public comments. Abt was able to de-duplicate the form letters quickly and efficiently, which enabled personnel to focus their attention on the 2,000 comments that included news articles, pictures, and recorded voice. Using natural language processing, data scientists were able to accomplish in a few days what would have taken several weeks to do manually.

Presidents Malaria Initiative (PMI) VectorLink

Client: United States Agency for International Development (USAID)

Abt is working with USAID to minimize the threat of malaria by using a variety of prevention and treatment techniques. To gain deeper insight into spray operations and get more real-time information to get ahead of malaria, Abt developed a variety of applications to combat the disease. Recognizing that stakeholders at various levels want to know where spray operations have occurred, Abt built a chat bot to answer questions around where, how many and when homes were sprayed with insecticide. The chat bot works in both settings with connection to the internet and in unconnected locations. To get ahead of malaria, Abt aggregated disparate data sources--entomological, topographic, atmospheric and demographic--to build machine learning models to estimate insecticide resistance at different sentinel sites. With this knowledge, teams are better able to mobilize resources more quickly, deploy personnel more efficiently and study the impacts of their spray operations.

Zika Africa Indoor Residual Spraying Project (ZAP)

Client: United States Agency for International Development (USAID)

Under the ZAP initiative funded by USAID, Abt is working to prevent Zika outbreaks by implementing, monitoring and evaluating mosquito-control activities in Latin America and the Caribbean. An essential aspect of the project is entomological monitoring or getting an understanding of the current mosquito population to help inform what the future population looks like. This includes all stages of the mosquito: eggs, larva, pupa and adult. This is a time-consuming and resource-intensive process. Teams of technicians need to count the number of mosquito eggs on a piece of paper, commonly referred to as ovitrap paper. This process becomes increasingly difficult when the team deploys hundreds of ovitrap papers across different locations and when each paper can contain hundreds of eggs. Seeing the need for a more efficient system, data scientists and software engineers at Abt built a mobile application to count the number of eggs on these ovitrap papers. The application relied on open source technology and will be left behind after the project so that local governments can customize it.

Evaluation of the Medicare Care Choices Model: Predicting Survival at the End of Life

Client: Centers for Medicare & Medicaid Services (CMS)

Abt is conducting an evaluation of the Medicare Care Choices Model (MCCM).  The model provides eligible Medicare beneficiaries an opportunity to receive supportive services from participating hospices while continuing treatment for their terminal condition. The evaluation of MCCM will measure whether providing additional support and care coordination under this new model improves quality of care at the end of life and increases beneficiary and caregiver satisfaction while reducing Medicare expenditures. One challenge in this analysis is selecting a comparison group and identifying who from that comparison group would have enrolled in MCCM given the opportunity. We need to identify comparison beneficiaries who are as similar as possible to MCCM enrollees, including being at a similar point in the trajectory of their disease as the MCCM enrollees. To identify those similar points in trajectory, Abt is using machine learning to help predict survival among Medicare beneficiaries at the end of life. We use over 300 beneficiary characteristics, including demographics, measures of comorbidity (i.e., disease burden), health care utilization, functional status, and interactions across these characteristics. By capturing complex patterns in the data, machine learning can substantially improve our ability to predict survival and help us select an appropriate comparison group.

Contact Us

Sung-Woo Cho
Senior Associate/Scientist
Social & Economic Policy
Rockville, MD
(301) 347-5843

Machine Learning

Abt Associates is an engine for social impact, harnessing the power of data and grounded insight to bring people from vulnerability to security worldwide.  We think boldly to deliver solutions that cross disciplines, methods, and geographies. Abt provides impact through research, evaluation, and program implementation in the fields of health, social and environmental policy, and international development.