It’s a buzzword with big benefits. In ways unthinkable just a few years ago, machine learning marries massive computing power and often very large amounts of data to develop and implement algorithms that can make predictions and analyze everything from telecom data to tweets. As a form of artificial intelligence, machine learning can help identify malignant moles as well as a veteran dermatologist can and play video games better than humans. While the areas of health, e-commerce and defense have been the largest users of machine learning, Abt Associates is exploring its use in the social policy arena.
Abt uses machine learning to do things that until recently would have been impossible, such as turning a vast amount of unstructured text into analyzable data in seconds or counting mosquito eggs from a smartphone picture using image recognition. Decision makers can analyze the large amounts of data that result from a program evaluation in a variety of ways. For example, machine learning can dig through vast numbers of open-ended survey responses to determine the main themes or analyze spatial data to predict the onset of droughts. Such uses can cut the cost of program evaluations by analyzing data faster and more comprehensively than would be possible through purely human efforts. Machine learning also has the potential to change qualitative research by leaving the time-consuming tasks of reviewing documents and cataloging them to algorithms.
Abt uses large datasets, powerful computing and efficient programming to explore machine learning applications in the public policy domain and to transform social policymaking. We use the technique in “supervised learning” to predict outcomes and in “unsupervised learning” to provide unique insights into a document collection of any size. In supervised learning, the target outcome is clear, such as graduating on time. In these cases, we can use machine learning to predict outcomes with high accuracy, and we build recommender systems to provide the best steps toward achieving a desired outcome. In unsupervised learning, the data have no preconceived outcome, so the algorithm looks for patterns. Unsupervised learning can be particularly helpful in identifying main themes or opinions within a body of text and can examine anything from journal articles to social media posts.
In 2017, Abt received its first federal contract with a machine learning component. As part of this contract, Abt will use machine learning to examine a large set of resumes and job postings to gather information on how individuals move through their career pathways. The work will employ machine learning to predict employment and wage outcomes using publicly available administrative data on individuals who have accessed services at an American Job Center. We will analyze existing research materials from Abt projects (e.g., interviews and notes) on career pathways to determine patterns in the language and to identify the main themes in the current body of research. Lastly, we will explore the development of machine learning-powered tools in the form of chat bots (online tools that simulate conversations with human users), which can address fundamental questions about career pathways training for job-seeking clients.
Abt worked with Miami Dade College, one of the largest postsecondary institutions in the U.S., to use machine learning to predict students’ associate degree completion within four years of their first enrollment. Using only the administrative data that the college regularly collects, Abt developed and implemented a type of machine learning algorithm called a gradient boosting to predict the degree completion outcomes of 300,000 students over several cohorts. Abt was able to accomplish this with up to 93 percent accuracy. The analysis also found that certain academic course performance indicators and baseline assessment characteristics are better predictors of completion than general demographic characteristics, such as race or gender.
Abt uses machine learning to target important subgroups for non-response follow up in mixed mode surveys. In situations where initial recruitment has not met expectations for certain subgroups, Abt (like many other survey vendors) uses external data to assist with targeted nonresponse follow-up. The external data often are incomplete and sporadic. The Abt difference is in the use of regional data (e.g., Census information) and machine learning to improve targeting. Abt relies on a combination of random forest models and simple neural networks to make such predictions, both of which can work well with incomplete data. Our clients in California and Texas are benefitting from better insights into where we can target follow-up to increase responses from under-represented or high-priority groups. Recently, we were more than 80 percent effective at identifying a specific subgroup of interest to one client for targeted follow-up. These algorithms need direction from subject matter experts to be most effective. So we combine machine learning with our vast knowledge of survey design and implementation to reduce the need for nonresponse follow-up, which is critical to achieving client goals.
A survey of participants in a large multi-site study of career pathways gathered responses to questions on schools attended and credentials awarded. Abt used the Naïve Bayes text classification algorithm, a supervised machine learning technique, to classify each of these responses into one of many known schools and known occupational categories, respectively. This work involved a combination of human and machine effort. A researcher manually identified many previously unknown schools (using web searches) and made a nearly exhaustive list of occupations. We used fuzzy text matching techniques along with manual inspection to classify some of the responses. Once we classified at least several responses into each category, we used machine learning to classify those responses not already manually classified. We identified classification errors and used them to refine the results with multiple waves of processing. This project illustrates how machine learning can greatly amplify the return on human effort.
When the EPA wanted to understand citizen reaction to mining activities in Alaska, Abt gathered and arrayed more than 13,000 citizen comments in the form of handwritten notes, PDF files, emails, HTML and text files. Abt data scientists used natural language processing and machine learning techniques to analyze the data, identifying common themes and text, interpreting overall and thematic sentiment and geo-locating response types. Abt discovered that a majority of the public comments came from a single form letter or a variation of it. The groups would create a letter to have members send to the Bristol Bay public comments. Abt was able to de-duplicate the form letters quickly and efficiently, which enabled personnel to focus their attention on the 2,000 comments that included news articles, pictures, and recorded voice. Using natural language processing, data scientists were able to accomplish in a few days what would have taken several weeks to do manually.
Abt is working with USAID to minimize the threat of malaria by using a variety of prevention and treatment techniques. To gain deeper insight into spray operations and get more real-time information to get ahead of malaria, Abt developed a variety of applications to combat the disease. Recognizing that stakeholders at various levels want to know where spray operations have occurred, Abt built a chat bot to answer questions around where, how many and when homes were sprayed with insecticide. The chat bot works in both settings with connection to the internet and in unconnected locations. To get ahead of malaria, Abt aggregated disparate data sources--entomological, topographic, atmospheric and demographic--to build machine learning models to estimate insecticide resistance at different sentinel sites. With this knowledge, teams are better able to mobilize resources more quickly, deploy personnel more efficiently and study the impacts of their spray operations.
Under the ZAP initiative funded by USAID, Abt is working to prevent Zika outbreaks by implementing, monitoring and evaluating mosquito-control activities in Latin America and the Caribbean. An essential aspect of the project is entomological monitoring or getting an understanding of the current mosquito population to help inform what the future population looks like. This includes all stages of the mosquito: eggs, larva, pupa and adult. This is a time-consuming and resource-intensive process. Teams of technicians need to count the number of mosquito eggs on a piece of paper, commonly referred to as ovitrap paper. This process becomes increasingly difficult when the team deploys hundreds of ovitrap papers across different locations and when each paper can contain hundreds of eggs. Seeing the need for a more efficient system, data scientists and software engineers at Abt built a mobile application to count the number of eggs on these ovitrap papers. The application relied on open source technology and will be left behind after the project so that local governments can customize it.
Select from our growing list of insights below.
Abt Associates is an engine for social impact, harnessing the power of data and grounded insight to bring people from vulnerability to security worldwide. We think boldly to deliver solutions that cross disciplines, methods, and geographies. Abt provides impact through research, evaluation, and program implementation in the fields of health, social and environmental policy, and international development.