|Position:||Associate Professor, School of Information|
Director, Data-Intensive Development Lab
Co-Director, Center for Effective Global Action
University of California, Berkeley
|My work focuses on using novel data and methods to better understand the economic lives of the poor. Most active projects are based in developing and conflict-affected countries.|
Blumenstock, JE (2020). Machine learning can help get COVID-19 aid to those who need it most, Nature, 581 (7807)
COVID-19's spread and lockdowns in low-income countries are leaving hundreds of millions of poor and vulnerable people without work or income. The United Nations World Food Programme has warned of devastating famines - 265 million people in low- and middle-income countries are projected to suffer from acute hunger by the end of the year. Big data and artificial intelligence can help.
Blumenstock, JE, Callen, M, and Ghani, T (2018). Why Do Defaults Affect Behavior? Experimental Evidence from Afghanistan, American Economic Review, 108 (10), 2868-2901 [pdf]
We report on an experiment examining why default options impact behavior. By randomly assigning employees to different varieties of a salary-linked savings account, we find that default enrollment increases participation by 40 percentage points -- an effect equivalent to providing a 50% matching incentive. We then use a series of experimental interventions to differentiate between explanations for the default effect, which we conclude is driven largely by present-biased preferences and the cognitive cost of thinking through different savings scenarios. Default assignment also changes employees' saving habits, and makes them more likely to actively decide to save after the study concludes.
Park, PS, Blumenstock, JE, and Macy, MW (2018). The strength of long-range ties in population-scale social networks, Science, 362(6421), 1410-1413 [pdf]
Long-range connections that span large social networks are widely assumed to be weak, comprised of sporadic and emotionally distant relationships. However, researchers historically have lacked the population-scale network data needed to verify the predicted weakness. Using data from eleven culturally diverse population-scale networks on four continents -- encompassing 56M Twitter users and 58M mobile phone subscribers -- we find long-range ties are nearly as strong as social ties embedded within a small circle of friends. These high bandwidth connections have important implications for diffusion and social integration.
Blumenstock, JE (2018). Don't forget people in the use of big data for development, Nature, 561 (7722), 170-172 [pdf]
Aid organizations, researchers and private companies are looking for ways to leverage the 'data revolution' to transform international development. In the rush to find technological solutions to complex global problems, however, there's a danger that we get by distracted the technology and lose track of the deeper issues that are unique to each local context... The CEO of a popular big-data platform recently described data science as "a blend of Red-Bull-fueled hacking and espresso-inspired statistics." In my view, the successful use of big data in development will require a data science that is considerably more humble than this version that has captured the popular imagination.
Blumenstock, JE (2016). Fighting Poverty with Data, Science, 353(6301), 753-754 [pdf]
Policy-makers in the world's poorest countries are often forced to make decisions based on limited data. Consider Angola, which recently conducted its first postcolonial census. In the 44 years that elapsed between the prior census and the recent one, the country's population grew from 5.6 million to 24.3 million, and the country experienced a protracted civil war that displaced millions of citizens. In situations where reliable survey data are missing or out of date, a novel line of research combines big data and machine learning to offer promising alternatives.
Blumenstock, JE, Cadamuro, G, On, R (2015). Predicting Poverty and Wealth from Mobile Phone Metadata, Science, 350(6264), 1073-1076 [pdf]
Accurate and timely estimates of population characteristics are a critical input to social and economic research and policy. We show that an individual's past history of phone use can be used to infer his or her socioeconomic status, and that the predicted attributes of millions of individuals can in turn be used to accurately reconstruct the distribution of wealth of an entire nation, or to infer the asset distribution of micro-regions comprised of just a few households. In resource-constrained environments where censuses and household surveys are rare, this creates an option for gathering localized and timely information at a fraction of the cost of traditional methods.
Working Papers / Active Projects
Migration and the Value of Social Networks - joint with Guanghua Chi, Xu Tan (Revise and Resubmit, Review of Economic Studies)
What is the value of a social network? Prior work suggests two distinct mechanisms that have historically been difficult to differentiate: as a conduit of information, and as a source of social and economic support. We use a rich 'digital trace' dataset to link the migration decisions of millions of individuals to the topological structure of their social networks. We find that migrants systematically prefer 'interconnected' networks (where friends have common friends) to 'expansive' networks (where friends are well connected). A micro-founded model of network-based social capital helps explain this preference: migrants derive more utility from networks that are structured to facilitate social support than from networks that efficiently transmit information.
Many decisions that once were made by humans are now made using algorithms. These algorithms are typically designed with a single, profit-related objective in mind: Loan approval algorithms are designed to maximize profit, smart phone apps are optimized for engagement, and news feeds are optimized for clicks. However, these decisions have side effects: irresponsible payday loans, addictive apps, and fake news can harm individuals and society. This project develops and tests a new paradigm for prioritizing the social impact of an algorithmic decision from the start, rather than as an afterthought. The key insight is to leverage recent advances in machine learning -- which make it possible to predict who will benefit from a decision and how -- to design algorithms that balance those predicted benefits alongside traditional profit-related objectives.
Global Micro-Estimates of Wealth and Poverty - joint with Guanghua Chi
Many critical business and policy decisions, from strategic investments to the allocation of humanitarian aid, rely on data about the geographic distribution of wealth and poverty. Here, we develop the first global micro-estimates of wealth and poverty, which cover the entire populated surface of the earth at 2.4km2 resolution. The estimates are built by applying machine learning algorithms to vast and heterogeneous data from satellites, mobile phone networks, topographic maps, as well as aggregated and de-identified connectivity data from Facebook. The estimates, which are trained and calibrated using nationally-representative household survey data from 50 countries, are more accurate than state-of-the-art methods that rely solely on satellite imagery. We validate the accuracy of the high-resolution poverty maps using two independent sources of data: population census data from 15 countries and poverty scorecard data collected by a large nonprofit organization in Kenya. We also provide confidence intervals for each micro-estimate to facilitate responsible downstream use. Working paper available by request.
Manipulation-Proof Machine Learning - joint with Daniel Björkegren and Samsun Knight
An increasing number of decisions are guided by machine learning algorithms. An individual's behavior is typically used as input to an estimator that determines future decisions. But when an estimator is used to allocate resources, individuals may strategically alter their behavior to achieve a desired outcome. This paper develops a new class of estimators that are stable under manipulation, even when the decision rule is fully transparent. We explicitly model the costs of manipulating different behaviors, and identify decision rules that are stable in equilibrium. Through a large field experiment in Kenya, we show that decision rules estimated with our strategy-robust method outperform those based on standard supervised learning approaches. Working paper available by request.
How Do Firms Respond to Insecurity? Evidence from Afghan Phone Records - joint with Tarek Ghani, Sylvan Herskowitz, Ethan B. Kapstein, Thomas Scherer, and Ott Toomet
We provide new evidence on how insecurity affects firm behavior by linking data on violent conflict in Afghanistan to geo-stamped corporate mobile phone records. We begin by developing a method for observing firm location choice with phone data, and validate these measurements using independent sources of administrative and survey data. Next, we show that deadly terrorist attacks reduce the presence of firms in targeted districts by 4-6%. The effect includes both an increase in the local exit of existing firms following attacks and a decrease in new firm entry. We find large negative spillovers from attacks in provincial capitals on firm presence in nearby rural districts. After violence, employees in provincial capitals are 33% more likely to move to Kabul and 15% more likely to leave for another province.
Scalable Methods for Discovering Latent Structure in Societal-Scale Data - joint with Sham Kakade
The proliferation of digital devices has created an unparalleled opportunity to observe, model, and understand the changing structure of social networks in developing and conflict-affected states. However, current state-of-the-art computational methods used to analyse such data are notoriously ill-suited to answer basic, fundamental questions in the social science and policy arena. While many new, provably efficient algorithms for community detection have been recently developed, these methods have several key limitations: they rarely scale to real-world datasets consisting of millions of interconnected actors; they are not applicable to dynamic contexts where network structure evolves over time; and they are almost never validated. This project adapts recent algorithmic advances in theoretical computer science to build scalable tools capable of reliably discovering hidden structure in societal-scale network data.
(Machine) Learning what Governments Value - joint with Daniel Björkegren and Samsun Knight
The rationale behind targeting criteria is not always clear. We combine program eligibility criteria with recent advances in machine learning heterogeneous treatment effects to infer a policymaker's preferences over households and outcomes. Our method can be used to better understand and articulate the allocation of social programs. We find for Mexico's PROGRESA anti-poverty program, government allocations are consistent with a consumption value of 2.03 pesos for each day of child school attendance and 2.64 pesos for each child sick day. Allocations imply welfare weights that place 16.9% more value on the median household for each additional household member, 8% more value if indigenous, and 0.6% less value for each additional year of education of the household head. Alternate eligibility criteria could have marginally improved average health and schooling outcomes at a small cost to average consumption outcomes. Working paper available by request.
Targeting Development Aid with Machine Learning and Mobile Phone Data: Evidence from an Anti-Poverty Intervention in Afghanistan - joint with Emily Aiken
Recent papers demonstrate that non-traditional data, from mobile phones and other digital sensors, can be used to roughly estimate the wealth of individual subscribers. This paper asks a question more directly relevant to development policy: Can non-traditional data be used to more efficiently target humanitarian aid? By combining rich survey data from a "big push" anti-poverty program in Afghanistan with detailed mobile phone logs from program beneficiaries, we study the extent to which machine learning methods can accurately differentiate ultra-poor households eligible for the relief program from other poor households deemed ineligible. We show that supervised learning methods leveraging mobile phone data can identify ultra-poor households nearly as accurately as standard survey-based measures of poverty, including expenditures and wealth. We discuss the implications and limitations of these methods for targeting extreme poverty in marginalized populations.
Score-Based Classifiers for Welfare-Aware Machine Learning - joint with Esther Rolf, Max Simchowitz, Sarah Dean, Lydia Liu, Daniel Björkegren and Moritz Hardt
While real-world decisions involve many competing objectives, algorithmic decisions are often evaluated with a single objective function. In this paper, we study algorithmic policies which explicitly trade off between a private objective (such as profit) and a public objective (such as social welfare). We analyze a natural class of policies which trace an empirical Pareto frontier based on learned scores, and focus on how such decisions can be made in noisy or data-limited regimes. Our theoretical results characterize the optimal strategies in this class, bound the Pareto errors due to inaccuracies in the scores, and show an equivalence between optimal strategies and a rich class of fairness-constrained profit-maximizing policies. We then present empirical results in two different contexts --- online content recommendation and sustainable abalone fisheries --- to underscore the generality of our approach to a wide range of practical decisions. Taken together, these results shed light on inherent trade-offs in using machine learning for decisions that impact social welfare.
Biased Updating Creates Overconfidence and Increases Default Risk: Evidence from Sports Betting in Kenya - joint with Matthew Olckers
We develop a model to show how biased updating can lead to persistent overconfidence in one's ability, and highlight the negative welfare implications of this overconfidence. We validate key assumptions and predictions of this model using a unique dataset that captures rich details on the gambling decisions of over 50,000 Kenyans. The data show that gamblers react asymmetrically to (exogenous variation in) wins and losses. The bias in the learning process causes gamblers to increase betting expenditures over time. Exogenous increases in betting expenditures cause gamblers to take out high-interest loans, thus creating scope for persistent debt traps. Working paper available by request
The Impact of Mobile Phones: Experimental Evidence from the Random Assignment of New Cell Towers - joint with Niall Keleher, Arman Rezaee, Erin Troland
We present experimental evidence on the economic impacts of mobile phone access. Our results are based on a randomized control trial in the Philippines, through which 14 isolated and previously unconnected villages were randomly assigned to either receive or not receive a new cellphone tower. Following a pre-analysis plan, we find that the introduction of mobile phones had large and significant impacts on household income and expenditure, particularly for wage workers. Mobile phone access also increased social connections within and between communities. However, there are no consistent impacts on market access, informedness, or subjective well being. In post-specified analysis, we find suggestive evidence that the improved economic conditions are driven by increases in migration, remittances, and self-employment. Working paper available by request.
How Important are the Yellow Pages? Experimental Evidence from Tanzania - joint with Brian Dillon and Jenny Aker
Mobile phones reduce the cost of communicating with existing social contacts, but do not eliminate frictions in forming new relationships. We report the findings of a two-sided randomized control trial in central Tanzania, centered on the production and distribution of a "yellow pages" phone directory with contact information for local enterprises. Enterprises randomly assigned to be listed in the directory receive more business calls, make more use of mobile money, and employ more workers. There is evidence of positive spillovers, as both listed and unlisted enterprises in treatment villages experience significant increases in sales relative to a pure control group. Households randomly assigned to receive copies of the directory make greater use their phones for farming, are more likely to rent land and hire labor, have lower rates of crop failure, and sell crops for weakly higher prices. Willingness-to-pay to be listed in future directories is significantly higher for treated enterprises.
Violence and Financial Decisions: Evidence from Mobile Money in Afghanistan - joint with Michael Callen and Tarek Ghani
We provide evidence that violence changes the financial decisions people make. Exploiting the quasi-random timing of several thousand violent incidents in Afghanistan, we show that individuals who are exposed to violence retain more cash and are less likely to adopt and use mobile money, a new financial technology. This effect is corroborated using three independent sources of data: (i) the universe of mobile money transactions in Afghanistan; (ii) high-frequency data from a randomized experiment designed to increase mobile money adoption; and (iii) a behavioral lab-in-the-field experiment with experienced mobile money users. Collectively, the evidence highlights an economic cost of violence that operates through individual beliefs, which is large enough to impede the development of formal financial systems in conflict settings.