On the Econometrics of Matching

Al Roth, in his 2012 Nobel speech, noted that matching markets are some of the most important types of markets that we are involved in— in fact, matching markets can determine what schools we go to, what jobs we get, and maybe who we marry. Both, Al Roth and Lloyd Shapley, who jointly won the 2012 Nobel Prize in Economics, have worked extensively on the fundamental problem of market design in such markets. Most notably, their research has led to many improvements in the National Resident Matching Program in the US hospitals, and to the creation of a matching program that matches kidney donors to patients.  Early works in this area were mainly concerned with developing the theoretical tools to understand the allocation mechanism in such markets. In fact, the economic theory of matching models has been around for more than five decades. However, it is only recently that there has been a surging interest in taking these theoretical matching models to the data. One reason for this has been the easier availability of datasets that are observed at the level of the matches, be it men/women matching with spouses; students matching with schools and colleges; residents matching with hospital residency programs; and many more. This has posed a set of new questions and challenges for the econometricians and empirical economists alike. The econometric challenges lie in finding the right set of conditions given the features and limitations of the dataset, under which we can formulate credible strategies to estimate the agent preferences for matching.

Simply put, a matching market is a two-sided market with disjoint sets of agents on the two sides. Agents on both sides have a say in forming a match or remaining unmatched according to some innate preferences. These matching preferences are usually what we want to estimate from the data. As empiricists, we assume that the match allocations observed in the data are generated in equilibrium, or according to some stability criterion.

Broadly, this literature on matching can be classified into two strands—one where transfers or prices play a role as a mechanism to clear markets (transferable utility models, or TU), and the other where transfers do not play a role (non-transferable utility models, or NTU). Each of these can be further classified based on how many matches agents on each side are allowed to make. For example, in a school choice problem, one school can match with multiple students but one student can match with at most one school. This is called one-to-many matching. In a traditional marriage market, one man can match with at most one woman and vice-versa. This is called one-to-one matching. We can also have the case where, say, an upstream firm can choose to match with many downstream firms and vice-versa. This is called many-to-many matching. Depending on the setting of our application—whether it is TU or NTU, and how many matches an agent can form—the matching model can have different implications on the number of stable allocations, whether the stable allocations are efficient, etc. For concreteness, let us look at the questions and challenges posed to empirical research in a couple of these models.

Edwin Long – The Babylonian Marriage Market

Marriage Markets

A widely studied case of matching in the empirical literature is that of the marriage market. A major question often posed in this literature is how a policy or technology shock affects the matching patterns or the agents’ preferences to match? For example, what is the impact of improved birth control technologies and/or abortion laws on the matching patterns?

To study such questions, an empirical strategy could be to estimate and compare the preferences of men and women before and after the policy/technology shock. This is not always straightforward. Marriage markets have usually been modelled as a TU one-to-one matching model. This model has some advantages as Shapley and Shubik (1972) show that the stable allocation in this model is unique and efficient. We can use these implications to estimate the preferences of agents. But, transfers here can be non-monetary and determined as a result of intra-household bargaining. Thus, transfers can be thought of as endogenous and unobserved market clearing prices. Dealing with these transfers can make the estimation problem challenging.

In the data we observe who marries whom, including observed characteristics of the spouses. Becker (1973) suggested that if men and women just matched on a single dimension, say years of schooling, then provided the level of education of our spouse is a complement to our own level of education, we should observe positive assortative matching. That is, men with high (low) level of schooling match with women with high (low) level of schooling. The reality, however, is more complicated. In the data, we observe many more types of matching. This is because individuals tend to match on more than just one observed characteristic of the spouse. Moreover, there can also be many unobservable characteristics that go into the match consideration. How we parameterize the preferences of the agents in terms of the observed and unobserved covariates will determine how credibly we can identify the preference parameters.  Fox (2010), Choo and Siow (2006), Galichon and Salanié (2015), and Sinha (2015) all give different sets of conditions to identify and estimate the preference parameters.

School Choice

Another widely studied matching problem is the allocation of students across schools. Usually monetary transfers are precluded in these problems. School choice has been modelled using NTU one-to-many matching. In such a market, there need not be a unique stable match allocation. Plus, it is possible that these stable match allocations are not efficient. Thus, for the estimation of preference parameters, we can only rely on the moment inequalities that are implied by the condition of “stable matching.” In fact, most empirical studies have considered the matching process when the match allocation is centralized, i.e. a social planner allocates students to schools based on students’ revealed rankings subject to the capacity constraints of the schools, making it in effect a one-sided problem.

Here, an important theoretical and empirical question of interest has been to find a stable match allocation that maximizes the net welfare of the market. At the centre of this debate has been two allocation mechanisms—the Boston Mechanism (BM), a system in which students (or their parents) have an incentive to misreport their preferences, and the Deferred Acceptance (DA) algorithm where truth-telling is a weakly dominant strategy. Theoretically, it has been argued that the BM can give unfair advantage to students with sophisticated parents whereas DA is strategy-proof.  The Boston School Committee voted in 2005 to replace the BM with a DA mechanism, based on such theoretical discourse.

To test this empirically, we need to understand what drives these match allocations. In other words, we need to find a way of estimating agent preferences. We can use these to perform counterfactual analyses that involve computing the welfare gains/losses under different equilibriums in a given market. In fact, He (2014) shows that even under DA algorithm, students with naïve parents enjoy a utility gain only if the true population has a sufficiently small percentage of naïve parents. And, sophisticated parents always lose. This suggests a more mixed verdict that does not always favour the DA mechanism.

What is Next?

The empirical literature on matching has made many leaps in recent years, but much remains to be done. The focus has primarily been on one-to-one matching models. These models have so far considered matching on observed and perhaps unobserved characteristics, which are exogenously given. But in reality, it might be the case that the covariates that agents match on are in fact endogenous. For instance, it might be the case that our marital prospects affect our decision to invest in our human capital. What implications can this have for estimating the preference parameters in such models?

Apart from a few exceptions, we still need to make methodological advances in estimating decentralized one-to-many and many-to-many matching models. That is, there is scope to study these models with the supply side preferences endogenised. In the school choice example, there are markets where the match allocation is not centralized. Here modelling the supply side becomes important, so we can study it as a true two-sided matching problem.

Finally, it should be noted that most of the empirical literature has been performed assuming frictionless matching. One way of interpreting this is to say searching for a match is costless. However, there is a large theory of search models. A fruitful area of research for the future can be to do empirical work in matching models incorporating search frictions.

by Shruti Sinha