A Brief Introduction to Automated Valuation Models: How We Estimate Home Prices
To understand real estate economics, one should first understand AVMs (Automated Valuation Models): It is essential to be able to objectively value homes, which is what AVMs do, if you plan to explain what causes those values to change, which is what real estate economics struggles to achieve. There are essentially three types of single-family AVMs: 1) hedonic, 2) comparable sale, and 3) mark-to-market. Hedonic models estimate a home’s value directly from its features. Historically these models have used linear regression, but they have recently become much more sophisticated because of the emergence of machine learning techniques.
Machine learning makes it much easier to discern the complex factors that contribute to property value. Comparable sales models imitate the behavior of an appraiser, finding recent, nearby sales of similar homes and adjusting their values based on their differences with respect to the subject using either a separate hedonic model or standard adjustments gleaned from appraisals. Comparable sales models have the advantage of taking into consideration factors that might not be available in the property data: Homes that are similar in ways the data allows us to see are likely to be similar in other ways as well. Comparables can, consequently, account for factors that are missing from our data. Prior to the emergence of machine learning techniques, comparable sales models had the best performance.
Mark-to-market models use a home price index, an idea I discussed in a previous post, and a prior sale to value a property by moving the old sale “into the present.” They take the historical sales price and multiply it by the current value of the index divided by the value of the index at the time of sale. These models generally perform worse than the other two, but they are the most robust to missing data as they require nothing more than knowledge of the time and price of the prior sale. These indices also have value in and of themselves, as I discussed in a prior post.
Once you have a sense of the main methods for producing AVMs, it is key to understand how they are evaluated. AVMs are judged by a variety of metrics, the most unusual of which are PPE_10 and PPE_5. (PPE stands for proportional percentage error, and it measures whether the estimate is within a certain percentage of the sale price; thus, PPE_10 is the frequency—given as a percentage—that the model gets within 10% of the sale price; PPE_5, of course, measures how often the estimate is within 5%.) Others are root mean squared error, root median squared error, median absolute percentage error, total PPE greater than 20%. PPE greater than 20% can then be decomposed into two metrics—one measuring the misses on the high side, where the model value was more than 20% greater than the observed price and the other on the low side—these are generally called the PPE greater than 20% right tail and left tail respectively.
MAPE, mean absolute percentage error, is frequently used to judge the accuracy of AVMs because it is important to account for variability in prices in different geographies; if you did not divide by the home price, you would find that AVMs were magically more accurate where values are lower. While other metrics are also used, these are the most important.
Another unique aspect of evaluating AVMs is assessing performance on different price quantiles: Frequently an AVM can do exceedingly well on properties in the mid-range of prices but perform much worse on very low-end or very high-end homes. It is also important to see how a model performs geographically, by looking at plots of percentage error on maps. This can help to determine if there are certain geographical features that a model is failing to take into account.
Let's connect, and see how we can help you stay ahead of the market.
Contact us
AVMs perform differently when they have different kinds of information available to them: Specifically, they show very different performance on so-called on-market and off-market properties, on-market referring to those properties that are currently available for sale on an MLS and off-market referring to those that are not currently for sale. On-market AVMs have the advantage of being able to utilize the realtor’s listing price as well as a fully updated set of property characteristics: A listing price simplifies the problem for an AVM considerably—instead of needing to figure out what a property is going to sell for from scratch, it merely needs to determine whether the likely sale price is going to be higher or lower than the listing price. Indeed, most properties sell for within 10% of the listing price, so an AVM that anchors itself to the listing price is likely to have a pretty high PPE_10 from that fact alone.
Judging on-market AVM performance creates a whole host of problems. For starters, machine learning techniques that were able to observe that sale might predict the sales price exactly because of a problem common in modelling: In-sample overfitting. You can think of many machine learning techniques as clever students who have differing performances on new problems but who remember every problem they have seen on the homework. It is important to evaluate these models, much more so than in the past, on transactions that were not used in training the models originally. Consequently, on-market tests require the participants to be honest about what transactions they have seen.
Perhaps more importantly, on-market tests of AVM performance negate the advantages that come from having access to more diverse sources of data: An MLS listing gives everyone an updated view of the property, so access to information like permit data, photographs of the home, or a recent appraisal can matter less once a realtor has given everyone an updated view of the home. Yet, if someone were trying to get a value for a property that has not sold in a long time, he would want to see how much lift these alternative data sources can produce.
This suggests that perhaps AVM providers should compete in the off-market space; after all, if you can build an excellent off-market AVM then you can obviously build an excellent on-market one. More importantly, off-market AVMs tend to be more useful. They allow you to detect fraud, value real estate portfolios, and study real estate trends without the need for a realtor to visit the property.
One way you could test off-market AVMs is by making people submit their estimates for all properties in the country and then waiting to see how well the AVMs predicted the sale prices on the subset that sells, making sure to use only estimates that were submitted before these properties were listed. The problem with this, however, is that there can be meaningful market movements within the period properties are left on the market—and any property that sells unusually quickly is likely selling at a discount or at a premium, meaning that it is no longer a good measure of AVM performance. So while this method looks interesting, it has its own problems.
Another method would be to have competitors submit their models and data to a company that runs them themselves and ensures no current MLS data is used to inform the model about the subject property. (Comparable sales models might still be allowed to use MLS data to learn about comps). Of course, this involves a transfer of technical detail that is far, far too labor-intensive.
One counterintuitive way to address these problems would be to have AVM providers submit their estimates in bulk and then to judge them by how well they anticipate initial listing prices, making certain that the estimate was produced shortly before the listing. Of course, sale prices are the actual goal, but this would eliminate a number of the issues that arise from waiting for properties to sell. An AVM that consistently matched realtors’ opinions would be quite a good one: Here the only problem is that realtors will sometimes list a price above its true market value at the insistence of a client or well below it in order to start a bidding war.
All approaches to competitive AVM testing have their pluses and minuses. Limiting ourselves to on-market testing, however, is not the answer. It is the off-market space that truly matters. Given the complexity of AVMs, we may never come to a simple set of standards for evaluating them, but we can certainly establish principles that are better than those that currently govern the industry.
Read more: AI in real estate, Part I