The $500mm+ Debacle at Zillow Offers – What Went Wrong with the AI Models?

Zillow, an online real estate marketplace, recently shuttered its Zillow Offers business because of failed iBuying algorithms. A derailed algorithm on property valuations led the company to reduce the estimated value of the houses it purchased in Q3 and Q4 by more than $500 million. Zillow has already officially announced $304 million in Q3 losses and expects to reduce its workforce by 25% over future quarters in order to compensate for the impact on its business. An analyst has estimated that possibly 2/3rds of the homes that Zillow purchased are currently valued at below what Zillow paid for them.

The event has once again raised concerns about the validity of AI models. Was the problem with Zillow specifically, or is it a limitation of  AI models in general? Can AI be relied upon to make major business decisions or market predictions? And how did competitors like Opendoor and Offerpad weather the storm of abrupt changes in the housing market, while Zillow missed the mark?

Put simply, Zillow’s algorithms overestimated the value of the homes for which they paid. At the same time, Zillow was aggressively expanding its purchasing program, acquiring more homes in the last two quarters than it had in the two years prior. Since the expense of continuing to hold empty houses in the hopes of price recovery is very high, the company is forced to try to sell large volumes of houses at below purchase price. Bloomberg has reported that Zillow is currently attempting to sell 7,000 houses in order to recoup $2.8 billion.

Failing to catch a change in market conditions

We don’t know the exact causes of why Zillow’s models overestimated the value of the homes. However, looking back at the timeline of events, it appears that when the housing market cooled down, Zillow’s algorithms were not adjusted accordingly. The algorithms continued to assume that the market was still hot and overestimated home prices. In machine learning (ML), this kind of problem is known as “concept drift” and this does appear to be at the heart of the problem with Zillow Offers.

Machine learning models often assume that the past equals the future, but that is generally not the case in the real world. This is especially true when you are trying to predict a rapidly shifting value, or something that may be impacted by shocks like shifts in purchasing due to unexpected surprises such as a global pandemic.

For example, one significant market change that would have contributed to skewed results is that Zillow wasn’t able to get the houses renovated and re-sold fast enough since contractors were in short supply during COVID-19. It’s not clear whether Zillow’s models accounted for this factor accurately. Another possibility could be that they were purchasing in areas that had experienced sustained price increases in 2020 and early 2021 due to the increased desirability of suburban or rural settings with lower population density. However, when early summer 2021 came, increased vaccine availability may have alleviated the urgency of purchasing in those areas, allowing the prices to stabilize or decline, while the algorithm continued to expect increases.

What is clear is that the algorithms didn’t account accurately for the relationship between the target variable (which was the price of the house) and the input variables (e.g., number of bedrooms, number of bathrooms, square footage, home condition). Prices on homes went down even for the same value of the input variables but the models were not updated to reflect the new relationships.

 Key steps to avoid the dangers of AI model drift

So how could this situation have been avoided?  A key piece of the solution lies in leveraging better tools for monitoring and maintaining the quality of AI models. These tools could automatically alert data science teams when there is drift or performance degradation, support root cause analysis, and inform model updates with humans-in-the-loop. (My colleagues explore the various types of drift and what can be done to account for it in “Drift in Machine Learning.”)

In the context of Zillow Offers, it would have been useful to measure drift (or changes) in model accuracy, model outputs, and model inputs on an ongoing basis using a monitoring tool to detect potential model issues.

  • Model  accuracy. As the market cooled down and house sale prices started going down in certain zip codes, one would expect that in certain geographies the accuracy of the Zillow Offers model would go down, i.e. the prices of homes estimated by the model would be consistently higher than the actual sale prices. Identifying this degradation in model accuracy could have prompted action to update the model in a timely manner.
  • Model outputs. The model outputs (estimated house prices) may have exhibited upward trends over time. Understanding the root causes of why estimated house prices were trending higher, in particular, when the model was wrong (i.e. had lower accuracy), would have been useful to debug the model.
  • Model inputs. Examining changes in model input distributions could also have surfaced areas of concern. For example, a model input that tracked changes in average home prices in a neighborhood over time, could have revealed that the market was cooling down. This information could have prompted action, e.g. placing greater weight on the most recent data and retraining the model to reflect the changed market conditions.

Carefully managed AI can still be effective at investment initiatives

In our view, fluctuations like the recent ones in the housing markets can still be managed and accurately accounted for by AI models. It appears that Zillow’s competitors Opendoor and Offerpad used AI models that detected the cooling housing market and reacted appropriately, pricing their offers more accurately. It’s likely that these companies have put these kinds of processes and tools in place as guardrails (Opendoor started its iBuyer program in 2014).

In conclusion, AI models can be updated to account for concept drift when built correctly, and when humans become part of the process for supervision and mitigation. The pandemic impacted all types of consumer behavior, including shopping, banking, travel, as well as housing, and yet AI models were able to keep pace in many cases. This is why, for AI and ML models to perform for profitable outcomes, especially for high stakes models like Zillow’s, it is crucial to have serious AI governance supported by tools for monitoring and debugging, which includes having qualified humans-in-the-loop to adjust to major market shifts that can arise during unexpected events.

About the Author

Anupam Datta is Co-Founder, President, and Chief Scientist of TruEra. He is also Professor of Electrical and Computer Engineering and (by courtesy) Computer Science at Carnegie Mellon University. His research focuses on enabling real-world complex systems to be accountable for their behavior, especially as they pertain to privacy, fairness, and security. His work has helped create foundations and tools for accountable data-driven systems. Datta serves as lead PI of a large NSF project on Accountable Decision Systems, on the Steering Committees of the Conference on Fairness, Accountability, and Transparency in socio-technical systems and the IEEE Computer Security Foundations Symposium, and as an Editor-in-Chief of Foundations and Trends in Privacy and Security. He obtained Ph.D. and M.S. degrees from Stanford University and a B.Tech. from IIT Kharagpur, all in Computer Science.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Sign up for the free insideAI News newsletter.