A Problem with Hospital Ratings—and How to Fix It

Until recently, Chicago’s Rush University Medical Center boasted the maximum of 5 stars in the Centers for Medicare and Medicaid Services (CMS) hospital rating system. Data used to compute the July 2018 rating indicated that the hospital had improved in many areas, so it came as a shock when hospital administrators got a preview of the new ratings and Rush had dropped to 3 stars, according to Chicago Booth’s Dan Adelman.

Even a hospital that improves in every single metric can experience a rating drop, says Adelman. This indicates a problem with the current CMS system—and he suggests a way to address it.

The CMS rating system organizes hundreds of hospital metrics into seven categories: mortality, safety of care, readmission, patient experience, effectiveness of care, timeliness of care, and efficient use of medical imaging. It then uses what statisticians call a latent variable model, which gives weight to metrics that are statistically correlated but not necessarily indicative of a hospital’s performance.

The latent variable model assumes that in each category, there is a single, unknowable factor driving performance measures. If a few metrics in a category are correlated, the model assumes that they are driven by the latent variable and thereby gives them more weight when computing the hospital’s score.

For example, a hospital’s Patient Safety and Adverse Events Composite, known as PSI-90, takes into account a number of factors including hospital mistakes, patient falls, and infection rates. Until recently, the PSI-90 had been given the most weight in performance measures, but thanks to stronger correlations in the data used to calculate the July 2018 ratings, a new factor was given more weight: complications from knee and hip surgeries.

The problem is that these surgeries affect far fewer patients and might not be applicable to all hospitals, yet the knee and hip surgeries became a big factor by which all hospital systems were rated.

Hospitals view their individual rating before the CMS releases the information to the public, but hospital uproar over the new results caused the CMS to delay their publication until February. It also modified the ratings, so that PSI-90 now dominates again. Rush was bumped from 3 stars to 4.

Adelman argues that ratings shifts from small changes in correlations result in “knife-edge” instability that renders the evaluation system meaningless for patients who might rely on it when choosing a facility for their care. Hospitals, which use the ratings to negotiate with insurance companies for payments, cannot determine where to focus efforts toward improving. The ratings also affect a hospital’s reputation, which in turn affects patient volume and payor mix (an industry term that refers to the distribution of more-profitable patients, who use private insurance, and less-profitable ones, on public insurance). And when patients are attracted to hospitals that rate higher but have worse outcomes, that hurts the overall health of people in an area.

The idea is reminiscent of portfolio optimization in finance, in which investors seek to maximize a portfolio’s return by taking combinations of investments on the efficient frontier.

“It’s like developing a grading scheme for school,” Adelman says. “The teacher gives the grading scale out at the beginning of the semester and tells everyone the weights for attendance, quizzes, papers, and tests. But this is like going through the semester and then telling everyone where the weight is at the end based on how the students perform. And every semester that might change.”

A benefit of the CMS rating model is that it doesn’t require anyone at the CMS or its affiliates to manually determine the weight of each metric, says Adelman, which could introduce bias and opinions. Rather, the model chooses how to weigh each metric, and additional metrics are easily integrated.

Adelman argues that the same could be accomplished with a model he has created that relies not on correlation but on patient representation and the measurement of hospitals against best performers. In his model, each hospital gets its own unique weights. A measure that affects more people is given more weight.

Efficient Frontier Hospital Ratings

Click here to view ratings for 3,720 US hospitals scored according to the Efficient Frontier system. Compare hospitals based on size, teaching status, and socioeconomic rating; submit questions or comments about the system; and view additional resources used in developing the ratings.

To weigh particular measures for Hospital A, Adelman’s model compares it to other hospitals that are more efficient and better performing along key dimensions. These hospitals are combined to create a “virtual hospital” that sits between Hospital A and an ideal hospital that achieves the maximum performance along every measure. The virtual hospital thus dominates Hospital A. The idea is reminiscent of portfolio optimization in finance, in which investors seek to maximize a portfolio’s return by taking combinations of investments on the efficient frontier, the point at which investments achieve the best risk-adjusted return. Rather than combine investments that are measured by risk and return, Adelman’s model combines hospitals that perform most efficiently on the basis of factors such as mortality and readmissions. The model then finds measure weights that score Hospital A as close as possible to the virtual hospital as measured under the same weights, and maximizes Hospital A’s score.

This model eliminates the possibility that a hospital would receive a lower rating even if it improves in all metrics, according to Adelman. “Measure weights obey desirable structural properties under reasonable conditions, including that scores improve when hospitals improve, and that better-performing hospitals score higher,” he says.

Adelman warns that his model is designed to combat the problems with the latent variable model but does not address other concerns in hospital rankings—including those related to underlying measures, steps in the methodology, or the ratings system itself.

Works Cited

Dan Adelman, “An Efficient Frontier Approach to Scoring and Ranking Hospital Performance,” Operations Research, forthcoming.

NECESSARY COOKIES These cookies are essential to enable the services to provide the requested feature, such as remembering you have logged in.	ALWAYS ACTIVE
	Reject \| Accept
PERFORMANCE AND ANALYTIC COOKIES These cookies are used to collect information on how users interact with Chicago Booth websites allowing us to improve the user experience and optimize our site where needed based on these interactions. All information these cookies collect is aggregated and therefore anonymous.
FUNCTIONAL COOKIES These cookies enable the website to provide enhanced functionality and personalization. They may be set by third-party providers whose services we have added to our pages or by us.
TARGETING OR ADVERTISING COOKIES These cookies collect information about your browsing habits to make advertising relevant to you and your interests. The cookies will remember the website you have visited, and this information is shared with other parties such as advertising technology service providers and advertisers.
SOCIAL MEDIA COOKIES These cookies are used when you share information using a social media sharing button or “like” button on our websites, or you link your account or engage with our content on or through a social media site. The social network will record that you have done this. This information may be linked to targeting/advertising activities.

A Problem with Hospital Ratings—and How to Fix It

Efficient Frontier Hospital Ratings

Related Topics

Related Topics