Learn more about the Efficient Frontier Hospital Rating System
The Efficient Frontier Hospital Rating System scores more than 3,700 individual hospitals using a method that addresses some of the limitations underlying the U.S. Centers for Medicare and Medicaid Services’ Hospital Compare ratings. Click here for more information and a detailed example showing how the system works, or go straight to the searchable EFHRS database available via Chicago Booth Review.
Imagine Hospital A. In Adelman’s model, its weights are determined by comparing it with other hospitals that are more efficient and better performing in key dimensions. These hospitals are combined to create a virtual hospital that sits between Hospital A and an ideal hospital that achieves the maximum performance along every measure. The model constructs this virtual hospital by combining hospitals that perform most efficiently on the basis of factors such as mortality and readmissions. It then finds measure weights that score the hospital as close as possible to these efficient hospitals measured under the same weights, while also ensuring that measures impacting more people are weighted more.
In this approach, which essentially provides a stable answer key, hospitals that are improving could still see their ratings drop if national volumes of patients impacted by measures shift dramatically relative to one another. “However, infinitesimal shifts would result in only infinitesimal shifts in hospital scores (a result of math programming sensitivity analysis), not dramatic shifts in the scores and measure weights as we see in the LVM [latent variable model] approach with respect to correlations,” Adelman writes. “Thus, our approach enjoys substantially greater stability properties.”
Problem No. 2: Small-data issues
Adelman’s efficient frontier model only addresses the shortcomings of the latent variable model, while other research suggests that issues with hospital data, risk adjustment, and methodology also affect the accuracy of the CMS rating system. Here, too, the small-data problem at small hospitals creates challenges.
The CMS website offers information about hospital heart-attack mortality rates, but some hospitals deal with fewer heart-attack patients than others, and therefore a small hospital’s rating could be affected by one or two heart-attack deaths. To address this, the CMS adjusts the data, with the goal of making a fairer comparison; the outcome, however, is that small hospitals look much safer than they actually are.
The problem—say Edward I. George, Paul R. Rosenbaum, and Jeffrey H. Silber of the University of Pennsylvania; Chicago Booth’s Veronika Ročková; and INSEAD’s Ville A. Satopää—is that the model doesn’t take into account hospital characteristics such as volume or the procedures the professionals there can do. In cases in which a hospital has few heart-attack mortality data, the CMS simply estimates it to have a rate that is closer to the national average. In 2007, of all the hospitals rated, large and small, almost 100 percent were classified as “no different than the national rate.” The next year, none was worse than average, and nine were better than average.
“For any one small hospital, there is not much data to contradict that prediction,” the researchers write. But, they ask, when the CMS model claims that its mortality rate is close to the national average, “is this a discovery or an assumption?”
To find out, the researchers analyzed data from Medicare billing records for 377,615 patients treated for heart attacks at 4,289 hospitals between July 2009 and the end of 2011. This analysis suggests the actual heart-attack-mortality rate is 12 percent at large hospitals and 28 percent at small hospitals. The CMS model adjusts the rate to 13 percent at large hospitals and 23 percent at small ones. It tries to compensate for the lack of data from small hospitals by borrowing information from large ones, Ročková says. “This would only work if the small and large hospitals were comparable in terms of their performance. The data, however, speaks to the contrary.”
It would be more reasonable, the researchers argue, to borrow information from hospitals of similar size. They do this, plus take into account hospital volume (number of patients), nurse-to-bed ratio, and the hospital’s technological capability—particularly its ability to perform percutaneous coronary intervention (PCI), better known as angioplasty, to improve blood flow to the heart.
In the researchers’ proposed expanded model, “hospital characteristics that generally indicate better mortality (say PCI or increased volume) can be utilized to direct patients away from specific hospitals that do not perform PCI and have small volume,” they write. “If patients instead utilized the HC [Hospital Compare tool in the CMS] model, which does not include hospital characteristics, they would not be directed away from these hospitals. While there may be some small hospitals with excellent outcomes despite not performing PCI, the vast majority of such hospitals perform worse than those larger hospitals that do perform PCI.”
Problem No. 3: The underlying data
Thus, research suggests at least two problems with how the CMS ratings are compiled, and another research project indicates there are some issues with the data that are fed into the ratings. Analysis Group’s Christopher Ody, Chicago Booth PhD candidate Lucy Msall, and Harvard’s Leemore S. Dafny, David C. Grabowski, and David M. Cutler highlight an issue with readmissions data, one of the seven measures used to inform the algorithm behind the CMS star system.
The researchers’ study isn’t about hospital ratings. Rather, it looks at another program administered by the CMS, the Affordable Care Act’s Hospital Readmissions Reduction Program (HRRP). Using a value-based care approach, the ACA contains rules that penalize hospitals with a higher-than-expected 30-day readmission rate, premised on the idea that hospitals could do a better job of avoiding readmissions.
Prior to the HRRP’s implementation, in October 2012, the government reimbursed hospitals for Medicare-covered patients on the basis of the kind of care provided. But once the HRRP went into effect, hospitals with high readmissions for heart attacks, heart failure, and pneumonia were docked 1 percent of reimbursements. This increased annually, until it reached 3 percent in 2015. Findings from several studies suggest that the plan has worked, noting that readmission rates declined not only for the targeted conditions, but for others as well.
However, Ody, Msall, Dafny, Grabowski, and Cutler probe this conclusion by looking at what goes into the readmission rates, which are risk adjusted to account for the incoming health of a patient. The sicker a patient is upon her first hospital admission, the greater the likelihood she will be readmitted. In an attempt to be fair, and not have sick patients hurt hospitals’ ratings or readmission statistics, the CMS considers patient data on age, sex, and comorbidities (the simultaneous presence of two or more chronic problems) from diagnoses in the year before hospitalization.
A patient arriving at a hospital may have a severe cough, high blood pressure, diabetes, and other medical issues. Hospital staff can note these health issues, and others, on the patient’s chart. When it comes time to send the information to the CMS, as part of submitting Medicare claims, staff electronically submit codes that indicate symptoms or illnesses. The CMS uses these codes to make its risk adjustments.
About the same time that the ACA’s program went into effect, the CMS made a change to these electronic-transaction standards that hospitals use to submit Medicare claims, the researchers point out. Prior to the readmissions penalty program, hospitals could include a maximum of 10 patient-diagnosis codes in their submissions. Even if the patient had dozens of other symptoms or illnesses, the hospital staff could electronically add no more than 10 codes.
But coincidentally, just as the HRRP began, the CMS changed the rules and allowed for up to 25 diagnosis codes, which helped doctors paint a more accurate picture of a patient’s health. “We document that around January 2011, the share of inpatient claims with nine or 10 diagnoses plummeted and the share with 11 or more rose sharply,” the researchers write. Prior to the rule change, more than 80 percent of submissions had nine or 10 diagnosis codes. After the change, 15 percent had nine or 10 codes, while 70 percent of submissions had 11 or more. There was little change in the number of submissions with eight or fewer codes. Rather, doctors included more codes and better indicated all the health issues patients presented.
The CMS didn’t take this into account when evaluating the effect of the HRRP—and the diagnosis-code change may account for about half of the supposed progress made by hospitals in reducing readmissions, the researchers write. The additional codes helped show that many patients were sicker than they would have looked previously. And while about half of hospitals’ overall decline in readmissions may have been due to hospitals doing a better job, the other half resulted from both recording more accurate data and recognizing the health of incoming patients, the researchers conclude.
They note that the program may have unfairly penalized certain hospitals, including ones that treat poorer and less-healthy patients, who are readmitted more frequently. Say two people, one affluent and one poor, both had heart attacks and went to two different hospitals. Doctors at each hospital would have entered no more than 10 data points indicating what was wrong with their patients, so that in the system, the patients looked similar. In fact, though, there was more wrong with the poor patient, who was more likely to be readmitted.
“Pay-for-performance schemes expose participants to the risk of unstable funding, in ways that may seem unfair or contrary to other social goals,” the researchers write. “In the case of the HRRP, the program was found to have initially penalized hospitals that cared predominantly for patients of low socioeconomic status—hospitals that are more likely to be safety-net providers already operating on tight budgets.”
The system change addressed this problem, in part—but it remains an issue, as hospital staff are still limited as to how many codes they can input, even if the limit is higher than it was before. Even as hospitals serving a poorer, sicker population submit more data on the health of their patients, they are still more likely to suffer from high readmission rates and be penalized, the researchers say. And the incoming health of patients doesn’t necessarily reflect the quality of hospital care.
Readmission rates may similarly impact the number of stars a hospital receives from the CMS—but also illustrate how incomplete data and analysis can skew ratings. Patients comparing hospitals on the CMS website see “unplanned hospital visits” as one of the seven categories they can use for evaluation. Under that, hospitals are scored for readmissions for heart issues, pneumonia, hip and knee replacements, colonoscopies, and more. These data affect hospital reimbursement but also how patients and insurance companies view a hospital.
Changing the system?
The CMS generally releases hospital ratings twice a year. When it issued ratings in February 2019, however, 15 months after the previous ones, it announced that it would be taking public stakeholder comments on potential changes to the rating system, an indication that there could be a chance to correct some of the problems in the methodology.
The CMS’s announcement suggested the latent variable model could be on the chopping block, potentially to be replaced with “an explicit approach (such as an average of measure scores) to group score calculation.” Other potential changes included assigning hospitals to peer groups, modifying the frequency of ratings releases, and developing a tool that would allow users to modify ratings according to their preferred measures.
But there’s no guarantee that the latent variable model will be scrapped or significantly changed. As for the other problems researchers have identified, Ročková for one said that although representatives of the CMS have shown positive interest in their proposed model, it has not yet been incorporated into their current recommendation system.
Adelman argues that there should be a moratorium on all hospital ratings during the pandemic. Even poorly rated hospitals are full of medical staff—many demoralized by equipment shortages, furloughs, and pay cuts—working tirelessly and risking their own lives to save the lives of others. Because of this, and because there are no measures related to COVID-19 responsiveness or preparedness, publicly rating hospitals at this time is not appropriate, Adelman says.
The CMS, through a spokesperson, says that it will go through “appropriate rulemaking” for any changes to the star-rating methodology.
“The agency, with its vast network of partners in health-care delivery and on behalf of people with Medicare benefits, patients, and their families, most certainly celebrates and appreciates the amazing work that medical staff (and many others) have been doing,” reads a statement from the CMS, adding that it “has responded by offering unprecedented waivers and flexibilities to remove barriers, expand telehealth, and allow all providers, and especially hospitals, to focus on patient care.”
The CMS is assessing how COVID-19 has impacted data reporting, according to the agency. But for now, the current ratings stand. In spite of its flaws, for the foreseeable future, the CMS rating system will continue to drive patient decisions, shape hospital budgets, and influence public policy.