Health Price Transparency: Methodology


Our survey of hospitals is based on the Texas Department of State Health Services’ annual survey of acute and psychiatric hospitals (as of June 2021). This list includes the name and address information of 644 hospitals statewide.

Based on this list, we attempted to locate the website of each hospital on the list using a Google search. We were able to locate websites for 95% of hospitals on the list. A handful of hospitals on the list were either duplicates, had merged with another hospital, or had closed down since the DSHS list had been published.

We then attempted to locate the hospital pricing transparency data on each website. The law requires that the data, according to CMS, be “displayed prominently on a publicly available website and in a prominent manner that clearly identifies the hospital location with which the information is associated.” Nevertheless, there we a number of challenges associated with locating the hospital pricing data.

First, there was no consistent place where hospitals housed the data on their sites. Some hospitals put it on pages devoted to financial and insurance information, some put it on pages dedicated to “pricing transparency,” while others buried it more deeply in the structure of their sites. In these latter cases, we often had to use the search function on their sites to locate the data, using search terms like “Pricing Transparency” and “Chargemaster.”

A second challenge was related to larger hospital systems. Some large hospital systems included all hospitals in their system in one place with separate links, while others required users to visit each hospital’s website to download the data.

Third, many websites made it difficult or cumbersome to download the data. In a handful of cases, the links listed on the site did not work. For some of these hospitals, we were able to download the data by inspecting the web page and clicking the link listed in the HTML file. For a number of other hospitals (29 in total), the data was searchable on a third-party website called the Hospital Pricing Index. These data were not readily downloadable from this site. We were able to grab the data by utilizing the Hospital Pricing Index site API. However, no public documentation was available on how to access this API, and a layperson would not be able to use it to download the data.

Another factor related to data availability was the time period of the data. For most hospitals, the data represented 2020 or 2021 (we collected the data in August 2021). For a few hospitals, however, the data posted on their site was for 2019 or 2018.

Once we collected all the data that was available, three researchers at January Advisors conducted an initial review of 50 hospitals each (150 total) to examine differences in data fields and formatting. We also talked with experts from Health Cost Labs and Kaiser Family Foundation about the known challenges of working with these data around the country. After our initial review, we developed a data taxonomy that classified each hospital’s data based on data availability, file type, procedure codes, and formatting.

The main taxonomy classifies data based on the availability and detail of the data. Key questions we asked included the following:

  • Were we able to locate a website for the hospital?
  • Was any data available for download on the website?
  • Can you download the data easily? (e.g., data from Hospital Pricing Index was not easily downloaded)
  • Are insurance-specific negotiated rates included in the data, including discounted/cash rates and minimum and maximum rates?
  • Does the insurance-specific information appear to be complete?

Other key questions we examined to classify the data included:

  • What is the file type of the data? Options here included .csv, .xlsx, .json, .txt, .xml, and .pdf.
  • What types of hospital service codes are available in the data? In particular, we noted whether CPT, HCPCS, DRG, or CMG codes were present in the data.
  • How are the hospital service codes formatted? Some data had a column for each code, while others formatted the data long with one column for all codes.
  • How are the insurance-specific rates formatted? Here we were interested in whether the data was wide (one column for each insurer rate) or long (two columns with one for the insurer name and another for the rate), as well as the format of the rates (raw numbers, percentages, or some other format).

We then applied this taxonomy to the 150 hospitals we initially reviewed. We communicated throughout this re-review to hone the classification system and ensure consistency across reviewers. Then we applied the revised methodology to the rest of the datasets.

Mostly compliant is the highest designation we classified any entity.  Because we do not have legal authority to request and verify certain additional data from the organizations we reviewed, we are unable to affirm whether any entity is “fully” compliant.  Being graded “mostly compliant” does not indicate any identified deficiencies or a known lack of compliance on the entity’s part.

After this initial review process, we did two things. First, we developed a proposed statewide standard for these data that would improve transparency going forward. That is detailed in the Recommendations tab of this dashboard. Second, for the hospitals that we found to be mostly compliant in our review, we cleaned the data into a standardized format, merged with all other hospitals, and present the results in the Pricing Data tab of this dashboard.