PROPOSED DATA STANDARDS
What’s evident from our review of the currently available hospital pricing data in Texas is that there is no standard way that hospitals report this information. A clear data standard could help encourage and improve reporting and provide researchers with cleaner data for analysis.
First, we have created a proposed standard spreadsheet for reporting the necessary data fields. This is based on our review of the data as well as our attempts to standardize the ~120 hospitals with full datasets.
We propose the following fields (columns) in the data:
- Date of the data: The date or year the data reflects.
- Hospital system name: The name of the hospital system or group. This should be standardized for each hospital system so that each hospital has an identical system or group name.
- Facility name: Name of the hospital or facility.
- URL/website where data is posted: Hospitals should be required to detail where the data is housed on their website. This field would be a url or web address where the data can be downloaded.
- Unique charge/service description: All hospitals with data have this field. It should be unique to each service detailed in the table. We acknowledge that it will be different across hospitals but it is important to understand what the service price refers to in the dataset.
- CPT code: The CPT code associated with the service (if available or applicable).
- DRG code: The DRG code associated with the service (if available or applicable).
- Setting: Some hospitals distinguish between inpatient and outpatient pricing in their data. This field should have three options: Inpatient, Outpatient, and Not specified.
- Payer Name: This field should detail the payer or insurer name. Examples include Aetna, Blue Cross Blue Shield, and Cigna. Ideally, HHS could provide a standard insurer name file that hospitals could use to harmonize their data against.
- Plan type information: Hospitals often negotiate different prices for services with the same insurer based on the plan, for instance, HMO plan vs. PPO plan. This field would detail the plan type and any other information related to the insurer.
- Payer-specific price: The negotiated price for the service in “Unique charge description” and the specific payer listed in “Payer Name”/“Plan type.”
- Gross price: The overall gross price for the service in “Unique charge description.” For a given service, this field will be identical across payers.
- Cash discounted price: The overall discounted price for patients paying in cash for the service in “Unique charge description.” For a given service, this field will be identical across payers.
- Minimum payer-negotiated price: The minimum insurer-negotiated rate for the service in “Unique charge description” across all payers. For a given service, this field will be identical across payers.
- Maximum payer-negotiated price: The maximum insurer-negotiated rate for the service in “Unique charge description” across all payers. For a given service, this field will be identical across payers.
Our proposed data standard can be viewed here. It is a combination of codes formatted wide (separate columns for each code type), setting and payers formatted long (separate rows for each setting and payer-plan), and prices formatted wide (payer, gross, cash, min and max). We believe this format balances the needs of researchers using the data, the need for clarity for hospitals around which columns to include, while also making it easier for data and non-data people alike to check whether the data is complete or in compliance.
In addition, we recommend hospitals provide these data as a csv or txt file with a delimiter (comma, tab, pipe) separating columns. These formats are relatively lightweight and can be accessed by anyone. Excel files are fine but require proprietary software to access. They also encourage data formats meant to beautify the data, e.g., merging columns, having multiple sheets, that make it more difficult to read into statistical software packages for analysis. JSON files are also an option but many data-novices may find it difficult to access and understand this format.
We also suggest that missing data be standardized across hospitals. If data is intentionally missing or there is not a valid value, include either “NA” or leave the cell blank.
