Building and loading a Healthcare Data Warehouse takes careful planning and selection of an adequate architecture. Depending on your organization, you may elect to store Clinical, Administrative and even Operational data. Over the years, we’ve developed a number of guidelines for how to properly establish a Health Data Warehouse – Below are the key steps we follow.
Data Extraction from Sources
Data extracts start with identifying the data sources. Data acquired from physicians is loaded into staging tables, which is then loaded into the working interface tables.
The Interface Tables offer a standardized format for the data acquisition process, are mostly in 3rd normal form and are not a place for end users to run queries. Once the data is loaded into the interface table store area, there will be no need to go back to the original data sources.
The Interface Tables data store can also be utilized to solve workflow issues. For example, a new doctor is added to the system. His patient encounters have arrived before his demographics could be entered into the system. The interface tables store area offers a place for the encounters to be stored while waiting for other dependent data stores.
Data objects in the Interface Tables data store may include but not limited to:
- Source tables – these objects will hold data from the source systems
- Decode tables – these objects hold the master list to coordinate integration of multiple sources.
- Lookup tables – these objects are reference tables for codes and may be cached into memory with the ETL tool.
Unique Patient Identification
As affiliations are formed and healthcare data grows from many systems that reference the entity with different identifiers or names, the need for a single master view of person data to ensure the availability and unique identification of unified patient information is imperative to ensure trust of the data. An Enterprise Master Patient Index or MDM solution acts as a single point of reference to information about a patient, clinician, payer, or other healthcare entity within the data warehouse. This will be especially important when aggregating data from different providers.
Data Integration / Transformation
Data integration is provided through the ETL tools and interfaces with the MDM tools. For example, integration requires translation services to resolve ICD-9 to ICD-10 code references. We will store in the HDW the integrated value. Today’s codsets may include ICD-9, ICD-10, SNOMED-CT, CPT and LOINC. We will store the original value as well as an integrated value in the HDW. Mapping the codes will be performed by a Terminology Service.
We need the ability to enforce data integrity and have architected a centralized data code repository. For example, we can store in our centralized data code repository multiple gender domain value code sets. One code set will have gender domain containing only a male value, female value or unknown. Another gender code set may have 12 domain values. We will be able to extend the data model through flex fields to store the additional gender code values in the fact tables.
The intake layer needs the ability to combine values into one value such as North Hillsdale and South Hillsdale data values combined into “Hillsdale” values.
Will provide the ability to enforce business rules through the ETL tool. For example, a Hospital application may have a business rule that in order for a patient to be admitted, the patient must have an approved physician.
Data Parsing and Standardization
Sometimes data elements need a standard format such as phone number or Social Security Number (SSN). For example, SSN has the format 999-99-9999. The SSN utilizes this standard format for intelligence. The first three numbers identify a specific region of the county where the SSN was first obtained.
Range checking is utilized to see if data values fit within a boundary of values. For example, a birth date may be checked to verify if the person is no more than 200 years old. Often people leave off the first two digits of a birth date. For example, a person born in 1959 would say that they were born in 59.
Sparseness and Frequency Distribution
A sparseness check evaluates the percentage of the data elements that are actually populated with meaningful data values. Also, we check that the system is correctly utilizing null values. A null value is a missing value. Some systems mistakenly insert a zero or blank in place of a null value.
Unique Value Checking
In a relational database, every table that is normalized to at least 1st normal form needs a primary key. A primary key enforces uniqueness.
Checking to determine that a column is being populated by the best source. Often a column could be populated from multiple sources. However, one source is usually better than the others. This best source is often referred to as a Golden Source.
Derived Values and Computations
If a data value is being derived, check to make sure the calculation is correct.
Validate the hierarchies utilized to rollup or aggregate data values.
Adding additional information to a data element. Check to see if the process is correct.
Master Data Management Interaction
MDM systems maybe in place to support Master Patient / Provider Index, Enterprise Terminology Service or Unit of Measure. The ETL process will interface with MDM systems by calling them from an API or other programmatic interface. For example, a source system provides an ICD-9 code. The DW is standardized on ICD-10 codes. An ETL process will pass the ICD-9 code to Enterprise Terminology Service with a parameter to return the corresponding ICD-10 code. Then the ETL process will store both in the data warehouse.
Exception and Error Handling
Exception and error handling will occur at multiple levels depending on the implementation, from reading the source file to loading the data into a target. Data Error handling can be set up for business rules checking, data conversion exceptions, loading to target, etc. Errors that arise during operations can be captured and reported or allowed to proceed based on business rules.
Data Quality and Profiling
The quality of the information in the warehouse must be high for users to perform their reporting needs. Data Profiling will analyze the source data for this quality and the results of the analysis can be converted into business rules for proper ETL handling by source.
Data governance is a mixture of technology and business process and procedures. Data governance is a single authoritative group responsible for creating and ensuring adherence to data standards. For example, the data governance process would establish and maintain cooperation between lines of business to establish standards for how common data elements and metrics will be defined, propagated, owned and enforced throughout the organization.
ETL tools can be used to enforce data governance through implementing metrics and data quality checks including data integrity and business rules.
Metadata – Business, Operational, Technical and Process
Metadata is very important for the management and operation of an enterprise data warehouse. In today’s environment, there exist legal requirements that enforce the importance of metadata. Business metadata describes the business meaning of data. It includes business definitions of the objects and metrics, hierarchies, business rules, and aggregation rules. The Business metadata will be exposed to the applications and the end users. Business metadata would include definitions, sample data and contact information for additional information. Operational metadata stores information about who accessed what and when. This information is not only important for legal requirements but for the design of the data warehouse itself. For example, we can identify that a particular data mart is not being utilized. Should we redesign the application or the data mart? Technical Metadata describes the data structures and formats such as table types, data types, indexes and partitioning method. Also, it describes the location of the data elements. With technical metadata, version control of database structures is possible. Process Metadata describes the data input process. It includes data cleansing rules, source target maps, transformation rules, validation rules and integration rules. The Process metadata will provide source to data marts data lineage. The Solution will provide data linage charts. After the data is processed in the healthcare data warehouse, the data is ready for transferring into the presentation layer (data marts or cubes).