Impact of COVID-19 on data warehouse architecture
As COVID-19 (Coronavirus) rages across our planet, all eyes turn to a trusted source of data: The John’s Hopkins COVID-19 Global Update. It highlights the critical need to have timely access to data and analytics in order to drive decision making.
In particular, out-of-hospital care settings like skilled nursing facilities, assisted or independent living facilities, life plan communities, home health care, and hospice agencies have seen significant impacts on their clinical operations due to COVID-19. They need access to clinical data and actionable insights in order to plan and provide quality care while being efficient.
Clinical healthcare data can be broadly sorted into two groups: structured and unstructured. Let’s examine the impact of each when selecting the right system architecture for storing and retrieving data.
Structured clinical data
The best way to explain structured data is by the specific example of vitals. Vitals in an Electronic Healthcare Records (EHR) system typically measure and record temperature, pulse rate, respiratory rate, and blood pressure. In this case, the range of data and the method of measurement are both examples of structured data. Normal body temperature can range between 97.8 degrees F and 99 degrees F for a healthy adult. Temperature can be measured by a variety of methods including orally, by skin, by ear, and others. Both the range of the data and the method of obtaining it are known and represent structured data types. Structured data comes from a limited set of questions applied to a limited set of answers to questions or menu items repeatedly selected. This results in clearly defined data types whose patterns make them easily searchable
Unstructured clinical data
One definition of unstructured data is just about “everything else.” Examples of unstructured data include text, images, or social media content. The data itself has some internal structure but cannot be easily pre-defined by data models or schema. Examples could be radiology images or nurses’ progress notes in an EHR. Nurses’ progress notes could be considered semi-structured or unstructured. In fact, with COVID-19, it is important to understand the content of those progress notes in conjunction with structured data, like vitals, that may not tell the entire story.
One challenge faced by healthcare organizations is that the amount of unstructured data is growing, or expected to grow significantly. This trend will test the scalability limits of current databases. So, what is the optimal architecture for storing, retrieving, and analyzing data? Let’s examine some structures for this purpose.
Choices for data warehouse architecture: SQL or NoSQL?
Relational Database Management Systems (RDBMS) typically use some form of a Structured Query Language (SQL) with data organized in rows and columns. SQL databases provide atomicity, consistency, isolation, and durability (ACID) transactions that guarantee data validity even in the event of failures and errors. However, scalability and cost have become major challenges for SQL-based databases (DBs) as the volume of data continues to grow exponentially. One method for scaling an SQL server that allows access to the data layer is to add CPU and memory, i.e. vertical scaling.
While there is no question that a lot of work has gone into scaling SQL, NoSQL (Not-just-SQL) technologies claim to provide a more robust path to scale.
NoSQL DBs offer several advantages that make them a valuable option for scaling the data layer. These DBs achieve scalability using distributed systems providing flexible data schema (structured, unstructured, and semi-structured) to better enable fast changes to data. Furthermore, NoSQL DBs typically trade off data consistency against availability to provide a practical path for scalable DBs using distributed systems. In this case, the system grows through horizontal scaling, and incremental cost can be added as desired. Examples of NoSQL DBs are Apache Cassandra and Cosmos DB. These are distributed, scalable DBs that provide eventual consistency for data by trading off availability over consistency in the presence of a partition (CAP theorem).
A typical enterprise will have some form of SQL DBs already installed and in use. Organizations need to consider adding a NoSQL layer for scalability as the amount of unstructured data grows. So, the answer is not SQL or NoSQL, it’s both.
At MatrixCare, we continue to invest in and evolve our data infrastructure using the architectural methods described here. We offer two categories of products: MyData and MyAnalytics. MyData offers customers secure, HIPAA-compliant access to their own data, and allows them to mine it for insights. MyAnalytics offers dashboards and reports focused on key metrics and insights related to their out-of-hospital care network – census, readmissions, quality measures, clinical events (e.g. falls), financial AR, and five-star ratings.
Disclaimer: we are not endorsing this information for accuracy or validity of the content. We encourage you as appropriate, to verify clinical and regulatory content with your own trusted sources.