How to Find Relevant Healthcare Big Data

Summary

Whether deciding what data to archive, what to include in a data warehouse, what to use in developing predictive analytics, or what to feed an analytics product, data relevance and validity is critical to realizing ROI for health IT investments.

Join Fellow Healthcare IT Pros

Tips, Guides, News & More

Sign Me Up

Big Data is being touted as the panacea that will solve current challenges as government reforms create pressure to improve the quality of healthcare while decreasing costs. Words like volume, velocity, variety and veracity are often used to talk about Big Data. However, in the purest business sense, Big Data is primarily about validity and value. In healthcare in particular, one should be mindful that Big Data can very well turn into big confusion or big mistakes — especially if the data being queried does not relate to the question being asked.

How do we measure the value of data?
The size of the data is often not as critical as one’s ability to identify specific data points that, when appropriately aggregated and manipulated, deliver valuable, meaningful, and actionable insights. Knowing what to do with this knowledge and executing on those things is the key to unlocking value — big or otherwise. Process and operational changes that improve efficiency, quality or both must be made in order to realize a return on investment (ROI) on data. However, before healthcare administrators or clinicians will take action, they need trust that the decisions they’re making are based on accurate information.

How do we separate usable data from noise?
Data in and of itself is not valuable, and, more does not always equal better. In fact, collecting vast amounts of superfluous data can make deriving insights more complicated. In any given data, there is noise that makes interpreting the signal difficult. The bigger the data, the louder the noise. But what is noise? Noise is how we describe certain data within a data set. A data set is made up of data points that are related to each other in some way. Although the data points are related to each other, individual data points themselves may not be relevant to the question at hand.

Identifying relevant data
Data is usable and trustworthy based on its validity or ability to accurately represent a phenomenon, be it processes of care, treatment regimes, care giver activity or patient symptoms. Therefore, data points are usable within a specific context. The context of the data – who collected it, where it was collected, when it was collected, and how it was collected — is all relevant in selecting the most valid data for analysis. Given the breadth and complexity of patient care and healthcare operations it is no surprise that many organizations do not truly know what kind of data – or how much of it – they need to archive or collect in order to generate actionable insights from their information.

It takes years of experience working in the industry to understand patient care related processes and operations. This type of knowledge and insight is critical to understanding the context of healthcare data (the what, where, when and how). A greater understanding of the data context enables one to provide better inputs, which leads to better outputs. Better outputs lead to better questions and answers. Better answers lead to knowledge and understanding.

In future posts, I’ll share some of the least considered ‘contexts’ for the most common healthcare data sets. These perspectives have made a difference in hundreds of database designs, healthcare analyses and improvement initiatives.

Diagnoses/Problems
Diagnoses and problems are regularly used to identify cohorts of patients. To understand how a disease process is impacting a patient population, start by identifying the patients with the diagnosis of interest. Sounds simple right? It may or may not be, depending on the context of the question being asked. Below are 6 considerations that are useful when determining the suitability of data points to include withÂ diagnoses and problems.

Where – diagnoses can be found in encounters, orders, medical histories, and problem lists to name a few. Each of these locations represents a slightly different context for this information. Be sure to know which source is the most relevant to the question being asked.
Status, whether chronic or acute, may be relevant depending on the question. There is a big difference between evaluating quality of care for patients with acute bronchitis versus those with chronic bronchitis (often associated with COPD).
Specificity of diagnosis may be an important consideration – for example, when analyzing diabetic populations, is it appropriate to include gestational diabetics in the cohort?
Temporality – the initial diagnosis date may be relevant, as well as the date the problem was resolved, depending on the time frame of interest. This is often important in determining which patients to include in the look-back period of a reporting year.
Order – Whether a diagnosis is the primary or secondary assigned to an encounter may also be an important consideration. This is especially true when using diagnoses from claims data.
Reason for Visit may provide relevant information in addition to encounter diagnosis when forecasting future utilization.
Diagnoses associated with physician procedures or orders may provide valuable information for identifying differential diagnoses patterns or trends.

We’ll continue to explore the different contexts of data sets that offer the biggest bang for your buck. Whether deciding what data to archive, what to include in a data warehouse, what to use in developing predictive analytics, or what to feed an analytics product, data relevance and validity is critical to realizing ROI for health IT investments.

CETA and Harmony Healthcare IT solve complex data challenges on a regular basis. If you need best practice recommendations or consultation to define a data strategy, please contact us.

Guest Blog submitted by Michelle Currie, RN, MSN, CPHIMS, CPHQ

Michelle is a registered nurse with more than 15 years of experience in Clinical Informatics. She received her BSN from the University of Michigan and her MSN in Clinical Informatics from the University of California at San Francisco. In addition to serving as Chief Data Officer at CETA, Michelle volunteers her time as an annual Conference Presentation Reviewer, Mentor and Moderator for HIMSS.

Editor’s Note: This blog has been updated from an earlier version posted on June 25th, 2015.

August 2, 2018March 10, 2025

Ready to connect?

Healthcare IT tips, guides, news & more delivered to your inbox

Learn More

New System Implementation

Technology Consolidation

Cybersecurity

Registry Participation

Data Interoperability

Information Technology

Finance

Clinical

Health Information Management

Accounts Receivable

Research

Legal & Compliance

Human Resources

Mergers & Acquisitions

Quality

Altera & Veradigm (Allscripts)

TruBridge (CPSI)

athenahealth

eClinicalWorks

Epic

MEDITECH

Greenway

NextGen

Oracle Health (Cerner)

Other Archives

MEDICAL FACILITIES/SPECIALTIES OVERVIEW

Acute - Hospitals & Health Systems

Ambulatory Surgery Centers

Blood Banks & Laboratories

Correctional Health Facilities

Dental & Oral Surgery

Emergency Medical Services

Federally Qualified Health Centers (FQHC)

Imaging Centers

Infectious Disease

Mental & Behavioral Health

Occupational Therapy

Payer

Pediatric Health

Perinatal and Fetal Monitoring

Physician Groups

Post-Acute - Home Health, Hospice

Post-Acute - Long-term Care/SNF

Radiation and Oncology

Retail Pharmacy

Rural Health Providers

Specialty Pharmacy

Student Health

Telehealth

Vision

Women's Health

Data Services

Extraction

Migration / Conversion

Records Release

Legacy Data Management

Hot Storage - Active Archive

Warm Storage

Cold Storage

Live Data Solutions

Registry Automation

resources

HealthData Talks: ClearWay: AI-Powered Registry Automation

Article

10 Data Management Steps for Success Before, During and After M&A

resources

Health Data Management: Parkview Health aims to enhance efficiency with archiving solutions

resources

HealthData Talks: ClearWay: AI-Powered Registry Automation

Summary

Join Fellow Healthcare IT Pros

Ready to connect?

Healthcare IT tips, guides, news & more delivered to your inbox