Big Data is being touted as the panacea that will solve current challenges as government reforms create pressure to improve the quality of healthcare while decreasing costs. Words like volume, velocity, variety and veracity are often used to talk about Big Data. However, in the purest business sense, Big Data is primarily about validity and value. In healthcare in particular, one should be mindful that Big Data can very well turn into big confusion or big mistakes — especially if the data being queried does not relate to the question being asked. How do we measure the value of data? The size of the data is often not as critical as one’s ability to identify specific data points that, when appropriately aggregated and manipulated, deliver valuable, meaningful, and actionable insights. Knowing what to do with this knowledge and executing on those things is the key to unlocking value — big or otherwise. Process and operational changes that improve efficiency, quality or both must be made in order to realize a return on investment (ROI) on data. However, before healthcare administrators or clinicians will take action, they need trust that the decisions they’re making are based on accurate information. How do we separate usable data from noise? Data in and of itself is not valuable, and, more does not always equal better. In fact, collecting vast amounts of superfluous data can make deriving insights more complicated. In any given data, there is noise that makes interpreting the signal difficult. The bigger the data, the louder the noise. But what is noise? Noise is how we describe certain data within a data set. A data set is made up of data points that are related to each other in some way. Although the data points are related to each other, individual data points themselves may not be relevant to the question at hand. Identifying relevant data Data is usable and trustworthy based on its validity or ability to accurately represent a phenomenon, be it processes of care, treatment regimes, care giver activity or patient symptoms. Therefore, data points are usable within a specific context. The context of the data – who collected it, where it was collected, when it was collected, and how it was collected — is all relevant in selecting the most valid data for analysis. Given the breadth and complexity of patient care and healthcare operations it is no surprise that many organizations do not truly know what kind of data – or how much of it – they need to archive or collect in order to generate actionable insights from their information. It takes years of experience working in the industry to understand patient care related processes and operations. This type of knowledge and insight is critical to understanding the context of healthcare data (the what, where, when and how). A greater understanding of the data context enables one to provide better inputs, which leads to better outputs. Better outputs lead to better questions and answers. Better answers lead to knowledge and understanding. In future posts, I’ll share some of the least considered ‘contexts’ for the most common healthcare data sets. These perspectives have made a difference in hundreds of database designs, healthcare analyses and improvement initiatives. Diagnoses/Problems Diagnoses and problems are regularly used to identify cohorts of patients. To understand how a disease process is impacting a patient population, start by identifying the patients with the diagnosis of interest. Sounds simple right? It may or may not be, depending on the context of the question being asked. Below are 6 considerations that are useful when determining the suitability of data points to include withÂ diagnoses and problems. Where – diagnoses can be found in encounters, orders, medical histories, and problem lists to name a few. Each of these locations represents a slightly different context for this information. Be sure to know which source is the most relevant to the question being asked. Status, whether chronic or acute, may be relevant depending on the question. There is a big difference between evaluating quality of care for patients with acute bronchitis versus those with chronic bronchitis (often associated with COPD). Specificity of diagnosis may be an important consideration – for example, when analyzing diabetic populations, is it appropriate to include gestational diabetics in the cohort? Temporality – the initial diagnosis date may be relevant, as well as the date the problem was resolved, depending on the time frame of interest. This is often important in determining which patients to include in the look-back period of a reporting year. Order – Whether a diagnosis is the primary or secondary assigned to an encounter may also be an important consideration. This is especially true when using diagnoses from claims data. Reason for Visit may provide relevant information in addition to encounter diagnosis when forecasting future utilization. Diagnoses associated with physician procedures or orders may provide valuable information for identifying differential diagnoses patterns or trends. We’ll continue to explore the different contexts of data sets that offer the biggest bang for your buck. Whether deciding what data to archive, what to include in a data warehouse, what to use in developing predictive analytics, or what to feed an analytics product, data relevance and validity is critical to realizing ROI for health IT investments. CETA and Harmony Healthcare IT solve complex data challenges on a regular basis. If you need best practice recommendations or consultation to define a data strategy, please contact us. Guest Blog submitted by Michelle Currie, RN, MSN, CPHIMS, CPHQ Michelle is a registered nurse with more than 15 years of experience in Clinical Informatics. She received her BSN from the University of Michigan and her MSN in Clinical Informatics from the University of California at San Francisco. In addition to serving as Chief Data Officer at CETA, Michelle volunteers her time as an annual Conference Presentation Reviewer, Mentor and Moderator for HIMSS. Editor’s Note: This blog has been updated from an earlier version posted on June 25th, 2015.