Skip to main content

Race and ethnicity data — helping bridge the gap in health equity

The life sciences industry needs robust real-world data. For that data to reflect real-world populations, you need race and ethnicity data.

March 2024 | 5-minute read

Catherine Anderson, Senior Vice President of Health Equity Strategy, UnitedHealth Group

Catherine Anderson
Senior Vice President of Health Equity Strategy, UnitedHealth Group®

Lou Brooks, Senior Vice President of Real-World Data and Analytics, Optum Life Sciences

Lou Brooks
Senior Vice President of Real-World Data and Analytics, Optum® Life Sciences

Dr. Brian Solow, Chief Medical Officer, Optum Life Sciences

Dr. Brian Solow
Chief Medical Officer, Optum Life Sciences

Why self-reported data are important and barriers to getting them

Race and ethnicity (R&E) data are important metrics to understand discrepancies in health outcomes within certain populations. While examining these discrepancies can help providers, payers and life sciences organizations develop solutions that address disparities, R&E data are generally considered inadequate across the industry.

This is largely due to gaps or imputed methodologies that attempt to identify race and ethnicity from proxy sources. These issues have plagued the industry to the point that even understanding racial inequities in health is challenging. As a result, stakeholders across the health care industry are seeking more accurate sources of ethnicity and race data, either through their own novel collection efforts, or by combining many sources.

To that end, Optum Life Sciences recently added self-reported race and ethnicity data to our real-world claims data set, Clinformatics® Data Mart (CDM). The incorporation of these data in CDM presents an opportunity for life sciences researchers to do work that’ll help lead to better care and better solutions for all types of populations. 

The buzz surrounding self-reported data collection

Self-reported race and ethnicity data reflect information collected directly from individuals. They represent an individual’s understanding of their racial and ethnic background. But self-reported data are constrained by how the industry has collected that information.

For instance, some collection methodology limits accuracy because patients may not even have a choice that aligns to their belief of who they are. For example, on health care forms, the data fields matching their race and/or ethnicity may not be available, so patients have limited choices that they can identify with. Often, the question is presented as, “optional for demographic purposes only,” so patients just skip it. So, what patients can provide about themselves in the health care setting isn’t always as thorough as it could be.

That said, self-reported data are typically considered to be more reliable than imputed data. Self-reported R&E data are lifted directly from what patients and providers fill out in real-world health records. That’s why there’s been an industry-wide push for improving collection of these data, though it hasn’t been without its challenges.

Why collecting race and ethnicity data is a tough task

If giving personal information is voluntary and patients don’t know why it's important, it’s easy to understand why they don’t offer it. The industry has to build trust with patients and demonstrate that the information they share will be used for good and not for potential harm.

And the industry hasn’t done a good enough job of explaining how the data they collect will lead to improved care. Whenever possible, it’s important to spell out that this information leads to better research and therefore improved care, which enables us to advance medicine in general. 

Alt text

The challenges with existing data imputation methodology

While self-reported data are preferential in the industry, many sources still employ various data imputation methodologies. But that’s not always straightforward either. Both government and non-governmental organizations are testing improvements to imputation models and trying to correct for multiracial and multiethnic realities.

Many neighborhoods have a very diverse representation of ethnic backgrounds and races. So, one of the big challenges for imputation methods is that people don't cluster in distinct areas anymore. In some geographic areas, the discriminatory practice of redlining was a very clear policy. Ironically, this practice helps maintain the credibility of some data imputation models because it limited where people could live. But then results become more skewed because the imputation models themselves cannot be compared geographically. And as people become more diverse through marriage, imputation models incorporating surnames may make these methods even more problematic.

Using race and ethnicity data and the questions they help answer

Race and ethnicity data combined with real-world clinical and cost details are crucial to understand inequities at every point in the patient journey, from clinical trial opportunities to access to therapies. 

More diversity in clinical trials can help resolve inequity in the health care system. Reviewing race and ethnicity data can help researchers make sure trials more closely match the U.S. population affected by the condition under study.

And we can also look at real-world data (RWD) to see what's happening post-market. Observing how race and ethnicity and social determinants of health (SDOH) data intersect with efficacy and access to treatment is crucial for life sciences companies. These types of analyses can help inform treatment guidelines, prescribing decisions and other activities that promote equity.

For example, a recent example in the news uncovered a big gap between Black and white stroke victims. Researchers discovered that Black Americans suffer from strokes at far younger ages than white patients. The gap is nearly a decade.1 The difference in health outcomes may be due to limited access to healthy foods and medical care in some communities.2

In addition, Black patients are also less likely to receive tissue plasminogen activator (tPA), have higher door-to-imaging times and are less likely to receive endovascular thrombectomy (EVT) than white patients; all of which may contribute to the higher mortality rate post-stroke in Black patients.3 The study concluded that an increased use of the right therapeutics in minority groups can improve stroke care and outcomes.

The impact of race and ethnicity data today and in the future

In many instances, race and ethnicity are largely missing but credible foundational data elements. Without that foundational information, we are left making assumptions, good and bad, in clinical design and at other decision points, that likely have unintended bias in them. That may create less than ideal outcomes and, in some instances, may actually create unintended harm.

The team at Optum Life Sciences is committed to helping life sciences companies use RWD to answer the toughest of questions — and the incorporation of self-reported race and ethnicity data in our CDM claims data set will help researchers do just that.

But the industry's interest in achieving health equity isn't stopping at improved race and ethnicity data reporting. As populations continue to diversify, the industry must continue to look for opportunities to use data sets that contain social determinant data including factors such as education level, income and more.

As data sources evolve and collection methods improve, there will be opportunities to use this data to do many positive things: improve quality of life, predict and avoid chronic illnesses or get the right patients on the right therapy sooner. Start moving the needle by looking for innovative ways to incorporate R&E data in your research today. 



Get better RWD insights with diverse patient journeys

Interested in learning how our data can help catalyze your research?

Contact us


Related content

Catalyze the use of RWE in the life sciences industry

Discover how using real-world evidence (RWE) in the life sciences industry can help catapult learning and drive impact for your company.

An interview with Jamie Freedman, MD, PhD, and Shalini Mohan, MD

This interview about health equity and clinical trials is part of the "Let's talk" conversation series.

6 guiding steps for selecting a fit-for-purpose data set

Using RWD that are not fit for purpose can waste time and money. Review 6 simple steps to guide your research towards the right data.
  1. USA Today. There’s a big gap between Black and white stroke victims. It’s a major health concern. Published January 10, 2024. Accessed January 19, 2024.
  2. Ibid.
  3. Metcalf D, Zhang D. Racial and ethnic disparities in the usage and outcomes of ischemic stroke treatment in the United States. Journal of Stroke and Cerebrovascular Diseases. 2023; 32(12).