Real-World Evidence Sources for Drug Safety: Registries and Claims Data

Drug Safety Detection Calculator

Calculate Patient Sample Size for Rare Side Effect Detection

Estimate how many patients are needed to detect a rare adverse event using registry or claims data sources.

Side Effect Prevalence (1 in X):

Data Source:

Data Completeness (%):

When a new drug hits the market, the clinical trials that got it approved only tell part of the story. Those trials usually involve a few thousand patients over a couple of years. But real people - millions of them - take the drug for years, sometimes decades. They have other health conditions, take other medications, and live different lifestyles. That’s where real-world evidence comes in. It’s not theory. It’s data from actual use. And two of the most powerful sources? Patient registries and claims data.

What Exactly Is Real-World Evidence?

Real-world evidence (RWE) is clinical evidence drawn from real-world data (RWD). That means information collected outside the tightly controlled environment of clinical trials. It’s what happens when drugs are used in everyday practice. The U.S. Food and Drug Administration (FDA) officially defined RWE in 2018 as evidence about how a medical product is used and how safe or effective it really is - based on data from routine healthcare settings. This isn’t new. The FDA has been using this kind of data since the 1980s. But since 2016, with the passage of the 21st Century Cures Act, it’s become a formal part of drug safety monitoring. In fact, between 2017 and 2021, the FDA approved 12 drugs or new uses where RWE played a direct role - five of them relying on claims data or registries.

How Registries Work: The Deep Dive

Registries are structured databases that collect detailed information about patients with specific conditions or those using particular drugs. Think of them as medical diaries, but on a massive scale. There are two main types: disease registries and product registries. Disease registries track patients with conditions like cystic fibrosis, cancer, or diabetes. Product registries follow patients who take a specific medication - say, a new biologic for rheumatoid arthritis.

What’s in them? A lot. Beyond basic demographics, registries include lab results, imaging reports, treatment details, side effects, and even patient-reported symptoms. For example, the Cystic Fibrosis Foundation Patient Registry helped uncover safety signals for ivacaftor - a drug that worked brilliantly for certain genetic mutations, but only because the registry had the detail to spot it. Clinical trials missed it because those mutations were too rare to show up in their small groups.

Registries vary in size. Some are run by a single hospital and include just a few hundred patients. Others, like the Surveillance, Epidemiology, and End Results (SEER) cancer registry, cover nearly half the U.S. population. According to a 2021 study, registries offer 37% more detail on long-term outcomes than claims data alone. But they’re expensive and hard to maintain. Setting one up can cost $1.2 million to $2.5 million and take 18 to 24 months. Annual upkeep? $300,000 to $600,000. And about 35% of academic registries shut down within five years due to funding gaps.

Claims Data: The Power of Scale

Claims data is different. It’s not collected for research - it’s generated when doctors bill insurance. Every time a patient gets a prescription filled, visits the ER, or has an MRI, that’s recorded in a claims database. It includes diagnosis codes (ICD-10), procedure codes (CPT), drug codes (NDC), and dates of service. It’s not glamorous, but it’s massive.

IBM MarketScan tracks 200 million lives. Optum Clinformatics covers 100 million. Truven Health MarketScan adds another 150 million. The Medicare database alone has over 60 million beneficiaries with 15+ years of continuous records. That’s why the FDA used it in 2015 to study entacapone - analyzing 1.2 million Medicare patients over five years - and found no increased heart risk. In 2014, they looked at 850,000 patients on olmesartan to check for cardiovascular issues. No red flags.

Claims data is nearly perfect for tracking how often people use healthcare. For inpatient visits, completeness hits 95-98%. But it’s weak on clinical detail. Only 45-60% of lab values are recorded. Patient-reported symptoms? Almost never. And coding errors? They’re common. The Agency for Healthcare Research and Quality (AHRQ) estimates a 15-20% error rate in diagnosis codes. That’s why a 2022 study found a 22% false positive rate for safety signals from claims data alone - meaning a lot of alarms turned out to be noise.

A sprawling claims data network like a neon subway system, with trains carrying millions of records and a doctor coding uncertainly.

Registries vs. Claims Data: A Side-by-Side Look

Comparison of Registries and Claims Data for Drug Safety Monitoring
Feature	Registries	Claims Data
Population Size	1,000-50,000 patients	Millions (e.g., 300M+ via FDA Sentinel)
Clinical Detail	High (87% lab value completeness)	Low (45-60% lab value completeness)
Longitudinal Coverage	Variable (often 5-10 years)	Excellent (15+ years for Medicare)
Data Completeness	68-92%	95-98% for utilization
Cost to Build	$1.2M-$2.5M upfront	Minimal (uses existing billing systems)
Best For	Rare diseases, detailed outcomes, long-term safety	Large populations, rare adverse events, cost patterns

For rare side effects - say, one in 10,000 patients - claims data needs about a million records to spot it reliably. Registries can do it with half that, thanks to higher data quality. But claims data wins when you need to track something across decades. The FDA’s Sentinel Initiative, which links 11 health systems and 3 claims processors, now monitors 300 million patient records. That’s how they caught early signals for drugs like palbociclib, which got expanded approval in 2019 after claims and EHR data showed it was safe in new patient groups.

The Hybrid Approach: Why Combining Them Works Best

Neither source is perfect alone. Registries are detailed but small. Claims data is huge but shallow. The smartest move? Use both together.

The International Council for Harmonisation (ICH) released new guidance in June 2023 that specifically recommends combining registry and claims data. Why? Because when you cross-check them, false positives drop by 40%. A 2024 study in JAMA Network Open showed AI tools that blend both data types cut false safety signals by 28%. That’s huge. It means fewer unnecessary drug warnings and faster action when real risks appear.

Take the case of tacrolimus. In 2021, the European Medicines Agency approved a new use for the transplant drug based on data from the Scientific Registry of Transplant Patients - a registry that tracked outcomes over 10 years. But they also checked claims data to confirm the drug wasn’t being overused or misprescribed. That dual approach gave regulators confidence.

A registry and claims data merging into a shield labeled 'FDA Sentinel', symbolizing safer drug monitoring.

What’s Changing in 2026?

The landscape is moving fast. In January 2024, the FDA released draft guidance saying registry data must now meet at least 80% completeness for key variables to be accepted. That’s a big step toward standardization. The FDA’s 2023-2027 RWE Action Plan commits to creating 5-7 new statistical methods for claims data by 2025 - especially to fix problems like “immortal time bias,” which can make drugs look safer than they are. Fixing that reduces bias by 35-50%.

Meanwhile, the European Union’s Darwin EU network expanded in late 2023 to cover 120 million citizens across 15 countries. Novartis is piloting wearable data from smartwatches - heart rate, activity levels - and linking it to claims records to monitor heart failure drug safety. The FDA’s REAL program, launched in 2023, is building standardized registries for 20 rare diseases by 2026. Why? Because traditional trials can’t capture enough patients. Registries can.

Why This Matters for Patients and Doctors

This isn’t just about regulators. It’s about real people. When a drug is approved, we assume it’s safe. But safety doesn’t end at approval - it’s an ongoing conversation. Registries help us understand if a drug works for older patients, pregnant women, or those with kidney disease - groups often left out of trials. Claims data tells us if people are taking it correctly, if it’s causing unexpected hospital visits, or if it’s too expensive to sustain.

For doctors, this means better tools to make decisions. If a patient has a rare mutation and you’re considering a new targeted therapy, a registry might show you how others with the same mutation responded over five years. If a patient is on five medications and ends up in the ER, claims data can reveal patterns - maybe a drug interaction you didn’t see coming.

And for patients? It means safer drugs, faster. When a rare side effect shows up, regulators can act before hundreds more people are harmed. That’s the goal.

What’s Next?

The future of drug safety isn’t in labs. It’s in the data we already collect - in billing systems, in clinic visits, in patient-reported logs. The challenge isn’t gathering more data. It’s connecting it, cleaning it, and trusting it. Registries and claims data aren’t perfect. But together, they’re becoming the most powerful tool we have to keep drugs safe after they leave the trial and enter the real world.

Can claims data prove that a drug causes a side effect?

Claims data alone can’t prove causation. It can only show a pattern - like more heart attacks in people taking Drug X. That’s a signal, not proof. To confirm causation, you need clinical review, lab results, and often, registry data. The FDA often uses claims data to flag potential risks, then digs deeper with medical records or registry studies to confirm.

Are patient registries only for rare diseases?

No. While registries are especially useful for rare diseases - because trials can’t include enough patients - they’re also used for common conditions like diabetes, cancer, and heart failure. For example, the SEER registry tracks over 48% of U.S. cancer patients. Registries help identify long-term side effects, treatment patterns, and survival rates across large populations.

Why do some registries shut down after a few years?

Most academic registries rely on grants or institutional funding. When funding runs out - and many last only 3-5 years - they can’t afford staff, data entry, or software updates. Some are also voluntary, so if participation drops, the data becomes unreliable. Sustainable registries usually have industry backing or government support.

How accurate are diagnosis codes in claims data?

Diagnosis codes (ICD-10) in claims data have an estimated 15-20% error rate. Doctors may code for symptoms instead of diagnoses, or use generic codes to save time. Billing pressure can lead to overcoding or undercoding. That’s why researchers use algorithms to flag inconsistencies and combine claims with EHR data to improve accuracy.

Is real-world evidence accepted by regulators worldwide?

Yes. The FDA, EMA, and other global agencies now routinely accept RWE for safety monitoring and even some approvals. The EU’s Darwin EU network, launched in 2021, connects 32 databases across 15 countries. Japan’s PMDA and Health Canada also use RWE. Regulatory acceptance has grown sharply since 2017, with over 100 RWE submissions reviewed by the FDA in 2022 alone.

March 18 2026
Tony Newman
0 Comments
Permalink

Written by: Tony Newman

View all posts by: Tony Newman