RWE Source Selector Tool
Select Your Criteria
Recommendation
Registries vs. Claims Data Comparison
- Best for: Rare diseases, detailed clinical tracking
- Key strength: High clinical detail (87% lab value completeness)
- Limitation: Small population (1,000-50,000 patients)
- Cost: $1.2M - $2.5M to establish
- Best for: Large populations, common drugs
- Key strength: Massive scale (100M-300M patients)
- Limitation: Low clinical detail (52% lab value completeness)
- Cost: Minimal (uses existing systems)
When a new drug hits the market, the clinical trial data that got it approved is just the beginning. Real-world evidence (RWE) is what comes next - the ongoing, real-life look at how drugs behave outside the controlled environment of a trial. Two of the most powerful sources for this kind of evidence are registries and claims data. Together, they help regulators, doctors, and drug makers spot safety problems that clinical trials simply can’t catch.
What Is Real-World Evidence and Why Does It Matter?
Real-world evidence isn’t from labs or tightly monitored studies. It’s pulled from everyday healthcare settings: hospitals, pharmacies, insurance claims, and patient tracking systems. The U.S. Food and Drug Administration (FDA) started formally using RWE for drug safety decisions around 2017, and since then, it’s played a role in approving 12 drugs or new uses - five of them based directly on claims or registry data.
Why does this matter? Clinical trials usually involve a few thousand patients over a couple of years. Real-world data can track hundreds of thousands - even millions - of people over decades. That’s how you find rare side effects, like a heart rhythm problem that only shows up in 1 in 10,000 users. Trials miss those. Registries and claims data don’t.
How Registries Work: Deep, Detailed, But Smaller
Disease registries are databases that collect detailed, structured information about patients with specific conditions. Think of them as long-term patient diaries, kept by doctors and researchers. A registry for cystic fibrosis, for example, tracks everything: lung function tests, which medications are used, hospital visits, and even how patients feel day to day.
These registries are gold for safety monitoring because they capture clinical details that claims data can’t. Lab results, imaging reports, genetic data - all of it gets recorded. According to a 2021 study, registries offer 37% more detail on long-term outcomes than claims data alone.
The FDA approved pembrolizumab’s expanded use in 2017 partly because of registry data from an expanded access program. The Scientific Registry of Transplant Recipients helped confirm the safety of tacrolimus in 2021 by tracking thousands of transplant patients over time. The Cystic Fibrosis Foundation’s registry even found a safety signal for ivacaftor in patients with rare gene mutations - something the original trial didn’t catch because those mutations were too uncommon.
But there’s a catch. Registries are expensive and slow to build. Setting one up can cost $1.2 million to $2.5 million and take two years. Most have only a few thousand to 50,000 patients. And participation isn’t mandatory - only 60% to 80% of eligible patients join. That introduces selection bias. If sicker or more tech-savvy patients are more likely to sign up, the data might not reflect the full population.
Claims Data: The Power of Scale
Claims data is what insurance companies and government programs like Medicare collect every time someone visits a doctor, gets a prescription, or is admitted to the hospital. It includes diagnosis codes (ICD-10), procedure codes (CPT), drug codes (NDC), and billing information. It’s not clinical - it’s administrative. But it’s huge.
IBM MarketScan covers 200 million lives. Optum has 100 million. Truven Health has 150 million. Medicare claims alone go back 15+ years for each beneficiary. That’s more than enough to track long-term side effects like kidney damage or cancer risk after a drug has been on the market for a decade.
The FDA used Medicare claims data in 2015 to study entacapone, a Parkinson’s drug, and found no increased cardiovascular risk in 1.2 million patients. In 2014, they reviewed olmesartan (Benicar) in 850,000 diabetic patients to check for heart-related side effects. Both studies relied entirely on claims data.
Claims data is also how palbociclib got its expanded approval in 2019. The FDA looked at claims, electronic health records, and safety reports together to confirm the drug was safe for new patient groups.
But here’s the downside: claims data doesn’t tell you how a patient felt. It doesn’t include blood pressure readings, weight changes, or symptoms reported by the patient. Lab values are missing in 40-55% of records. Diagnosis codes can be wrong - up to 20% of the time, according to the Agency for Healthcare Research and Quality (AHRQ). And if a patient sees a specialist outside the network, that visit might not show up at all.
Comparing Registries and Claims Data Side by Side
| Feature | Registries | Claims Data |
|---|---|---|
| Population Size | 1,000 - 50,000 patients | 100 million - 300 million patients |
| Data Completeness (Lab Values) | 87% | 52% |
| Longitudinal Coverage | 5-15 years (on average) | 15+ years (Medicare) |
| Clinical Detail | High - includes imaging, symptoms, genetics | Low - only codes and billing info |
| Cost to Establish | $1.2M - $2.5M | Minimal (uses existing systems) |
| False Signal Rate | Low (due to rich context) | High - up to 22% require clinical review |
| Best For | Rare diseases, genetic conditions, long-term outcomes | Common drugs, large populations, rare adverse events |
The numbers tell a clear story: if you need depth, go with registries. If you need breadth, claims data wins. But neither is perfect alone.
Why Combining Them Works Best
Registries give you the clinical context. Claims data gives you the population scale. Together, they cancel out each other’s weaknesses.
The International Council for Harmonisation (ICH) released new guidance in June 2023 saying exactly that: use both. When you combine them, false positive safety signals drop by 40%. That means fewer unnecessary drug warnings and fewer wasted investigations.
Take the FDA’s Sentinel Initiative. It’s not just one database. It connects 11 healthcare systems and 3 claims processors to monitor 300 million patient records. That’s the gold standard - using claims for volume and registries for detail when needed.
Even the European Medicines Agency (EMA) is moving this way. Its Darwin EU network, launched in 2021, now pulls data from 32 healthcare databases across 15 countries. It’s not just about one country - it’s about cross-border safety monitoring. By October 2023, it covered 120 million EU citizens.
Challenges and Pitfalls
It’s not all smooth sailing. Data standardization is the biggest headache. One hospital codes diabetes as E11. Another uses 250.9. Getting them to match takes 40-60% of a project’s time and budget.
Privacy laws like HIPAA in the U.S. and GDPR in Europe add layers of complexity. You can’t just grab patient records. Data must be de-identified, encrypted, and handled under strict protocols.
And then there’s bias. Claims data misses people without insurance. Registries miss people who don’t volunteer. Both can skew results. The FDA’s 2022 guidance specifically warns about “immortal time bias” - a statistical trap where patients are incorrectly labeled as safe because they survived long enough to be counted. Using the right methods can cut that bias by 35-50%.
Finally, sustainability. About 35% of academic registries shut down within five years. They run out of funding. Without ongoing support, valuable long-term data disappears.
The Future: AI, Wearables, and Standardization
Things are changing fast. In January 2024, the FDA released draft guidance requiring at least 80% data completeness for key variables in registry studies. That’s a big step toward quality control.
The FDA’s 2023-2027 RWE plan includes building five to seven new analytical standards for claims data by 2025. That means better tools to detect signals, filter noise, and avoid false alarms.
And now, some companies are adding wearables. Novartis started using smartwatches to track heart rate and activity in patients on Entresto - a heart failure drug. Combining that real-time data with claims records gave them a more complete picture of safety.
AI is also stepping in. A 2024 study in JAMA Network Open showed AI-powered signal detection cut false positives by 28%. It doesn’t replace humans - it helps them focus on the real red flags.
The FDA’s REAL program, launched in 2023, is trying to standardize registry data for 20 priority diseases by 2026. The first focus? Rare diseases. Why? Because traditional trials can’t recruit enough patients. Registries are the only way to monitor safety for these groups.
What This Means for Patients and Providers
For patients, this means drugs are being monitored more closely than ever. If a side effect emerges years after approval, regulators can act faster because they’re not waiting for a few hundred reports - they’re seeing patterns in millions of records.
For doctors, RWE helps answer real questions: Is this drug safe for my elderly patient with kidney disease? Does it interact with their other meds? The answers are no longer just based on trial results from healthy 30-year-olds.
And for the system as a whole, it’s a win. RWE is cheaper than running new trials. It’s faster than waiting for adverse events to pile up. And it’s more accurate than relying on voluntary reports.
The global market for RWE is projected to hit $10.7 billion by 2030. Pharmaceutical companies are now spending 8-12% of their pharmacovigilance budgets on it - up from just 3-5% in 2017. That’s not a fad. It’s the new normal.
What’s the difference between real-world data (RWD) and real-world evidence (RWE)?
Real-world data (RWD) is the raw information collected from everyday sources like electronic health records, insurance claims, and patient registries. Real-world evidence (RWE) is the clinical insight you get after analyzing that data. Think of RWD as the ingredients and RWE as the finished meal.
Can claims data prove that a drug causes a side effect?
Not alone. Claims data can show a pattern - like more heart attacks in patients taking Drug X. But it can’t prove cause and effect. That’s why regulators always look for supporting evidence: clinical reviews, lab data, or registry input. Claims data flags a possible problem; other data confirms it.
Why do registries cost so much to set up?
Because they require trained staff to collect, verify, and enter detailed clinical data - not just codes. You need doctors to input lab results, nurses to follow up with patients, IT systems to store imaging and genetic data, and quality controls to ensure accuracy. It’s labor-intensive and requires long-term funding.
Are registries only used for rare diseases?
No. While they’re especially valuable for rare diseases - where trials are too small - they’re also used for cancer, autoimmune diseases, and transplant patients. The key is when you need detailed, long-term clinical tracking, not just billing info.
How does the FDA use registry data in drug approvals?
The FDA uses registry data to support supplemental approvals - like expanding a drug’s use to new patient groups or confirming long-term safety. For example, registry data helped approve pembrolizumab for more cancer types and tacrolimus for new transplant patients. It doesn’t replace trials, but it adds critical real-world proof.
Is claims data reliable for tracking drug side effects in older adults?
Yes - especially because Medicare claims cover 15+ years of data for seniors. That’s longer than most clinical trials. But it’s only reliable if you account for comorbidities and drug interactions. Older patients often take multiple medications, and claims data can miss those unless carefully analyzed.
Can patients opt out of having their data used in registries or claims systems?
In claims data, patients usually can’t opt out - it’s part of billing. But in voluntary registries, participation is always optional. Patients must give informed consent before joining. Their data is anonymized, and they can leave at any time.
What’s the biggest limitation of using claims data for drug safety?
The biggest limitation is missing clinical context. Claims tell you a patient was diagnosed with diabetes and got a prescription - but not their blood sugar levels, diet, or symptoms. Without that, it’s hard to tell if a side effect is real or just a coincidence.
David McKie
February 23, 2026 AT 21:00Let me tell you something - this whole RWE thing is a glorified dumpster fire wrapped in a PowerPoint presentation. Registries? Cost $2 million to build and only 60% of patients bother to join? That’s not data - that’s wishful thinking with a spreadsheet. And claims data? HA. You think ICD-10 codes are accurate? I’ve seen a diabetic patient coded as ‘unspecified hyperglycemia’ because the nurse was on break and the EHR auto-filled the first option. This isn’t science. It’s statistical theater. And don’t even get me started on AI ‘signal detection’ - it’s just a fancy algorithm screaming ‘FIRE!’ every time two patients took the same drug and sneezed on Tuesday.
Stephen Archbold
February 24, 2026 AT 06:29honestly tho - i’ve worked in pharma analytics for 8 years and i can say this: claims data is messy but it’s the only thing we got at scale. registries are beautiful but they’re like fancy museums - cool to look at, but who’s gonna visit? i once saw a registry for rare liver disease with 27 patients. 27. out of 400k. meanwhile, medicare claims showed 12k patients on the same drug. yeah, the data’s rough - but when you combine it with real clinician notes? boom. you see patterns. also, typo in ‘entacapone’ lol - but you get the point. this stuff saves lives. even if it’s ugly.
kirti juneja
February 25, 2026 AT 06:43As someone from India where healthcare access is a lottery, I can’t help but laugh at the ‘millions of lives’ in claims data. Who’s included? The 5% with private insurance? What about the 95% paying out of pocket? We don’t have EHRs. We don’t have Medicare. We have moms walking 10km with a child’s lab report tied to their sari. Registries? We dream of registries. But here’s the truth: if you’re not counting the uncounted, you’re not monitoring safety - you’re just chasing metrics. RWE isn’t magic. It’s privilege wrapped in data. And until we include the invisible patients? We’re just rearranging deck chairs on the Titanic.
Haley Gumm
February 26, 2026 AT 04:40Okay, but let’s be real - the FDA’s using RWE because it’s cheaper than running new trials. And honestly? That’s fine. But the moment you start saying ‘claims data proves causation’? That’s when things go sideways. I’ve seen a drug get pulled because claims showed ‘more strokes’ - turns out, the stroke patients were all in one county with bad air quality and a new power plant. No one checked. Just ran the numbers. RWE is a tool. Not a crystal ball. And we’re treating it like one. Big mistake.
Natanya Green
February 26, 2026 AT 12:41OMG I JUST REALIZED THIS IS A FULL ON CONSPIRACY!!!
Registries cost millions? Claims data has 20% wrong codes? AI is ‘cutting false positives’?!!
WHAT IF THE WHOLE SYSTEM IS DESIGNED TO HIDE DRUG SIDE EFFECTS??
Think about it - if they only use data from people who can afford insurance or volunteer for registries… then the people who get hurt the most? The poor? The elderly? The undocumented? THEY’RE NOT IN THE DATA!!
So… the drugs are ‘safe’… because we’re not counting the people who die in silence??
I’m not even mad. I’m just… disappointed. Like… how many people have to die before we fix this??
And why isn’t anyone talking about this??
Maranda Najar
February 28, 2026 AT 06:36While I appreciate the structural overview presented herein, I must respectfully contend that the fundamental epistemological foundation of real-world evidence remains gravely compromised by systemic selection bias, confounding variables of unprecedented magnitude, and the ontological impossibility of deriving causal inference from administrative datasets devoid of clinical granularity. The very notion that claims data - inherently transactional, fragmented, and non-standardized - can serve as a proxy for clinical outcomes is not merely flawed; it is epistemologically incoherent. The FDA’s reliance on such data constitutes a dangerous conflation of correlation with causation, and the proliferation of AI-driven signal detection algorithms merely masks this epistemic collapse under a veneer of computational sophistication. In sum: we are not advancing pharmacovigilance. We are automating negligence.
Sanjaykumar Rabari
February 28, 2026 AT 16:40Kenzie Goode
March 1, 2026 AT 12:42I think the most beautiful thing here is how registries and claims data complement each other - like two different languages telling the same story. One gives you the heartbeat, the other the heartbeat’s echo. Neither alone is perfect, but together? They’re the closest we’ve ever gotten to seeing the whole picture. And honestly? That’s kind of hopeful. We’re not trying to be perfect. We’re just trying to be better. Slowly. Messily. But together.