Electronic Heath Record Data/PCORnet Common Data Model

The PCORnet Common Data Model (CDM) is a specification that defines a standard organization and representation of data for the PCORnet Distributed Research Network. The PCORnet CDM is a key component of the PCORnet Distributed Research Network (DRN) infrastructure.  PCORnet developed the PCORnet DRN to be a “…functional distributed research network that facilitates multi-site patient–centered research across the Clinical Research Networks (CRNs) and other interested contributors. The distributed network will enable the conduct of observational research and clinical trials while allowing each participating organization to maintain physical and operational control over its data”. [Data Standards, Security, and Network Infrastructure Task Force (DSSNI charter), 2014]

For rich detail on the current common data model specification, please visit or download
the latest CDM specification:


Informatics for Integrating Biology and Bedside (i2b2) is a self-service and user-friendly tool that leverage existing data for cohort identification, retrospective data analysis, feasibility study and hypothesis generation.

Researchers can query de-identified CDM data from GPC sites using the GPC i2b2 tool. Apart from just numbers you can also visualize with demographic, medications and diagnosis breakdowns and compare two population sets.

GPC i2b2 is available to qualified faculty member from GPC institutions. Affiliate investigators, which can include staff, students, residents, fellow, and postdocs, can gain access with qualified faculty member sponsorship. Request access

Enhanced Data - Mortality Status, Tumor Table and Natural Language Processing

Mortality data is a key outcome for clinical research ad in most healthcare data sets, mortality data is not included in the data set unless the patient has died in a hospital setting.  Mortality data can assist in informing research and outcomes analysis as well as assist clinical trial teams in avoiding contacting deceased individuals’ families regarding clinical trials and studies. 

GPC recognizes the value of mortality data and integrates the Social Security Death Master file data with the GPC site CDMs.  An additional data source,, is available for the University of Missouri data set.

  • Death Data from the EHR record (death as disposition) – Death as disposition of a patient is recorded in their EHR patient record
  • SSA DMF– The Death Master File (DMF) from the Social Security Administration (SSA) is a data source that is created from internal SSA records of deceased persons possessing social security numbers and whose deaths were reported to the SSA.
  •– Obituary data sourced from funeral homes, newspapers, and other online obituary sources.
  • Enriched site-level data with Tumor Table Linkage – In additional to the standard tables in CDM, GPC is spearheading the effort to integrate specialty cancer data. We have linked cancer-specific data from North American Association of Certified Cancer Registrar (NAACCR) to populate a tumor table in site CDM, which has been used to support proposals. High quality data in structured fields for demographic, clinical, and treatment observations are included in the table.
  • Natural Language Processing (NLP) – GPC has committed to standardize the extraction and population of textual data. We have a top ranked NLP development team that specializes in clinical textual data extraction to tailor the pipelines, test, validate and refine the pipeline to support NLP deployment.
  • Geocoding – All GPC sites have geocoded patients’ addresses which can be used to further link to multiple community-level social determinants of health data that are publicly available. Based on zip+4 information, we have also geocoded all Medicare and Medicaid beneficiaries obtained for the GROUSE project and linked to a curated set of America community survey variable, Rural-Urban Community Area code, Area Deprivation Index, Bird Index, and etc.   
  • Clinical observable Data – Multiple GPC partners have extracted an extensive list of structure clinical observable data from source EHR systems, including but not limited to flowsheets data and patient-reported outcomes.

GPC Reusable Observable Unified Study Environment (GROUSE)

In order to understand all types of care a patient receives without being restricted to specific health systems, the GPC Reusable Observable Unified Study Environment (GROUSE) – a unique de-identified data resource, is created by merging Medicare and Medicaid claims with Electronic Health Records from all 13 GPC sites.