JobDescription.org

Science

Data Manager

Last updated

Data Managers in clinical research design and maintain the electronic data capture systems used in clinical trials, establish data quality standards, manage data validation programming, and oversee the process of cleaning and locking trial databases prior to statistical analysis. They work closely with biostatistics, clinical operations, and regulatory teams to ensure that study data is complete, accurate, and submission-ready.

Role at a glance

Typical education
Bachelor's degree in life science, CS, health informatics, or statistics
Typical experience
Not specified
Key certifications
CCDM (Certified Clinical Data Manager)
Top employer types
Biotechnology companies, Contract Research Organizations (CROs), pharmaceutical companies
Growth outlook
Sustained demand driven by escalating FDA expectations for CDISC-compliant data submissions
AI impact (through 2030)
Augmentation — AI-assisted query generation and automated anomaly detection reduce manual cleaning workloads, shifting the role toward overseeing AI-flagged patterns and resolving complex issues algorithms cannot handle.

Duties and responsibilities

  • Design and build electronic data capture (EDC) databases in Medidata Rave, Oracle InForm, or Veeva Vault based on protocol specifications
  • Develop data management plans (DMPs) defining data collection standards, validation rules, and quality control procedures
  • Create edit checks and validation programs to detect out-of-range values, inconsistencies, and missing data during trial conduct
  • Lead the user acceptance testing (UAT) process for EDC databases, coordinating sign-off from clinical, medical, and biostatistics teams
  • Manage ongoing data cleaning: generate query listings, route queries to sites, and track resolution through database lock
  • Execute data reconciliation between EDC, external lab data, eCOA, and safety databases
  • Prepare and execute the database lock process: final query resolution, missing data reconciliation, data audit trail review
  • Coordinate with biostatistics on CDASH/SDTM dataset preparation and ADAM derivations for regulatory submissions
  • Develop coding conventions and manage medical coding in MedDRA and WHO Drug dictionaries
  • Support regulatory inspections: prepare data management documentation, respond to health authority data questions, and manage audit trail requests

Overview

Data Managers are responsible for the integrity of clinical trial data from first entry through regulatory submission. They build the EDC systems that sites use to enter data, write the validation rules that catch errors as they happen, manage the cleaning process that resolves discrepancies, and execute the database lock that hands clean data to biostatistics.

EDC database development is the foundational technical work. Building a clinical database for a complex Phase III trial involves translating a 150-page protocol into structured forms, field definitions, controlled terminology, and edit checks. Design decisions made during database build — how a date field is formatted, whether a required field has a missing data code option, how adverse event severity is coded — affect data quality for the entire two-to-four-year life of the study.

Data cleaning is the ongoing daily work during trial conduct. Edit checks flag anomalous values automatically; queries go to sites for resolution; responses come back and the query closes or requires further discussion. In a 50-site global study, the query workload can run into hundreds of open items at any point. Data managers who keep query aging under control — most queries resolved within 30 days of opening — arrive at database lock with weeks of work rather than months of work remaining.

Database lock preparation brings together data management, biostatistics, clinical operations, and medical monitoring to confirm that every significant data issue has been resolved, all external data sources have been reconciled, and the audit trail is complete. On complex programs, this process involves formal sign-off meetings and may span several weeks. The locked database then goes to biostatistics for the primary analysis that drives the regulatory submission.

Qualifications

Education:

  • Bachelor's degree in a life science, computer science, health informatics, or statistics
  • Master's degree in clinical research, health informatics, or biostatistics is valued for senior DM roles
  • CCDM (Certified Clinical Data Manager) through SCDM is the primary professional certification for this field

Technical skills:

  • EDC platforms: Medidata Rave (most widely used; Rave Designer and Rave Architect skills are distinct), Oracle InForm, Veeva Vault EDC
  • CDISC standards: CDASH for data collection design, SDTM for submission datasets, ADAM for analysis datasets
  • Medical coding: MedDRA hierarchy and coding conventions, WHO Drug dictionary, coding review processes
  • SAS: DATA step and PROC SQL for data cleaning and SDTM conversion (expected for senior and lead roles)
  • Python: pandas and data validation scripting increasingly requested as an alternative or complement to SAS

Regulatory knowledge:

  • ICH E6(R3): data management requirements and essential documents for GCP compliance
  • FDA 21 CFR Part 11: electronic records and electronic signatures requirements for EDC validation
  • FDA Technical Conformance Guide for CDISC data standards in NDA/BLA submissions
  • Data Management Plan development and sponsor SOP compliance

Soft skills:

  • Methodical attention to detail in validation logic — an edit check that fires incorrectly generates unnecessary queries and erodes site trust
  • Cross-functional communication: data managers field questions from clinical, medical, biostatistics, and regulatory simultaneously
  • Project management: managing query aging, database lock timelines, and UAT schedules across multiple studies

Career outlook

Clinical Data Managers occupy a specialized but essential niche in the clinical research workforce. Every clinical trial requires data management infrastructure, and the complexity of that infrastructure has grown substantially as regulatory data standards (CDISC), electronic patient-reported outcomes (eCOA), and decentralized trial data streams have multiplied the sources of data that must be integrated into a clean, submission-ready database.

The FDA's escalating expectations for CDISC-compliant data submissions in all NDA and BLA applications have created sustained demand for data managers who understand SDTM and ADAM standards. Companies that have historically run trials with non-standardized data structures are being required to convert and resubmit, creating remediation work alongside new trial design work.

AI is entering the data quality space in meaningful ways. Automated anomaly detection and AI-assisted query generation are reducing the manual workload for routine data cleaning. These tools are changing the role rather than eliminating it — experienced data managers shift from executing routine query cycles to overseeing AI-flagged patterns and making judgment calls about complex data issues that algorithms cannot resolve.

Decentralized trial data integration represents both a challenge and a growth area. When participants wear biosensors, complete eCOA instruments on tablets, and have blood drawn by at-home nurses who upload results to external systems, integrating all of those data streams into the master EDC requires the kind of data architecture and reconciliation work that data managers are uniquely positioned to do.

For career advancement, Lead Data Manager and Data Management Lead roles carry team management responsibility and earn $100K–$125K. Data Management Director and Head of Data Management at mid-size biotechs or CROs earn $130K–$175K. Data managers with SDTM programming experience can also move into biostatistics programming roles with significant salary uplift.

Sample cover letter

Dear Hiring Manager,

I'm applying for the Data Manager position at [Company/CRO]. I've been a clinical data manager at [Company] for four years, managing EDC design, data cleaning, and database lock on three Phase II oncology studies — two using Medidata Rave and one using Oracle InForm.

On my most recent study I served as the primary DM from protocol finalization through database lock. I built the Rave database from the annotated CRF, developed the DMP, programmed 140 edit checks, led UAT with the clinical and biostatistics teams, and managed the query workflow for 32 sites through a 24-month enrollment period. We hit database lock 10 days ahead of the milestone date committed to the regulatory team, with fewer than 20 outstanding minor queries at lock versus the 80+ that had been typical on prior programs.

The improvement came from tightening the UAT process. On previous studies, edit checks that fired incorrectly weren't caught until they generated site queries. I added a manual spot-check layer in UAT specifically for edit check false-positive rates, which identified 12 checks that needed logic adjustment before we went live. The result was a meaningfully lower query burden on sites and faster resolution.

I hold the CCDM credential and I've been working through the CDISC SDTM training modules to build submission standards knowledge for late-phase work. Your Phase IIb-to-III transition program looks like the right context to apply that training.

Thank you for your consideration.

[Your Name]

Frequently asked questions

What is a database lock in clinical trials?
Database lock is the formal process by which the EDC system is made read-only after data cleaning is complete and all outstanding queries are resolved. Once the database is locked, no further modifications can be made without a documented re-opening procedure. The locked database is the data set transferred to biostatistics for the primary statistical analysis. Database lock is a major trial milestone, and the quality of data management throughout the trial determines how clean and fast that process is.
What is a Data Management Plan?
A Data Management Plan (DMP) is a document created at study startup that defines how data will be collected, validated, cleaned, and transferred throughout the trial. It specifies the EDC system, edit check logic, coding conventions, query workflows, external data handling, and database lock procedures. The DMP is a required essential document under ICH E6 and is typically reviewed by the sponsor and may be reviewed by health authorities during inspections.
What technical skills does a clinical Data Manager need?
EDC configuration in at least one major platform (Medidata Rave, Oracle InForm, or Veeva Vault EDC) is the core technical requirement. SAS or Python programming for data cleaning and SDTM conversion is increasingly expected at senior levels. CDASH and SDTM data standards knowledge is essential for late-phase programs with NDA/BLA submission requirements. MedDRA and WHO Drug coding experience is standard for any DM role in pharma.
How is the Data Manager role affected by AI tools?
AI-driven anomaly detection is being applied to clinical data quality monitoring, flagging unusual patterns that might indicate data entry errors or site-level issues faster than traditional edit checks. Natural language processing tools are also being used for adverse event narrative coding. These tools reduce manual query generation for routine issues and shift the data manager's focus toward investigating complex data patterns that algorithms surface but cannot resolve on their own.
What is CDASH and why does it matter?
CDASH (Clinical Data Acquisition Standards Harmonization) is the CDISC standard that defines how clinical trial data should be collected at the site level — the field names, formats, and structures that make data readily convertible to SDTM for regulatory submission. FDA expects NDA and BLA submissions to include SDTM-compliant datasets, and building to CDASH from the start makes SDTM conversion more efficient. Data managers who understand CDASH design trials that are easier to submit.