Blog detail

DATA Standard in SAS Clinical Data integration

There are numerous ways SAS Clinical Data Integration helps users implement CDISC data standards. SAS Clinical Data Integration is built using SAS Data Integration Studio as its foundation. Then SAS Clinical Standards Toolkit is integrated into it, which provides metadata about the CDISC data standards and controlled terminology, as well as tools to check the compliance of study domains to the data standard. Within the user interface of SAS Clinical Data Integration, users can import data standards. These data standards come directly from SAS Clinical Standards Toolkit. There are several versions of SDTM, ADaM, and SEND data standards available for import. A data standard that has been imported into SAS Clinical Data Integration contains domain templates, which contain all of the metadata about each domain.

This includes all of the columns possible for the domain and their label, length, and format. Each column also has its own metadata; such as whether it is Required, Expected, or Permissible, whether it is a key variable, its XML data type, and any associated controlled terminology codelist. Each domain also has metadata defined. This includes the structure, title, and file name of its archive file (SAS v5 transport file) and its key variables. When a user creates a new Study in SAS Clinical Data Integration, they choose which data standard(s) they would like associated with the Study.

 Then, the user can create new Standard Domains within the study, where they choose which domains from the associated data standard(s) they would like to create. At the time they are created, these domains are metadata objects within SAS Clinical Data Integration and are an exact copy, including all metadata for the domains and their columns, of the domain templates from the data standard. From there, users create SAS Clinical Data Integration jobs to populate those domain instances with data by transforming the source data into the structure required by the domains.

Another piece of the submission puzzle is the define.xml document that accompanies the submission. This document describes the study, all of the data sets being submitted, their structure, code lists used, computational algorithms, comments, value-level metadata, and more. SAS Clinical Data Integration has a transformation that can create the define.xml using the metadata from the study, its domains, and the Controlled Terminology Package associated with the study.

Limitation of CDISC DATA Standards 

Most pharmaceutical companies have adopted the CDISC SDTM and ADaM data standards for submission of clinical study data to regulatory agencies. While these standards have come a long way in providing a standard format for submission data, the CDISC data models as they are published do not typically fit the data for a given clinical study perfectly. For example, in the SDTM model, each column within a domain is given a designation by CDISC as Required, Expected, or Permissible. CDISC provides a way to report much of the data that it finds to be commonly collected, with the understanding that some things may not apply to all studies.

So, for any given study, it is likely that a company would remove some of the Permissible and/or Expected columns from their domains. Many companies already had some kind of internal data standards implemented before CDISC started releasing their data standards. It is common for companies to use some hybrid of the CDISC standards and their internal, company-specific data standards. Most companies also find that the data they have collected for their studies do not fit perfectly into the existing CDISC data models. In this case, the company will often need to create additional variables within existing domains or create new custom domains altogether. Different therapeutic areas often have unique data collected. While CDISC has the initiative to develop Therapeutic Area data standards, these are still in development. Many companies have established their own standard domain templates for therapeutic area-specific data they are collecting. In the define.xml file created by the SAS Clinical Data Integration transformation, the source, algorithm, and comment values are not populated for domain columns by default. These are values that cannot be anticipated in a way that would allow standard values to be set by CDISC or SAS, and they are likely to be different in every study.