Recent discussions within the R Consortium Submission Working Group have highlighted challenges in handling dates between SAS and Unix systems. In addition, during the CDISC Pilot Data update at the Phuse Test Data Factory, using R’s Unix time format resulted in dates (TRTSDT, TRTEDT, and ADT in adlbc.xpt) being mistakenly advanced by +10 years.
This blog post explores the differences in date handling between SAS and R, focusing on epoch discrepancies and data types. It also discusses key considerations for conversion tools to ensure accurate date conversions and maintain data integrity.
Epoch in R and SAS
An epoch is a reference point from which time is measured. In computing, it’s often used as the starting point for a system’s time-keeping, allowing for the easy calculation of time intervals. Different systems use different epochs, which can lead to confusion if not properly managed.
In SAS, the epoch is defined as midnight on January 1, 1960. (01JAN1960:00:00:00).
R uses the Unix epoch which begins epoch begins at midnight on January 1, 1970 (01JAN1970:00:00:00)
For example, a date with a value of 19725 corresponds to 03JAN2014 in SAS and 03JAN2024 in R.
Date Differences Between R and SAS
Another important aspect to consider is the difference in data types for numeric dates in R and SAS.
SAS: SAS only supports numeric and character data types. Therefore, numeric dates in SAS need a format applied to be human-readable.
R: In contrast, R has specific data types for dates, which allow numeric dates to be rendered with a date format directly in data frames. The data types in R are: “Date” for dates, “hms” or “difftime” for time, and “POSIXct” or “POSIXt” for datetime.
These differences can lead to discrepancies if not properly accounted for during data conversion, potentially resulting in a date shift of 10 years when transferring data between SAS and R.
Considerations for Conversion
To illustrate how date conversions between SAS and R can be managed, let’s look at the haven
package in R. This package facilitates the conversion of data between SAS and R.
From SAS to R: Numeric variables with dates formats are identified as dates and converted to POSIX date(time) formats.
Detection of date, time, datetime variable from SAS by haven package
From R to SAS: POSIX date variables are converted to SAS numeric dates. By default, a format is added to these dates to make them human-readable. This format can be customized by changing the attribute or by using the
xportr
package to pull the format from metadata and apply it to the corresponding variable.Detection of date, time, datetime variable from R by haven package
Dealing with Date Interoperability in Dataset-JSON
Dataset-JSON v1.1 could potentially be used in future data exchanges between SAS and R. Given the differing approaches to handling dates across programming languages, the challenge lies in managing dates within a programming language agnostic data exchange format.
What is Dataset-JSON? Dataset-JSON is a dataset exchange standard that uses JSON. It is designed to meet regulatory submission requirements and other dataset exchange scenarios. It has the potential to replace XPT as the default format for clinical and device data submissions to regulatory authorities.
To address interoperability issues across programming languages, options include storing numeric dates as ISO 8601 strings with a targetdatatype
attribute to differentiate between character and numeric dates, or associating the epoch with the numeric date values. Refer to the Dataset-JSON specification for the latest date handling specifications. These approaches ensure that conversion tools receive sufficient information to handle dates correctly.
To ensure the correct handling of dates in Dataset-JSON, you can use the datasetjson R package, which is built to read and write CDISC Dataset JSON formatted datasets.
Conclusion
The 10-year date shift between R and SAS can be avoided by using appropriate conversion tools such as xportr and haven. To ensure correct date conversions, verify that the datetime variables in SAS and R have the correct data type and a date format for SAS. By paying attention to these details, you can maintain data integrity and avoid significant discrepancies during data conversion processes.
Last updated
2024-12-13 13:36:40.755842
Details
Reuse
Citation
@online{piraux2024,
author = {Piraux, Céline},
title = {Unix Versus {SAS} {Time}},
date = {2024-07-16},
url = {https://pharmaverse.github.io/blog/posts/2024-07-08_unix_vs_sas_date.../unix_vs_sas_datetime.html},
langid = {en}
}