Imputation partial date portion of a --DTC variable based on user input.
Usage
impute_dtc_dt(
dtc,
highest_imputation = "n",
date_imputation = "first",
min_dates = NULL,
max_dates = NULL,
preserve = FALSE
)Arguments
- dtc
-
The
--DTCdate to imputeA character date is expected in a format like
yyyy-mm-ddoryyyy-mm-ddThh:mm:ss. Trailing components can be omitted and-is a valid "missing" value for any component.- Permitted values
a character date variable
- Default value
none
- highest_imputation
-
Highest imputation level
The
highest_imputationargument controls which components of the--DTCvalue are imputed if they are missing. All components up to the specified level are imputed.If a component at a higher level than the highest imputation level is missing,
NAis returned. For example, forhighest_imputation = "D""2020"results inNAbecause the month is missing.If
"n"(none, lowest level) is specified no imputation is performed, i.e., if any component is missing,NAis returned.If
"Y"(year, highest level) is specified,date_imputationmust be"first"or"last"andmin_datesormax_datesmust be specified respectively. Otherwise, an error is thrown.- Permitted values
"Y"(year, highest level),"M"(month),"D"(day),"n"(none, lowest level)- Default value
"n"
- date_imputation
-
The value to impute the day/month when a datepart is missing.
A character value is expected.
The
"first"and"last"keywords allow imputation to the first/last day/month. They can also be used to impute the year if used in conjunction with themin_datesormax_datesarguments. Some examples of this are available here.-
When
highest_imputationis"M"or"D", the"mid"keyword can also be specified to impute missing components to the middle of the possible range:If both month and day are missing, they are imputed as
"06-30"(middle of the year).If only day is missing, it is imputed as
"15"(middle of the month).
"<dd>"can be specified only ifhighest_imputation = "D". Missing days are imputed by the specified day, e.g."10"for the 10th day of the month. The specified day should be valid for all months as otherwise an error might be issued. For example,date_imputation = "30"results in an invalid date of "2024-02-30" for the partial date "2024-02"."<mm>-<dd>"can be specified only ifhighest_imputationis"M", e.g."06-15"for the 15th of June.
- Permitted values
a key-word, i.e.
"first","mid","last", or"<mm>-<dd>"or"<dd>"- Default value
"first"
- min_dates
-
Minimum dates
A list of dates is expected. It is ensured that the imputed date is not before any of the specified dates, e.g., that the imputed adverse event start date is not before the first treatment date. Only dates which are in the range of possible dates of the
dtcvalue are considered. The possible dates are defined by the missing parts of thedtcdate (see example below). This ensures that the non-missing parts of thedtcdate are not changed. A date or date-time object is expected. For exampleimpute_dtc_dtm( "2020-11", min_dates = list( ymd_hms("2020-12-06T12:12:12"), ymd_hms("2020-11-11T11:11:11") ), highest_imputation = "M" )returns
"2020-11-11T11:11:11"because the possible dates for"2020-11"range from"2020-11-01T00:00:00"to"2020-11-30T23:59:59". Therefore"2020-12-06T12:12:12"is ignored. Returning"2020-12-06T12:12:12"would have changed the month although it is not missing (in thedtcdate).- Permitted values
a list of dates, e.g.
list(ymd_hms("2021-07-01T04:03:01"), ymd_hms("2022-05-12T13:57:23"))- Default value
NULL
- max_dates
-
Maximum dates
A list of dates is expected. It is ensured that the imputed date is not after any of the specified dates, e.g., that the imputed date is not after the data cut off date. Only dates which are in the range of possible dates are considered. A date or date-time object is expected.
- Permitted values
a list of dates, e.g.
list(ymd_hms("2021-07-01T04:03:01"), ymd_hms("2022-05-12T13:57:23"))- Default value
NULL
- preserve
-
Preserve day if month is missing and day is present
For example
"2019---07"would return"2019-06-07ifpreserve = TRUE(anddate_imputation = "MID").- Permitted values
TRUE,FALSE- Default value
FALSE
Details
This is a vector-oriented helper and is not usually called directly on a data
frame with %>%.
Additionally, the function will throw an error if imputation rules cause an invalid datetime (e.g. "2020-02-31") to be generated. In this case, the user should adjust the imputation rules.
See also
Date/Time Computation Functions that returns a vector:
compute_age_years(),
compute_dtf(),
compute_duration(),
compute_tmf(),
convert_date_to_dtm(),
convert_dtc_to_dt(),
convert_dtc_to_dtm(),
convert_xxtpt_to_hours(),
impute_dtc_dtm()
Examples
library(lubridate)
dates <- c(
"2019-07-18T15:25:40",
"2019-07-18T15:25",
"2019-07-18T15",
"2019-07-18",
"2019-02",
"2019",
"2019",
"2019---07",
""
)
# No date imputation (highest_imputation defaulted to "n")
impute_dtc_dt(dtc = dates)
#> [1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" NA
#> [6] NA NA NA NA
# Impute to first day/month if date is partial
impute_dtc_dt(
dtc = dates,
highest_imputation = "M"
)
#> [1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-01"
#> [6] "2019-01-01" "2019-01-01" "2019-01-01" NA
# Same as above
impute_dtc_dt(
dtc = dates,
highest_imputation = "M",
date_imputation = "01-01"
)
#> [1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-01"
#> [6] "2019-01-01" "2019-01-01" "2019-01-01" NA
# Impute to last day/month if date is partial
impute_dtc_dt(
dtc = dates,
highest_imputation = "M",
date_imputation = "last",
)
#> [1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-28"
#> [6] "2019-12-31" "2019-12-31" "2019-12-31" NA
# Impute to mid day/month if date is partial
impute_dtc_dt(
dtc = dates,
highest_imputation = "M",
date_imputation = "mid"
)
#> [1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-15"
#> [6] "2019-06-30" "2019-06-30" "2019-06-30" NA
# Impute to a given day of the month if only day is missing
impute_dtc_dt(
dtc = dates,
highest_imputation = "D",
date_imputation = "10"
)
#> [1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-10"
#> [6] NA NA NA NA
# Impute a date and ensure that the imputed date is not before a list of
# minimum dates
impute_dtc_dt(
"2020-12",
min_dates = list(
ymd("2020-12-06"),
ymd("2020-11-11")
),
highest_imputation = "M"
)
#> [1] "2020-12-06"
# Impute completely missing dates (only possible if min_dates or max_dates is specified)
impute_dtc_dt(
c("2020-12", NA_character_),
min_dates = list(
ymd("2020-12-06", "2020-01-01"),
ymd("2020-11-11", NA)
),
highest_imputation = "Y"
)
#> [1] "2020-12-06" "2020-01-01"
