metacore and metatools 0.2.0 are live. Read package release information and hear from the new maintainer.
Metadata
ADaM
Community
Author
Liam Hobby
Published
August 4, 2025
A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} and GitHub - pharmaverse/metatools {metatools} have a new package maintainer
Hi, everyone! I’m Liam and I’m excited to announce that I have taken over as package maintainer for both A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} and GitHub - pharmaverse/metatools {metatools} from Christina Fillmore. I work at GSK as a clinical programmer and I am coming to the end of my second year in the industry. This is my first experience working within the open-source world, but I am a regular user of pharmaverse packages and am keen to get more involved with the community.
Christina remains on-hand as a mentor and I’d like to thank both her and Ben Straub for the continued support before we dive into the details of A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore}/ GitHub - pharmaverse/metatools {metatools} 0.2.0.
What’s new in metacore?
The goal of version 0.2.0 was to clarify the distinction between an imported Metacore spec, containing information about multiple datasets, and a subsetted spec containing information about just a single dataset (as achieved via Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset()).
We received a number of questions and issues raised where users were attempting to use a Metacore object containing metadata for multiple datasets in functions from GitHub - pharmaverse/metatools {metatools} that were designed to take a single, subsetted specification. When developing datasets, the typical workflow is to be working on a single dataset at a time - so subsetting the Metacore object is the logical thing to do. The issue was that the approach to functions in GitHub - pharmaverse/metatools {metatools} was inconsistent, with some functions permitting multiple specification metadata and others not.
Now, a Metacore object which has multiple datasets or one with a single dataset have been redesigned to be programmatically distinct, with the single dataset implemented as a subclass of Metacore called “DatasetMeta”.
From the users’ perspective there is one key change. A metadata object about a single dataset will be required for users to work with GitHub - pharmaverse/metatools {metatools} functions, which have had their API harmonised to accept only subsetted Metacore objects (via Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset()).
The print statements of both combined and subsetted Metacore objects have been refined to better illustrate the differences between them and provide more helpful information to the user.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Metacore object contains metadata for 5 datasets ────────────────────────────
→ ADSL (Subject-Level Analysis Dataset)
→ ADADAS (ADAS-Cog Analysis)
→ ADLBC (Analysis Dataset Lab Blood Chemistry)
→ ADTTE (AE Time To 1st Derm. Event Analysis)
→ ADAE (Adverse Events Analysis Dataset)
ℹ To use the Metacore object with metatools package, first subset a dataset using `metacore::select_dataset()`
The Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset() function is now explicit about what is being selected:
Error in `verify_DatasetMeta()`:
! The object supplied to the argument `metacore` is not a subsetted
Metacore object. Use `metacore::select_dataset()` to subset metadata for the
required dataset.
Related: soft deprecation of dataset_name in metatools
Additionally, the argument dataset_name has been soft-deprecated across all functions in GitHub - pharmaverse/metatools {metatools}. While the argument is still available and will not break existing code, using it will now issue a warning. This change encourages users to adopt the preferred workflow, creating a subsetted Metacore object, and improves performance by avoiding repeated subsetting operations each time these functions are called.
The full list of affected functions is included below. The dataset_name argument will remain available for at least one year from the release date of 0.2.0 before being fully removed.
Create Variable from Codelist — create_var_from_codelist • metatools metatools::create_var_from_codelist() now optionally allows the user to specify a codelist from which the new column should be generated. This is useful in situations like the one below where the user is trying to derive PARAM from PARAMCD but the codelist for the out_var (PARAM) does not contain the values of PARAMCD.
ID
Order
Code
Decode
PARAM
1
Alanine Aminotransferase
Alanine Aminotransferase
PARAM
2
Bilirubin
Bilirubin
PARAM
3
Creatine
Creatine
Example of default usage not providing the correct result:
# A tibble: 3 × 2
PARAMCD PARAM
<chr> <chr>
1 ALB <NA>
2 ALP <NA>
3 ALT <NA>
By default, Create Variable from Codelist — create_var_from_codelist • metatools metatools::create_var_from_codelist() takes the codelist of the out_var as input. The user can now overwrite this default with a specific codelist (in this case PARAMCD below) to achieve the desired result.
# A tibble: 3 × 2
PARAMCD PARAM
<chr> <chr>
1 ALB Albumin (g/L)
2 ALP Alkaline Phosphatase (U/L)
3 ALT Alanine Aminotransferase (U/L)
This function also provides a new option strict, which when set to TRUE (default) will issue a warning indicating any values in your input column that do not appear in the codelist.
Warning: In `create_var_from_codelist()`: The following values present in the input
dataset are not present in the codelist: DUMMY1 and DUMMY2
create_cat_var()
Create Categorical Variable from Codelist — create_cat_var • metatools metatools::create_cat_var() has been updated so that users can now specify to create a new variable from either the code or decode column of the controlled terminology. Previously, a codelist set-up like the one below would be evaluated from the code column only, leaving out the “years” text from the new variable.
Example of a codelist for AGEGR2
ID
Name
Data Type
Order
Code
Decode
AGEGR2
Pooled Age Group 2
text
1
<35
<35 years
AGEGR2
Pooled Age Group 2
text
2
35-49
35-49 years
AGEGR2
Pooled Age Group 2
text
3
>= 50
>= 50 years
Now, specifying the option create_from_decode = TRUE will allow you to create the variable based on the text in the decode column. If you are using this option to also create a numeric coded variable (in this case AGEGR2N), ensure your CT is set up so that the decode columns match.
# A tibble: 5 × 4
USUBJID AGE AGEGR2 AGEGR2N
<chr> <dbl> <chr> <dbl>
1 01-701-1015 63 18-64 years 1
2 01-701-1023 64 18-64 years 1
3 01-701-1028 71 65-80 years 2
4 01-701-1033 74 65-80 years 2
5 01-701-1034 77 65-80 years 2
This function now also provides a default strict = TRUE option, that issues a warning message if there are values in the reference column that do not fit into the categories in the controlled terminology. This can be disabled with strict = FALSE.
Warning: There are 2 observations in AGE that do not fit into the provided categories
for AGEGR2. Please check your controlled terminology.
Summary of Other Changes
Fixed a bug where the presence of variables with VLM in the value_spec table would prevent variables of the same name in different datasets being populated in the value_spec table.
Build a dataset from derived — build_from_derived • metatools metatools::build_from_derived() adds new options for the keep parameter that allow users to derive either all or only prerequisite columns from source datasets. Thanks to Matt Bearham for this amendment!
Combine the Domain and Supplemental Qualifier — combine_supp • metatools metatools::combine_supp() now adds the label found in QLABEL to the QNAM columns that are derived from supplementary datasets. Thanks to Bill Denney for this amendment!
Check Variable Names — check_variables • metatools metatools::check_variables() now provides a strict option that will issue a warning rather than throw an error when strict = FALSE.
Helpers for Developing Command Line Interfaces • cli {cli} output is now used across both packages and messaging for various functions has been improved.
What’s next?
The next step for both packages will be working through and closing out issues from the backlog, updating the examples and vignettes, and improving the user experience via more informative messaging.
For A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore}, there has been some interest in a UI to help users write custom specification readers for specs not in the standard P21 format. So this will be explored as well.
I hope to release the next update towards the end of the year, looking at an approximately 6-monthly release schedule going forward. Until then I encourage people to explore some of the new features and provide feedback on the changes through GitHub at the links below: