Undergraduate University Statistics Report using pharmaverseadam data

As part of my placement year as a Data Sciences Industrial Placement student in Biostatistics at Roche in the UK, I was required to produce a “Business Project” and present it to the entire Data Sciences department. I decided to use pharmaverseadam to design a brand-new R training, “trainStats”, for junior Biostatisticians, since the package includes realistic ADaM datasets that are ideal for statistical analyses. For maximum efficiency, I tied my business project with a quantitative project report, due August 2024, for my undergraduate degree in Mathematics, Operational Research and Statistics at Cardiff University.

The quantitative project report investigates statistical analyses on preliminary clinical trial data using the R Studio software as instructed by the trainStats program I have authored, to help ease new Biostatisticians in the industry. The software was built considering the needs of people who are new to the industry and are keen to pursue a career in Biostatistics.

I had a smooth experience with the {pharmaverseadam} package all throughout my business and university project. I was introduced to the package by Ross Farrugia while looking for open-source data to analyze for my project. The package was very easy to read and use, with excellent documentation on the pharmaverseadam website. As I was planning to share aggregated outputs (such as tables, listings and graphs) from clinical datasets externally to the university, even using historical clinical data was not allowed since external use of confidential data did not align with Roche’s data privacy principles.

Throughout the trainStats documentation, I have primarily used the adoe_ophtha ADaM dataset (containing ophthalmology safety data) from {pharmaverseadam} to allow for a variety of exploratory statistical analyses ranging from producing boxplots of the spread of data by visit day, computing standard deviation and confidence intervals for endpoints, as well as programming linear regression models and patient profiles. As adoe_ophtha contains visit day, active arm and endpoint data, it was ideal to use for training purposes. In addition, I did use the adsl dataset too, to encourage trainStats users to join and merge datasets, taking into account patient demographics such as age. Here is a snippet of the code I wrote to generate bar charts by active arm for the “Central Subfield Thickness” endpoint:

# For Central Subfield Thickness
adoe_CST$ARM <- factor(adoe_CST$ARM, levels = c("Placebo", "Xanomeline Low Dose", "Xanomeline High Dose"))

ggplot(data = subset(adoe_CST, ARM != "Screen Failure"), aes(x = ARM, y = AVAL)) +
  geom_bar(stat = "identity") +
  xlab("Active Arm") +
  ylab("Central Subfield Thickness / um") +
  ylim(0, max(adoe_CST$AVAL))

Below, is another example of the code I wrote to produce a boxplot displaying the analysis value of Central Subfield Thickness by patient visit days:

adoe_CST <- adoe_ophtha %>%
  filter(PARAMCD == "SCSUBTH")

adoe_DR <- adoe_ophtha %>%
  filter(PARAMCD == "SDRSSR")

# Boxplots for each visit day
boxplot(AVAL ~ AVISITN,
  data = adoe_CST,
  main = "Different boxplots for each visit day",
  xlab = "Visit Number",
  ylab = "Central Subfield Thickness/ um",
  col = "orange",
  border = "brown"
)

As you can see above, both of these code snapshots display the importance of clear logic and reasoning whilst coding by implementing strong data visualization techniques such as commenting. The code is simple and I personally found that using the {pharmaverseadam} package to produce various plots was very straightforward. The objective of trainStats was to help users familiarise themselves with ADaM datasets and my favorite element of the package was that the format of both synthetic ADaM datasets were incredibly similar to that of a true clinical trial ADaM for a study in Ophthalmology.

To further develop and improve the {pharmaverseadam} package, I believe including more endpoints in the adoe_ophtha dataset would be invaluable for future application and statistical analyses. Often ADOE datasets have several endpoints but the adoe_ophtha dataset only included 2 clinical parameters, namely “Central Subfield Thickness” and “Diabetic Retinopathy Severity Scale”. In addition, since the data is synthetic and randomly generated, the outputs had no significant correlations or trends from a statistical perspective in terms of disease progression or measures of central tendencies. Although, in this case, the emphasis was on understanding logic and reasoning whilst programming the statistical outputs, I experienced difficulties analysing the data quantitatively in my university report due to the high variation in data. Going forward, if there is a method to simulate the data less randomly, then that may be more useful for future dummy analyses on {pharmaverseadam} data.

Overall, my experience of using the {pharmaverseadam} package for the first time was excellent. The package was convenient to use in R Studio, and clearly formatted for multi-purpose use. I would definitely recommend using {pharmaverseadam} to all users in the industry, who are required to produce a piece of project work or any analyses/summary for external use, or even those keen to publicly publish articles and papers in their areas within pharma to the wider community, in a safe and responsible manner regarding external use of data. I would like to thank Ross Farrugia for introducing me to the package, and especially Edoardo Mancini for talking me through the package and supporting me throughout the business project and university report.

Last updated

2025-08-13 17:01:11.202972

Details

Source, Session info

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{parashar2024,
  author = {Parashar, Syon},
  title = {Undergraduate {University} {Statistics} {Report} Using
    Pharmaverseadam Data},
  date = {2024-09-16},
  url = {https://pharmaverse.github.io/blog/posts/2024-09-16_university_undergraduate_report/university_undergraduate_report.html},
  langid = {en}
}

For attribution, please cite this work as:

Parashar, Syon. 2024. “Undergraduate University Statistics Report Using Pharmaverseadam Data.” September 16, 2024. https://pharmaverse.github.io/blog/posts/2024-09-16_university_undergraduate_report/university_undergraduate_report.html.