Skip to contents

Here we will walk-through how to update _envsetup.yml to meet your needs. The configuration is currently setup to address:

  1. paths
  2. autos

PATHS

This adds envsetup:paths to your search path which contains all of the relevant objects needed to point to different directories in your environment.

Level 1 of config: the execution environment (ex. dev, qa or prod)

Scripts typically execute in different environments depending on your workflow. Here we have a workflow where multiple developers work in dev making scripts, they move to qa for some quality checks and sign off, then move to prod where they are executed for delivery.

default:

dev:

qa:

prod:

Level 2 of config: paths and autos

Each execution environment might have slightly different configurations. This allows us to change the configuration to meet the needs of each environment.

default:
  paths:
  autos:

dev:
  paths:
  autos:

qa:
  paths:
  autos:

prod:
  paths:
  autos:

Level 3 of config: configure the environment

This is best illustrated with an example. For this example, we will focus on setting up one environment, the default configuration.

If you wish to have different configurations based off your environment, you would need to expand this to fit your needs.

default:
  paths:
  autos:

A working example

First we will need to read in data, write out results and save the script for future reference for a project we’ll call project1. So we need an object to point to each of these locations, and we add the data, output and programs objects to our config.

default:
  paths:
    data: "/demo/DEV/username/project1/data"
    output: "/demo/DEV/username/project1/output"
    programs: "/demo/DEV/username/project1/programs"

A working example is even better, so let’s create a temporary directory and store this config file as _envsetup.yml.

library(envsetup)
#> 
#> Attaching package: 'envsetup'
#> The following object is masked from 'package:base':
#> 
#>     library

# create temporary directory
dir <- fs::file_temp()
dir.create(dir)
config_path <- file.path(dir, "_envsetup.yml")

# write a config file to it
file_conn <- file(config_path)
writeLines(
"default:
  paths:
    data: '/demo/DEV/username/project1/data'
    output: '/demo/DEV/username/project1/output'
    programs: '/demo/DEV/username/project1/programs'", file_conn)
close(file_conn)

We can then call rprofile(), passing in this configuration.

# Set up the project
envsetup_config <- config::get(file = config_path)

rprofile(envsetup_config)
#> Attaching paths to envsetup:paths

We now have data, output and programs available to us in our search path within envsetup:paths. Let’s take a look:

objects("envsetup:paths")
#> [1] "auto_stored_envsetup_config" "data"                       
#> [3] "output"                      "programs"

data
#> [1] "/demo/DEV/username/project1/data"
output
#> [1] "/demo/DEV/username/project1/output"
programs
#> [1] "/demo/DEV/username/project1/programs"

Alright!

Now let’s go one step further and imagine a programmer, we’ll call them Tidy McVerse. Miss McVerse needs to read in some data and this data is in the development area when she started programming.

This is great! We already have the object data that points to “/demo/DEV/username/project1/data”.

Half way through programming, the data was considered production ready and the data moved from “/demo/DEV/username/project1/data” to “/demo/PROD/project1/data”. Miss McVerse should not need to change her programs now, she needs a way to read data that is smarter than the average bear.

The same object she uses to read in the data should work if the data is in “/demo/DEV/username/project1/data” or “/demo/PROD/project1/data”.

Let’s create a config to keep Tidy McVerse happy and focused on the results, not data locations.

Here we have a configuration where we execute some R code to build a list for our possible data sources, see the config package for details.

default:
  paths:
    data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data')
    output: '/demo/DEV/username/project1/output'
    programs: '/demo/DEV/username/project1/programs'
    envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'

Once again, we have a working example if you would like to code along. We will overwrite the previous config file with our new config.

file_conn <- file(config_path)
writeLines(
  paste0(
"default:
  paths:
    data: !expr list(DEV = '",dir,"/demo/DEV/username/project1/data', PROD = '",dir,"/demo/PROD/project1/data')
    output: '",dir,"/demo/DEV/username/project1/output'
    programs: '",dir,"/demo/DEV/username/project1/programs'
    envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'"
  ), file_conn)
close(file_conn)

Now we can re-setup the project.

# Set up the project
envsetup_config <- config::get(file = config_path)

rprofile(envsetup_config)
#> Attaching paths to envsetup:paths

We have data, output and programs available to us in our search path within envsetup:paths, but data is now a named list with two locations. We also now have envsetup_environ which, we will get into more details later, just accept it exists for now.

objects("envsetup:paths")
#> [1] "auto_stored_envsetup_config" "data"                       
#> [3] "envsetup_environ"            "output"                     
#> [5] "programs"

data
#> $DEV
#> [1] "/tmp/RtmpEUQIRc/filebd6230314354/demo/DEV/username/project1/data"
#> 
#> $PROD
#> [1] "/tmp/RtmpEUQIRc/filebd6230314354/demo/PROD/project1/data"
output
#> [1] "/tmp/RtmpEUQIRc/filebd6230314354/demo/DEV/username/project1/output"
programs
#> [1] "/tmp/RtmpEUQIRc/filebd6230314354/demo/DEV/username/project1/programs"
envsetup_environ
#> [1] "DEV"

We can use envsetup::read_path() to help us find where the data is we would like to read.

Let’s create the directories in our temporary folder structure …

dir.create(file.path(dir, "/demo/DEV/username/project1/data"), recursive = TRUE)
dir.create(file.path(dir, "/demo/PROD/project1/data"), recursive = TRUE)

… and add mtcars to the PROD data directory, “/demo/PROD/project1/data”.

saveRDS(mtcars, file.path(dir, "/demo/PROD/project1/data/mtcars.RDS"))

Now we can use read_path(), passing in the path object data to find where to read mtcars.RDS. The data is only in PROD so the function returns the path to PROD mtcars.RDS.

read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpEUQIRc/filebd6230314354/demo/PROD/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpEUQIRc/filebd6230314354/demo/PROD/project1/data/mtcars.RDS"

Let’s keep going!

What if the data was in DEV and PROD?

Let’s save the same data to DEV …

saveRDS(mtcars, file.path(dir, "/demo/DEV/username/project1/data/mtcars.RDS"))

… and see what read_path() returns.

read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpEUQIRc/filebd6230314354/demo/DEV/username/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpEUQIRc/filebd6230314354/demo/DEV/username/project1/data/mtcars.RDS"

We see the path to DEV now instead of the path to PROD.

To explain this, we will now talk about envsetup_environ, which we set in the config earlier.

When we have multiple paths, as we do here with data, this controls which paths should be checked. This is just an index. Wherever the environment is found in the list, only this location to the end will be checked for data.

In this example below, we set envsetup_environ = 'DEV'. So DEV is first in our data list, meaning all locations are checked until the object is found or nothing is found.

default:
  paths:
    data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data')
    output: '/demo/DEV/username/project1/output'
    programs: '/demo/DEV/username/project1/programs'
    envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'

Let’s now add an execution environment for PROD. We cannot simply change envsetup_environ from DEV to PROD, or DEV wouldn’t work. We need to add a configuration to PROD, otherwise it will use default.

default:
  paths:
    data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data')
    output: '/demo/DEV/username/project1/output'
    programs: '/demo/DEV/username/project1/programs'
    envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'

prod:
  paths:
    envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'PROD'); 'PROD'

So we will write this new config out …

# overwrite the config file to the temporary directory previously setup
file_conn <- file(config_path)
writeLines(
  paste0(
"default:
  paths:
    data: !expr list(DEV = '",dir,"/demo/DEV/username/project1/data', PROD = '",dir,"/demo/PROD/project1/data')
    output: '",dir,"/demo/DEV/username/project1/output'
    programs: '",dir,"/demo/DEV/username/project1/programs'
    envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'

prod:
  paths:
    envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'PROD'); 'PROD'"
  ), file_conn)
close(file_conn)

… and use it to overwrite the project with our new configuration.

# setup the project
envsetup_config <- config::get(file = config_path)

rprofile(envsetup_config)
#> Attaching paths to envsetup:paths

Let’s check that envsetup_environ is now PROD.

envsetup_environ
#> [1] "DEV"

What! It isn’t PROD.

We must pass the configuration to config:get() telling it to use PROD.

envsetup_config <- config::get(file = config_path, config = "prod")

rprofile(envsetup_config)
#> Attaching paths to envsetup:paths

envsetup_environ
#> [1] "PROD"

Now lets see what has changed when we call read_path() for mtcars.RDS using the PROD configuration.

read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpEUQIRc/filebd6230314354/demo/PROD/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpEUQIRc/filebd6230314354/demo/PROD/project1/data/mtcars.RDS"

We see the path to PROD, even though data exists in both DEV and PROD. This is because data was indexed starting with the location of PROD, which is the last element in data, so only this location was checked, excluding DEV.

Miss McVerse no longer needs to think about where her data is in the workflow, and can use read_path(data, ...) to determine the correct path.

We can apply the same steps to update our configuration for output and programs to account for PROD as well.

AUTOS

This adds multiple environments to your search path, each of which contain objects that are automatically sourced.

default:
  autos:

A working example

So let’s go back to Tidy McVerse. She has created a custom, one off function and stored this in /demo/DEV/username/project1/script_library.

We will add this path to the autos config.

default:
  autos:
    dev_script_library: '/demo/DEV/username/project1/script_library'

Let’s look at a working example. We will create the directory, place a script into the folder …

# create the temp directory
dir <- fs::file_temp()
dir.create(dir)
dir.create(file.path(dir, "/demo/DEV/username/project1/script_library"), recursive = TRUE)

# write a function to the folder
file_conn <- file(file.path(dir, "/demo/DEV/username/project1/script_library/test.R"))
writeLines(
"test <- function(){print('test')}", file_conn)
close(file_conn)

# write the config
config_path <- file.path(dir, "_envsetup.yml")
file_conn <- file(config_path)
writeLines(
  paste0(
"default:
  autos:
    dev_script_library: '", dir,"/demo/DEV/username/project1/script_library'"
  ), file_conn)
close(file_conn)

… and call rprofile() passing in this config file.

envsetup_config <- config::get(file = config_path)

rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
#> Attaching functions from /tmp/RtmpEUQIRc/filebd627438468c/demo/DEV/username/project1/script_library to autos:dev_script_library

Now we can see autos:dev_script_library was added to the search path.

search()
#>  [1] ".GlobalEnv"               "autos:dev_script_library"
#>  [3] "package:envsetup"         "envsetup:paths"          
#>  [5] "package:stats"            "package:graphics"        
#>  [7] "package:grDevices"        "package:utils"           
#>  [9] "package:datasets"         "package:methods"         
#> [11] "Autoloads"                "tools:callr"             
#> [13] "package:base"

test() is available within this environment, and we can execute this function without sourcing.

objects("autos:dev_script_library")
#> [1] "test"

test()
#> [1] "test"

Why on earth would we need this?

Just as with our previous data example, these scripts can be in multiple locations during their qualification lifecycle.

So let’s say Tidy McVerse’s friend, Sir Purrr, has a function that is useful for others in this specific project, but it is already in prod. Miss McVerse would like to use her function in dev and Sir Purrr’s function in prod.

To illustrate this, let’s add the prod script library to our config …

default:
  autos:
    dev_script_library: '/demo/DEV/username/project1/script_library'
    prod_script_library: '/demo/PROD/project1/script_library'

… create the PROD directory and Sir Purrr’s function to PROD.

dir.create(file.path(dir, "/demo/PROD/project1/script_library"), recursive = TRUE)

# write a function to the folder
file_conn <- file(file.path(dir, "/demo/PROD/project1/script_library/test2.R"))
writeLines(
"test2 <- function(){print('test2')}", file_conn)
close(file_conn)

Then we can overwrite our _envsetup.yml

# write the config
file_conn <- file(config_path)
writeLines(
  paste0(
"default:
  autos:
    dev_script_library: '", dir,"/demo/DEV/username/project1/script_library'
    prod_script_library: '", dir,"/demo/PROD/project1/script_library'"
  ), file_conn)
close(file_conn)

… and overwrite the project with our new configuration.

envsetup_config <- config::get(file = config_path)

rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
#> Attaching functions from /tmp/RtmpEUQIRc/filebd627438468c/demo/PROD/project1/script_library to autos:prod_script_library
#> Attaching functions from /tmp/RtmpEUQIRc/filebd627438468c/demo/DEV/username/project1/script_library to autos:dev_script_library

Now we can see prod_script_library was added to the search path, the function test() and test2() are available, and we can execute these functions without a need for sourcing.

search()
#>  [1] ".GlobalEnv"                "autos:dev_script_library" 
#>  [3] "autos:prod_script_library" "package:envsetup"         
#>  [5] "envsetup:paths"            "package:stats"            
#>  [7] "package:graphics"          "package:grDevices"        
#>  [9] "package:utils"             "package:datasets"         
#> [11] "package:methods"           "Autoloads"                
#> [13] "tools:callr"               "package:base"

objects("autos:prod_script_library")
#> [1] "test2"

test()
#> [1] "test"
test2()
#> [1] "test2"

We can keep going and create different configurations for each execution environment, similar to what we did for PATHS above.

One example that we would not want to source any functions in dev, when executing in prod. This configuration example is one way you can handle this situation, by blanking out the dev script location when executing in prod.

# write the config
file_conn <- file(config_path)
writeLines(
  paste0(
"default:
  autos:
    dev_script_library: '", dir,"/demo/DEV/username/project1/script_library'
    prod_script_library: '", dir,"/demo/PROD/project1/script_library'
prod:
  autos:
    dev_script_library: ''"
  ), file_conn)
close(file_conn)

envsetup_config <- config::get(file = config_path, config = "prod")

rprofile(envsetup_config)