Advanced Path Resolution
Sometimes data moves between environments during development, or you need to check multiple locations for files. This guide shows you how to set up dynamic path resolution that adapts to your workflow.
The Problem: Moving Data
Imagine this scenario with our friend Tidy McVerse:
- She starts programming with data in development:
/demo/DEV/username/project1/data
- Halfway through, the data becomes production-ready and moves to:
/demo/PROD/project1/data
- Her code should work without changes, regardless of where the data lives
Working Example Setup
library(envsetup)
# Create temporary directory structure
dir <- fs::file_temp()
dir.create(dir)
config_path <- file.path(dir, "_envsetup.yml")
# Write configuration with multiple data paths
file_conn <- file(config_path)
writeLines(
paste0(
"default:
paths:
data: !expr list(DEV = '", dir,"/demo/DEV/username/project1/data', PROD = '", dir, "/demo/PROD/project1/data')
output: '", dir, "/demo/DEV/username/project1/output'
programs: '", dir, "/demo/DEV/username/project1/programs'
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'"
), file_conn)
close(file_conn)
# Load and apply configuration
envsetup_config <- config::get(file = config_path)
rprofile(envsetup_config)
#> Assigned paths to __callr_data__Assigned paths to R_GlobalEnv
Understanding the Configuration
Let’s examine what we now have available:
# See all configured objects
ls(envsetup_environment)
#> character(0)
# Data is now a named list with multiple locations
get_path(data)
#> $DEV
#> [1] "/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/DEV/username/project1/data"
#>
#> $PROD
#> [1] "/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/PROD/project1/data"
get_path(output)
#> [1] "/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/DEV/username/project1/output"
get_path(programs)
#> [1] "/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/DEV/username/project1/programs"
get_path(envsetup_environ)
#> [1] "DEV"
Using read_path() for Smart File Location
The read_path()
function searches through your path list
to find files:
# Create the directory structure
dir.create(file.path(dir, "/demo/DEV/username/project1/data"), recursive = TRUE)
dir.create(file.path(dir, "/demo/PROD/project1/data"), recursive = TRUE)
# Add data only to PROD location
saveRDS(mtcars, file.path(dir, "/demo/PROD/project1/data/mtcars.RDS"))
# read_path() finds the file in PROD
read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/PROD/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/PROD/project1/data/mtcars.RDS"
Path Search Order
When data exists in multiple locations, read_path()
follows the search order:
# Add the same data to DEV location
saveRDS(mtcars, file.path(dir, "/demo/DEV/username/project1/data/mtcars.RDS"))
# Now read_path() returns DEV location (first in search order)
read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/DEV/username/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/DEV/username/project1/data/mtcars.RDS"
Controlling Search Order with envsetup_environ
The envsetup_environ
variable controls which paths are
searched:
- DEV: Searches DEV first, then PROD
- PROD: Searches only PROD (skips DEV)
Environment-Specific Path Resolution
Let’s add a production configuration that changes the search behavior:
# Update config to include prod environment
file_conn <- file(config_path)
writeLines(
paste0(
"default:
paths:
data: !expr list(DEV = '",dir,"/demo/DEV/username/project1/data', PROD = '",dir,"/demo/PROD/project1/data')
output: '",dir,"/demo/DEV/username/project1/output'
programs: '",dir,"/demo/DEV/username/project1/programs'
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'
prod:
paths:
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'PROD'); 'PROD'"
), file_conn)
close(file_conn)
# Load production configuration
envsetup_config <- config::get(file = config_path, config = "prod")
rprofile(envsetup_config)
#> Assigned paths to __callr_data__Assigned paths to R_GlobalEnv
# Check the environment setting
get_path(envsetup_environ)
#> [1] "PROD"
Production Path Resolution
With the production configuration, path resolution behavior changes:
# In production, read_path() returns PROD location even though DEV exists
read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/PROD/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpBB8Rks/file1e3d58cd99c1/demo/PROD/project1/data/mtcars.RDS"
Benefits of Dynamic Paths
- Workflow Flexibility: Code works as data moves through environments
-
Environment Awareness: Different search strategies
per environment
- Fallback Logic: Automatic fallback to alternative locations
- Code Stability: No code changes needed when paths change