Getting started • Cohort2Trajectory

Introduction

The Cohort2Trajectory package is designed for patients’ medical trajectory creation. It accepts a target cohort of any size from your OMOP CDM instance and allows you to provide the state cohorts which will populate the observation time of the target cohort as trajectories. The package outputs patient level trajectories with no void time-interval described by your input state cohorts. These trajectories can be later used for further analysis, modeling and visualization!

Example with SampleCohort2Trajectory

Initiating the database connection

The package relies heavily on OMOP CDM, therefore a database connection must be initiated. The CDMConnector package is used to establish the database connection for running Cohort2Trajecotry. You can configure the connection either by reading credentials from a .Renviron file or explicitly writing them in your script.

user <- Sys.getenv("DB_USERNAME")
pw <- Sys.getenv("DB_PASSWORD")
server <- stringr::str_c(Sys.getenv("DB_HOST"), "/", Sys.getenv("DB_NAME"))
port <- Sys.getenv("DB_PORT")

cdmSchema <-
  Sys.getenv("OHDSI_CDM") # Schema which contains the OHDSI Common Data Model
cdmVocabSchema <-
  Sys.getenv("OHDSI_VOCAB") # Schema which contains the OHDSI Common Data Model vocabulary tables.
cdmResultsSchema <-
  Sys.getenv("OHDSI_RESULTS") # Schema which contains "cohort" table (is not mandatory)
writeSchema <-
  Sys.getenv("OHDSI_WRITE") # Schema for temporary tables, will be deleted
writePrefix <- "c2t_"

db = DBI::dbConnect(
  RPostgres::Postgres(),
  dbname = Sys.getenv("DB_NAME"),
  host = Sys.getenv("DB_HOST"),
  user = Sys.getenv("DB_USERNAME"),
  password = Sys.getenv("DB_PASSWORD"),
  port  = port
)

cdm <- CDMConnector::cdm_from_con(
  con = db,
  cdm_schema = cdmSchema,
  achille_schema = cdmResultsSchema,
  write_schema = c(schema = writeSchema, prefix = writePrefix),
)

For the purpose of the example, let us use a synthetic database.

db  <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomia_dir("GiBleed"))

# The Synthetic Eunomia database does not have defined cohorts, so let create a dummy table
cohorts <- data.frame(subject_id = c(6,6,6,6,123,123,123),
                         cohort_definition_id = c(1,2,3,2,1,2,3),
                         cohort_start_date = c(as.Date("1970-01-01"),
                                               as.Date("1970-02-18"),
                                               as.Date("1970-03-02"),
                                               as.Date("1970-04-01"),
                                               as.Date("1968-01-05"),
                                               as.Date("1968-02-01"),
                                               as.Date("1968-03-12")
                         ),
                         cohort_end_date = c(as.Date("1971-01-01"),
                                             as.Date("1970-02-28"),
                                             as.Date("1970-03-05"),
                                             as.Date("1970-04-15"),
                                             as.Date("1969-01-01"),
                                             as.Date("1968-02-04"),
                                             as.Date("1968-03-19"))
)

cdm <- CDMConnector::cdm_from_con(
  con = db,
  cdm_name = "eunomia",
  cdm_schema = "main",
  write_schema = "main"
)
cdm <- omopgenerics::insertTable(cdm, cohorts, name =  "cohort")
cdm$cohort <- omopgenerics::newCohortTable(cdm$cohort)

Study configuration

studyEnv <- cohort2TrajectoryConfiguration(
  baseUrl = NULL,
  studyName = "SampleCohort2Trajectory",
  pathToStudy = getwd(),
  atlasTargetCohort = 1, # The id of the target cohort
  atlasStateCohorts = c(2, 3), # The ids of the state cohort
  stateCohortLabels = c("test_state1", "test_state2"),
  stateCohortMandatory = c("test_state2"),
  stateCohortAbsorbing = c("test_state2"),
  outOfCohortAllowed = FALSE,
  trajectoryType = "Discrete",
  lengthOfStay = 30,
  stateSelectionType = "Priority",
  stateCohortPriorityOrder = c("test_state1", "test_state2"),
  runSavedStudy = FALSE,
  useCDM = TRUE,
  batchSize = 10
)

Warning as output is expected, the study used as an example already exists.

Warning message:
Study name already in use, consider renaming!

Importing data

To import and preprocess the data, the following function is used:

getDataForStudy(cdm = cdm, studyEnv = studyEnv)

Cleaned imported data is in the /SampleCohort2Trahectory/Data folder. Expected output:

> getDataForStudy(cdm = cdm,studyEnv = studyEnv)
✔ Importing data ... [64ms]
✔ Get cohort data success!
✔ Data cleaning completed!
ℹ Saved cleaned data /Git/Cohort2Trajectory/SampleCohort2Trajectory/Data/importedDataCleaned_1.csv
✔ Cleaning data ... [170ms]

Creating trajectories

To create trajectories the following function is used:

createTrajectories(cdm = cdm, runSavedStudy = F,studyEnv = studyEnv)

Expected output:

ℹ Creating batch 1!!!
ℹ Saved trajectory dataframe: /Git/Cohort2Trajectory/SampleCohort2Trajectory/Data/patientDataDiscrete.csv
ℹ Saving trajectories to the specified temp schema ...
ℹ Trajectories saved to the specified temp schema!
ℹ Saved settings to: /Git/Cohort2Trajectory/SampleCohort2Trajectory/Settings/trajectorySettings.csv
✔ Trajectories generated!

Created trajectories are in the /SampleCohort2Trajectory/Data folder.

Results

CSV Data Content
subject_id	state_label	state_start_date	state_end_date	time_in_cohort	seq_ordinal	gender_concept_id	age	state_id
6	START	1970-02-18	1970-02-18	0.0000000	1	8532	6.136	1
6	test_state1	1970-02-18	1970-03-20	0.0000000	1	8532	6.136	2
6	test_state1	1970-03-20	1970-04-19	0.0821355	2	8532	6.218	2
6	EXIT	1970-04-20	1970-04-20	0.0821355	1	8532	6.303	4
123	START	1968-02-01	1968-02-01	0.0000000	1	8507	17.807	1
123	test_state1	1968-02-01	1968-03-02	0.0000000	1	8507	17.807	2
123	test_state2	1968-03-02	1968-04-01	0.0821355	1	8507	17.889	3
123	EXIT	1968-04-02	1968-04-02	0.0821355	1	8507	17.974	4

To disconnect:

DBI::dbDisconnect(db)