Getting started
a00_introduction.Rmd
Introduction
The Cohort2Trajectory package is designed for patients’ medical trajectory creation. It accepts a target cohort of any size from your OMOP CDM instance and allows you to provide the state cohorts which will populate the observation time of the target cohort as trajectories. The package outputs patient level trajectories with no void time-interval described by your input state cohorts. These trajectories can be later used for further analysis, modeling and visualization!
Example with SampleCohort2Trajectory
Initiating the database connection
The package relies heavily on OMOP CDM, therefore a database
connection must be initiated. The CDMConnector
package is used to establish the database connection for running
Cohort2Trajecotry
. You can configure the connection either
by reading credentials from a .Renviron
file or explicitly
writing them in your script.
user <- Sys.getenv("DB_USERNAME")
pw <- Sys.getenv("DB_PASSWORD")
server <- stringr::str_c(Sys.getenv("DB_HOST"), "/", Sys.getenv("DB_NAME"))
port <- Sys.getenv("DB_PORT")
cdmSchema <-
Sys.getenv("OHDSI_CDM") # Schema which contains the OHDSI Common Data Model
cdmVocabSchema <-
Sys.getenv("OHDSI_VOCAB") # Schema which contains the OHDSI Common Data Model vocabulary tables.
cdmResultsSchema <-
Sys.getenv("OHDSI_RESULTS") # Schema which contains "cohort" table (is not mandatory)
writeSchema <-
Sys.getenv("OHDSI_WRITE") # Schema for temporary tables, will be deleted
writePrefix <- "c2t_"
db = DBI::dbConnect(
RPostgres::Postgres(),
dbname = Sys.getenv("DB_NAME"),
host = Sys.getenv("DB_HOST"),
user = Sys.getenv("DB_USERNAME"),
password = Sys.getenv("DB_PASSWORD"),
port = port
)
cdm <- CDMConnector::cdm_from_con(
con = db,
cdm_schema = cdmSchema,
achille_schema = cdmResultsSchema,
write_schema = c(schema = writeSchema, prefix = writePrefix),
)
For the purpose of the example, let us use a synthetic database.
db <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomia_dir("GiBleed"))
# The Synthetic Eunomia database does not have defined cohorts, so let create a dummy table
cohorts <- data.frame(subject_id = c(6,6,6,6,123,123,123),
cohort_definition_id = c(1,2,3,2,1,2,3),
cohort_start_date = c(as.Date("1970-01-01"),
as.Date("1970-02-18"),
as.Date("1970-03-02"),
as.Date("1970-04-01"),
as.Date("1968-01-05"),
as.Date("1968-02-01"),
as.Date("1968-03-12")
),
cohort_end_date = c(as.Date("1971-01-01"),
as.Date("1970-02-28"),
as.Date("1970-03-05"),
as.Date("1970-04-15"),
as.Date("1969-01-01"),
as.Date("1968-02-04"),
as.Date("1968-03-19"))
)
cdm <- CDMConnector::cdm_from_con(
con = db,
cdm_name = "eunomia",
cdm_schema = "main",
write_schema = "main"
)
cdm <- omopgenerics::insertTable(cdm, cohorts, name = "cohort")
cdm$cohort <- omopgenerics::newCohortTable(cdm$cohort)
Study configuration
studyEnv <- cohort2TrajectoryConfiguration(
baseUrl = NULL,
studyName = "SampleCohort2Trajectory",
pathToStudy = getwd(),
atlasTargetCohort = 1, # The id of the target cohort
atlasStateCohorts = c(2, 3), # The ids of the state cohort
stateCohortLabels = c("test_state1", "test_state2"),
stateCohortMandatory = c("test_state2"),
stateCohortAbsorbing = c("test_state2"),
outOfCohortAllowed = FALSE,
trajectoryType = "Discrete",
lengthOfStay = 30,
stateSelectionType = "Priority",
stateCohortPriorityOrder = c("test_state1", "test_state2"),
runSavedStudy = FALSE,
useCDM = TRUE,
batchSize = 10
)
Warning as output is expected, the study used as an example already exists.
:
Warning messagein use, consider renaming! Study name already
Importing data
To import and preprocess the data, the following function is used:
getDataForStudy(cdm = cdm, studyEnv = studyEnv)
Cleaned imported data is in the /SampleCohort2Trahectory/Data folder. Expected output:
> getDataForStudy(cdm = cdm,studyEnv = studyEnv)
✔ Importing data ... [64ms]!
✔ Get cohort data success!
✔ Data cleaning completed/Git/Cohort2Trajectory/SampleCohort2Trajectory/Data/importedDataCleaned_1.csv
ℹ Saved cleaned data ✔ Cleaning data ... [170ms]
Creating trajectories
To create trajectories the following function is used:
createTrajectories(cdm = cdm, runSavedStudy = F,studyEnv = studyEnv)
Expected output:
1!!!
ℹ Creating batch : /Git/Cohort2Trajectory/SampleCohort2Trajectory/Data/patientDataDiscrete.csv
ℹ Saved trajectory dataframe
ℹ Saving trajectories to the specified temp schema ...!
ℹ Trajectories saved to the specified temp schema: /Git/Cohort2Trajectory/SampleCohort2Trajectory/Settings/trajectorySettings.csv
ℹ Saved settings to! ✔ Trajectories generated
Created trajectories are in the
/SampleCohort2Trajectory/Data
folder.
Results
subject_id | state_label | state_start_date | state_end_date | time_in_cohort | seq_ordinal | gender_concept_id | age | state_id |
---|---|---|---|---|---|---|---|---|
6 | START | 1970-02-18 | 1970-02-18 | 0.0000000 | 1 | 8532 | 6.136 | 1 |
6 | test_state1 | 1970-02-18 | 1970-03-20 | 0.0000000 | 1 | 8532 | 6.136 | 2 |
6 | test_state1 | 1970-03-20 | 1970-04-19 | 0.0821355 | 2 | 8532 | 6.218 | 2 |
6 | EXIT | 1970-04-20 | 1970-04-20 | 0.0821355 | 1 | 8532 | 6.303 | 4 |
123 | START | 1968-02-01 | 1968-02-01 | 0.0000000 | 1 | 8507 | 17.807 | 1 |
123 | test_state1 | 1968-02-01 | 1968-03-02 | 0.0000000 | 1 | 8507 | 17.807 | 2 |
123 | test_state2 | 1968-03-02 | 1968-04-01 | 0.0821355 | 1 | 8507 | 17.889 | 3 |
123 | EXIT | 1968-04-02 | 1968-04-02 | 0.0821355 | 1 | 8507 | 17.974 | 4 |
To disconnect:
DBI::dbDisconnect(db)