Function for discovering ngrams from the extracted data from CohortContrast function
Source:R/phase_mining.R
nGramDiscovery.RdFunction for discovering ngrams from the extracted data from CohortContrast function
Value
A data frame of enriched n-grams, or NULL if no significant n-grams are found. The returned table contains one row per detected n-gram with cluster assignment, concept identifiers and names, observed and expected counts, over-representation statistics, person counts, timing summaries, and p-values used to flag significant patterns.
Examples
# \donttest{
if (requireNamespace("nanoparquet", quietly = TRUE) &&
requireNamespace("Matrix", quietly = TRUE) &&
requireNamespace("vegan", quietly = TRUE) &&
requireNamespace("cluster", quietly = TRUE)) {
studyDir <- system.file("example", "st", package = "CohortContrast")
study <- loadCohortContrastStudy("lc500", pathToResults = studyDir)
ngrams <- nGramDiscovery(study)
head(ngrams)
}
#> ! Running n-gram discovery...
#> ! Running n = 2
#> ! Running n = 3
#> ! Running n = 4
#> ! Running n = 5
#> ✔ Running n-gram discovery.
#> ! Running n-gram clustering...
#> ✔ Running n-gram clustering.
#> # A tibble: 6 × 15
#> ngram_label ngram_cluster ngram_ids ngram_names observed_ngram_count
#> <chr> <int> <list> <list> <int>
#> 1 Death + Death Certif… 1 <chr [2]> <chr [2]> 156
#> 2 Death + Palliative c… 1 <chr [2]> <chr [2]> 185
#> 3 Death + Death 1 <chr [2]> <chr [2]> 186
#> 4 Radiotherapy + Chemo… 2 <chr [2]> <chr [2]> 322
#> 5 Bronchoscopy + Needl… 3 <chr [2]> <chr [2]> 228
#> 6 Malignant tumor of l… 3 <chr [2]> <chr [2]> 276
#> # ℹ 10 more variables: expected_ngram_count <dbl>, overrep_ratio <dbl>,
#> # n_unique_persons <int>, unique_persons_ids <list>, ngram_time_diff <dbl>,
#> # ngram_time <dbl>, observed_count_p_value <dbl>, observed_test <lgl>,
#> # persons_count_p_value <dbl>, persons_test <lgl>
# }