Data science practice 3
Q1. Compile the ICU cohort in Practice 2 Q8 from the PostgreSQL database mimiciv
.
Below is an outline of steps.
Q1.1
Connect to database mimiciv
. We are going to use username postgres
with password postgres
to access the mimiciv
database.
# Load configuration settings first
#
# Connect to the database using the configuration settings
(con <- dbConnect(RPostgreSQL::PostgreSQL(),
dbname = dbname,
user = user,
password = password))
Q1.2
List all schemas in the mimiciv
database.
dbGetQuery(con, "SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA")
List tables in the mimiciv
database:
dbListTables(con)
List schemas and tables in the mimiciv
database (bash command).
psql -U postgres -d mimiciv -c "\dt *."
Q1.3
Connect to the icustays table. Note how to use Id()
to specify the schema containing the table.
icustays_tble <- tbl(con, Id(schema = "mimic_icu", table = "icustays")) %>%
print(width = Inf)
Q1.4
Connect to the patients table.
patients_tble <- tbl(con, Id(schema = "mimic_core", table = "patients")) %>%
print(width = Inf)
Q1.5
Connect to the admissions table.
admissions_tble <- tbl(con, Id(schema = "mimic_core", table = "admissions")) %>%
print(width = Inf)
Q1.6
Connect to the mimic_labevents_icu
table.
labevents_tble <- tbl(con, Id(schema = "public",
table = "mimic_labevents_icu")) %>%
print(width = Inf)
Q1.7
Connect to mimic_chartevents_icu
table.
chartevents_tble <- tbl(con, Id(schema = "public",
table = "mimic_chartevents_icu")) %>%
print(width = Inf)
Q1.8
Put things together. Using one chain of pipes %>%
to perform following data wrangling steps: (i) keep only the first ICU stay of each unique patient, (ii) merge in admissions and patients tables, (iii) keep adults only (age at admission >= 18), (iv) merge in the labevents and chartevents tables, (v) display the SQL query, (vi) collect SQL query result into memory as a tibble, (vii) create an indicator for 30-day mortality, (viii) save the final tibble to an icu_cohort.rds
R data file in the mimiciv_shiny
folder.
# make a directory mimiciv_shiny
if (!dir.exists("mimiciv_shiny")) {
dir.create("mimiciv_shiny")
}
which(duplicated(labevents_tble %>%
select(subject_id, hadm_id) %>%
collect()) == TRUE) %>% length
which(duplicated(chartevents_tble %>%
select(subject_id, hadm_id) %>%
collect()) == TRUE) %>% length
After a quick check, we found that there are some patients have more than one record at a single time point (duplicated resords). Thus, we need to only keep one record
icustays_tble %>%
# keep only the first ICU stay of each unique patient
group_by(subject_id) %>%
filter(rank(intime) == 1) %>%
ungroup() %>%
# merge in admissions and patients tables
left_join(patients_tble, by = c("subject_id")) %>%
left_join(admissions_tble, by = c("subject_id", "hadm_id")) %>%
# keep adults only (age at admission >= 18)
mutate(age_at_adm = year(admittime) - anchor_year + anchor_age) %>%
filter(age_at_adm >= 18) %>%
# merge in the labevents and chartevents tables
left_join(labevents_tble, by = c("subject_id", "hadm_id")) %>%
left_join(chartevents_tble, by = c("subject_id", "hadm_id")) %>%
# display the SQL query
show_query() %>%
# collect SQL query result into memory as a tibble
collect() %>%
# delete duplicate row
group_by(subject_id) %>%
slice_head(n = 1) %>%
ungroup() %>%
# create an indicator for 30-day mortality
mutate(death_30 = ifelse((deathtime - admittime)/(60*24) <= 30,
"Yes", "No")) %>%
mutate_at(vars(death_30),
function(x){ifelse(is.na(x) == TRUE, "No", x)}) %>%
# save the final tibble
saveRDS(file = "./mimiciv_shiny/icu_cohort.rds")
Close database connection and clear workspace.
dbDisconnect(con)
rm(list = ls())
Q2. Shiny app
Develop a Shiny app for exploring the ICU cohort data created in Q1. The app should reside in the mimiciv_shiny
folder. The app should provide easy access to the graphical and numerical summaries of variables (demographics, lab measurements, vitals) in the ICU cohort.
solution: Please refer to: https://zianzhuang.shinyapps.io/mimiciv_shiny/