Q1. Linux Shell Commands

Q1.1

This exercise (and later in this course) uses the MIMIC-IV data, a freely accessible critical care database developed by the MIT Lab for Computational Physiology. Follow the instructions at https://mimic-iv.mit.edu/docs/access/ to (1) complete the CITI Data or Specimens Only Research course and (2) obtain the PhysioNet credential for using the MIMIC-IV data. Display the verification links to your completion report and completion certificate here. (Hint: The CITI training takes a couple hours and the PhysioNet credentialing takes a couple days; do not leave it to the last minute.)

solution: The verification links to the completion report and completion certificate.

Q1.2

The /usr/203b-data/mimic-iv/ folder on teaching server contains data sets from MIMIC-IV. Refer to https://mimic-iv.mit.edu/docs/datasets/ for details of data files.

ls -l /usr/203b-data/mimic-iv

Please, do not put these data files into Git; they are big. Do not copy them into your directory. Do not decompress the gz data files. These create unnecessary big files on storage and are not big data friendly practices. Just read from the data folder /usr/203b-data/mimic-iv directly in following exercises.

Use Bash commands to answer following questions.

solution: Done.

Q1.3

Display the contents in the folders core, hosp, icu. What are the functionalities of the bash commands zcat, zless, zmore, and zgrep?

solution:

ls -l /usr/203b-data/mimic-iv/core
ls -l /usr/203b-data/mimic-iv/hosp
ls -l /usr/203b-data/mimic-iv/icu

The functionalities of bash commands:

  • zcat: Line utility for viewing the contents of a compressed file without literally uncompressing it.
  • zmore: a filter which allows examination of compressed or plain text files one screenful at a time on a soft-copy terminal.
  • zless: works the same way as zmore, except the decompressed output is displayed by the less command for additional viewing flexibility.
  • zgrep: Search out expressions from a given a file even if it is compressed.

Q1.4

What’s the output of following bash script?

```bash
for datafile in /usr/203b-data/mimic-iv/core/*.gz
  do
    ls -l $datafile
  done
```

solution: The bash script print out all .gz files in the folder core.

Display the number of lines in each data file using a similar loop.

solution:

for datafile in /usr/203b-data/mimic-iv/core/*.gz
  do
    ls -l $datafile
    echo "the number of lines:" 
    zcat $datafile | awk 'END { print NR }'
  done

Q1.5

Display the first few lines of admissions.csv.gz. How many rows are in this data file? How many unique patients (identified by subject_id) are in this data file? What are the possible values taken by each of the variable admission_type, admission_location, insurance, language, marital_status, and ethnicity? Also report the count for each unique value of these variables. (Hint: combine Linux commands zcat, head/tail, awk, uniq, wc, and so on.)

solution:

zcat /usr/203b-data/mimic-iv/core/admissions.csv.gz | 
awk '(NR<=5)'
echo "the number of rows:"
zcat /usr/203b-data/mimic-iv/core/admissions.csv.gz | 
awk 'END { print NR }' 
echo "the number of unique patients: (colname row excluded)"
zcat /usr/203b-data/mimic-iv/core/admissions.csv.gz | 
awk  -F ',' '{ print $1 }' | sort | uniq |
tail -n +2 | awk 'END { print NR }'
for i in 6 7 9 10 11 12; 
do
echo "---------------------------"
zcat /usr/203b-data/mimic-iv/core/admissions.csv.gz | 
awk  -F ',' -v i=$i '{ print $i }' | 
awk '(NR<=1)''{printf "%-19s~%-20s\n", $1,
"(count & values (* NULL/NA included))"}' 
zcat /usr/203b-data/mimic-iv/core/admissions.csv.gz | 
awk  -F ',' -v i=$i '{ print $i }' | tail -n +2 | sort | uniq -c 
done

Q3. More fun with Linux

Try these commands in Bash and interpret the results: cal, cal 2021, cal 9 1752 (anything unusual?), date, hostname, arch, uname -a, uptime, who am i, who, w, id, last | head, echo {con,pre}{sent,fer}{s,ed}, time sleep 5, history | tail.

solution:

cal

cal display the calender of current month.

cal 2021

cal 2021 display the calender of all month in 2021.

cal 9 1752

cal 9 1752 seems display a incomplete calender of September 1752. Reason: The Gregorian calendar reform was adopted by the Kingdom of Great Britain in September 1752. As a result, the September 1752 cal shows the adjusted days missing. [wiki]

date

date returns the date in the default system timezone.

hostname

hostname provides the name of the server.

arch

arch provides the computer architecture.

uname -a

uname -a prints the name, version and other details about the current machine and the operating system running on it.

uptime

uptime returns information about how long your system has been running together with the current time, number of users with running sessions, and the system load averages for the past 1, 5, and 15 minutes.

who am i

who am i displays the username of the current user when this command is invoked.

who

who displays account information: user login name, user’s terminal, time of login as well as the host the user is logged in from.

w

w displays information about currently logged in users and what each user is doing.

id

id print real and effective User ID (UID) and Group ID (GID).

last | head

last | head displays the first 10 users logged in and out since the file /var/log/wtmp was created.

echo {con,pre}{sent,fer}{s,ed}

echo {con,pre}{sent,fer}{s,ed} generates all the permutations possible of a set of elements ({con,pre}{sent,fer}{s,ed}) stored in a variable in groups of 2 elements.

time sleep 5

time sleep 5 pauses execution of shell scripts or commands for a 5-second period on a Linux

set -o history
echo "zza"
history | tail

history | tail shows 10 of the last commands that have been recently used.