\(MOSAIC_{data}\)

Block 0: Introduction to R, Part 3

Oliver Nakoinz, Lizzie Scholtus, Néhémie Strupler

2022-07-11
License: CC BY-SA 4.0

Problem

plot(test_df[, 2:3],
     main  = "Testplot",
     lines(test_df[, 2:3])
points(test_df[1:3, 7:3],
       pch = 17,
       col = "redd",
       cex = 3)
text(test_df[, 2:3],
       labels = test_df[, 1],
       pos    = 4
       cex    = 1)

Answer

plot(test_df[, 2:3],
     main  = "Testplot",
     lines(test_df[, 2:3])
points(test_df[1:3, 7:3],
       pch = 17,
       col = "redd",
       cex = 3)
text(test_df[, 2:3],
       labels = test_df[, 1],
       pos    = 4
       cex    = 1)
plot(test_df[, 2:3],
     main  = "Testplot")
lines(test_df[, 2:3])
points(test_df[1:3, 2:3],
       pch = 17,
       col = "red",
       cex = 3)
text(test_df[, 2:3],
       labels = test_df[, 1],
       pos    = 4,
       cex    = 1)

What is control flow?

Control flow

Control flow

For

For loops repeat code a certain times and use an internal iteration variable.

for (i in 1:5){
    print(paste("Loop no.: ", i))
}
## [1] "Loop no.:  1"
## [1] "Loop no.:  2"
## [1] "Loop no.:  3"
## [1] "Loop no.:  4"
## [1] "Loop no.:  5"

While

While loops repeat code until a certain condition is met. Iteration variables, if required have to be installed manually.

Analyze Code!

i <- 1
while(i < 5){
    print(paste("Loop no.: ", i))
    i <- i + 1
}
## [1] "Loop no.:  1"
## [1] "Loop no.:  2"
## [1] "Loop no.:  3"
## [1] "Loop no.:  4"

While

Iteration variable and the variable for the condition need not to be the same.

i <- 1
a <- 0
while(a < 5000){
    a <- 2^i
    print(paste("Loop no.: ", i))
    i <- i + 1
}
## [1] "Loop no.:  1"
## [1] "Loop no.:  2"
## [1] "Loop no.:  3"
## [1] "Loop no.:  4"
## [1] "Loop no.:  5"
## [1] "Loop no.:  6"
## [1] "Loop no.:  7"
## [1] "Loop no.:  8"
## [1] "Loop no.:  9"
## [1] "Loop no.:  10"
## [1] "Loop no.:  11"
## [1] "Loop no.:  12"
## [1] "Loop no.:  13"

If

If allows for conditional code. The condition contains a logical value and can make use of logical operators: ==, !=, <, <=, >, >=

The terms can be combined with and & and or |.

if(3 == 4){print("abc")}

Repeat

Repeat repeats code until we break the loop with break.

i <- 1
repeat{print(paste("Loop no.: ", i))
    i <- i + 1
    if(i < 5){next}
    print(paste("Value: ", 2^i))
    if(i > 10){break}
    }
## [1] "Loop no.:  1"
## [1] "Loop no.:  2"
## [1] "Loop no.:  3"
## [1] "Loop no.:  4"
## [1] "Value:  32"
## [1] "Loop no.:  5"
## [1] "Value:  64"
## [1] "Loop no.:  6"
## [1] "Value:  128"
## [1] "Loop no.:  7"
## [1] "Value:  256"
## [1] "Loop no.:  8"
## [1] "Value:  512"
## [1] "Loop no.:  9"
## [1] "Value:  1024"
## [1] "Loop no.:  10"
## [1] "Value:  2048"

If Else

If Else tests for a condition. If the condition is True Code 1 is terminated if False execute Code 2.

if(3 == 4){print("abc")
    }else{print("deff")}
## [1] "deff"

Conditions

Write your own functions

Functions are container, shortcuts or names for pieces of code. Variables can be passed on to the functions as parameters.

analyze Code!

add <- function(a, b){
    c <- a + b
    return(c)
}
add(3, 5)
## [1] 8

Vector based methods: using vectors

x <- c(2, 4, 1, 5)
sqrt(x)
## [1] 1.414214 2.000000 1.000000 2.236068
mean(x)
## [1] 3

Vector based methods: apply and co.

apply is a function that runs other functions for every column or row of a matrix or dataframe. apply is usually faster than a loop.

analyze Code!

df <- data.frame(a = c(1, 2, 3, 4, 5),
                 b = c(5, 4, 3, 2, 1),
                 c = c(3, 5, 3, 7, 5))
apply(df,
      1,         # 1 for row; or 2 for column indexing
      mean)
## [1] 3.000000 3.666667 3.000000 4.333333 3.666667
apply(df,
      2,
      mean)
##   a   b   c 
## 3.0 3.0 4.6

Which approach do you prefer?

Tidyverse

Tidyverse is a philosophy and a style of data sciences within the R ecosphere, initiated by Hadley Wickham (now at RStudio). Tidyverse includes R-packages partly as part of the meta-package tidyverse. The following slides are mainly based on Wickham/Grolemund (2017): http://r4ds.had.co.nz/.

Tidyverse

library("tidyverse")

Tidy data

Tidy data

Tidy data

Tidy data

Tidy data

Readr

readr (Wickham/Hester 2021) reads data into tibbles.

library(readr)
ceramics <- readr::read_delim("../2data/22archdata/keramik.csv",
    ";",
    escape_double = FALSE,
    trim_ws = TRUE)
head(ceramics)
id Hoehe Breite Rechts Hoch Grab Dat Motive Beifunde_ker_etc Beifund_metall gemeinsamMit
1 5.0 17.0 324 7209 Brandgrab Lt B 1,0,0,0,1,0,1,0 Flasche H23 NA NA
2 5.0 18.2 316 7209 Körpergrab Lt C1 0,1,0,1,0,0,1,1 Schale S, Flasche S, Flasche S Hiebmesser, Bronzeanhänger NA
3 6.0 19.1 NA NA NA NA 1,1,0,0,0,0,0,0 NA NA 2
4 5.2 17.0 394 7187 Körpergrab Lt B 1,0,0,0,0,0,0,1 Flasche H50.2, Schale, Quarzit Pfeilspitze NA
5 8.2 17.4 350 7186 Körpergrab LtB/C 0,0,0,1,0,0,0,0 Flasche H29, Schüssel, Schale Knotenring, Knotenring, Armring, Messer NA
6 8.0 19.5 316 7209 Körpergrab Lt C1 0,1,0,1,0,0,0,1 Flasche S H50.5 NA NA

Tibbles

Pipe & Dplyr

The pipe (Bache/Wickham 2020) is forwarding the output of a function to the next function.

library(magrittr)
ceramics %>%
    dplyr::filter(Breite < 20) %>%
    dplyr::select(id,
                  Hoehe,
                  Breite,
                  Motive) -> ceramics2
head(keramik2)
id Hoehe Breite Motive
1 5.0 17.0 1,0,0,0,1,0,1,0
2 5.0 18.2 0,1,0,1,0,0,1,1
3 6.0 19.1 1,1,0,0,0,0,0,0
4 5.2 17.0 1,0,0,0,0,0,0,1
5 8.2 17.4 0,0,0,1,0,0,0,0
6 8.0 19.5 0,1,0,1,0,0,0,1

Pipe & Dplyr (old approach)

library(magrittr)
ceramics2 <- ceramics %>%
    dplyr::filter(Breite < 20) %>%
    dplyr::select(id,
                  Hoehe,
                  Breite,
                  Motive)

Pipe & Dplyr

ceramics %>%
    dplyr::mutate(g = Hoehe > 10) %>%
    dplyr::rename(Groesse = g) %>%
    dplyr::group_by(Groesse) %>%
    dplyr::summarise(m = mean(Breite))
Groesse m
FALSE 18.03333
TRUE 25.31667

Tidyr

(Wickham 2021)

library(tidyr)
ceramics %>%
    tidyr::separate(Motive,
                    into = c("m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8"),
                    sep = ",") -> ceramics3
head(ceramics3)
id Hoehe Breite Rechts Hoch Grab Dat m1 m2 m3 m4 m5 m6 m7 m8 Beifunde_ker_etc Beifund_metall gemeinsamMit
1 5.0 17.0 324 7209 Brandgrab Lt B 1 0 0 0 1 0 1 0 Flasche H23 NA NA
2 5.0 18.2 316 7209 Körpergrab Lt C1 0 1 0 1 0 0 1 1 Schale S, Flasche S, Flasche S Hiebmesser, Bronzeanhänger NA
3 6.0 19.1 NA NA NA NA 1 1 0 0 0 0 0 0 NA NA 2
4 5.2 17.0 394 7187 Körpergrab Lt B 1 0 0 0 0 0 0 1 Flasche H50.2, Schale, Quarzit Pfeilspitze NA
5 8.2 17.4 350 7186 Körpergrab LtB/C 0 0 0 1 0 0 0 0 Flasche H29, Schüssel, Schale Knotenring, Knotenring, Armring, Messer NA
6 8.0 19.5 316 7209 Körpergrab Lt C1 0 1 0 1 0 0 0 1 Flasche S H50.5 NA NA

Tidyr

Analyze Code!

ceramics3 %>%
    tidyr::gather(key   = "key",
                  value = "value",
                  -id) -> ceramics4
head(ceramics4)
id key value
1 Hoehe 5
2 Hoehe 5
3 Hoehe 6
4 Hoehe 5.2
5 Hoehe 8.2
6 Hoehe 8

Tidyr

ceramics4 %>%
    tidyr::spread(key     = "key",
                  value   = "value",
                  convert = T) -> ceramics5
head(ceramics5)
id Beifund_metall Beifunde_ker_etc Breite Dat gemeinsamMit Grab Hoch Hoehe m1 m2 m3 m4 m5 m6 m7 m8 Rechts
1 NA Flasche H23 17.0 Lt B NA Brandgrab 7209 5.0 1 0 0 0 1 0 1 0 324
2 Hiebmesser, Bronzeanhänger Schale S, Flasche S, Flasche S 18.2 Lt C1 NA Körpergrab 7209 5.0 0 1 0 1 0 0 1 1 316
3 NA NA 19.1 NA 2 NA NA 6.0 1 1 0 0 0 0 0 0 NA
4 Pfeilspitze Flasche H50.2, Schale, Quarzit 17.0 Lt B NA Körpergrab 7187 5.2 1 0 0 0 0 0 0 1 394
5 Knotenring, Knotenring, Armring, Messer Flasche H29, Schüssel, Schale 17.4 LtB/C NA Körpergrab 7186 8.2 0 0 0 1 0 0 0 0 350
6 NA Flasche S H50.5 19.5 Lt C1 NA Körpergrab 7209 8.0 0 1 0 1 0 0 0 1 316

ggplot2

ggplot2 (Wickham 2016) makes plots according to the “Grammar of Graphics”: data + geometry = graphic

library(ggplot2)
ceramics3 %>%
    ggplot2::ggplot() +
    geom_point(mapping = aes(x     = Breite,
                             y     = Hoehe,
                             color = Dat))

ggplot2

plots can be assigned to an object

library(ggplot2)
ceramics3 %>%
    ggplot2::ggplot() -> p

ggplot2

  p +
    geom_density(mapping = aes(x = Breite)) +
    geom_point(mapping   = aes(x = Breite,
                               y = Hoehe,
                               color = Dat)) +
    geom_dotplot(mapping = aes(x = Breite)) +
    ggtitle("Braubacher Stempelkeramik") +
    theme_minimal()

ggplot2

library(ggplot2)
ceramics %>%
    tidyr::separate(Motive,
                    into = c("m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8"),
                    sep = ",") %>%
    dplyr::mutate(g = Hoehe > 10) %>%
    ggplot2::ggplot() -> p2

    p2 + geom_point(mapping = aes(x = Breite,
                                  y = Hoehe,
                                  color = g,
                                  size  = 4,
                                  shape = m3)) +
    facet_wrap( ~ Dat)

Exercise

We are done for today!

References

Bache/Wickham 2020: S. M. Bache/H. Wickham, Magrittr: A forward-pipe operator for r (2020). https://cran.r-project.org/package=magrittr.
Wickham 2016: H. Wickham, ggplot2: Elegant graphics for data analysis (2016). https://ggplot2.tidyverse.org.
Wickham 2021: H. Wickham, Tidyr: Tidy messy data (2021). https://cran.r-project.org/package=tidyr.
Wickham/Grolemund 2017: H. Wickham/G. Grolemund, R for data science (2017). http://r4ds.had.co.nz/.
Wickham/Hester 2021: H. Wickham/J. Hester, Readr: Read rectangular text data (2021). https://cran.r-project.org/package=readr.