\(MOSAIC_{data}\)

Block 2: Conceptual database modelling

Oliver Nakoinz

2022-07-11
License: CC BY-SA 4.0

Introduction

How can we create data structures that map the relevant aspects of my observations and which data types are required for this purpose?

Factors of data modelling

Level of measurement

Level Operations Example
Nominal identity (=) gender
Ordinal + comparison of size (=,>) scores
Interval + differences (=,>,+,-) years
Ratio + ratios (=,>,+,-,*,/) object length

Stevens (1946)

Data types in C

Data types in DBMS: SQL

Data types in DBMS: SQLite

Data types in DBMS: MySQL

Data types in DBMS: Access

Data Type Usage Size
Short Text Alphanumeric data Up to 255 characters.
Long Text Large amounts of alphanumeric data: sentences and paragraphs. See The Memo data type is now called “Long Text” for more information on the Long Text details. Up to about 1 gigabyte (GB), but controls to display a long text are limited to the first 64,000 characters.
Number Numeric data. 1, 2, 4, 8, or 16 bytes.
Large Number Numeric data. 8 bytes.
Date/Time Dates and times. 8 bytes.
Date/Time Extended Dates and times. Encoded string of 42 bytes
Currency Monetary data, stored with 4 decimal places of precision. 8 bytes.
AutoNumber Unique value generated by Access for each new record. 4 bytes (16 bytes for ReplicationID).
Yes/No Boolean (true/false) data; Access stores the numeric value zero (0) for false, and -1 for true. 1 byte.
OLE Object Pictures, graphs, or other ActiveX objects from another Windows-based application. Up to about 2 GB.
Hyperlink A link address to a document or file on the Internet, on an intranet, on a local area network (LAN), or on your local computer Up to 8,192 (each part of a Hyperlink data type can contain up to 2048 characters).
Attachment You can attach files such as pictures, documents, spreadsheets, or charts; each Attachment field can contain an unlimited number of attachments per record, up to the storage limit of the size of a database file. Note, the Attachment data type isn’t available in MDB file formats. Up to about 2 GB.
Calculated You can create an expression that uses data from one or more fields. You can designate different result data types from the expression. Note, the Calculated data type isn’t available in MDB file formats. Dependent on the data type of the Result Type property. Short Text data type result can have up to 243 characters. Long Text, Number, Yes/No, and Date/Time should match their respective data types.
Lookup Wizard The Lookup Wizard entry in the Data Type column in Design view is not actually a data type. When you choose this entry, a wizard starts to help you define either a simple or complex lookup field. A simple lookup field uses the contents of another table or a value list to validate the contents of a single value per row. A complex lookup field allows you to store multiple values of the same data type in each row. Dependent on the data type of the lookup field.

Geodata

Geodata

Geodata: Spatial Reference System Identifier (SRID)

https://epsg.org/search/by-name

Temporal data in SQL

Zeitdaten in Datenbanken

Archaeological chronological data

Archaeological chronological data: SHKR

Archaeological chronological data: SHKR

Archaeological chronological data:

Entity Relationship Digram (ER Diagram, ERD)

ER Diagramm

ER Diagramm R

library(magrittr)

mound <- data.frame(
    mound_id = c(1,2,3),
    x  = c(19,23,25),
    y  = c(55,52,53),
    comment = c("yyyy","yyyy","yyyy")
)

grave <- data.frame(
    grave_id = c(1,2,3,4,5),
    mound_id = c(1,1,2,3,3),
    size  = c(53,46,67,54,44),
    comment = c("xxxx","xxxx","xxxx","xxxx","xxxx")
)

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10),
    grave_id = c(1,2,3,4,5,1,2,3,4,5),
    type_id  = c(1,2,1,3,4,3,3,1,2,4),
    comment = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)

type_thes <- data.frame(
    type_id = c(1,2,3,4),
    type = c("type 1","type 2","type 3","type 4"),
    descr  = c("type 1: xxxxxxxx","type 2: xxxxxxxx","type 3: xxxxxxxx","type 4: xxxxxxxx"),
    comment = c(1,2,3,4)
)

datamodel <- dm::dm(mound, grave, artefact, type_thes)

datamodel %>%
    dm::dm_add_pk(table = mound,     columns = mound_id) %>%
    dm::dm_add_pk(table = grave,     columns = grave_id) %>%
    dm::dm_add_pk(table = artefact,  columns = artefact_id) %>%
    dm::dm_add_pk(table = type_thes, columns = type_id) -> datamodel
datamodel %>%
    dm::dm_add_fk(table = grave,    columns = mound_id, ref_table = mound)  %>%
    dm::dm_add_fk(table = artefact, columns = grave_id, ref_table = grave)  %>%
    dm::dm_add_fk(table = artefact, columns = type_id,  ref_table = type_thes) -> datamodel

datamodel %>%  
    dm::dm_set_colors(violetred3 = c(grave, mound), steelblue4 = artefact, grey = type_thes) %>%  
    dm::dm_draw(view_type = "all")

Connection types

Software

UML Unified Modeling Language

UML Unified Modeling Language

SQL

CREATE TABLE mound (
    mound_id integer PRIMARY KEY,
    x numeric,
    y numeric,
    comment text
);

CREATE TABLE grave (
    grave_id integer PRIMARY KEY,
    mound_id numeric,
    size numeric,
    comment text,
    FOREIGN KEY(mound_id) REFERENCES mound(mound_id)
);

CREATE TABLE artefact (
    artefact_id integer PRIMARY KEY,
    grave_id integer,
    type_id integer,
    comment text,
    FOREIGN KEY(grave_id) REFERENCES grave(grave_id),
    FOREIGN KEY(type_id) REFERENCES type_thes(type_id)
);

CREATE TABLE type_thes (
    type_id integer PRIMARY KEY,
    typ text,
    descr text,
    comment text
);

RSQLite: create database

#library(DBI)
#library(RSQLite)
db_file <- "../2data/22archdata/cemDB.sqlite"
if (file.exists(db_file)) {
  file.remove(db_file)
}
## [1] TRUE
cemetery <- DBI::dbConnect(RSQLite::SQLite(), db_file)
DBI::dbDisconnect(cemetery)
unlink("../2data/22archdata/cemDB.sqlite")

RSQLite: write dataframes to database

cemetery <- DBI::dbConnect(RSQLite::SQLite(), "../2data/22archdata/cemDB.sqlite")
DBI::dbWriteTable(cemetery, "mound", mound)
DBI::dbWriteTable(cemetery, "grave", grave)
DBI::dbWriteTable(cemetery, "artefact", artefact)
DBI::dbWriteTable(cemetery, "type_thes", type_thes)
DBI::dbListTables(cemetery)
## [1] "artefact"  "grave"     "mound"     "type_thes"

Datamodel

cem_dm <- dm::dm_from_src(cemetery)
cem_dm
## ── Table source ────────────────────────────────────────────────────────────────
## src:  sqlite 3.30.1 [../2data/22archdata/cemDB.sqlite]
## ── Metadata ────────────────────────────────────────────────────────────────────
## Tables: `artefact`, `grave`, `mound`, `type_thes`
## Columns: 16
## Primary keys: 0
## Foreign keys: 0

RSQLite: create database

db_file <- "../2data/22archdata/cemDB2.sqlite"
if (file.exists(db_file)) {
  file.remove(db_file)
}
## [1] TRUE
cemetery2 <- DBI::dbConnect(RSQLite::SQLite(), db_file)
DBI::dbGetQuery(cemetery2, 'CREATE TABLE mound (
    mound_id integer,
    x numeric,
    y numeric,
    comment text,
    PRIMARY KEY (mound_id)
);')

|| || || ||

DBI::dbGetQuery(cemetery2, 'CREATE TABLE grave (
    grave_id integer PRIMARY KEY,
    mound_id numeric,
    size numeric,
    comment text,
    FOREIGN KEY(mound_id) REFERENCES mound(mound_id)
);')

|| || || ||

DBI::dbGetQuery(cemetery2, 'CREATE TABLE artefact (
    artefact_id integer PRIMARY KEY,
    grave_id integer,
    type_id integer,
    comment text,
    FOREIGN KEY(grave_id) REFERENCES grave(grave_id),
    FOREIGN KEY(type_id) REFERENCES type_thes(type_id)
);')

|| || || ||

DBI::dbGetQuery(cemetery2, 'CREATE TABLE type_thes (
    type_id integer PRIMARY KEY,
    type text,
    descr text,
    comment text
);')

|| || || ||

RSQLite: append data

DBI::dbWriteTable(cemetery2, "mound", mound,         append = T)
DBI::dbWriteTable(cemetery2, "grave", grave,         append = T)
DBI::dbWriteTable(cemetery2, "artefact", artefact,   append = T)
DBI::dbWriteTable(cemetery2, "type_thes", type_thes, append = T)

RSQLite: add keys

cem_dm <- dm::dm_from_src(cemetery2)
cem_dm %>%
    dm::dm_add_pk(table = mound,     columns = mound_id) %>%
    dm::dm_add_pk(table = grave,     columns = grave_id) %>%
    dm::dm_add_pk(table = artefact,  columns = artefact_id) %>%
    dm::dm_add_pk(table = type_thes, columns = type_id)  %>%
    dm::dm_add_fk(table = grave,     columns = mound_id, ref_table = mound)  %>%
    dm::dm_add_fk(table = artefact,  columns = grave_id, ref_table = grave)  %>%
    dm::dm_add_fk(table = artefact,  columns = type_id,  ref_table = type_thes) -> cem_dm

RSQLite: plot ERD

cem_dm %>%  
    dm::dm_set_colors(violetred3 = c(grave, mound), steelblue4 = artefact, grey = type_thes) %>%  
    dm::dm_draw(view_type = "all")

RSQLite: disconect database

DBI::dbDisconnect(cemetery2)

Break and/or Exercise

Relationale Datenbank

A relation is a kind of table conprising attributes/columns and tuples/rows.

Database normalization

The normalization of a database aims to

0. Normal form (UNF)

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10),
    chronstring = c("fortME_Vorg>EZ>Lt>FLT>LtA_1.0",
                    "fortME_Vorg>EZ>Lt>FLT>LtA_0.5",
                    "fortME_Vorg>EZ>Lt>FLT>LtB_1.0",
                    "fortME_Vorg>EZ>Lt>FLT_1.0",
                    "fortME_Vorg>EZ>Lt>FLT_0.7",
                    "fortME_Vorg>EZ>Lt>FLT>LtB_1.0",
                    "fortME_Vorg>EZ_1.0",
                    "fortME_Vorg>EZ_1.0",
                    "fortME_Vorg>EZ>Lt>_0.5",
                    "fortNE_Vorg>EZ>Jastorf_1.0"),
    grave_id = c(1,2,3,4,5,1,2,3,4,5),
    mound_id = c(1,1,2,3,3,1,1,2,3,3),
    type_id  = c(1,2,1,3,4,3,3,1,2,4),
    mound_x  = c(34,34,31,23,23,34,34,31,23,23),
    mound_y  = c(346,346,223,346,346,346,346,223,346,346),
    grave_size  = c(53,46,67,54,44,53,46,67,54,44),
    rc_sample  = c("-","no 1","-","-","-","-","-","no 2","-","no 3, no 4"),
    type     = c("type 1","type 2","type 3","type 4","type 2","type 3","type 4","type 3","type 4","type 2"),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)
dm::dm(artefact) %>%
    dm::dm_add_pk(table   = artefact,     
                  columns = artefact_id)    %>% 
    dm::dm_set_colors(steelblue4 = artefact) %>%  
    dm::dm_draw(view_type = "all")

1. Normal form (1NF, atomic attributes)

1NF ist satisfied, if the relation contains only attributes of atomic values.

fortME_Vorg>EZ>Lt>FLT>LtA_1.0 does not satisfy the 1NF

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10,10),
    chronsys = c("fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortNE",
                    "fortNE"),
    chronstring = c("LtA",
                    "LtA",
                    "LtB",
                    "FLT",
                    "FLT",
                    "LtB",
                    "EZ",
                    "EZ",
                    "Lt",
                    "Jastorf",
                    "Jastorf"),
    chronprom = c(1.0,
                  0.5,
                  1.0,
                  1.0,
                  0.7,
                  1.0,
                  1.0,
                  1.0,
                  0.5,
                  1.0,
                  1.0),
    grave_id = c(1,2,3,4,5,1,2,3,4,5,5),
    mound_id = c(1,1,2,3,3,1,1,2,3,3,3),
    type_id  = c(1,2,1,3,4,3,3,1,2,4,4),
    mound_x  = c(34,34,31,23,23,34,34,31,23,23,23),
    mound_y  = c(346,346,223,346,346,346,346,223,346,346,246),
    grave_size  = c(53,46,67,54,44,53,46,67,54,44,44),
    rc_sample  = c("-","no 1","-","-","-","-","-","no 2","-","no 3","no 4"),
    type     = c("type 1","type 2","type 3","type 4","type 2","type 3","type 4","type 3","type 4","type 2","type 2"),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)
dm::dm(artefact) %>%
    dm::dm_add_pk(table   = artefact,     
                  columns = artefact_id)    %>% 
    dm::dm_set_colors(steelblue4 = artefact) %>%  
    dm::dm_draw(view_type = "all")

2. Normal form (2NF, functional dependency from primary key)

2NF ist satisfied, if the relation satisfies the 1NF and every non-key attribute depends on a primary key.

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10,10),
    chronsys = c("fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortME",
                    "fortNE",
                    "fortNE"),
    chronstring = c("LtA",
                    "LtA",
                    "LtB",
                    "FLT",
                    "FLT",
                    "LtB",
                    "EZ",
                    "EZ",
                    "Lt",
                    "Jastorf",
                    "Jastorf"),
    chronprom = c(1.0,
                  0.5,
                  1.0,
                  1.0,
                  0.7,
                  1.0,
                  1.0,
                  1.0,
                  0.5,
                  1.0,
                  1.0),
    grave_id = c(1,2,3,4,5,1,2,3,4,5,5),
    mound_id = c(1,1,2,3,3,1,1,2,3,3,5),
    type_id  = c(1,2,1,3,4,3,3,1,2,4,4),
    rc_sample  = c("-","no 1","-","-","-","-","-","no 2","-","no 3","no 4"),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)

grave <- data.frame(
    grave_id = c(1,2,3,4,5),
    mound_id = c(1,1,2,3,3),
    grave_size  = c(53,46,67,54,44),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx")
)

mound <- data.frame(
    mound_id = c(1,2,3),
    x  = c(19,23,25),
    y  = c(55,52,53),
    comment = c("yyyy","yyyy","yyyy")
)

type_thes <- data.frame(
    type_id = c(1,2,3,4),
    type = c("type 1","type 2","type 3","type 4"),
    descr  = c("type 1: xxxxxxxx","type 2: xxxxxxxx","type 3: xxxxxxxx","type 4: xxxxxxxx"),
    comment = c(1,2,3,4)
)

dm::dm(artefact, mound, grave, type_thes) %>%
    dm::dm_add_pk(table = mound,     
                  columns = mound_id) %>%
    dm::dm_add_pk(table = grave,     
                  columns = grave_id) %>%
    dm::dm_add_pk(table = artefact,  
                  columns = artefact_id) %>%
    dm::dm_add_pk(table = type_thes, 
                  columns = type_id)    %>% 
    dm::dm_add_fk(table = grave,    
                  columns = mound_id, 
                  ref_table = mound)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = grave_id, 
                  ref_table = grave)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = type_id,  
                  ref_table = type_thes)  %>%  
    dm::dm_set_colors(violetred3 = c(grave, mound), steelblue4 = artefact, grey = type_thes) %>%  
    dm::dm_draw(view_type = "all")

3. Normal form (3NF, no transitional dependencies)

3NF ist satisfied, if the relation satisfies the 2NF and every non-key attribute depends not transitive on a primary key. Non-key attributes do not allow to predict other non-key attributes.

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10,10),
    chronstring_id = c(5,5,6,4,4,6,2,2,3,5,5),
    chronprob = c(1.0,
                  0.5,
                  1.0,
                  1.0,
                  0.7,
                  1.0,
                  1.0,
                  1.0,
                  0.5,
                  1.0,
                  1.0),
    grave_id = c(1,2,3,4,5,1,2,3,4,5,5),
    type_id  = c(1,2,1,3,4,3,3,1,2,4,4),
    rc_sample  = c("-","no 1","-","-","-","-","-","no 2","-","no 3","no 4"),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)

chronsys <- data.frame(
    chronsys_id   = c(1,2),
    chronsys_type = c("fortME","fortNE"))

chronstring <- data.frame(
    chronstring_id   = c(1,2,3,4,5,6,7),
    chronsys_id      = c(1,1,1,1,1,1,2),
    chronstring_type = c("Vorg",
                        "EZ",
                        "Lt",
                        "FLT",
                        "LtA",
                        "LtB",
                        "Jastorf"))

grave <- data.frame(
    grave_id = c(1,2,3,4,5),
    mound_id = c(1,1,2,3,3),
    grave_size  = c(53,46,67,54,44),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx")
)

mound <- data.frame(
    mound_id = c(1,2,3),
    x  = c(19,23,25),
    y  = c(55,52,53),
    comment = c("yyyy","yyyy","yyyy")
)

type_thes <- data.frame(
    type_id = c(1,2,3,4),
    type = c("type 1","type 2","type 3","type 4"),
    descr  = c("type 1: xxxxxxxx","type 2: xxxxxxxx","type 3: xxxxxxxx","type 4: xxxxxxxx"),
    comment = c(1,2,3,4)
)

dm::dm(artefact, mound, grave, type_thes, chronsys, chronstring) %>%
    dm::dm_add_pk(table = mound,     
                  columns = mound_id) %>%
    dm::dm_add_pk(table = grave,     
                  columns = grave_id) %>%
    dm::dm_add_pk(table = artefact,  
                  columns = artefact_id) %>%
    dm::dm_add_pk(table = type_thes, 
                  columns = type_id)    %>% 
    dm::dm_add_pk(table = chronsys, 
                  columns = chronsys_id)    %>%
    dm::dm_add_pk(table = chronstring, 
                  columns = chronstring_id)    %>%
    dm::dm_add_fk(table = grave,    
                  columns = mound_id, 
                  ref_table = mound)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = grave_id, 
                  ref_table = grave)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = type_id,  
                  ref_table = type_thes)  %>%  
    dm::dm_add_fk(table    = chronstring, 
                  columns   = chronsys_id,  
                  ref_table = chronsys)  %>%  
    dm::dm_add_fk(table    = artefact, 
                  columns   = chronstring_id,  
                  ref_table = chronstring)  %>%  
    dm::dm_set_colors(violetred3 = c(grave, mound), steelblue4 = artefact, grey = c(type_thes, chronsys), tan2 = c(chronstring)) %>%  
    dm::dm_draw(view_type = "all")

4. Normal form (4NF, no multivalued dependencies)

4NF ist satisfied, if the relation satisfies the 3NF (and BCNF) and does only contain trivial multivalued dependencies. In one relation maximal one 1:n or m:n relationships with keys are allowed.

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10),
    grave_id = c(1,2,3,4,5,1,2,3,4,5),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)

dates <- data.frame(
    dates_id       = c(1,2,3,4,5,6,7,8,9,10),
    artefact_id    = c(1,2,3,4,5,6,7,8,9,10),
    chronstring_id = c(5,5,6,4,4,6,2,2,3,5),
    chronprob = c(1.0,
                  0.5,
                  1.0,
                  1.0,
                  0.7,
                  1.0,
                  1.0,
                  1.0,
                  0.5,
                  1.0)
)

rc_sample <- data.frame(
    sample_id = c(1,2,3,4),
    artefact_id = c(2,8,10,10))

chronsys <- data.frame(
    chronsys_id   = c(1,2),
    chronsys_type = c("fortME","fortNE"))

chronstring <- data.frame(
    chronstring_id   = c(1,2,3,4,5,6,7),
    chronsys_id      = c(1,1,1,1,1,1,2),
    chronstring_type = c("Vorg",
                        "EZ",
                        "Lt",
                        "FLT",
                        "LtA",
                        "LtB",
                        "Jastorf"))

grave <- data.frame(
    grave_id = c(1,2,3,4,5),
    mound_id = c(1,1,2,3,3),
    grave_size  = c(53,46,67,54,44),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx")
)

mound <- data.frame(
    mound_id = c(1,2,3),
    x  = c(19,23,25),
    y  = c(55,52,53),
    comment = c("yyyy","yyyy","yyyy")
)

types <- data.frame(
    types_id     = c(1,2,3,4,5,6,7,8,9,10),
    artefact_id  = c(1,2,3,4,5,6,7,8,9,10),
    type_id      = c(1,2,1,3,4,3,3,1,2,4)
)

type_thes <- data.frame(
    type_id = c(1,2,3,4),
    type = c("type 1","type 2","type 3","type 4"),
    descr  = c("type 1: xxxxxxxx","type 2: xxxxxxxx","type 3: xxxxxxxx","type 4: xxxxxxxx"),
    comment = c(1,2,3,4)
)

dm::dm(artefact, mound, grave, type_thes, chronsys, chronstring, rc_sample, dates, types) %>%
    dm::dm_add_pk(table = mound,     
                  columns = mound_id) %>%
    dm::dm_add_pk(table = grave,     
                  columns = grave_id) %>%
    dm::dm_add_pk(table = artefact,  
                  columns = artefact_id) %>%
    dm::dm_add_pk(table = type_thes, 
                  columns = type_id)    %>% 
    dm::dm_add_pk(table = chronsys, 
                  columns = chronsys_id)    %>%
    dm::dm_add_pk(table = chronstring, 
                  columns = chronstring_id)    %>%
    dm::dm_add_pk(table = rc_sample, 
                  columns = sample_id)    %>%
    dm::dm_add_pk(table = dates, 
                  columns = dates_id)    %>%
    dm::dm_add_pk(table = types, 
                  columns = types_id)    %>%
    dm::dm_add_fk(table = types,    
                  columns = artefact_id, 
                  ref_table = artefact)  %>%
    dm::dm_add_fk(table = types,    
                  columns = type_id, 
                  ref_table = type_thes)  %>%
    dm::dm_add_fk(table = rc_sample,    
                  columns = artefact_id, 
                  ref_table = artefact)  %>%
    dm::dm_add_fk(table = grave,    
                  columns = mound_id, 
                  ref_table = mound)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = grave_id, 
                  ref_table = grave)  %>%
    dm::dm_add_fk(table = dates, 
                  columns   = artefact_id,  
                  ref_table = artefact)  %>%  
    dm::dm_add_fk(table     = chronstring, 
                  columns   = chronsys_id,  
                  ref_table = chronsys)  %>%  
    dm::dm_add_fk(table     = dates, 
                  columns   = chronstring_id,  
                  ref_table = chronstring)  %>%  
    dm::dm_set_colors(violetred3 = c(grave, mound), steelblue4 = artefact, grey = c(type_thes, chronsys), tan2 = c(chronstring), yellow4 = c(rc_sample), thistle4 = c(types, dates)) %>%  
    dm::dm_draw(view_type = "all")

Normalization

https://en.wikipedia.org/wiki/Database_normalization

Joins

Joins:

Joins: Datamodel

datamodel %>%
    dm::dm_join_to_tbl(grave, mound, join = dm::left_join) 
grave_id mound_id grave_size grave.comment x y mound.comment
1 1 53 xxxx 19 55 yyyy
2 1 46 xxxx 19 55 yyyy
3 2 67 xxxx 23 52 yyyy
4 3 54 xxxx 25 53 yyyy
5 3 44 xxxx 25 53 yyyy

Joins: SQL

sqldf::sqldf('SELECT grave.*, mound.* 
             FROM grave INNER JOIN mound 
             ON grave.mound_id = mound.mound_id')
grave_id mound_id grave_size comment mound_id x y comment
1 1 53 xxxx 1 19 55 yyyy
2 1 46 xxxx 1 19 55 yyyy
3 2 67 xxxx 2 23 52 yyyy
4 3 54 xxxx 3 25 53 yyyy
5 3 44 xxxx 3 25 53 yyyy

Joins: R

merge(x = grave, 
      y = mound,
      by.x = "mound_id", 
      by.y = "mound_id",
      all.x = T,
      all.y = T)
mound_id grave_id grave_size comment.x x y comment.y
1 1 53 xxxx 19 55 yyyy
1 2 46 xxxx 19 55 yyyy
2 3 67 xxxx 23 52 yyyy
3 4 54 xxxx 25 53 yyyy
3 5 44 xxxx 25 53 yyyy

Denormalization

Denormalization is the process of reverting normalization in order to increase performance and to enhance the usability of the database.

Denormalization: combine attributes (Chronology)

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10),
    grave_id = c(1,2,3,4,5,1,2,3,4,5),
    type_id  = c(1,2,1,3,4,3,3,1,2,4),
    chronstring = c("fortME_Vorg>EZ>Lt>FLT>LtA_1.0",
                    "fortME_Vorg>EZ>Lt>FLT>LtA_0.5",
                    "fortME_Vorg>EZ>Lt>FLT>LtB_1.0",
                    "fortME_Vorg>EZ>Lt>FLT_1.0",
                    "fortME_Vorg>EZ>Lt>FLT_0.7",
                    "fortME_Vorg>EZ>Lt>FLT>LtB_1.0",
                    "fortME_Vorg>EZ_1.0",
                    "fortME_Vorg>EZ_1.0",
                    "fortME_Vorg>EZ>Lt>_0.5",
                    "fortME_Vorg>EZ>Lt>FLT>LtA_1.0"),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)

rc_sample <- data.frame(
    sample_id = c(1,2,3,4),
    artefact_id = c(2,8,10,10))

grave <- data.frame(
    grave_id = c(1,2,3,4,5),
    mound_id = c(1,1,2,3,3),
    grave_size  = c(53,46,67,54,44),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx")
)

mound <- data.frame(
    mound_id = c(1,2,3),
    x  = c(19,23,25),
    y  = c(55,52,53),
    comment = c("yyyy","yyyy","yyyy")
)

type_thes <- data.frame(
    type_id = c(1,2,3,4),
    type = c("type 1","type 2","type 3","type 4"),
    descr  = c("type 1: xxxxxxxx","type 2: xxxxxxxx","type 3: xxxxxxxx","type 4: xxxxxxxx"),
    comment = c(1,2,3,4)
)

dm::dm(artefact, mound, grave, type_thes, rc_sample) %>%
    dm::dm_add_pk(table = mound,     
                  columns = mound_id) %>%
    dm::dm_add_pk(table = grave,     
                  columns = grave_id) %>%
    dm::dm_add_pk(table = artefact,  
                  columns = artefact_id) %>%
    dm::dm_add_pk(table = type_thes, 
                  columns = type_id)    %>% 
    dm::dm_add_pk(table = rc_sample, 
                  columns = sample_id)    %>%
    dm::dm_add_fk(table = rc_sample,    
                  columns = artefact_id, 
                  ref_table = artefact)  %>%
    dm::dm_add_fk(table = grave,    
                  columns = mound_id, 
                  ref_table = mound)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = grave_id, 
                  ref_table = grave)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = type_id,  
                  ref_table = type_thes)  %>%  
    dm::dm_set_colors(violetred3 = c(grave, mound), steelblue4 = artefact, grey = type_thes, yellow4 = c(rc_sample)) %>%  
    dm::dm_draw(view_type = "all")

Denormalization: combine attributes (Geodata)

WKT format is a text based GIS standard: Point (377672.7 5559859.1)

artefact <- data.frame(
    artefact_id = c(1,2,3,4,5,6,7,8,9,10),
    grave_id = c(1,2,3,4,5,1,2,3,4,5),
    type_id  = c(1,2,1,3,4,3,3,1,2,4),
    chronstring = c("fortME_Vorg>EZ>Lt>FLT>LtA_1.0",
                    "fortME_Vorg>EZ>Lt>FLT>LtA_0.5",
                    "fortME_Vorg>EZ>Lt>FLT>LtB_1.0",
                    "fortME_Vorg>EZ>Lt>FLT_1.0",
                    "fortME_Vorg>EZ>Lt>FLT_0.7",
                    "fortME_Vorg>EZ>Lt>FLT>LtB_1.0",
                    "fortME_Vorg>EZ_1.0",
                    "fortME_Vorg>EZ_1.0",
                    "fortME_Vorg>EZ>Lt>_0.5",
                    "fortME_Vorg>EZ>Lt>FLT>LtA_1.0"),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx","xxxx")
)

rc_sample <- data.frame(
    sample_id = c(1,2,3,4),
    artefact_id = c(2,8,10,10))

grave <- data.frame(
    grave_id = c(1,2,3,4,5),
    mound_id = c(1,1,2,3,3),
    grave_size  = c(53,46,67,54,44),
    comment  = c("xxxx","xxxx","xxxx","xxxx","xxxx")
)

mound <- data.frame(
    mound_id = c(1,2,3),
    wkt = c("Point (19 55)","Point (23 52)","Point (25 53)"),
    comment = c("yyyy","yyyy","yyyy")
)

type_thes <- data.frame(
    type_id = c(1,2,3,4),
    type = c("type 1","type 2","type 3","type 4"),
    descr  = c("type 1: xxxxxxxx","type 2: xxxxxxxx","type 3: xxxxxxxx","type 4: xxxxxxxx"),
    comment = c(1,2,3,4)
)

dm::dm(artefact, mound, grave, type_thes, rc_sample) %>%
    dm::dm_add_pk(table = mound,     
                  columns = mound_id) %>%
    dm::dm_add_pk(table = grave,     
                  columns = grave_id) %>%
    dm::dm_add_pk(table = artefact,  
                  columns = artefact_id) %>%
    dm::dm_add_pk(table = type_thes, 
                  columns = type_id)    %>% 
    dm::dm_add_pk(table = rc_sample, 
                  columns = sample_id)    %>%
    dm::dm_add_fk(table = rc_sample,    
                  columns = artefact_id, 
                  ref_table = artefact)  %>%
    dm::dm_add_fk(table = grave,    
                  columns = mound_id, 
                  ref_table = mound)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = grave_id, 
                  ref_table = grave)  %>%
    dm::dm_add_fk(table = artefact, 
                  columns = type_id,  
                  ref_table = type_thes)  %>%  
    dm::dm_set_colors(violetred3 = c(grave, mound), steelblue4 = artefact, grey = type_thes, yellow4 = c(rc_sample)) %>%  
    dm::dm_draw(view_type = "all")

Workflows of developing relational databases

Workflow a

  1. Collect ideas
  2. Define tables (ERD)
  3. Normalize
  4. Denormalize
  5. Test

Workflow b

  1. Collect ideas
  2. Define categories and features (lists)
  3. Define relationships
  4. Define tables (ERD, switch perspective)
  5. Consider dependencies (balance normalization and denormalization)
  6. Test

NoSQL

Key-Value-Datenbanken

id key value
1 color red
2 color green
3 shape circle
4 shape circle

Object database

Graph database

Graph database

Exercise

Inspiration for archaeological data from Bassenheim

Break

References

Stevens 1946: S. S. Stevens, On the theory of scales of measurement. Science 103, 2684, 1946, 677–680. DOI: https://doi.org/10.1126/science.103.2684.677.