1 — Wiederholung und Vertiefung Multiple Regression

$\color{#d19d00}\rule{552px}{1px}$

Wiederholung

$\small y$: metrische Zielgrösse
$\small x_1 \space …\space x_p$: metrische / kategoriale Einflussgrösse

$\begin{rcases}\\\\\end{rcases}$ Regressionsgleichung für Person $\small i$: $\small y_i = \beta_0 + \beta_1 \cdot x_{i1} + \beta_2\cdot x_{i2}+\space … \space + \beta_p\cdot x_{ip} + \epsilon_i$

Matrix-Schreibweise für Person $\small i$

$\small y_{\color{red}i} = \begin{bmatrix} 1\space x_{i1}\space …\space x_{ip} \end{bmatrix} \cdot \begin{bmatrix} \beta_0 \\ \vdots \\ \beta_p \end{bmatrix} + \epsilon_i = x_i^T \beta + \epsilon_i$, wobei $\small x$-Vektor $\small x_i = \begin{bmatrix} 1 \\ x_{i1} \\ \vdots \\ x_{ip} \end{bmatrix}$und transponierter $\small x$-Vektor $\small x_i^{\color{red}T} = \begin{bmatrix} 1 \space x_{i1} \space … \space x_{ip} \end{bmatrix}$

$\small x$-Vektor muss transpniert werden, damit man ihn mit $\small\beta$-Vektor verrechnen kann
$\small 1$ an erster Stelle von $\small x$-Vektor, um mit zu $\small\beta_0$ verrechnen

Matrix-Schreibweise für alle $\small n$ Personen

$\small y = X \cdot \beta + \epsilon \space \implies \begin{bmatrix} y_1 \\ \vdots \\ \vdots \\ \vdots \\ \vdots \\ y_{\color{red}n} \end{bmatrix} = \begin{bmatrix} 1 \space x_{11} \space … \space x_{1\color{green}p} \\ 1 \space x_{21} \space … \space x_{2\color{green}p} \\ 1 \space x_{31} \space … \space x_{3\color{green}p} \\ \vdots \\ \vdots \\ 1 \space x_{\color{red}n\color{default}1} \space … \space x_{\color{red}n\color{green}p} \end{bmatrix} \cdot \begin{bmatrix} \beta_0 \\ \vdots \\ \beta_{\color{green}p} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \vdots \\ \vdots \\ \vdots \\ \vdots \\ \epsilon_{\color{red}n} \end{bmatrix}$

Anzahl Zeilen $\small\times$ Spalten je Vektor:

$\small y$: $\small n \times 1$
$\small X$: $\small n \times (p - 1)$
$\small\beta$: $\small (p - 1) \times 1$
$\small\epsilon$: $\small n \times 1$

$\begin{rcases}\\\\\\\\\end{rcases}$ $\small x$- und $\small\epsilon$-Vektor brauchen so viele Zeilen wie Personen ($\small\color{red} n$), $\small\beta$-Vektor so viele wie Einflussgrössen ($\small\color{green} p$)

Matrix-Algebra

Addition	$\small\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \color{red}+ \color{default} \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 +4 \\ 2 + 5 \\ 3 + 6 \end{bmatrix} = \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix}$	nur wenn Vektoren gleiches Format (gleich lang)
Multiplikation	$\small\begin{bmatrix} 1 \space\space\space 2 \\ 3 \space\space\space 4 \\ 5 \space\space\space 6 \end{bmatrix} \color{red}\cdot\color{default} \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 1 \cdot 1 + 2 \cdot 2 \\ 1 \cdot 3 + 2 \cdot 4 \\ 1 \cdot 5 + 2 \cdot 6 \end{bmatrix} = \begin{bmatrix} 5 \\ 11 \\ 17 \end{bmatrix}$
Einheitsmatrix und Multiplikation mit Skalar	$\small 5 \cdot I = \color{red}5 \cdot \color{default}\begin{bmatrix} 1 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 1 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 1 \space\space\space 0 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 0 \space\space\space 1 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 1 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 1 \end{bmatrix} = \begin{bmatrix} 5 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 5 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 5 \space\space\space 0 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 0 \space\space\space 5 \space\space\space 0 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 5 \space\space\space 0 \\ 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 0 \space\space\space 5 \end{bmatrix}$	• wenn man mit Matrix multipliziert, passiert nichts (wie $\small \cdot 1$)
• wenn man mit Skalar / Zahl multipliziert, werden $\small 1$ zu Skalar

Annahmen

Annahme	Bedeutung	Verletzung	Überprüfung
Linearität	• Fehler sind im Mittel $\small 0$ ($\small E(\epsilon) = 0$)
• Punkte streuen zufällig um Gerade	• Einflussgrössen unberücksichtigt
• non-lineare Zusammenhänge	• Streudiagramm
• Residuenplot	weitere (non-lineare) Einfluss-grössen ins Modell aufnehmen
Homoskedastizität	• Varianzhomogenität: gleiche Varianz an jeder Stelle
• untereinander unkorrelierte Fehler	• Varianz von $\small x$-Werten abhängig
• zeitlich / räumlich gruppierte Daten	• Streudiagramm
• Residuenplot	• Quantilregression
• gemischte Modelle
$\small X$ hat vollen Rang	• $\small n ≥ p$
• keine perfekt korrelierten Einflussgrössen
(Linearkombinationen)	Multikollinearität (gleiche Info in verschiedenen Variablen)	Modell ≠ schätzbar	Einflussgrössen sinnvoll wählen
Normalverteilung	• normalverteilte Fehler an jeder Stelle von $\small x$
• insgesamt normalverteilte Fehler:$\small \epsilon \sim N(0,\space \sigma^2 \cdot \text{I})$
• normalverteilte $\small y$-Werte an jeder Stelle von $\small x$	andere Verteilungsform	• Histogramm
• QQ-Plot	• generalisierte lineare Modelle
• $\small y$ Box-Cox-transformieren
Messfehlerfreiheit	Einflussgrössen fehlerfrei gemessen	• verzerrt geschätzte Koeffizienten
• immer bei latenten Konstrukten	nicht direkt sichtbar	• Messinstrumente validieren
• Modelle für latente Variablen