% Information geometry & adaptive assessment % JJ Vie % \includegraphics[height=1cm]{figures/inria.png} — handout: true header-includes: - \usepackage{bm} - \usepackage{booktabs} - \usepackage{tikz} - \DeclareMathOperator\logit{logit} - \newcommand\mycite[3]{\textcolor{blue}{#1} “#2”.~#3.} —
\centering \begin{tikzpicture}[xshift=-14cm,xscale=0.7] \draw (6,0) – node[above,label={below:Lyon}] {Bachelor} (9,0); \draw (9,1) – node[above,label={below:Paris}] {Master(s)} (14.5,1); \draw (14,0) – node[above,label={below:Paris-Saclay}] {PhD} (16.5,0); \draw (17.5,1) – node[above,label={below:Tokyo \& Kyoto}] {Postdoc} (19.5,1); \only<2>{\draw[red] (13.5,-1) – node[above,label={below:Cachan}] {Agrégation ?!} (14.5,-1);} \draw[->] (6,-2) – (19.5,-2) node[above] {time}; \end{tikzpicture}
#
\centering
\centering
\centering
{width=90%}
We want to assess your skills in some domain,
by asking you to complete some tasks.
\centering
\begin{tabular}{rlcccc} \toprule
& & \multicolumn{4}{c}{Knowledge components}
& & \textbf{form} & \textbf{mail} & \textbf{copy} & \textbf{url}\ \midrule
T1 & Send a mail & \textbf{form} & \textbf{mail}
T2 & Fill a form & \textbf{form}
T3 & Share a link & & & \textbf{copy} & \textbf{url}
T4 & Type a URL & \textbf{form} & & & \textbf{url}\ \bottomrule
\end{tabular}
\raggedright \def\correct{\textcolor{green!50!black}{Correct !}} \def\incorrect{\textcolor{red}{Incorrect.}}
\pause
We administer task 1. \correct{}
$\Rightarrow$ \textbf{form} & \textbf{mail} : mastered. Task 2 brings few information.
\pause
We administer task 4. \incorrect{}
$\Rightarrow$ \textbf{url} seems unmastered. Task 3 will bring few information.
You seem to master form & mail but not url.
Trying to find a \alert{target} in ${0, 1}^K$ where $K$ is the number of skills.
Maximum entropy: uniform distribution
\alert{Minimizing} the expected entropy
But the support is $O(2^K)$, what to do when $K$ is big?
\centering
5 domains, 16 competencies, 800 skills, what should we do?
\centering \includegraphics[width=0.5\linewidth]{figures/digcomp.png}
Before: B2i.
Now:
{width=2cm}
Pix replaces B2i for high school students (JO September 1\textsuperscript{st} 2019) Some companies use it to measure the impact of their trainings
\centering \Large
In the French village of Montrésor,
what street is crossing Perrières street?
\vspace{1cm} \pause
\normalsize $\rightarrow$ can get skill \@rechercheInfo3
Assess your level adaptively
Know your strong and weak points
Recommend tutorials
Few questions to certify a rough estimate
“This person is level 4 in safety.”
“What should I learn next?”
Optimizing human learning
Rasch (1960)
Ask question that maximizes average number of validated/invalidated skills:
\centering
$\textnormal{Maximize } p(success) N_{validated} + (1 - p(success)) N_{invalidated}$
\raggedright
Code is on GitHub (AGPLv3 license) in JavaScript
\small \fullcite{Vie2017PIX}
In language learning, people from different countries have different difficulties.
Black points are items, red point is user.
\centering \includegraphics[width=0.5\linewidth]{figures/mirt-here.png}
\centering \includegraphics[width=\linewidth]{figures/inter1.jpg}
\centering \includegraphics[width=\linewidth]{figures/inter2.jpg}
\centering
\centering
We want to maximize likelihood $\Rightarrow \max LL = \max \log p(X | \theta)$ |
Find the zeroes, or go in the direction of the gradient:
\[\nabla_\theta LL = \frac{\partial LL}{\partial \theta}\]Property (fun fact): $\mathbb{E}_{p(X | \theta)} \nabla_\theta LL = 0$ |
\pause
If $Var_{p(X | \theta)} (\nabla_\theta LL)$ is low, the observation is \alert{useless}. |
\pause
Another index for choosing a question:
\[KL(\theta) = \int_{B(\theta, c/\sqrt{n})} KL(\theta||\theta_0) = \int_{B(\theta, c/\sqrt{n})} \mathbb{E}_{p(X|\theta_0)} \log \frac{P(X|\theta_0)}{P(X|\theta)}\]Let’s take the Rasch model $p(X_j | \theta) = \sigma(\theta - d_j) = p_j$ |
$\nabla_\theta LL = X_j - p_j$
$\mathcal{F}(\theta) = - \frac{\partial^2 LL}{\partial^2 \theta} = p_j (1 - p_j)$
Which means the item of maximum Fisher information is the one of probability \alert{closest to $1/2$}, given the current maximum likelihood estimate.
\pause
Other rewards & policies have been considered:
How to model \alert{pairwise interactions} with \alert{side information}?
Learn a 1-dim \alert{bias} for each feature (each user, item, etc.)
Learn a 1-dim \alert{bias} and a $k$-dim \alert{embedding} for each feature
If you know user $i$ attempted item $j$ on \alert{mobile} (not desktop)
How to model it?
$y$: score of event “user $i$ solves correctly item $j$”
\pause
\small \vspace{-3mm} \(y = \theta_i + e_j + \alert{w_{\textnormal{mobile}}} + \langle \bm{v_{\textnormal{user $i$}}}, \bm{v_{\textnormal{item $j$}}} \rangle + \langle \bm{v_{\textnormal{user $i$}}}, \alert{\bm{v_{\textnormal{mobile}}}} \rangle + \langle \bm{v_{\textnormal{item $j$}}}, \alert{\bm{v_{\textnormal{mobile}}}} \rangle\)
\centering
\centering
Learn bias \alert{$w_k$} and embedding \alert{$\bm{v_k}$} for each feature $k$ such that: \(\logit p(\bm{x}) = \mu + \underbrace{\sum_{k = 1}^N \alert{w_k} x_k}_{\textnormal{logistic regression}} + \underbrace{\sum_{1 \leq k < l \leq N} x_k x_l \langle \alert{\bm{v_k}}, \alert{\bm{v_l}} \rangle}_{\textnormal{pairwise interactions}}\)
Multidimensional item response theory: $\logit p(\bm{x}) = \langle \bm{u_i}, \bm{v_j} \rangle + e_j$
is a particular case.
\small \fullcite{rendle2012factorization}
\normalsize Use temporal features
\small \fullcite{KTM2019}
Simple assumptions:
Uncertainty increases with time (ex. Brownian motion)
The more you fail, the more you learn
Student is forgetting exponentially
\centering
{width=80%}
\small\raggedright \mycite{Benoît Choffin, Fabrice Popineau, Yolaine Bourda, and Jill-Jênn Vie (2019)}{DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills}{Best Paper Award at EDM 2019}
“Information geometry” was coined by Shunichi Amari (RIKEN)
Fisher information defines a \alert{Riemannian metric} on probability distributions
\vspace{1cm}
Questions?