Jill-Jênn Vie

Researcher at Inria

Real-world example: certifying digital skills

We want to assess your skills in some domain,
by asking you to complete some tasks.

\centering \begin{tabular}{rlcccc} \toprule & & \multicolumn{4}{c}{Knowledge components}
& & \textbf{form} & \textbf{mail} & \textbf{copy} & \textbf{url}\ \midrule T1 & Send a mail & \textbf{form} & \textbf{mail}
T2 & Fill a form & \textbf{form}
T3 & Share a link & & & \textbf{copy} & \textbf{url}
T4 & Type a URL & \textbf{form} & & & \textbf{url}\ \bottomrule \end{tabular}

We administer task 1. \correct{}
$\Rightarrow$ \textbf{form} & \textbf{mail} : mastered. Task 2 brings few information.


We administer task 4. \incorrect{}
$\Rightarrow$ \textbf{url} seems unmastered. Task 3 will bring few information.


You seem to master form & mail but not url.

Discrete adaptive assessments

Trying to find a \alert{target} in ${0, 1}^K$ where $K$ is the number of skills.

Maximum entropy: uniform distribution

\alert{Minimizing} the expected entropy

But the support is $O(2^K)$, what to do when $K$ is big?

Structure on the assessed domain (prerequisite graph)


Digital competencies curriculum DIGCOMP 2.0

5 domains, 16 competencies, 800 skills, what should we do?

Certification of digital skills

Before: B2i.



Pix replaces B2i for high school students (JO September 1\textsuperscript{st} 2019) Some companies use it to measure the impact of their trainings

An example of Pix challenge

\centering \Large In the French village of Montrésor,
what street is crossing Perrières street?

\normalsize $\rightarrow$ can get skill \@rechercheInfo3

Different tests, different objectives

Placement tests (self-assessment, low stake)

Assess your level adaptively
Know your strong and weak points
Recommend tutorials

Certification tests (high stake)

Few questions to certify a rough estimate
“This person is level 4 in safety.”

Progression tests

“What should I learn next?”
Optimizing human learning

Continuous adaptive assessments

Rasch (1960)

Rasch model

\[p(success) = \Pr(R_{ij} = 1) = \sigma(\alert{\theta_i} - \alert{d_j}).\]


Adaptive assessment

Combining discrete and Rasch

Ask question that maximizes average number of validated/invalidated skills:


$\textnormal{Maximize } p(success) N_{validated} + (1 - p(success)) N_{invalidated}$


Code is on GitHub (AGPLv3 license) in JavaScript

Are we really unidimensional?

In language learning, people from different countries have different difficulties.

Continuous multivariate:

Rasch (item response theory)

\[\Pr(R_{ij} = 1) = \sigma(\alert{\theta} - \alert{d}).\]

Multidimensional item response theory

\[\Pr(R_{ij} = 1) = \sigma(\langle \bm{a}, \alert{\bm{\theta}} \rangle + b).\]

Example of multidimensional adaptive assessment

\[\Pr(R_{ij} = 1) = \sigma(\langle \bm{a}, \alert{\bm{\theta}} \rangle + b).\]

Black points are items, red point is user.

Interpreting the components

Interpreting the components

Prior + Posterior given $(1, 1)$ is answered correctly


What information?

We want to maximize likelihood $\Rightarrow \max LL = \max \log p(X \theta)$

Find the zeroes, or go in the direction of the gradient:

\[\nabla_\theta LL = \frac{\partial LL}{\partial \theta}\]
Property (fun fact): $\mathbb{E}_{p(X \theta)} \nabla_\theta LL = 0$


If $Var_{p(X \theta)} (\nabla_\theta LL)$ is low, the observation is \alert{useless}.
\[\mathcal{F}(\theta) = Var_{p(X|\theta)} (\nabla_\theta LL) = -\mathbb{E}_{p(X|\theta)} \frac{\partial^2 LL}{\partial^2 \theta}\]


Another index for choosing a question:

\[KL(\theta) = \int_{B(\theta, c/\sqrt{n})} KL(\theta||\theta_0) = \int_{B(\theta, c/\sqrt{n})} \mathbb{E}_{p(X|\theta_0)} \log \frac{P(X|\theta_0)}{P(X|\theta)}\]

A toy example

Let’s take the Rasch model $p(X_j \theta) = \sigma(\theta - d_j) = p_j$

$\nabla_\theta LL = X_j - p_j$

$\mathcal{F}(\theta) = - \frac{\partial^2 LL}{\partial^2 \theta} = p_j (1 - p_j)$

Which means the item of maximum Fisher information is the one of probability \alert{closest to $1/2$}, given the current maximum likelihood estimate.


Other rewards & policies have been considered:

Here comes a new challenger

How to model \alert{pairwise interactions} with \alert{side information}?

Logistic Regression

Learn a 1-dim \alert{bias} for each feature (each user, item, etc.)

Factorization Machines

Learn a 1-dim \alert{bias} and a $k$-dim \alert{embedding} for each feature

How to model pairwise interactions with side information?

If you know user $i$ attempted item $j$ on \alert{mobile} (not desktop)
How to model it?

$y$: score of event “user $i$ solves correctly item $j$”


\[y = \theta_i + e_j\]

Multidimensional IRT (similar to collaborative filtering)

\[y = \theta_i + e_j + \langle \bm{v_{\textnormal{user $i$}}}, \bm{v_{\textnormal{item $j$}}} \rangle\]


With side information

\small \vspace{-3mm} \(y = \theta_i + e_j + \alert{w_{\textnormal{mobile}}} + \langle \bm{v_{\textnormal{user $i$}}}, \bm{v_{\textnormal{item $j$}}} \rangle + \langle \bm{v_{\textnormal{user $i$}}}, \alert{\bm{v_{\textnormal{mobile}}}} \rangle + \langle \bm{v_{\textnormal{item $j$}}}, \alert{\bm{v_{\textnormal{mobile}}}} \rangle\)

Graphically: logistic regression


Graphically: factorization machines


Formally: factorization machines

Learn bias \alert{$w_k$} and embedding \alert{$\bm{v_k}$} for each feature $k$ such that: \(\logit p(\bm{x}) = \mu + \underbrace{\sum_{k = 1}^N \alert{w_k} x_k}_{\textnormal{logistic regression}} + \underbrace{\sum_{1 \leq k < l \leq N} x_k x_l \langle \alert{\bm{v_k}}, \alert{\bm{v_l}} \rangle}_{\textnormal{pairwise interactions}}\)

Multidimensional item response theory: $\logit p(\bm{x}) = \langle \bm{u_i}, \bm{v_j} \rangle + e_j$
is a particular case.

\normalsize Use temporal features

Learners evolve over time!

Simple assumptions:

Forgetting model



Thank you!

“Information geometry” was coined by Shunichi Amari (RIKEN)

Fisher information defines a \alert{Riemannian metric} on probability distributions

