% Variational factorization machines applied to anime/manga recommendations\newline and preference elicitation % Jill-Jênn Vie¹ \and Hisashi Kashima¹² % ¹ RIKEN Center for AIP, Tokyo\newline ² Kyoto University — handout: true theme: Frankfurt section-titles: false header-includes: - \usepackage{bm} - \usepackage{multicol,booktabs} - \usepackage{algorithm,algpseudocode,algorithmicx} - \DeclareMathOperator\logit{logit} - \def\Lb{\mathcal{L}_b} - \def\ReLU{\textnormal{ReLU}} —
\includegraphics[width=\linewidth]{figures/france7.jpg}
\begin{tabular}{ccccc}
& \includegraphics[height=2.5cm]{figures/1.jpg} & \includegraphics[height=2.5cm]{figures/2.jpg} & \includegraphics[height=2.5cm]{figures/3.jpg} & \includegraphics[height=2.5cm]{figures/4.jpg}
Satoshi & ? & 5 & 2 & ?
Kasumi & 4 & 1 & ? & 5
Takeshi & 3 & 3 & 1 & 4
Joy & 5 & ? & 2 & ?
\end{tabular}
\begin{tabular}{ccccc}
& \includegraphics[height=2.5cm]{figures/1.jpg} & \includegraphics[height=2.5cm]{figures/2.jpg} & \includegraphics[height=2.5cm]{figures/3.jpg} & \includegraphics[height=2.5cm]{figures/4.jpg}
Satoshi & \alert{3} & 5 & 2 & \alert{2}
Kasumi & 4 & 1 & \alert{4} & 5
Takeshi & 3 & 3 & 1 & 4
Joy & 5 & \alert{2} & 2 & \alert{5}
\end{tabular}
\
\
\
\
\begin{center}
\begin{tabular}{ccc} \toprule
Kasumi & \alert{like} & \emph{Zootopia}
Kasumi & \alert{favorite} & \emph{Porco Rosso}
Satoshi & \alert{favorite} & \emph{Tokikake}
Satoshi & \alert{dislike} & \emph{The Martian}\ \bottomrule
\end{tabular}
\end{center}
\begin{center}
\begin{tabular}{ccc} \toprule
Kasumi & \alert{\only<1>{?}\only<2>{favorite}} & \emph{The Martian}
Satoshi & \alert{\only<1>{?}\only<2>{like}} & \emph{Zootopia}\ \bottomrule
\end{tabular}
\end{center}
\includegraphics{figures/svd2black.jpg}
Approximate $R$ ratings $n \times m$
\begin{block}{Fit} Learn $U$ and $V$ to \alert{minimize} $ {||R - UV^T||}_2^2 + \lambda \cdot \textnormal{regularization} $ \end{block}
\pause
\begin{block}{Predict: Will user Satoshi like item Naruto?} Just compute $\alert{U_{\textnormal{Satoshi}}} \cdot \alert{V_{\textnormal{Naruto}}}$ and you will know! \end{block}
\begin{algorithm}[H] Initialize randomly $U$ and $V$ \begin{algorithmic} \Repeat \State Fix users $U$ learn items $V$ to minimize error + reg \State Fix items $V$ learn users $U$ \Until {convergence} \end{algorithmic} \caption{Alternating Least Squares with Weighted Regularization} \label{als-wr} \end{algorithm}
\fullcite{zhou2008large}
\includegraphics{figures/embed38.pdf}
\centering {width=80%}\
\pause
\alert{N. B.} – FMA does not mean Fullmetal Alchemist
FMA means Factorization Machine
If no ratings are available for a work $j$
$\Rightarrow$ Its vector $W_j$ cannot be learned :-(
$\Rightarrow$ No way to distinguish between unrated works.
\centering
{height=70%}\ {height=70%}\
\includegraphics{figures/archifinal.pdf}
From the reviewers: \bigskip
\pause
Two models individually are existing methods, but this work presents a novel fusion method called \alert{Steins gate} to integrate results given by two models.
\fullcite{BALSE2017}
Existing student $i$ got question $j$ correct or incorrect
Will student $i$ get question $j$ correct?
\centering \includegraphics[width=0.49\linewidth]{figures/embedding1.png}
Learn abilities $\theta_i$ for each user $i$
Learn easiness $e_j$ for each item $j$ such that:
\(\begin{aligned}
Pr(\textnormal{User $i$ Item $j$ OK}) & = \sigma(\theta_i + e_j)\\
\logit Pr(\textnormal{User $i$ Item $j$ OK}) & = \theta_i + e_j
\end{aligned}\)
\pause
Learn $\alert{\bm{w}}$ such that $\logit Pr(\bm{x}) = \langle \alert{\bm{w}}, \bm{x} \rangle$
Usually with L2 regularization: ${ | \bm{w} | }_2^2$ penalty $\leftrightarrow$ Gaussian prior |
\centering
\[Pr(\textnormal{User $i$ Item $j$ OK}) = \sigma(\langle \bm{w}, \bm{x} \rangle) = \sigma(\theta_i + e_j)\]If you know user $i$ attempted item $j$ on \alert{mobile} (not desktop)
How to model it?
$y$: score of event “user $i$ solves correctly item $j$”
\pause
\centering \input{tables/show-swf}
\centering
Learn bias \alert{$w_k$} and embedding \alert{$\bm{v_k}$} for each feature $k$ such that: \(\logit p(\bm{x}) = \mu + \underbrace{\sum_{k = 1}^N \alert{w_k} x_k}_{\textnormal{logistic regression}} + \underbrace{\sum_{1 \leq k < l \leq N} x_k x_l \langle \alert{\bm{v_k}}, \alert{\bm{v_l}} \rangle}_{\textnormal{pairwise interactions}}\)
\fullcite{rendle2012factorization}
Take a batch $x_{batch}$ and $y_{batch}$ and update the parameters such that the error is minimized.
\begin{algorithm}[H] \begin{algorithmic} \For {batch $\bm{X}_B, y_B$} \For {$k$ feature involved in this batch $\bm{X}_B$} \State Update $w_k, \bm{v}_k$ to decrease loss estimate $\mathcal{L}$ on $\bm{X}_B$ \EndFor \EndFor \end{algorithmic} \caption{SGD} \label{algo-vfm} \end{algorithm}
\begin{algorithm}[H] Prior on every $V$ \begin{algorithmic} \For {each iteration} \State Sample hyperparameters from posterior using MCMC \State Sample weights $w$ \State Sample vectors $V$ \State Sample predictions $y$ \EndFor \end{algorithmic} \caption{MCMC implementation of FMs} \label{mcmc-fm} \end{algorithm}
MCMC require many samples to converge
\fullcite{rendle2012factorization}
Learn layers \alert{$W^{(\ell)}$} and \alert{$b^{(\ell)}$} such that: \(\begin{aligned}[c] \bm{a}^{0}(\bm{x}) & = (\alert{\bm{v_{\texttt{user}}}}, \alert{\bm{v_{\texttt{item}}}}, \alert{\bm{v_{\texttt{device}}}}, \ldots)\\ \bm{a}^{(\ell + 1)}(\bm{x}) & = \ReLU(\alert{W^{(\ell)}} \bm{a}^{(\ell)}(\bm{x}) + \alert{\bm{b}^{(\ell)}}) \quad \ell = 0, \ldots, L - 1\\ y_{DNN}(\bm{x}) & = \ReLU(\alert{W^{(L)}} \bm{a}^{(L)}(\bm{x}) + \alert{\bm{b}^{(L)}}) \end{aligned}\)
\[\logit p(\bm{x}) = y_{FM}(\bm{x}) + y_{DNN}(\bm{x})\]\fullcite{guo2017deepfm}
\fullcite{Settles2018}
\centering \input{tables/duolingo}
Available on \url{http://sharedtask.duolingo.com}
\centering
\begin{tabular}{cccc} \toprule
Rank & Team & Algo & AUC\ \midrule
1 & SanaLabs & RNN + GBDT & .857
2 & singsound & RNN & .854
2 & NYU & GBDT & .854
4 & CECL & LR + L1 (13M feat.) & .843
5 & TMU & RNN & .839\ \midrule
7 (off) & JJV & Bayesian FM & .822
8 (off) & JJV & DeepFM & .814
10 & JJV & DeepFM & .809\ \midrule
15 & Duolingo & LR & .771\ \bottomrule
\end{tabular}
\raggedright \fullcite{Duolingo2018}
Approximate true posterior with an easier distribution (Gaussian)
according to the KL divergence
Increase the ELBO $\Rightarrow$ increase the objective
\begin{algorithm}[H] \begin{algorithmic} \For {batch $\bm{X}_B, y_B$} \For {$k$ feature involved in this batch $\bm{X}_B$} \State Sample $w_k \sim q(w_k)$, $\bm{v}_k \sim q(\bm{v}_k)$ \EndFor \For {$k$ feature involved in this batch $\bm{X}_B$} \State Update each* parameter $\theta$ to increase an unbiased ELBO estimate $\Lb$ on this batch $\bm{X}_B$ \EndFor \EndFor \end{algorithmic} \caption{Variational Training (SGVB) of FMs} \label{algo-vfm} \end{algorithm}
* Here, $\theta \in {\mu_k^w, \sigma_k^w, \bm{\mu}_k^v, \bm{\sigma}_k^v}$
\centering \input{tables/datasets-movies}
\scriptsize \begin{table}[h] \centering \input{tables/table-sep-4-forced} \caption{Results on all datasets for the regression task.} \label{results} \end{table}
\tiny \begin{table} \centering \input{tables/table-sep-5b} \caption{Results on the Fraction dataset for the classification task.} \label{results-class} \end{table}
\vspace{1cm}
VFM is implemented in TensorFlow.
Feel free to try it on GitHub (vfm.py
):
github.com/jilljenn/vae
\vspace{1cm}
Please try Mangaki:
mangaki.fr