% Using Ratings & Posters\newline for Anime & Manga Recommendations % Jill-Jênn Vie % RIKEN Center for Advanced Intelligence Project (Tokyo)\newline Mangaki (Paris) — header-includes: - \usepackage{tikz} - \usepackage{array} - \usepackage{icomma} - \usepackage{multicol,booktabs} - \def\R{\mathcal{R}} handout: true —
- 2006: Prépa MP au lycée Thiers
- 2008: Auditeur à l’ENS de Lyon
- 2010: Normalien à l’ENS de Cachan
- 2012: Master Parisien de Recherche Informatique
- 2013: Raté un master de mathématiques
- 2014: Agrégation de mathématiques
- 2016: Thèse d’informatique à l’Université Paris-Saclay
- 2017: Postdoc à Tokyo
\
\
\centering
\begin{tabular}{ccc} \toprule
\multicolumn{2}{c}{$X$} & $y$\ \cmidrule{1-2}
\texttt{user_id} & \texttt{work_id} & \texttt{rating}\ \midrule
24 & 823 & like
12 & 823 & dislike
12 & 25 & favorite
\ldots & \ldots & \ldots\ \bottomrule
\end{tabular}
\pause
\centering
\begin{tabular}{ccc} \toprule
\multicolumn{2}{c}{$X$} & $\hat{y}$\ \cmidrule{1-2}
\texttt{user_id} & \texttt{work_id} & \texttt{rating}\ \midrule
24 & 25 & \only<2>{?}\only<3>{\alert{disliked}}
12 & 42 & \only<2>{?}\only<3>{\alert{liked}}\ \bottomrule
\end{tabular}
If I predict $\hat{y_i}$ for each user-work pair to test among $n$,
while truth is $y^*_i$:
\
\pause
research.mangaki.fr
(features for movies: directors, genre, etc.)
(solely based on ratings)
(combine those two)
If $R’$ the $N \times M$ matrix of rows $\frac{\R_u}{ | \R_u | }$, we can get the $N \times N$ score matrix by computing $R’ R’^T$. |
\vspace{-7mm}
\(R = \left(\begin{array}{c} \R_1\\ \R_2\\ \vdots\\ \R_n \end{array}\right) = \raisebox{-1cm}{\begin{tikzpicture} \draw (0,0) rectangle (2.5,2); \end{tikzpicture}} = \raisebox{-1cm}{\begin{tikzpicture} \draw (0,0) rectangle ++(1,2); \draw node at (0.5,1) {$C$}; \draw (1.1,1) rectangle ++(2.5,1); \draw node at (2.35,1.5) {$P$}; \end{tikzpicture}}\) \(\text{$R$: 2k users $\times$ 15k works} \iff \left\{\begin{array}{l} \text{$C$: 2k users $\times$ \alert{20 profiles}}\\ \text{$P$: \alert{20 profiles} $\times$ 15k works}\\ \end{array}\right.\) $\R_\text{Bob}$ is a linear combination of profiles $P_1$, $P_2$, etc..
\pause
\begin{tabular}{@{}lccc@{}}
If $P$ & $P_1$: adventure & $P_2$: romance & $P_3$: plot twist
And $C_u$ & $0,2$ & $-0,5$ & $0,6$
\end{tabular}
$\Rightarrow$ $u$ \alert{likes a bit} adventure, \alert{hates} romance, \alert{loves} plot twists.
\vspace{2mm}
\pause
$R = (U \cdot \Sigma)V^T$ where $U : N \times r$ et $V : M \times r$ are orthogonal and $\Sigma : r \times r$ is diagonal, with singular values in decreasing order.
\alert{Closer} points mean similar taste
\
You will \alert{like} movies that are \alert{in your direction}
\
$R$ ratings, $C$ coefficients, $P$ profiles ($F$ features).
$R = CP = CF^T \Rightarrow r_{ij} \simeq \hat{r}_{ij} \triangleq C_i \cdot F_j$.
SVD : $\sum_{i, j}~(r_{ij} - C_i \cdot F_j)^2$ (deterministic)
\pause
ALS : $\sum_{i, j \textnormal{\alert{ known}}}~(r_{ij} - C_i \cdot F_j)^2$
\pause
\alert<6>{ALS-WR} : $\sum_{i, j \textnormal{\alert{ known}}}~(r_{ij} - C_i \cdot F_j)^2 + \lambda (\sum_i \alert<6>{N_i} ||C_i||^2 + \sum_j \alert<6>{M_j} ||F_j||^2)$
where $N_i$ ($M_j$): how many times user $i$ rated (item $j$ was rated)
\pause
WALS by Tensorflow™ : \(\sum_{i, j} w_{ij} \cdot (r_{ij} - C_i \cdot F_j)^2 + \lambda (\sum_i ||C_i||^2 + \sum_j ||F_j||^2)\)
where $w_{ij}$: how much can you trust rating $r_{ij}$.
\pause
$R = CP$
\pause
\centering
{width=40%}\ {width=40%}\
$T$ matrix of 15000 works $\times$ 502 tags ($t_{jk}$: tag $k$ appears in item $j$)
\pause
\pause
\noindent where $N_i$ is the number of items rated by user $i$.
We would like to do:
\[\hat{r}_{ij}^{BALSE} = \begin{cases} \hat{r}_{ij}^{ALS} & \text{if item $j$ was rated at least $\gamma$ times}\\ \hat{r}_{ij}^{LASSO} & \text{otherwise} \end{cases}\]But we can’t. Why? \pause \alert{Not differentiable!}
\[\hat{r}_{ij}^{BALSE} = \alert{\sigma(\beta(R_j - \gamma))} \hat{r}_{ij}^{ALS} + \left(1 - \alert{\sigma(\beta(R_j - \gamma))}\right) \hat{r}_{ij}^{LASSO}\]\noindent
where $R_j$ denotes how many times item $j$ was rated
$\beta$ and $\gamma$ are learned by stochastic gradient descent.
\pause
\centering
We call this gate the \alert{Steins;Gate}.
\centering
\
\pause
We call this model \alert{BALSE}.
\centering
\
\centering {width=50%}\
\raggedright
\small Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario
\normalsize \alert{github.com/mangaki/balse} (PDF on arXiv, front page of HNews)