Skip to main content

e.g. modern problem of predicting movie ratings.

  • nu=No.usersn_u=No.users
  • nm=No.moviesn_m=No.movies
  • r(i,j)=1r(i,j)=1 if user jj has rated movie ii
  • y(i,j)=y^{(i,j)} = rating given by user jj to movie ii (defined on if r(i,j)=1r(i,j)=1)
  • θ(j)=\theta^{(j)}= parameter vector for user jj
  • x(i)=x^{(i)}= feature vector for user ii
  • m(j)=No.m^{(j)} = No. of movies rated by user jj

Content Based Recommendations

To learn θ(j)\theta^{(j)} (parameter for user jj): minθ(j)12m(j)i:r(i,j)=1((θ(j))Tx(i)y(i,j))2+λ2m(j)k=1n(θk(j))2\min\limits_{\theta^{(j)}}\frac1{2m^{(j)}}\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2m^{(j)}}\sum\limits_{k=1}^n(\theta_k^{(j)})^2

To learn θ(1),θ(2),...,θ(nu)\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)} :

Because m(j)m^{(j)} is a constant, so we can just ignore it in the optimization algorithm.

(Optimization algorithm)

minθ(1),θ(2),...,θ(nu)12j=1nui:r(i,j)=1((θ(j))Tx(i)y(i,j))2+λ2j=1nuk=1n(θk(j))2\min\limits_{\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)}}\frac12\sum\limits_{j=1}^{n_u}\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum\limits_{j=1}^{n_u}\sum\limits_{k=1}^n(\theta_k^{(j)})^2

Gradient descent update:

θk(j):=θk(j)αi:r(i,j)=1((θ(j))Tx(i)y(i,j))xk(i)\mbox(fork=0)\theta_k^{(j)}:=\theta_k^{(j)}-\alpha\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_k^{(i)}\mbox{ ( for }k=0)

θk(j):=θk(j)α(i:r(i,j)=1((θ(j))Tx(i)y(i,j))xk(i)+λθk(j))\mbox(fork0)\theta_k^{(j)}:=\theta_k^{(j)}-\alpha(\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_k^{(i)}+\lambda\theta_k^{(j)})\mbox{ ( for }k\neq0)

Collaborative Filtering

Given θ(0),...,θ(nu)\theta^{(0)},...,\theta^{(n_u)}, to learn x(i)x^{(i)} :

minx(i)12i:r(i,j)=1((θ(j))Tx(i)y(i,j))2+λ2k=1n(xk(i))2\min\limits_{x^{(i)}}\frac12\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum\limits_{k=1}^n(x_k^{(i)})^2

Given θ(0),...,θ(nu)\theta^{(0)},...,\theta^{(n_u)}, to learn x(i),...,x(nm)x^{(i)},...,x^{(n_m)} :

minx(1),x(2),...,x(nm)12i=1nmi:r(i,j)=1((θ(j))Tx(i)y(i,j))2+λ2j=1nmk=1n(xk(i))2\min\limits_{x^{(1)},x^{(2)},...,x^{(n_m)}}\frac12\sum\limits_{i=1}^{n_m}\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum\limits_{j=1}^{n_m}\sum\limits_{k=1}^n(x_k^{(i)})^2

Combination of both Content Based Recommendations and Collaborative Filtering

Given x(1),x(2),...,x(nm)x^{(1)},x^{(2)},...,x^{(n_m)}, estimate θ(1),θ(2),...,θ(nu)\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)}:

minθ(1),θ(2),...,θ(nu)12j=1nui:r(i,j)=1((θ(j))Tx(i)y(i,j))2+λ2j=1nuk=1n(θk(j))2\min\limits_{\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)}}\frac12\sum\limits_{j=1}^{n_u}\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum\limits_{j=1}^{n_u}\sum\limits_{k=1}^n(\theta_k^{(j)})^2 Given θ(1),θ(2),...,θ(nu)\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)}, estimatex(1),x(2),...,x(nm)x^{(1)},x^{(2)},...,x^{(n_m)}:

Notes: Here should start from θ(1)\theta^{(1)} , because we want the algorithm to find the constant if indeed need! Besides, θ(i)Rn\theta^{(i)} \in R^n, x(i)Rnx^{(i)} \in R^n.

minx(1),x(2),...,x(nm)12i=1nmi:r(i,j)=1((θ(j))Tx(i)y(i,j))2+λ2j=1nmk=1n(xk(i))2\min\limits_{x^{(1)},x^{(2)},...,x^{(n_m)}}\frac12\sum\limits_{i=1}^{n_m}\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum\limits_{j=1}^{n_m}\sum\limits_{k=1}^n(x_k^{(i)})^2

Minimizing and simultaneously:

J(x(1),x(2),...,x(nm),θ(1),θ(2),...,θ(nu))=12(i,j):r(i,j)=1((θ(j))Tx(i)y(i,j))2+λ2j=1nmk=1n(xk(i))2+λ2j=1nuk=1n(θk(j))2J(x^{(1)},x^{(2)},...,x^{(n_m)},\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)})=\frac12\sum\limits_{(i,j):r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum\limits_{j=1}^{n_m}\sum\limits_{k=1}^n(x_k^{(i)})^2+\frac{\lambda}{2}\sum\limits_{j=1}^{n_u}\sum\limits_{k=1}^n(\theta_k^{(j)})^2

summarize

  1. Initialize x(1),x(2),...,x(nm),θ(1),θ(2),...,θ(nu)x^{(1)},x^{(2)},...,x^{(n_m)},\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)} to small random values.
  2. Minimize J(x(1),x(2),...,x(nm),θ(1),θ(2),...,θ(nu))J(x^{(1)},x^{(2)},...,x^{(n_m)},\theta^{(1)},\theta^{(2)},...,\theta^{(n_u)}) using gradient descent (or an advanced optimization algorithm). E.g. for every j=1,...,nu,i=1,...,nmj=1,...,n_u,i=1,...,n_m:

xk(i):=xk(i)α(j:r(i,j)=1((θ(j))Tx(i)y(i,j))θk(j)λxk(i))x_k^{(i)}:=x_k^{(i)}-\alpha(\sum\limits_{j:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})\theta_k^{(j)}-\lambda x_k^{(i)})

θk(j):=θk(j)α(i:r(i,j)=1((θ(j))Tx(i)y(i,j))xk(i)λθk(j))\theta_k^{(j)}:=\theta_k^{(j)}-\alpha(\sum\limits_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_k^{(i)}-\lambda \theta_k^{(j)})

  1. For a user with parameters and a movie with (learned) features xx, predict a star rating of θTx\theta^Tx.

For each product ii, we learn a feature vector x(i)Rnx^{(i)}\in R^n.

How to find movies related to movie?

5 most similar movies to movie :

Find the 5 movies jj with the smallest x(i)x(j)||x^{(i)}-x^{(j)}||.