e.g. modern problem of predicting movie ratings.
- nu=No.users
- nm=No.movies
- r(i,j)=1 if user j has rated movie i
- y(i,j)= rating given by user j to movie i (defined on if r(i,j)=1)
- θ(j)= parameter vector for user j
- x(i)= feature vector for user i
- m(j)=No. of movies rated by user j
Content Based Recommendations
To learn θ(j) (parameter for user j):
θ(j)min2m(j)1i:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))2+2m(j)λk=1∑n(θk(j))2
To learn θ(1),θ(2),...,θ(nu) :
Because m(j) is a constant, so we can just ignore it in the optimization algorithm.
(Optimization algorithm)
θ(1),θ(2),...,θ(nu)min21j=1∑nui:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))2+2λj=1∑nuk=1∑n(θk(j))2
Gradient descent update:
θk(j):=θk(j)−αi:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))xk(i)\mbox(fork=0)
θk(j):=θk(j)−α(i:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))xk(i)+λθk(j))\mbox(fork=0)
Given θ(0),...,θ(nu), to learn x(i) :
x(i)min21i:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))2+2λk=1∑n(xk(i))2
Given θ(0),...,θ(nu), to learn x(i),...,x(nm) :
x(1),x(2),...,x(nm)min21i=1∑nmi:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))2+2λj=1∑nmk=1∑n(xk(i))2
Combination of both Content Based Recommendations and Collaborative Filtering
Given x(1),x(2),...,x(nm), estimate θ(1),θ(2),...,θ(nu):
θ(1),θ(2),...,θ(nu)min21j=1∑nui:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))2+2λj=1∑nuk=1∑n(θk(j))2
Given θ(1),θ(2),...,θ(nu), estimatex(1),x(2),...,x(nm):
Notes: Here should start from θ(1) , because we want the algorithm to find the constant if indeed need! Besides, θ(i)∈Rn, x(i)∈Rn.
x(1),x(2),...,x(nm)min21i=1∑nmi:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))2+2λj=1∑nmk=1∑n(xk(i))2
Minimizing and simultaneously:
J(x(1),x(2),...,x(nm),θ(1),θ(2),...,θ(nu))=21(i,j):r(i,j)=1∑((θ(j))Tx(i)−y(i,j))2+2λj=1∑nmk=1∑n(xk(i))2+2λj=1∑nuk=1∑n(θk(j))2
- Initialize x(1),x(2),...,x(nm),θ(1),θ(2),...,θ(nu) to small random values.
- Minimize J(x(1),x(2),...,x(nm),θ(1),θ(2),...,θ(nu)) using gradient descent (or an advanced optimization algorithm). E.g. for every j=1,...,nu,i=1,...,nm:
xk(i):=xk(i)−α(j:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))θk(j)−λxk(i))
θk(j):=θk(j)−α(i:r(i,j)=1∑((θ(j))Tx(i)−y(i,j))xk(i)−λθk(j))
- For a user with parameters and a movie with (learned) features x, predict a star rating of θTx.
For each product i, we learn a feature vector x(i)∈Rn.
How to find movies related to movie?
5 most similar movies to movie :
Find the 5 movies j with the smallest ∣∣x(i)−x(j)∣∣.