Content-based recommendations with Poisson factorization Joanna Misztal Recommender systems • Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. – User’s preferences – Constraints Netflix Prize • Predict user ratings for films, based on previous ratings without any other information about the users or films • Training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies • US$1,000,000 in 2009 (2010 cancelled due to lawsuit) Recommender systems Collaborative filtering Contentbased Recommends items that were liked by users with simmilar preferences Recommend items simmilar to those that user liked in the past Based on ratings history Based on items attributes Content-based recommender systems • Build a model of user’s preferences based on the features of items that he rated • User’s interest in an object: matching user’s profile with items attributes Content-based recommender systems + - User independence (from other users) Limited content analysis Transparency (explanation) Over-specialization (no unexpected results) New item recommendation New user problem Collaborative filtering – neighbourhood approach • User-based approach: – User Eric has to decide whether or not to rent the movie “Titanic” that he has not yet seen. – He knows that Lucy has very similar tastes when it comes to movies, as both of them hated “The Matrix” and loved “Forrest Gump”, so he asks her opinion on this movie. – On the other hand, Eric finds out he and Diane have different tastes, Diane likes action movies while he does not, and he discards her opinion or considers the opposite in his decision. Collaborative filtering – neighbourhood approach • Item-based approach: – Instead of consulting with his peers, Eric instead determines whether the movie “Titanic” is right for him by considering the movies that he has already seen. – He notices that people that have rated this movie have given similar ratings to the movies “Forrest Gump” and “Wall-E”. – Since Eric liked these two movies he concludes that he will also like the movie “Titanic”. Matrix factorization D1 D2 D3 D4 U1 5 3 - 1 U2 4 - - 1 U3 1 1 - 5 U4 1 - - 4 U5 - 1 5 4 • Task: fill missing blanks • Latent features for user’s ratings (< nr users and items) Matrix factorization • R of size |U|x|D| - matrix of ratings • We want to discover K latent features • K=2, P(|U|xK matrix), Q(|D|xK matrix) • Each row of P – user’s features strength • Each row of Q – item’s features strength • Non-negative MF: – All elements of P, Q >0 – Intuitive meaning Matrix Factorization • Gradient descent: – Initialize P ad Q with some values – Iteratively minimize the difference of PxQ and R – Gradient of current values: – Update rules: – Overall error: Singular Value Decomposition • Find lower dimensional features that represent concepts Collaborative topic Poisson factorization (CTPF) • Latent articles topics indetification • Readers – topics preferences • How documents of one topic may be interesting to other users? • Massive, sparse, long-tailed data • Solving the ‘cold start’ problem • Organizing articles according to their topics Users preferences TOPICS Collaborative topic Poisson factorization (CTPF) Articles A case study: EM paper impact • ”Maximum likelihood from incomplete data via the EM algorithm” (1977) • Black bars – the topics that the EM paper is about • Red bars – the preferences of the readers who have the EM paper in their libraries • CTPF has uncovered the interdisciplinary impact of the EM paper Algorithm basics Poisson factorization generative process Latent featured initialization Observed features initialization Unread documents rating Approximate posterior inference Finding latent features, given observations Posterior approximation by variational inference Coordinate ascent algorithm CTPF – model V(words) Latent attributes W (Words counts) D(documents) Latent preferences R (Users ratings) 1 if user u consulted document d rud 0 if user u rated document d otherwise 0 Users preferences model U(users) Document model D(documents) CTPF – Poisson Factorization V(words) D(documents) W (Words counts) ~Poisson(a,b) D(documents) U(users) R (Users ratings) ~Poisson(c,d) Poisson distribution • Discrete probability distribution CTPF – latent features D(documents) K(topics) ϵ (topics offsets) U(users) V(words) D(documents) R (Users ratings) K(topics) K(topics) η (topics preferences) Users preferences model U(users) β (words intensities) Document model W (Words counts) θ (topics intensities) K(topics) V(words) K(topics) D(documents) CTPF – generative process D(documents) K(topics) ϵ (topics offsets) U(users) V(words) D(documents) R (Users ratings) K(topics) K(topics) η (topics preferences) Users preferences model U(users) β (words intensities) Document model W (Words counts) θ (topics intensities) K(topics) V(words) K(topics) D(documents) Recommending old and new documents In-matrix documents (rated by at least one user) Out-matrix documents (new to the system) • User’s unread documents: • No reader data – depend on the topics • Both reader and article data – use topic offsets Approximate posterior inference K(topics) K(topics) R (Users ratings) ϵ (topics offsets) V(words) β (words intensities) K(topics) K(topics) K(topics) D(documents) U(users) θ (topics intensities) U(users) W (Words counts) D(documents) V(words) D(documents) η (topics preferences) Approximate posterior inference • Posterior approximation by variational inference • Coordinate ascent algorithm – iterate over: – Non-zero document-words counts – Non-zero user-document ratings Variational inference • Approximate posterior density with a (simpler) density with new parameters q(z1:m | ν) • Find parameters which minimizes the KL divergence • Use q for future data predictions Auxiliary variables • For each word adding K latent variables(integers): • For each observed rating rud - K latent variables: • Consider auxiliary variables for non-zero values only. Variational family • Defined over independent latent variables: Optimal coordinate updates • Iteratively optimizing each variational parameter, the other fixed • Setting the variational parameter equal to the expected natural parameter – expectation of a function of the other random variables and observations. – Example update: Coordinate ascent algorithm Initialize the topics 1:K and topic intensities 1:D (use LDA). Repeat until convergence: 1. For each word count wdv > 0, set Φdv to the expected conditional parameter of zdv. 2. For each rating rud > 0, set ξud to the expected conditional parameter of yud. 3. For each document d and each k, update the block of variational topic intensities θdk to their expected conditional parameters. Perform similar block updates for β ~ vk, η ~ uk and ϵ~dk, in sequence. Latent Dirichlet allocation Document-topic model: • • • • W – observable (words) Z – topics for words in documents M – documents N – words in document Empirical results • Predictive approach to evaluating model fitness • Comparing predictive accuracy of CTPF to CTR • Datasets: – Mendeley dataset of scientific articles - a binary matrix of 80,000 users and 260,000 articles with 5 million observations – arXiv - a matrix of 120,297 users and 825,707 articles, with 43 million observations • Competing methods (topics and topic intensities initialized with LDA) – – – – – CTPF Decoupled Poisson Factorization Content Only (CTPF) Ratings Only (Poisson) CTR Evaluation • Test set: 20% of ratings and 1% of documents in each data set • Validation set: 1% of ratings (20% for arXiv) • Testing: – generate the top M recommendations for each user (items with the highest predictive score under each method) Comparison Top recommendations Conclusions • The text of the article + user behavior data • Cold-start: New articles based on text • Popular articles: based on readership • Organization of documents by topics
© Copyright 2024