Click-Through Rate prediction: TOP-5 solution for the Avazu contest Dmitry Efimov April 21, 2015 Outline Provided data Likelihood features FTRL-Proximal Batch algorithm Factorization Machines Final results Competition Device layer: id, model, type Time layer: day, hour Site layer: id, domain, category Basic features Provided data Connection layer: ip, type Banner layer: position, C1, C14-C21 Application layer: id, domain, category Notations X: m × n design matrix mtrain = 40 428 967 mtest = 4 577 464 n = 23 y: binary target vector of size m xj : column j of matrix X xi : row i of matrix X 1 : sigmoid function σ(z) = 1 + e−z Evaluation Logarithmic loss for yi ∈ {0, 1}: m L=− 1 X (yi log(yˆi ) + (1 − yi ) log(1 − yˆi )) m i=1 yˆi is a prediction for the sample i Logarithmic loss for yi ∈ {−1, 1}: m L= 1 X log(1 + e−yi pi ) m i=1 pi is a raw score from the model yˆi = σ(pi ), ∀i ∈ {1, . . . , m} Feature engineering I Blocks: combination of two or more features I Counts: number of samples for different feature values I Counts unique: number of different values of one feature for fixed value of another feature I Likelihoods: min L, where θt = P(yi | xij = t) θt I Others Feature engineering (algorithm 1) X : x11 ··· ··· ··· ··· ··· xm1 ··· ··· ··· ··· ··· ··· ··· x1j1 ··· ··· ··· ··· ··· xmj1 Z : x1j1 ··· xi1 j1 ··· xir j1 ··· xmj1 · · · x1js ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· · · · xmjs Step 1 · · · x1js ··· ··· · · · xi1 js ··· ··· · · · xir js ··· ··· · · · xmjs ··· ··· ··· ··· ··· ··· ··· x1n ··· ··· ··· ··· ··· xmn Require: J = {j1 , . . . , js } ⊂ {1, . . . , n} K = {k1 , . . . , kq } ⊂ {1, . . . , n}\J Z ← (xij ) ⊂ X , i ∈ {1, 2, . . . , m}, j ∈ J Step 2 Z1 xi1 j1 · · · Zt : · · · xir j1 ··· ··· ··· xi1 js ··· xir js · · · ZT (Z ) Step 3 ··· ci1 = . . . = cir = r yi + . . . + yir pi1 = . . . = pir = 1 r bi1 = . . . = bir = t xi1 k1 · · · xi1 kq At : · · · · · · · · · xir k1 · · · xir kq Step 4 ... BT (At ) B1 Step 5 ui1 = . . . = uir = T (At ) function SPLIT B Y R OWS(M) get partition of the matrix M into {Mt } by rows such that each Mt has identical rows ··· for I = {i1 , . . . , ir } ∈ SPLIT B Y R OWS(Z ) ci1 = . . . = cir = r yi + . . . + yir pi1 = . . . = pir = 1 r bi1 = . . . = bir = t At = (xik ) ⊂ X , i ∈ I, k ∈ K T (At ) = size(SPLIT B Y R OWS(At ) ) ui1 = . . . = uir = T (At ) Feature engineering (algorithm 2) function SPLIT B Y R OWS(M) get partition of the matrix M into {Mt } by rows such that each Mt has identical rows Require: parameter α > 0 J ← (j1 , . . . , js ) ⊂ {1, . . . , n} increasing sequence V ← (v1 , . . . , vl ) ⊂ {1, . . . , s}, v1 < s y1 + . . . + ym fi ← , ∀i ∈ {1, . . . , m} m for v ∈ V do Jv = {j1 , . . . , jv }, Z = (xij ) ⊂ X , i ∈ {1, 2, . . . , m}, j ∈ Jv for I = {i1 , . . . , ir } ∈ SPLIT B Y R OWS(Z ) do ci1 = . . . = cir = r yi + . . . + yir pi1 = . . . = pir = 1 r w = σ(−c + α) - weight vector fi = (1 − wi ) · fi + wi · pi , ∀i ∈ {1, . . . , m} FTRL-Proximal model Weight updates: i 1 P gr · w + τr ||w − wr ||22 + λ1 ||w||1 = arg min w 2 r =1 r =1 wi+1 i P = i i P P 1 2 = arg min w · (gr − τr wr ) + ||w||2 τr + λ1 ||w||1 + const , w 2 r =1 r =1 s where i P r =1 β+ τrj = i P r =1 α grj 2 + λ2 , j ∈ {1, . . . , N}, λ1 , λ2 , α, β - parameters, τr = (τr 1 , . . . , τrN ) - learning rates, gr - gradient vector for the step r FTRL-Proximal Batch model Require: parameters α, β, λ1 , λ2 zj ← 0 and nj ← 0, ∀j ∈ {1, . . . , N} for i = 1 to m do receive sample vector xi and let J = {j|xij 6= 0} for j ∈ J do if |zj | 6 λ1 0 −1 √ β + nj wj = − zj − sign(zj )λ1 otherwise + λ2 α predict yˆi = σ(xi · w) using the wj and observe yi if yi ∈ {0, 1} then for j ∈ J do gj = yˆi − yi - gradient direction of loss w.r.t. wj 1 q √ τj = nj + gj 2 − nj α zj = zj + gj − τj wj nj = nj + gj 2 Performance Description dataset is sorted by app id, site id, banner pos, count1, day, hour dataset is sorted by app domain, site domain, count1, day, hour dataset is sorted by person, day, hour dataset is sorted by day, hour with 1 iteration dataset is sorted by day, hour with 2 iterations Leaderboard score 0.3844277 0.3835289 0.3844345 0.3871982 0.3880423 Factorization Machine (FM) Second-order polynomial regression: n−1 X n X yˆ = σ wjk x j x k j=1 k=j+1 Low rank approximation (FM): n−1 X n X (vj1 , . . . , vjH ) · (vk1 , . . . , vkH )x j x k yˆ = σ j=1 k=j+1 H is a number of latent factors Factorization Machine for categorical dataset Assign set of latent factors for each pair level-feature: n−1 X n X 2 yˆi = σ (wxij k1 , . . . , wxij kH ) · (wxik j1 , . . . , wxik jH ) n j=1 k=j+1 1 Add regularization term: Lreg = L + λ||w||2 2 The gradient direction: gxij kh = ∂Lreg 2 y e−yi pi = − · i −y p · wxik jh + λwxij kh ∂wxij kh n 1+e i i 2 Learning rate schedule: τxij kh = τxij kh + gxij kh q Weight update: wxij kh = wxij kh − α · τxij kh · gxij kh Ensembling Model ftrlb1 ftrlb2 ftrlb3 fm ens Description dataset is sorted by app id, site id, banner pos, count1, day, hour dataset is sorted by app domain, site domain, count1, day, hour dataset is sorted by person, day, hour factorization machine 0.6 fm · ftrlb10.1 · ftrlb20.2 · ftrlb30.1 Leaderboard score 0.3844277 0.3835289 0.3844345 0.3818004 0.3810447 Final results Place Team Leaderboard score 1 2 3 4 5 6 7 4 Idiots Owen Random Walker Julian de Wit Dmitry Efimov Marios and Abhishek Jose A. Guerrero 0.3791384 0.3803652 0.3806351 0.3810307 0.3810447 0.3828641 0.3829448 Difference between the 1st place — 0.32% 0.40% 0.50% 0.50% 0.98% 1.00% Future work I apply the batching idea to the Factorization Machine algorithm I find a better sorting for the FTRL-Proximal Batch algorithm I find an algorithm that can find better sorting without cross-validation procedure References H.Brendan McMahan et al. ”Ad click prediction: a view from the trenches.” In KDD, Chicago, Illinois, USA, August 2013. Wei-Sheng Chin et al. ”A learning-rate schedule for stochastic gradient methods to matrix factorization.” In PAKDD, 2015. Michael Jahrer et al. ”Ensemble of collaborative filtering and feature engineered models for click through rate prediction.” In KDD Cup, 2012 Steffen Rendle. ”Social network and click-through prediction with factorization machines.” In KDD Cup, 2012 Thank you! Questions??? Dmitry Efimov [email protected]
© Copyright 2024