Alexander Spangher The Background. ● Comments are currently hand-moderated, and are thus opened on only 15% of articles. The Background. ● Comments are opened on 15% of articles, and are currently hand-moderated. ● Rudimentary ranking systems are employed. The Background. ● Comments are opened on 15% of articles, and are currently hand-moderated. ● Rudimentary ranking systems are employed. ● Personalization is starting to become a new field of experimentation. The Task(s). ● A classifier system would allow New York Times to process a higher volume of comments. ● A regressor/ranker could be used to filter better comments up further. ● A locality/projection model could introduce personalized ranking, to might bring even further engagement. Current approach - “Otter” User history input text Profanity analysis . . . Flescher reading scores log classifier output Alternative Approach 1 - MLP Engineered set . . . . . . . . . softmax output 17 input 50 hidden1 50 hidden2 Alternative Approach 2 - Word Hashing and MLP Word Hashing . . . . . . . . . output Derived features 17,000 input 5,000 hidden1 5,000 hidden2 Comparison Training set: 8,628 Validation: 1,702 Test: 1,702 (50 accept/50 reject split) Alternative Approach 3 - Doc2Vec Doc2Vec . . . . . . . . . 300 5000 5000 Gensim’s Doc2Vec Implementation Pros: ● Fast, simple to use and runs in Python. Cons: ● Not online or scalable. ● Unclear whether Doc2Vec beats other methods of text comprehension. Scaling Doc2Vec ● I implemented two custom methods to build the internal vocabulary tree: ○ add_labels() ○ delete_labels() ● Tested and will submit a pull request. Evaluating Doc2Vec…? Currently: Home page: Article page: Currently using Collaborative Topic Modeling CTM [ 0.02, 0.5, 0, 0, … , .01 ] u [ 0.9, 0.01, 0.2, … , .05 ] What does this mean? x x x x x x x x x x x x x x x Recommendations x x x x x x x x x x x x x x x x x Recommended Articles: Doc2Vec Substitution document label Words Feature Space User Label Methodology 1 1. Derive article scores using Doc2Vec 2. Derive user scores by averaging read articles Methodology 2 1. Derive article scores using Doc2Vec 2. Derive user scores using Doc2Vec Methodology 2: Step 1 Methodology 2: Step 2 User id Results Conclusion, Future steps
© Copyright 2025