Alexander Spangher

Alexander Spangher
The Background.
● Comments are currently hand-moderated,
and are thus opened on only 15% of articles.
The Background.
● Comments are opened on 15% of articles,
and are currently hand-moderated.
● Rudimentary ranking systems are employed.
The Background.
● Comments are opened on 15% of articles,
and are currently hand-moderated.
● Rudimentary ranking systems are employed.
● Personalization is starting to become a new
field of experimentation.
The Task(s).
● A classifier system would allow New York Times to
process a higher volume of comments.
● A regressor/ranker could be used to filter better
comments up further.
● A locality/projection model could introduce
personalized ranking, to might bring even further
engagement.
Current approach - “Otter”
User history
input
text
Profanity analysis
.
.
.
Flescher reading scores
log
classifier
output
Alternative Approach 1 - MLP
Engineered
set
.
.
.
.
.
.
.
.
.
softmax
output
17
input
50
hidden1
50
hidden2
Alternative Approach 2 - Word
Hashing and MLP
Word
Hashing
.
.
.
.
.
.
.
.
.
output
Derived
features
17,000
input
5,000
hidden1
5,000
hidden2
Comparison
Training set: 8,628
Validation: 1,702
Test: 1,702
(50 accept/50 reject split)
Alternative Approach 3 - Doc2Vec
Doc2Vec
.
.
.
.
.
.
.
.
.
300
5000
5000
Gensim’s Doc2Vec Implementation
Pros:
● Fast, simple to use and runs in Python.
Cons:
● Not online or scalable.
● Unclear whether Doc2Vec beats other
methods of text comprehension.
Scaling Doc2Vec
● I implemented two custom methods to build
the internal vocabulary tree:
○ add_labels()
○ delete_labels()
● Tested and will submit a pull request.
Evaluating Doc2Vec…?
Currently:
Home page:
Article page:
Currently using Collaborative Topic
Modeling
CTM
[ 0.02, 0.5, 0, 0, … , .01 ]
u
[ 0.9, 0.01, 0.2, … , .05 ]
What does this mean?
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Recommendations
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Recommended Articles:
Doc2Vec Substitution
document
label
Words
Feature
Space
User
Label
Methodology 1
1. Derive article scores using Doc2Vec
2. Derive user scores by averaging read
articles
Methodology 2
1. Derive article scores using Doc2Vec
2. Derive user scores using Doc2Vec
Methodology 2: Step 1
Methodology 2: Step 2
User
id
Results
Conclusion, Future steps