How to Protect Your Privacy Using Search Engines?

How to Protect Your Privacy Using Search Engines?
Albin Petit ∗ Thomas Cerqueus
Sonia Ben Mokhtar †
Lionel Brunie
Universit´e de Lyon, CNRS INSA-Lyon, LIRIS,
UMR5205, F-69621, France
[email protected]
Harald Kosch
Universit¨at Passau, Innstrasse 43, 94032 Passau,
Germany
[email protected]
Abstract
Contribution
Our goal is to design a solution that helps people to protect their
privacy while using a Web search engine. In fact, users, through
their millions of queries sent every day to search engines, disclose a
lot of information (e.g., interests, political and religious orientation,
health status). This data, stored for each user in a profile, can
possibly be used for other purposes by the search engine (for
instance, commercial purpose) without any agreement from the
user.
Based on [2], we propose a new solution that allows a user protecting her own privacy while using a search engine. Our solution
combines a new efficient unlinkability protocol with a new accurate
indistinguishability protocol. Results demonstrate that our solution
ensures a high privacy protection: less queries are de-anonymized
than solutions from the state of the art. Besides, our solution is efficient (low latency and high throughput) and accurate (low irrelevant
results introduce). The unlinkability is ensured by anonymizing the
query while the indistinguishability is performed by obfuscating
the query.
Categories and Subject Descriptors K.4.1 [Computers and society]: Public Policy Issues—privacy
General Terms Security, Algorithms
Keywords Privacy, Web search, Unlinkability, Indistinguishability
Introduction
Querying search engines (e.g., Google, Bing, Yahoo!) is by far
the most frequent activity performed by online users. As a direct
consequence, search engines use all these queries to profile users.
User profiles are needed to improve recommender systems (i.e.,
recommendations or rankings). As a business model, they are also
earning money using user profiles through targeted advertising.
However, these user profiles may contain sensitive information
(e.g., interests, political and religious orientation, health status) that
could be used for other purposes without users’ agreement.
Solutions already exists to protect users’ privacy in the context
of Web search. They consist either in hiding the identity of the user
(unlinkability) or in obfuscating the content of the query (indistinguishability). In the first case, they usually hide user’s identity by
querying search engines through anonymous networks. It prevents
search engines to aggregate queries in a user profile. In the second
type of solutions, noise is introduced in the query to distort the user
profile created by the search engine. As a consequence, this user
profile is not accurate (i.e., users’ interests cannot be retrieved).
Nevertheless, these solutions are not totally satisfactory. [1] shows
that a significant proportion of queries can be either de-anonymized
(i.e., retrieve its requester’s identity) or cleaned (i.e., eliminate the
noise deliberately introduced). Besides, as shown on the poster,
a straightforward solution combining unlinkability and indistinguishability solutions still fails in protecting users in a satisfactory
manner.
∗ PhD
student
† Presenter
Query anonymization A query is composed of two types of information: the identity of the requester and the content of the query.
Query anonymization consists in removing information about the
identity of the requester (basically her IP address). To do so, it is
necessary to send the user’s query through two servers: the first
one, the receiver, knows the identity of the user while the second
one, the issuer, knows the content of the request. The user ciphers
her request with the public key of the issuer and sends the cipher
text to the receiver. The receiver forwards the message to the issuer. Then, the issuer deciphers the message (with its private key)
and sends the user’s query to the search engine. The issuer ciphers
the results returned by the search engine and sends this cipher text
back to the receiver. The receiver forwards the message to the user.
Finally the user retrieves the results of her query by deciphering the
message.
Query obfuscation This step consists in misleading the search
engine about the requester’s identity to protect queries against deanonimization. This is done by combining user query with multiple
fake queries separated by the logical OR operator. Nevertheless, this
protection can easily be broken if fake queries are not appropriately
generated. To solve this issue, fake queries are generated based on
previous real queries made by all users in the system. Previous
queries are published by the issuer in an aggregated format for
privacy reasons.
References
[1] Peddinti, Sai Teja, and Nitesh Saxena. “Web search query privacy:
Evaluating query obfuscation and anonymizing networks.” Journal of
Computer Security 22.1 (2014): 155-199.
[2] A. Petit, S. Ben Mokhtar, L. Brunie, and H. Kosch, “Towards efficient
and accurate privacy preserving web search,” in Proc. of the 9th
Workshop on Middleware for Next Generation Internet, 2014.
How to Protect Your Privacy
Using Search Engines?
PROBLEMATIC & MOTIVATION
CONTRIBUTION
Querying search engines (e.g., Google, Bing, Yahoo!) is by far the most frequent
activity performed by online users for efficiently retrieving content from the
tremendously increasing amount of data populating the Web. As a direct
consequence, search engines are likely to gather, store and possibly leak sensitive
information about individual users (e.g., interests, political and religious
orientations, health status). In this context, developing solutions to enable private
Web search is becoming absolutely crutial.
Privacy Proxy
PRIVACY PROXY
and Accurate W
SEARCH Efficient
ENGINE
RECEIVER
ISSUER PEAS: Private,
❷
❶
Petit, Thomas Cerqueus,
Harald Kosc
PEAS:E(Q⊕k
Private,
Efficient and
Accurate
Web SearchSoniaAlbin
Q)
xQ,E(Q⊕k
Ben Mokhtar, and Lionel Brunie
Q)
❸Q
USER
Albin Petit, Thomas Cerqueus,
Sonia Ben Mokhtar, and Lionel Brunie
{R}
❻
PRIVATE WEB SEARCH UNDER ATTACK
❹ R
xQ,{R}
Universit´e de Lyon, CNRS
INSA-Lyon, LIRIS,
kQ UMR5205, F-69621, France
Email: [email protected]
❺
xQ <> Client
Universit¨at Pas
Innstrasse 43, 94032 Pas
Email: harald.kosch@u
Universit´e de Lyon, CNRS
INSA-Lyon, LIRIS, UMR5205, F-69621, France
Email: [email protected]
Harald Kosch
Universit¨at Passau
Innstrasse 43, 94032 Passau, Germany
kQ [email protected]
Email:
a
11
1
b
a
e
4
a
b
11
The User 1 ciphers
(RSA encryption), with a the issuer’s public key, the
e
4
b of her
aggregation
query
Q and a new symmetric cryptographicc key kdQ she
c
e
b
generates. This cipher-text is sent to the Receiver
(❶).
Then, the Receiver
a
3
forwards thec message
to
the
Issuer
along
with
a
identifier
x
that
the
1unique
Q
d
PEAS: Private, Efficient and
Accurate
Web Search
e
Receiver assigns to the query (❷). The Issuer
usesd its private bkey to retrieve
c
PEAS:
Private,
Efficient and Ac
Albin Petit, Thomas Cerqueus,
Harald
Kosch
a
the plaintext, and
then submits the initial
to Brunie
the Search
Engine
Universit¨at Passau
Sonia query
Ben Mokhtar,Q
and Lionel
a
Innstrasse 43, 94032
Passau,
Universit´e de Lyon, CNRS
Albin
Petit,Germany
Thomas Cerqueus,
(❸). Upon receiving the results (❹), the
issuer
encrypts
(AES cencryption),
d
Email: [email protected]
INSA-Lyon,
LIRIS, UMR5205,
F-69621, France
Sonia Ben Mokhtar, and Lionel Brunie b
Email:
[email protected]
e
Inn
Universit´e de Lyon, CNRS
b
with the user's
key kQ, the Search Engine's answer R and returns cipherE
INSA-Lyon, LIRIS, UMR5205, F-69621, France
a
text along with the user's identifier xQ to the
Receiver (❺). Finally, Email:
[email protected]
c
a
Receiver retrieves
the identifier
and sends the
c
a
d the user identity using
e
b
a
e
PEAS:
Private,
Efficient
Accurate
Web Search
b
encrypted results to the User
(❻).
The latter
uses and
her
key
kQ to retrieve
the e
b
Albin Petit, Thomas Cerqueus,
Harald Kosch
e
b
results of its query Q.
Universit¨at Passau
Sonia Ben Mokhtar, and Lionel Brunie
Attacks based on a similarity metric
13
Fig. 1. Example of graph extracted from the co-occurrence matrix.
QUERY Q
1
Fig. 1. Example of graph extracted from the co-occurrence matrix.
sim(q,PU) = sn
❸
❷
Fig. 4. Example of graph extracted from the co-occurrence matrix.
10
a
1
.
0
0
1
❶
.
5
1 0 1 0
USER PROFILE PU
Fig. 4. Example of graph extracted fro
Fig. 2. Example of graph extracted from the co-occurrence matrix.
0
0
d
0
.
1
0
n
0
1
❶ Normalized dot product
❷ Sort
❸ Exponential smoothing
4
Fig. 2. Example of graph extracted from the co-occurrence matrix.
����������
17
d
Co-occurence matrix
Fig. 3. Example of graph extracted from the co-occurrence matrix.
���������
❸
Fig. 6. Example of graph extracted from the co-occurrence matrix.
Co-occurence
graph
e
b
����
����
������������
�
��
Fig. 2. Example of graph extracted from the co-occurrence matrix.
Fig. 3. Example of graph extracted from the co-occurrence matrix.
MAXIMAL
CLIQUES
USER
������
NON
a
MAXIMAL
CLIQUES
OR
�
����
����
� ������������
�
�
����
�
����������������������
�
USER
PROFILE
��
�
a
Query OR FQ1 ORa … OR FQk
Fig. 6. Example of graph extracted from the co-occurrence matrix.
keywords have
Fig. 3.not
Example of graph extracted from the co-occurrence matrix.
#keywords
=
been used by the user
frequency
����������
USER
17
SEARCH ENGINE
PRIVACY PROXY
ISSUER
RESULTS
R’
The first part of our solution is a privacy-preserving proxy that divides the
user’s identity and her query between two non-colluding servers: the Receiver and
the Issuer. As such, the receiver knows the identity of the requester without
learning her interests, while the issuer reads queries without knowing their
provenance.
However, unlinkability solutions are not enough to protect users’ privacy.
To solve this issue, we combine the privacy proxy with an obfuscation
mechanism to mislead the adversary about the requester’s identity (by combining
user queries with multiple fake queries using the OR operator). We propose to
generate fake queries based on an aggregated history of past requests gathered
from users of our system (i.e., Group Profile).
Privacy
“ Low percentage of
de-anonymized
queries
Sonia BEN MOKHTAR
Lionel BRUNIE
����������������������������������������
�����������������������������������������
�������������������
��������������������
�
�
�
�
����������������������
�
Performance
“ 3 times higher than
”
�����
�����
�����
�����
��
����
�������������
��
���
���
���
���
���������������
���
���
Accuracy
“ In 95% of cases, 80%
Albin PETIT
����
���
���
���
���
��
”
Onion Routing
CONTACTS
�������������
�������
RECEIVER
R’
Q’
����������
�������������
FILTERING
Q’
Harald KOSCH
[email protected] [email protected] [email protected] [email protected]
of the expected results
are returned
”
����
���
���
����
���
������
��
��
����
���
���
���
��
���
���
���
���
����
�������
����
������
��
��
���
���
���
!
13
���
���
17
Fig. 6. Example of graph extracted f
Group Profile
������������������
�������������
R
OBFUSCATION
4
c
d
The user generates k fake queries which have a corresponding graph
that is either (i) a maximal clique, (ii) a non-maximal clique, or (iii) a noncomplete graph. These fake queries must have the same characteristics as
the initial queries (i.e., number of keywords and usage frequency). Finally, the
obfuscated query is created by aggregating all queries together separated by a
logical OR operator.
����������
�������������
Q
1
b
e
b
Fig. 3. Example of graph extracted from the co-occurrence matrix.
User Profile
Fig. 6. Exa
d
17
c
OVERVIEW OF THE CONTRIBUTION
c
13
c
d
17
d
e
4
OBFUSCATED QUERY
OR
Fake Query
c
17
Fig.
11 3. Example of graph extracted from the co-occurrence matrix.
Fig. 5. Example of graph extracted f
1
b
e
b
Query
c
a
NON
CLIQUES
5
��
����
1
����������
���������������������
�����������������
10
����
����
����
���
���
������
���
��� ���������
������
�����
��
����
b
10
”
5
by aggregating k
fake queries
a
d
e
Fig. 5. Example of graph extracted from the co-occurrence matrix.
Fig. 6. Example of graph extracted from the co-occurrence matrix.
5
“Hides users’ queries
e
a
13
17
17
��
Fig. 5. Exa
c
Fig. 4. Example of graph extracted f
11
4
10
GOOPIR
1
Fig. 2. Example of graph extracted from the co-occurrence matrix.
������
���������
1
d
Fig. 1. Example of graph extracted from the co-occurrence matrix.
5
“Sends periodically
fake queries
”
����
���
���
���
��
Maximal
cliques
c
a
a
The Issuer publishes periodically in a privacy-preserving way
a Group
a
a
c
d
e
b
Profile (❶). To do so, it aggregates
users’
queries
in a co-occurence
matrix.
b
e
b
The goal of the obfuscation step is to mislead
the
adversary
with fake
e
b
queries. Consequently, users need to retrieve (from this matrix)c potential
past
d
c
b
c
d
queries by first computing a co-occurence graph
(❷)
and
then
extracting
all
c
d
maximal cliques (❸).
5
TRACKMENOT
10
������
❷
5
c
��
Harald Kos
13
Universit¨
at Pa
Fig. 4. Exa
Innstrasse 43, 94032 Pa
a
Universit´e de Lyon, CNRS
10
���
e
Fig. 1. Example of graph extracted from the co-occurrence matrix.
Albin
Petit, Thomas Cerqueus,
Sonia Ben Mokhtar, and Lionel Brunie
11
c
17
1
a
a
d e
Email: harald.kosch@
INSA-Lyon,
LIRIS, UMR5205, F-69621, France
a
e
4
b
Email:
[email protected]
c
17
d
11
e Fig. 6. Example of graph extracted fro
1
b
10 11
ethe co-occurrence matrix.
b
e Fig. 3. Example of graph extracted from
4
b c
d
e
b
0 4
Fig. 2. Example of graph extracted from the co-occurrence matrix.
a
13
c
d
17 13
13
11
c
d
1
a
c
d
Fig. 1. Example of graph extracted from the co-occurrence matrix.
0 0
e extracted
b
c of graph
d from the co-occurrence matrix.
Fig. 2. 4Example
c
17
d
Fig. 4. Example of graph extracted from the co-occurrence matrix.
a
b
0 0
Fig. 5. Example of graph extracted from the co-occurrence matrix.
5
b
c
d
e
���
b
e
b
10
aa
b
b c
1 0
0 5
e
5 0
0 17
4 13
4
Fig. 4. Example of graph extracted from the co-occurrence matrix.
5
❶
ISSUER
a
0
1
0
10
11
a
1
d
PEAS: Private, Efficient
W
b
c
d and Accurate
a
5
“Hides the identity
of the requester
”
����������
�����������������������
���
13
cc
Innstrasse 43, 94032 Passau, Germany
Universit´e de Lyon, CNRS
d d
Email: [email protected]
INSA-Lyon, LIRIS, UMR5205, F-69621, France
a
c
Fig.
Example of
of graph
from
the the
co-occurrence
matrix.
5.1.Example
graphextracted
extracted
from
co-occurrence
matrix.
Email:Fig.
[email protected]
Obfuscation
a
����
Fig. 5. Example of graph extracted fro
11
1
4
Results
UNLINKABILITY
11
10
c
1
.
1
1
5
b
0
.
0
0