secure Internet How to Matti as Wingstedt

secure
Internet
How to
— an end-to-end approach —
M at t i a s W ings t ed t
SP ons or ed BY
Contents
Copyright © 2010 by Mattias Wingstedt
Illustrations Copyright © 2010 by Karin Holstensson
Some rights reserved.
This book is licensed under a Creative Commons Attribution:
http://creativecommons.org/licenses/by/3.0/legalcode
Version 1.0
Design by Peter Lindgren
This first edition was finished and published December 2010,
and is available online at https://starttls.se/documents/
Please point to this URL every time you quote or mention this book.
This White Paper has been financed by .SE (Swedish foundation for
Internet Infrastructure).
.SE’s mission is to act for a positive development of Internet in
Sweden. This project has been finances by .SE’s Internet fund. The
Internet fund supports the development of Internet by financing
independent projects.
Introduction
TLS@DNS-SEC
DNS-SEC
Warning messages
Outsourcing secure services
STARTTLS for HTTP
Forward security
Three party handshake
SRV-records
Finding a user’s public key
Store both email ­address and public key
User key management
Key management application
Design of the system
Implementation
Initial upload of public master key
The settings file
International characters
Open access control
Group service
Group address
Groups that are member of other groups
Service broker Using the service broker
URL to named services
Implementation of the service broker
The service records
Connecting to the service broker
Distributed database
NoSQL
Distributed database service
Distributed application
Sandbox
API
Getting there
Appendix A — ­Cryptography primer
Symmetric-key ciphers
Public-key cipher
Forward security with Diffie-Hellman
Appendix B ­— FAQ
1
5
6
7
7
7
8
8
9
11
12
12
13
14
14
15
15
15
17
18
18
19
21
22
23
23
24
24
27
28
29
31
32
32
32
35
35
37
39
43
Introduction
Internet is wonderful - the first truly global communication ­system with
no long-distance charges. It is especially good for open ­information.
Hyperlinks means that it doesn’t matter where, or how, the information
resides. Search engines make it easy to find ­information. It is easy to
combine services provided by different vendors.
Standards make it easy to develop web applications, as well as
mobile apps that get their intelligence from the cloud. It is easy to
develop a first version, and then let the service scale as it attracts
more users. A service that proves to be valuable can be run in the
cloud and scale to hundreds of millions, or even billions, of users. It
has never been as easy to create and deliver software.
However the Internet it not as good at handling private
information, or information that should only be shared
within a limited number of people. To share such
information we must often use a service that is its
own self-contained world. We have to stick with
the search engine provided — it is not possible
to choose the best one. Each user must have an
account on the same service — we acquire more
logins and passwords all the time. It is hard to
hyperlink between such services since we must
first ensure that everyone who sees the link also
have permission to access the linked page.
We are also often forced to share the
information with more people than we want. The
information may be intercepted by people who
use the same WiFi network as well as intelligence
agencies that monitor traffic that passes borders. We
must also trust the vendor that provides the service, and
the employees that work there.
This white paper looks at ways we can improve this situation. First we look at how to make it easier and cheaper to deploy
TLS-encryption. If all cloud-based services used TLS-encryption we
1
would not have to worry about people on the same WiFi network,
nor whether we are in a country with friendly intelligence agencies.
We would only have to trust the organization running the service,
no matter from which country in the world we use it.
Then we will look at how public key cryptography can be used
for user authentication, providing an alternative to passwords. This
is extended with an open access control system that lets us authenticate groups as well as individual users. Using this access control
system it becomes much easier to share information with a limited
set of users. Our friends do not have to create user accounts
on each cloud service – it is enough to belong to
the same access control group.
We can also use this authentication
system to grant access to programs as well
as users. That way we can let a search engine
index private information. The search engine
itself can then use the access control system
to ensure that the search results are only shown
to people that belong to the correct access control
group.
After looking at how to improve cloud-based
services we look at how to build applications where users
connect directly to each other. Putting the smarts in the
end points, rather than in the cloud. This is called
end-to-end security and is the gold standard for any
solution to secure military or commercial communication. We look at how to make it easy
to build end-to-end secure applications, and
for users to host such services on their own
computers.
To run a service for our friends we must
ensure that our computer stays on all the time,
or the ­service will be unavailable. This is in stark
contrast with a cloud-based service where professionals ensure that the service is available
24/7. To solve this problem we look at distributed NoSQL database. Using such a database
for storage the service will be available
even when some computers are turned off.
It is enough that just one computer stays
online.
With a distributed database as a base
we can start building peer-to-peer applications
that are end-to-end secure. Instead of running the
application in the cloud we run it on the users’ computers and
smartphones.
2
Finally we look at how to make it easy to create such peer-to-peer
­applications, using a JavaScript API. Such an API makes it possible
to run an application in different environments, the same way it is
­possible to use any web browser to view a web page. We aim at running such an application in two ways:
1. As a normal web application, running on a web browser and a
web server
2.As an end-to-end secure peer-to-peer application, running in
a special browser made for such applications
Thus it is up to the user to decide how she wants to run the application. Some users want applications that are easy to install
and run on their web site or intranet. Other users want
something that is end-to-end secure. Using the
JavaScript API both categories of users can use the
same application.
I hope to show that it is possible to build
new Internet standards based on end-to-end
secure communications, without having to
sacrifice ease of use or rapid development
of applications. In so doing we will also
improve cloud-based services, making
it easier to share information no matter
where and how we decide to store it.
Much of this paper concerns crypto­
graphy, since it is the basis of creating secure
communication. Appendix A explains the
cryptography terms used in the rest of the paper.
If you have further questions please don’t
hesitate to contact us at [email protected]
3
TLS@DNS-SEC
The first problem to solve to communicate securely is to ensure
that you really are connecting to the right person, or in the case
of Internet services, the right server. Lets say we want to make a
.
secure connection to a website to fetch the URL https://starttls.se
­­
To connect to a web server we do two things:
1. Use DNS to find the IP address of starttls.se
2.Make a TCP connection to port 443 of the computer with that
IP address
sta
rtt
ls.
201
se
1-1
2-1
3
The problem facing the designers of the TLS protocol, the secure
protocol used by HTTPS, is that neither of these two operations are
secure. An attacker can change the IP number returned by DNS.
Or she can change the routing so that our TCP connection ­actually
reaches another computer.
Therefore they created a method so that the web server can
identify itself, in our case proving that it is indeed starttls.se. The
method uses public-key cryptography. The web server gives us a
certificate. The certificate contains the server’s public key, a valid-to
date, the domain name. It is digitally signed with the private key of
a certificate authority.
To validate the certificate the web browser must:
1. Check that the certificate contains the correct domain name,
starttls.se in our case
2.Check the valid-to date of the certificate
3.Check the signature against a list of public keys from
­certificate authorities it trusts
If the certificate is valid the web browser knows the server’s public
key. Now the server has to prove that it has the corresponding
private key. If it passes this last test the web browser knows it has
connected to the right server.
5
So we have a system to create secure connections, what it the
problem?
Well there are actually quite a few problems. I will describe
what I think is the worst problem.
Using this system it costs extra to communicate securely, since
you have to pay good money to a certificate authority to buy the
certificate.
The cost in itself is not the worst problem. The worst problem is that quite a few people wanted to communicate somewhat
securely without paying extra for the certificate. They deemed
DNS and routing to be secure enough, but wanted some protection
against eavesdropping. To cater to these people developers of web
browsers and other software made a certificate error into a warning
message that can be dismissed by a user.
Users are very good at dismissing warning messages.
Basically the system relies on users being educated enough to
know when it is safe to dismiss a security warning, and when the
warning is important. In practice this makes the system much less
secure, since few users are interested enough in the intricacies of
certificate warnings.
However there are a golden lining to this story. Obviously there
is a great demand for a lower cost, and simpler, system that can
enable secure connections.
DNS-SEC
Fortunately things have progressed since the TLS protocol was
designed. One of the key problems was that we could not trust in
the ­information provided by DNS. That is no longer the case. The
new DNS-SEC standard makes the DNS system secure. DNS-SEC itself
uses public key encryption to ensure that information in the DNS
system is correct. Using DNS-SEC we can trust the IP-address we get
from DNS.
However we still need to ensure that we actually connected to
the right server. An attacker can still change the routing, so we end
up connecting to another computer.
The good news is that we can use DNS-SEC for that part as well.
We just need to put information into DNS to identify the servers
public key.
Connecting to a server would then look like this:
1. Get the IP-address and information about the public key for
­starttls.se from DNS using ­DNS-SEC
2.Connect to port 443 on the IP-address using TCP
3.Use the information from DNS to verify the public key of the
server
6
Warning messages
It is imperative that we do not repeat the mistakes of the past. Just
using DNS-SEC to replace certificates with information in DNS is not
enough. We must make it into an connection error if we cannot
verify the server. The connection must fail with an error message.
When you configure a server to use TLS@DNS-SEC that should
mean that it is not allowed to connect to it using any non-secure
method.
Outsourcing secure services
At the moment it is easy for one web server to handle any number
of web sites that use unsecure HTTP. It only needs one IP address to
do so. However to use HTTPS we need one IP address per certificate.
This makes it harder, and more expensive, to run secure web sites
for lots of customers.
Once we put the information used to verify a server in DNS
we will get rid of these limits. It suddenly becomes as easy to run a
HTTPs site as a HTTP site. It should become standard for web hotels
to offer HTTPS as well as HTTP.
STARTTLS for HTTP
One problem with the move to TLS@DNS-SEC is that old web browsers will still need a valid certificate, or they will show a warning
message. That might mean that we must wait to take full advantage
of TLS@DNS-SEC until all web browsers support the new standard.
But we can kick-start the process. The first secure implementation of most Internet protocols used a separate protocol, just like
HTTP and HTTPS. We got SMTPS for secure SMTP and IMAPS for
secure IMAP. Later they incorporated secure connections as an
extension to the old protocol. Now it is possible to run secure SMTP
and IMAP by using the command STARTTLS. We no longer require
two different protocols, nor do you need to run a secure service on a
separate port.
Unfortunatly HTTP is not as suited for a STARTTLS command.
Therefore we are stuck with HTTP and HTTPS. However implementing STARTTLS for HTTP would be one way to kick-start TLS@DNS-SEC.
A web browser that supports TLS@DNS-SEC could check in DNS
for information whether the web server supports STARTTLS. If the
web server supports STARTTLS the web browser cerates a secure connection to the web server. Thus creating a secure connection using
HTTP and STARTTLS, not HTTPS.
Users who use an old web browser would still get an unsecure
HTTP connection. But they would not get any warning messages.
Thus using STARTTLS with HTTP would be the preferred way for
7
SRV-records
everyone who wants to provide better security but do not want to
buy a certificate, or get the extra IP address that is needed.
Users who want to take advantage of the improved security
of such a web site would just have to change to a web browser that
supports TLS@DNS-SEC and STARTTLS for HTTP.
Forward security
As I describe in the Cryptography primer in Appendix A the DiffieHellman algorithm can be used to create a session key, that is safe
even from someone who has access to our private key. Unfortunaly
using Diffie-Hellman key exchange is not mandatory in TLS, though
it is available as an extension. For new services we must ensure that
this extension is mandatory, and that it is a connection failure to
connect without using a Diffie-Hellman key exchange.
Three party handshake
Using TLS we will always expose the public key of the server we connect to. This is usually not a bit deal, since we are already disclosing
as much information simply by making the connection. If you
connect to the web server that handles https://starttls.se it doesn’t
really matter that you also get the public key.
Later in this paper we are going to use TLS to connect between
actual users. If we use a normal TLS handshake we will expose the
identity of the user we connect to. To avoid this we will need to support a three-party handshake, with these three parties:
K
I will describe a few new services in this document. However I do
not want to create new domain names, like u
­ serlookup.­starttls.­se,
servicebroker.starttls.se, to run them. Instead I think just using the
short starttls.se would do just fine.
Fortunately there is already a solution to this, the SRV
­DNS-record. The SRV record lets us address a service, rather than
a computer. Using it we don’t need to have domain names like
u
­ serlookup.starttls.se or servicebroker.starttls.se. Instead we can
use the userlookup or servicebroker service for starttls.se, and run
them on separate computers if we want to. Unfortunately SRVrecords has not seen widespread ­adaption.
One of the advantages of creating a new standard is that we can
make use of SRV-records mandatory. Since we have SRV records we
do not need to assign any new port number for the services. Instead
we should always find each service using a SRV-record.
I would even go further and put some configuration
­information about the services in DNS. Most of the services I describe
use HTTP, it would be really nice to put the configuration of the path
of the service in DNS. That would make it possible to run this new
service on an existing web service. For ­example, on one web server
the userlookup service might run at:
https://starttls.se/find-user.php?email=[user email]
While another stand-alone user metadata server could use
1. The service
2.The user we are connecting to
3.The user that are connecting
https://userdb.starttls.se/[user email]
At first we do a normal TLS handshake to the service, using its public
key. This public key will be exposed to anyone eavesdropping on the
conversation.
After we have done this TLS handshake we have a encrypted
connection. Within this connection we will do a second handshake.
This time using the public key of the user we are connecting to. In
this second handshake the user that are connecting will also provide
a client certificate, to identify herself.
Note that this handshake is not enough to make the connection
anonymous to anyone doing large-scale surveillance of the network.
For true protection against someone finding out who we are communicating with it is necessary to use an anonymous network like
TOR.
8
9
Finding a user’s public
key
ia
t
t
ma
ar
st
s@
? @
?
?
tt
?
@ ?
? ?
? ?
?
se
.
ls
?
?
? @
?
?
?
?
?
? @ ? ? ?
?
?
?
?
? @ ? ? ?
In the last chapter described how TLS creates secure connections,
using public-key cryptography. We would like to use the same
cryptography to authenticate users. That way we can replace all
our passwords with just one public-private key pair. The problem is
finding a user’s public key. We need an electronic phone book with
public keys rather than phone numbers.
I suggest that we associate a public key with an email address.
The advantage is that we can keep using email addresses in the
user interface, while the public keys and associated information
is hidden from view. Just as we only need to remember the
domain name, not a public key, to reach a secure web site.
However if we turn the email address into a universal
password we get another problem — it is the owner of
the domain that has control over email addresses. She
can at any time reassign an email address. If we are
not careful the owner of the domain may be able to
impersonate anyone who uses the domain.
This is the right behavior for a company. If an
employee leaves a company the company should
still be able to use accounts that has been created
with her corporate email address.
But this is not right for private email. If I change network provider I might lose my email address. However that
does not mean that a new customer, that gets my old email
address, should be able to impersonate me or access accounts that
I created. The system administrator at my email provider should
definitely not be able to impersonate me.
11
Store both email ­address and public key
To solve the problem that a user can lose her email address we must
ensure that the user is always in control of her own private key. Then
it becomes impossible for the email provider to impersonate a user.
But it will still be possible for the email provider to change the
public key associated with an email address.
Therefore we must always store both the
public key and the email address the first time
we encounter a user. If the public key changes we
should treat the email address as belonging to a new
user, and not give access to old accounts.
As users we do not want to worry about the
public key. For us it is hard enough to remember email
addresses. So in the user interface of an application we
only use the email address. But the application itself must
store both the email address and the public key.
Each computer and device the user uses has its own publicprivate key pair, and a certificate to prove that it is valid. If a device
is lost, or a private key is stolen, the master private key can be used
to ­create a revocation message. This message simply states that the
lost or stolen key is no longer valid.
Key management application
A system like this needs a a user friendly key management application. The key ­management application creates the master publicprivate keypair, as well as the certificates for other devices.
Each device creates its own public-private keypair. The public
key is then retreived from the device and opened in the key management application. The key management application creates a
certificate, signing it with the private master key. Finally the certificate is stored on the device, thereby giving it authorization to login
User key management
Having just one private-public key pair is not enough. As with
physical keys private keys can be lost, stolen or copied. Besides
many users use several computing devices — ­perhaps a stationary computer, a laptop and a smartphone. You might need to
login from every device, but having many copies of your private key
makes it less secure.
A private key can be protected with a password. But can we
expect users to appreciate typing in a secure password every time
they want to visit a website from their smartphone?
Therefore it would be nice if you could have several keys, and
disable the key on the smartphone if it is stolen.
This can be achieved with certificates. Unlike the certificates
used in TLS these certificates are issued by the user herself. The user
has a master public-private key pair. This master key pair is only
used to create certificates that verify other key pairs. The master
private key is rarely used and can therefore be stored in a safe way.
It need not even be stored on a computer, but could be stored on a
USB-memory or even on paper.
12
as the user. For safety a copy of the certificate in kept by the key
management application.
If the user loses a device, or suspect that someone has been
able to get the private key from a device, the key management
application can revoke the certificate for that device. This can be
uploaded automatically to the public key catalog service
13
Design of the system
Initial upload of public master key
The system should provide the following information, as well as an
API for updating it:
To change any of the files a user needs to authenticate herself using
her private master key. But to do that the service must know the
public master key. How do we upload the public master key the first
time?
The answer is that we must use another system for the inital
upload of the public master key. The exact details will differ between
implementations.
1. The user’s current public master key
2.The user’s current settings file
3.A revocation list of certificates that are no longer valid
4.A signed chain of old public master keys
Each file, except the public master key, needs to be signed to prove
that the information comes from the user.
The settings file has many uses. In this paper we will use it to
provide the address to the user’s service broker, described in a later
chapter. Note that neither the settings file nor any other file contains
personal information. The purpose is to provide more functionallity
to an email address, not to reveal information about the person the
email address belongs to.
The revocation list is used to handle situations where you lose
a private key, perhaps because a mobile phone is stolen. The revocation list contains certificates that are no longer valid.
The fourth file is necessary to change the master key. Remember, a change of master key might look as if someone has highjacked
the email address and is trying to impersonate the user. To show
that a change is legitimate we need to sign the new public master
key with the old private master key. If a user changes master key
several times we get a chain of such signatures, proving that each
change was legitimate.
Implementation
The service is rather simple, all that is necessary is to provide a
directory with some files. I suggest that we use the HTTP protocol,
and the HTTP GET method to fetch the directory, and the files. The
directory is returned as a XML document with the same schema as
used by the PROPFIND method of WEB-DAV. Each file described above
has its own fixed file name.
We do of course use TLS@DNS-SEC to provide a secure connection to the service.
The user should be able to update the information. In order to
do so she must authenticate herself with a client certificate in the
TLS connection, using her private master key. Then she can change
any of the files using HTTP PUT, and delete files with HTTP DELETE.
The service must also ensure that the files the user uploads has been
correctly signed.
We find the service using SRV-records, as described in the
previous chapter.
14
The settings file
The settings file is an XML document that can be used to personalize
settings for this user. We will use it to store the address of the user’s
service broker. However the settings file is intended as a general
way to implement per user settings. It could for example be used to
provide a PGP key for sending encrypted emails. Email clients that
support reading the settings file could then automatically encrypted
messages using PGP to recipients who use PGP.
International characters
All files and protocols described here should use UTF-8 to encode
international characters. Furthermore all domain name and
email addresses should be encoded in readable form. The domain
name linköping.se should be encoded as linköping.se, not
­xn--linkping-q4a.se.
At the moment international characters in domain names
are causing problems. It is unclear in which context you can
use a readable domain name, and in which context you have
to use the encoded form. As an example, sometimes the URL
­http://­linköping.­se works. Other applications do not support it and
you have to use http://xn--linkping-q4a.se.
Therefore it is very important to specify that new services must
support the readable form of domain names. The domain name
should always be stored in this form. An application should encode
the domain name just before asking a DNS query. It should also
decode the domain name to human readable form just after receiving an answer from DNS. That way there will not be any leakage of
encoded domain names for users to worry about.
15
Open
access
control
My dear friends
Alice
[email protected]
Bob
[email protected]
Clyde
[email protected]
Member of Book Club
Alice
[email protected]
16
Most cloud-based services have their own access control system.
That works very well within a service. But if makes it much harder to
get services that cooperate — it is not always easy to integrate access
control systems. With an open access control system it would be
very easy to perform such integrations.
The basis for this access control system is that the user identifies herself using her private key. The service then checks whether
she has ­permissions to connect. We do not rely on passwords, we
only need to know someones email address to grant access to her.
The simplest version of the open access control system is that
each user maintains a list of friends, with their email address and
public key. It is important that we do not forget to store the public
key, to be safe in case the email address is highjacked. If the public
key has changed we should not grant access until we verify that it
has changed for a legitimate reason. Perhaps your friend’s harddisk
crashed.
However for larger organizations this is not sufficient. For
example, it is easier to publish pictures to every member of the book
club if someone keeps an ­up-to-date membership list. We want to
handle groups of users, and make it easy to grant permission to
every member of a group.
One simple way to do this is to define a file-format for such a
group, and then make it easy to share and perhaps automatically
update such group membership files.
17
Group service
A more scalable solution is to create a service that handles groups.
Such a group needs its own public-private key pair. Therefore the
group service needs the same functionality as the user service,
described in the previous chapter.
The groups service also needs the following new operations:
1. List the group membership
2.Create a membership certificate
3. List URLs that the group has access to
4.List groups this group is a member of
5.Receive updates of URLs that the group has access to
6.Receive updates of group memberships
Groups can have either an open or closed membership list. A group
with an open membership list lets members and services see who is
a member. A group with a closed membership list does not disclose
who is a member.
How you login with a group membership differs depending on
whether the group has a open or closed membership list. For a list
with an open membership list you just login as yourself. The service
can check your membership against the open list. For groups with
a closed membership list the service cannot find out that you are
a member. Instead you have to prove that you are a member, with
a signed membership certificate. One of the group services is to
provide such membership certificates.
In some cases you may want to login to a service anonymously,
without disclosing your email address. To handle those cases the
group service can issue an anonymous member certificate.
As you can see the user is responsible for choosing how to login.
If she does not login using the correct group membership certificate
she may be denied access. How will the user know how to login?
Each group must maintain a list of URLs that the group has
access to. This list is always available to members. Using this list the
user’s applications can determine how to login.
To maintain this list the group service can be informed when
it has been granted access to an URL, or when the access has been
revoked.
Group address
To make it simple to use groups each group must have a human
readable address. I suggest that we use email addresses for group
addresses. Then a group will look like a mailing list. Though we do
not have to support sending email to group members.
18
Using an email address as the group address works well for
groups maintained by an organization. However it would be nice
for a user to be able to defined groups herself. Then we need an
adress that contains both an email address as well as a group name.
One way to encode such a group address would be group@user@
domain.
Such user-owned groups must be implemented using the
service broker that is described in the next chapter.
My dear friends
Groups that are member of other groups
Alice
[email protected]
Bob
[email protected]
Clyde
[email protected]
Members of
National Club
Football Club
National Club
Book club
Just having simple groups is not powerful enough
to model access control for larger
organizations. We need to have
groups that belong to other groups.
Football
If these groups have closed memClub
bership lists it will be a bit compliMember one
[email protected]
cated to login:
Lets say the user is a member
Member two
[email protected]
of the group “Book club”. The
group “Book club” is in turn a
member of the group “National
Club”. The group “National Club”
Book Club
has permission to use a service.
Member three
To gain access to the service
[email protected]
the user must prove that she is a
Member four
[email protected]
member of the group “National
Club”. However she cannot get a
membership ­certificate directly,
instead she must first get a membership certificate for the group “Book club”.
Using this certificate she can connect the group service for
the group “National Club”, and get a certificate that proves she is an
indirect member of the group “National Club”. Using this certificate
she can connect to the service.
For this to work it is of course necessary for the user to know
that the group “Book club” is a member of the group “National
Cub”. Therefore each group must maintain a list of which groups it
is a member of. To maintain the list the group service need to be
informed when the group has been added as a member of another
group, or when the membership has been revoked.
It is as important to inform users about group memberships,
Therefore we also need a service that can be used to inform a user
about new or revoked group memberships. Such a service for users
should be implemented using the service broker described in the
next chapter.
19
I am Bob’s
laptop.
Service broker
I am
Alice’s
service
broker!
Where is
Alice’s photo
album?
It is here:
I am
Clyde’s
phone.
We have a way to find a user’s public key, as well as a signed settings
file. Does that mean that we can create secure end-to-end services
between users?
Yes we do. Just put the address to a service in the settings file.
Then use public-key cryptography to ensure that we connect to the
right user.
One catch though — the information in the settings file is
public. Everyone can see which services you provide.
I would like to be able to provide a service to a few friends,
without having to tell anyone else that the service even exists.
My solution to this problem is to create a service broker service.
You connect and identify yourself to my service broker, and it tells
you about the services I provide, as well as how to connect to each
service. But if you are not on my friends list the service broker will
not give you any information at all.
Best of all, if we design the service ­broker well, it will
be very easy to create new services. Thus providing an
infrastructure for anyone who has a good idea and
wants to create a new application that lets users
communicate and share data.
I do not
know you.
Go away!
21
Using the service broker
URL to named services
The service broker should perform four functions:
The URL needs to contain the following information:
1. Provide information about unnamed services
2.Provide information about named services
3.Act as a mailbox for services that are not online at the
moment
4.Act as a gateway to connect to services
Unnamed services are services that a user can only have one copy of,
or that has a natural default. Examples would be email or instance
messaging. To find such a service you only search for the service id.
A service id should be a domain name or an URL. That way it is easy
to invent a new services without needing a central registry of service
ids. The only address that is needed to connect to such a service is
the user’s email address.
Named services are services that a user may run many copies
of. Examples are personal web sites, group chats, version control
repositories or remote file systems. To connect to a named service
we need a URL, that contains both the user’s email address as well as
a path to the service.
Some, perhaps most, services will be running on computers
or devices that are not turned on all the time. When the service is
offline we can let the service broker act as an answering machine.
The service broker will forward any messages sent when the service
comes online.
The fourth function is to provide a gateway to connect to an
actual service. Some services will be independent of the service
broker — you connect to them by creating a new encrypted TLSconnection to an address provided by the service broker. Other
services may be running on laptops, mobile devices and home computers that are behind firewalls and therefore impossible to connect
to directly. In such cases the service broker can tunnel a connection
to the service.
Revealing the IP address of a laptop or mobile device will
disclose information about your current location. Something you
may not always want to do. Using the service broker as a gateway is a
way to hide the IP address from the person that is connecting to the
service.
Note that this does not protect against someone doing
­surveillance of the network itself. To protect against that you need to
use an anonymous network like TOR.
22
1. The email address of the user
2.Path to the service record
3.Extra information to send on to the service itself
As an example, lets examine this URL:
sb://[email protected]/computers/loke/web:stattls/version/2.0/
index.jsp?d=starttls.se
[email protected] is my email address, that leads you to my
­service broker. /computers/loke/web is the path to the service
record. Before getting the service record we do not know anything
about the service, or how to interpret the rest of the URL.
This particular service record is for a web server, which we will
connect to using HTTPS. The rest of the URL is interpreted as a path
to fetch the object /starttls/version/2.0/index.jsp?d=starttls.se
from the web server.
For every service that can be used in a service record we need
to define exactly how the last part of the URL is to be interpreted.
A drawback of this system is that there is no way to know
which application you should use for a certain URL. The actual application depends on the service record — it is not enough to just look
at the URL. Therefor an application that implements service record
URLs needs to provide good error messages to guide the user in case
the URL points to a service record the application cannot handle.
Implementation of the service broker
The service broker uses the HTTP protocol. It has five functions:
1. List services
2.Fetch a service record
3.Connect to a service
4.Get a initial session key for connecting to a service
5.Send a message to a service
The service broker is not intended to run as a web application on
an existing web server. Rather it is intended to be implemented as
a stand-alone application that uses an embedded HTTP server. It is
important that it is easy to use the service broker, since it is intended
to be run by normal users not system administrators. The reason to
use the HTTP protocol is to use a well known protocol than invent a
new protocol.
23
The service records
A service record is a dynamically generated XML document containing the following information:
1. The domain name or URL that defines the service
2.The current status of the service
3.Information about how to connect to the service. The
­following are possible:
1. A domain name
2.A IP-number, port and key fingerprint
3.A TOR hidden service
4.Connection through the service broker
The service record does not have to be signed. Rather we make
sure that it is valid by authenticating the connection to the service
broker.
Connecting to the service broker
We get the address of the service broker from the user´s settings file.
The address is either a URL, in which case we use TLS@DNS-SEC to
connect. Or the address is a TOR hidden service, for users who want
to be anonymous.
Usually both the service broker and the user that connects
identify themselves using their public keys. However we do not want
to reveal the identity of either user to an eavesdropper. Therefor we
need to use the three party TLS-handshake, as described in the
TLS@DNS-SEC chapter.
The assumption is that the service broker is used for services
which are only available to a limited audience. Therefore the user
that connects needs to identify herself. However it is possible to use
it for anonymous services as well. In that case the use that connects
does not have to identify herself. Though existing Internet protocols,
such as HTTP, is probably sufficient to build anonymous services.
24
25
Distributed database
With the service broker we can build applications with secure endto-end communication between two users. But how do we build
larger systems?
Cloud-based services run on a large number of computers, hosted at various data centers. However even more
computers are used to run the web browsers that are
needed to access a cloud-based service. We have to figure
out how we can run a service on these computers, without
needing the servers.
The web has been hugely successful for developing
applications. Our goal is to make it as simple to develop
services running on the new infrastructure, as it is to
develop a web application today. So lets look at the design
of a web service. It is built using three layers:
1. Web browser layer, which uses HTML and JavaScript to run
the user interface
2.Web application layer, where is programmed in a language
such as PHP, Python, Java or .NET, and run on a web server
3.SQL database that stores the application data
The first two layers are simple to use with another architecture.
Instead of connecting to
a remote web server we
could just run a local web
server on our computer.
However the third layer
is more complicated. A SQL
database needs to be centralized. That means that only one
computer could be used to run the
SQL database. If the user who runs
27
the SQL database loses her network connection the service will be
down.
That may be acceptable for a small service that is used by
a few friends. But the solution does not scale. Once the group
that uses the service grows to hundreds or thousands of
users one computer will not be enough to run the database.
One reason why we cannot run the SQL database
on several computers is that SQL databases are based on
transactional integrity. That basically means that every
change to the SQL database is done in a certain order.
Alice and Bob may attempt to change information in the
database at exactly the same time. However the database
will always decide that either Alice and Bob was first, and
perform the changes in order.
Lets say we have a project management application and
Alice and Bob both decide to change the name of the project
“Internet stuff” at the same time. The database decides
to perform Alice change first. She changes the name to
“Important Internet stuff”. Then the database tries to
perform Bobs change. Bob tried to change the name
from “Internet stuff” to “Internet things”. However
the database will detect that the name is being
changed by Alice. Therefore Bobs transaction will
fail — he must review the new name before he can
attempt to change it.
NoSQL
Fortunately SQL databases are not the only way to
store data. Cloud-based services that scale to hundreds of millions of users have found SQL databases
to be too limited. It is simply too hard to build a SQL
database that is large enough to host all data for such a
service.
Therefore a number of NoSQL databases has been created,
that solves the problem of running databases on many computers. NoSQL databases are scalable, meaning that it is always
possible to store more data, or handle more users, by running
the database on more computers.
One of the NoSQL databases, CouchDB, is very
­interesting for our purpose. CouchDB does not have centralized transactions, such as a SQL database. That makes
it much easier to run the database on several computers.
However it also means that the data may not always be in
sync.
28
Lets say Alice and Bob is using a project ­management
application build using CouchDB. Alice is in California while
Bob is in Sweden. Therefore Alice is ­connected to the US database server while Bob is connected to the European database
server. Now when they try to change the name of the project
both Alice and Bob succeeds.
For a little while the project will be known as “Important
Internet stuff” in the US and “Internet things” in Europe.
However the database servers are in contact with each other they will soon discover that they both contain changes to the
same name. Then the databases will choose to keep only one
of the changes. Lets say Alice wins again, and the project will be
named “Important Internet stuff” in Europe as well as in the US.
The database still remembers the conflict, and how it was
resolved. It also remembers that the project might have been
named “Internet things”, if it had decided to keep Bobs
change. Bob is ­notified that his change was retroactively
overruled. If he still insists his name was the best one he
can change it again.
The difference between transactions in SQL and
CouchDB is that in SQL all conflicts are resolved before
the change is perfomed. In CouchDB the conflict is
resolved after the change is performed.
The SQL approach is simpler to use, since the database never needs to be changed retro­actively. However
this simplicity comes at the cost of not being able to scale.
Since we need a distributed database we have to accept the
more complex retroactive conflict resolution. We must be
aware that data can change retroactively, and build applications that notifies users when this happens.
Distributed database service
Using a NoSQL database such as CouchDB we can create the
foundation for running large applications on top of user’s computers. Just as the bittorrent protocol can be used to share large
files between users, a NoSQL database can be used to share
structured data suitable for more complex ­applications that
file sharing.
29
Distributed application
Now we have a database that can be used to build a distributed
application. Is that enough?
It could be enough. We can certainly build an app that uses the
Service Broker protocol to find a NoSQL database. However we have
not yet reached our goal to make it as simple to build an application
as it is to build a web application today.
One of the great aspects of a web application is that users do not
have to install it. Since the application is run in one central ­location
changes to the application will be visible to users ­immediately.
However we want to get rid of the web server. Therefore we
must distribute the application. By ­definition this means that
users need to install it on their ­computer. We want to make
this process as easy as possible. Perhaps we should even let
the application update itself automatically. Then it would act
almost as a web ­application. As soon as the developer updates
the application the users get get the new version.
That would be great. However there is one problem. It
is not always safe to install and run applications.
When the developer updates a web application the
update only affects her web server. If she makes mistakes
it is only the web server that is vulnerable.
But if the application is downloaded to other
users their computers become vulnerable. Therefore
we need to verify that an ­application does not contain malicious code, or dangerous bugs, before we
distribute it. This means that normal application
development can never be as fast and convenient as
web development.
But wait a minute. A large part of a web
­application consists of HTML and JavaScript that run
on the user’s computer. How come it is safe to download JavaScript to a web browser, but not applications?
31
Sandbox
The reason is that the JavaScript is executed in a sandbox in the web
browser. The sandbox provides a very limited ­environment with
strict rules. As an example, a JavaScript program running in the
web browser cannot read files from the computers hard drive. Thus
JavaScript cannot be used to snoop around on your computer.
So it is possible to run part of a web application in a sandbox.
Does this mean that we could run the whole application in the
sandbox?
Actually it does. Instead of just writing part of the application
in JavaScript we could write the whole application in JavaScript, and
run it using a suitable sandbox. That way it would be safe for a user
to download the applications Which would mean that we can treat it
as a web application, and download new versions automatically.
API
To build such an application we need a suitable JavaScript API. We
should start with the normal APIs available in HTML5, and extend it
with the following functions:
1. Access data stored in the NoSQL database
2.Get events when data is changed in the NoSQL database
We need to use access control to limit who gets access to the
data stored in the NoSQL database. However that access control
functionality can be kept outside of this API. The application itself
need not be aware of the access control, it is happy just reading and
writing information to the NoSQL database. Some other part of the
system handles sharing information with other users.
The application runs in a special web browser that implements
the new API. The special web browser handles connections to other
users’ Service Brokers, as well as access control.
Getting there
At first there will be a catch 22 situation. In order to develop an
application using this new API there needs to be users that can use
it. But it is quite a hassle to create private-public keys pairs, learning
about the key management and setting up a personal service broker.
There needs to be good applications before someone goes through
all that.
Is there any way to jump start development of such
­applications?
Well the JavaScript API describes above does not invole any
public keys or service brokers. The special web browser handles all
32
that. The application can actually run in a different environment, as
long as it has access to the new JavaScipt API.
One way to run such an application is to run it on a web server,
as a normal web application. In such an deployment the JavaScript
API is implemented using AJAX. The NoSQL database runs on the web
server.
The new API makes it possible to build applications that can be
used both on todays Internet, as well as with the new infrastructure
described in this white paper. It is even be possible to run the same
application both as a web application, and using the new infrastructure, at the same time. In such a hybrid deployment some users
use a normal web browser and access the application via the web.
Other users use the new special web browser, and connect through a
service broker.
33
Appendix A —
­Cryptography primer
0101
1110
1010
1001
0101
1110
1010
1001
Symmetric-key ciphers
Symmetric-key ciphers are the work horse of any system using
cryptography. A good way to understand a symmetric-key cipher is
to imagine a deck or cards. By shuffling the deck the cards will be in
random order, even if the deck was ordered when you started. Thus
every time you plat a card game it will be different, you cannot rely
on the ace of spades always being the fifth card drawn.
Now imagine you could remember exactly how you shuffled the
card. Then you could redo the shuffle, but backwards, and get the
deck back to its original order.
This is basically how a symmetric-key cipher works, even
though it rearranges bits in a binary message rather than cards in
a deck. It uses mathematical operations to shuffle the message into
what looks like a random mess.
Now in order to remember how to shuffle, or encrypt, a
­message a computer cipher uses a binary key. The key controls
which mathematical operations to apply to the message. Using a
­different key will create a totally different random mess. Only by
using the same key is it possible to get back the original message.
The key itself is just a binary number, all numbers are valid.
The size of the key determines how many different combinations of encrypted messages we can create. With an 8 bit key we can
only create 256 different encrypted messages. Even if the encrypted
message looked like a random mess, it would be easy to try every
combination. Therefore someone could find the original message
without knowing the key.
To be safe we use a key that is at least 128 bits. That means
that there are 2^128, or 340,282,366,920,938,463,463,374,607,43
1,768,211,456, possible combinations. A 40, 56 or 64 bit key is not
35
c­ onsidered safe. Such a message could be decrypted today by trying
all ­combinations.
Symmetric-key ciphers are very good for protecting data you
want to store. You provide a password that is turned into a binary
number. That number is uses as a key to encrypt a file, or perhaps
your entire hard disk. As long as you choose a password that is not
easy to guess your data should be safe. Perhaps even from you —
always make sure you remember your encryption password. If you
use a good encryption program there is simply no way to get the
data back if you forget the password.
1100
0011
0101
1010
0101
1110
1010
1001
Public-key cipher
A symmetric-key cipher would also work for communication,
­providing you share a good password with everyone you want
to communicate with. That is right, for everyone you want to
­communicate with you would need a separate password. And you
would also have to come up with that password in a safe way, so that
no one can intercept the password, since everyone that knows the
password can eavesdrop on the communication. Basically you would
have to meet and come up with the password in a safe place where
you cannot be overheard.
Thus only using symmetric-key encryption is very good for
plots for spy novels, but not really useful for securing Internet
­communication. I am not sure if e-commerce would have taken off,
if you first had to go to the local Amazon office to come up with a
safe password to be able to communicate securely Amazon’s web
site...
36
One solution to the communication problem is to use a public-key
cipher. A public-key cipher does not use a single key, but a key pair.
Unlike a symmetric-key it is not possible to use any binary number as
a key. Instead the key pair has to be generated, using random data as
input. When the key pair has been generated you get two keys, lets call
them A and B. Now the remarkable thing is that a ­message encrypted
with the key A can only be decrypted with key B. A ­message encrypted
with the key B can only be decrypted with the key A.
Since we use random input every time we generate a key-pair
we get a different one. When you create your own key pair you can
be certain that it is unique. Due to this process it is not possible to
turn a password into a public-key. Instead of remembering a password you need to store the key-pair as a computer file.
Now using a key-pair, rather than just a straightforward
key seems a bit complicated. Why go through all the hassle with
­computer files instead of passwords?
As the name may imply the power of a public-key cipher comes
to light when you use one of the keys as a public key and the other as
a private key. You give the public key to anyone that might want to
contact you. You make sure only you have access to the private key.
Now what happens if someone encrypts a message with your
public key? First of all, since the message cannot be decrypted using
the same key you do not have to worry about the other people that
also has your public key. They cannot decrypt the message. Actually
the message can only be decrypted with the other key in the keypair — your private key.
37
This property is used for digital
signatures. Instead of a handwritten
signature you use your private key to
sign a document. It can also be used as
a login mechanism instead of a password. As we will see in the next chapter
it can also be used to verify that we are
connecting to the web server.
The most used algorithm for
­public-key ciphers is RSA. Since the
mathematics behind RSA is different
than for a symmetric key the keys must
be much longer to be secure. Unlike
symmetrical ciphers where we can use any
binary number as a key only a small portion
of all binary numbers are valid RSA keys. Therefore a RSA-key neesd
to be 2048 bits to be considered to be secure. It is important to note
that this is only necessary for RSA — a 2048-bit symmetrical cipher
key is just wasteful. Such a cipher would be very slow compared to a
128 cipher or 256-bit cipher, without adding any security.
We have a way to encrypt messages to you, that only you can
decrypt!
The spy-novel meeting where you need to determine a secure
password in privacy is no longer necessary. Instead you can just
exchange public keys in a public setting, it doesn’t matter if someone
else sees them. Actually it is even better that way, the more people
who know your public key the better.
Remember that each key in the key-pair can decrypt a message
encrypted by the other key. What happens if you encrypt a message
with your private key?
At first glance that may seem useless. The message can be
decrypted by your public key. What use is that? Perhaps you could
use it to send to a message to all your friends, but wouldn’t that
mean that they would have to be very careful about handling your
public key? Are we back at the spy-novel meeting again?
Actually it is not a useful way to encrypt a message that can
only be decrypted by your friends. As the name suggests a public key
should be, well public. We should assume that anyone can decrypt
the message.
However if they succeed in decrypting a message with your
public key they learn one important thing — that the message has
been encrypted by your private key. Since you are the only one
who has access to your private key, the message must have been
encrypted by you.
38
Forward security with Diffie-Hellman
Now we have public-key ciphers, does that mean that our communication is secure? Well if we use public-key ciphers for communication we need the private key to decrypt a message. So as long as we
keep the private key safe we are ok.
By the way, if someone gets hold of the private key at some
point in the future they will be able to decrypt the message. So we
need to be very careful with that private key...
We now live in the petabyte age — it is certainly feasible for someone to store all your communication. Perhaps you just have to be
grateful for the short time you could hide behind your private key.
Or perhaps not. There is actually a algorithm that lets two people
create a secret number, in public. The algorithm is called Diffie-­
Hellman.
39
a
b
Lets say that Alice and Bob wants to create a secret ­number. Alice
starts by creating a random number a, while Bob creates another
random number b.
a
K=B mod p
K=Ab mod p
Alice does a computation using her secret random number a as well
as the number B she got from Bob. The result is the number K. Bob
does some computations using his secret random number b as well
as the number A he got from Alice. The result is the number K
A=ga mod p
A!
Alice than does some computation with her random number and
tells everyone the result A.
B=gbmod p
B!
K
That is right both Alice and Bob got the same number K. However to
compute it it was necessary to have either the secret random number
a or b. Only Alice or Bob knows one of these two numbers. Even
if someone did overhear the public numbers, A and B she cannot
compute K without either a or b.
Usually we do not use Diffie-Hellman in a public setting. But if
there are eavesdroppers we can be safe in the knowledge that they
cannot get the secret number K, even if they have our private keys and
can listen in to the communication. It also makes it possible to really
forget. If we use K as our encryption key, it will not be possible for
anyone to decrypt the communication if we forget about a, b and K.
Bob does some computation with his random number and tells
everyone the result B.
40
41
Appendix B ­— FAQ
Do we have to rely on DNS, given that it seems to come under
­increased political control?
DNS is needed for us to keep using nice email-addresses and readable
URLs. I find it unlikely that it is possible to create a competing catalog service, that can scale to as many users as DNS, without coming
under any political control.
To replace DNS we will simply have to do without the nice email
addresses. Instead we would have to use some sort of web-of-trust,
where users share their public key with their friends. Most of the
services described in this paper would work in such a system, and
the JavaScript application will work without modification.
Does not OpenID already solve the problem with passwords?
OpenID does indeed let a user identify herself to a web site. However
there are many advantages with a public key based system. It makes
it much easier to build access control systems for example. And we
can use it in applications that are not web based.
Isn’t there already public key solutions in use?
Many countries have national systems that use public key crypt­
ography to identify citizens. However the system described in this
paper is not used to identify a person, but rather to identify an email
address.
This seems like a good idea. When can I use it?
That depends on you. To actually create the services described in
this paper will take some effort. If you want to help please contact us
at [email protected]
42
43
At startTLS, you can ­easily test if you can
email securely, or if your friends can send
and recieve email securely.
https://starttls.se/
This book is free — attribution cc by — which means it has some
rights reserved according to the Creative Commons License, see
URL below. Simply, it’s free to share and distribute.
http://creativecommons.org/licenses/by/3.0/legalcode