secure Internet How to — an end-to-end approach — M at t i a s W ings t ed t SP ons or ed BY Contents Copyright © 2010 by Mattias Wingstedt Illustrations Copyright © 2010 by Karin Holstensson Some rights reserved. This book is licensed under a Creative Commons Attribution: http://creativecommons.org/licenses/by/3.0/legalcode Version 1.0 Design by Peter Lindgren This first edition was finished and published December 2010, and is available online at https://starttls.se/documents/ Please point to this URL every time you quote or mention this book. This White Paper has been financed by .SE (Swedish foundation for Internet Infrastructure). .SE’s mission is to act for a positive development of Internet in Sweden. This project has been finances by .SE’s Internet fund. The Internet fund supports the development of Internet by financing independent projects. Introduction TLS@DNS-SEC DNS-SEC Warning messages Outsourcing secure services STARTTLS for HTTP Forward security Three party handshake SRV-records Finding a user’s public key Store both email address and public key User key management Key management application Design of the system Implementation Initial upload of public master key The settings file International characters Open access control Group service Group address Groups that are member of other groups Service broker Using the service broker URL to named services Implementation of the service broker The service records Connecting to the service broker Distributed database NoSQL Distributed database service Distributed application Sandbox API Getting there Appendix A — Cryptography primer Symmetric-key ciphers Public-key cipher Forward security with Diffie-Hellman Appendix B — FAQ 1 5 6 7 7 7 8 8 9 11 12 12 13 14 14 15 15 15 17 18 18 19 21 22 23 23 24 24 27 28 29 31 32 32 32 35 35 37 39 43 Introduction Internet is wonderful - the first truly global communication system with no long-distance charges. It is especially good for open information. Hyperlinks means that it doesn’t matter where, or how, the information resides. Search engines make it easy to find information. It is easy to combine services provided by different vendors. Standards make it easy to develop web applications, as well as mobile apps that get their intelligence from the cloud. It is easy to develop a first version, and then let the service scale as it attracts more users. A service that proves to be valuable can be run in the cloud and scale to hundreds of millions, or even billions, of users. It has never been as easy to create and deliver software. However the Internet it not as good at handling private information, or information that should only be shared within a limited number of people. To share such information we must often use a service that is its own self-contained world. We have to stick with the search engine provided — it is not possible to choose the best one. Each user must have an account on the same service — we acquire more logins and passwords all the time. It is hard to hyperlink between such services since we must first ensure that everyone who sees the link also have permission to access the linked page. We are also often forced to share the information with more people than we want. The information may be intercepted by people who use the same WiFi network as well as intelligence agencies that monitor traffic that passes borders. We must also trust the vendor that provides the service, and the employees that work there. This white paper looks at ways we can improve this situation. First we look at how to make it easier and cheaper to deploy TLS-encryption. If all cloud-based services used TLS-encryption we 1 would not have to worry about people on the same WiFi network, nor whether we are in a country with friendly intelligence agencies. We would only have to trust the organization running the service, no matter from which country in the world we use it. Then we will look at how public key cryptography can be used for user authentication, providing an alternative to passwords. This is extended with an open access control system that lets us authenticate groups as well as individual users. Using this access control system it becomes much easier to share information with a limited set of users. Our friends do not have to create user accounts on each cloud service – it is enough to belong to the same access control group. We can also use this authentication system to grant access to programs as well as users. That way we can let a search engine index private information. The search engine itself can then use the access control system to ensure that the search results are only shown to people that belong to the correct access control group. After looking at how to improve cloud-based services we look at how to build applications where users connect directly to each other. Putting the smarts in the end points, rather than in the cloud. This is called end-to-end security and is the gold standard for any solution to secure military or commercial communication. We look at how to make it easy to build end-to-end secure applications, and for users to host such services on their own computers. To run a service for our friends we must ensure that our computer stays on all the time, or the service will be unavailable. This is in stark contrast with a cloud-based service where professionals ensure that the service is available 24/7. To solve this problem we look at distributed NoSQL database. Using such a database for storage the service will be available even when some computers are turned off. It is enough that just one computer stays online. With a distributed database as a base we can start building peer-to-peer applications that are end-to-end secure. Instead of running the application in the cloud we run it on the users’ computers and smartphones. 2 Finally we look at how to make it easy to create such peer-to-peer applications, using a JavaScript API. Such an API makes it possible to run an application in different environments, the same way it is possible to use any web browser to view a web page. We aim at running such an application in two ways: 1. As a normal web application, running on a web browser and a web server 2.As an end-to-end secure peer-to-peer application, running in a special browser made for such applications Thus it is up to the user to decide how she wants to run the application. Some users want applications that are easy to install and run on their web site or intranet. Other users want something that is end-to-end secure. Using the JavaScript API both categories of users can use the same application. I hope to show that it is possible to build new Internet standards based on end-to-end secure communications, without having to sacrifice ease of use or rapid development of applications. In so doing we will also improve cloud-based services, making it easier to share information no matter where and how we decide to store it. Much of this paper concerns crypto graphy, since it is the basis of creating secure communication. Appendix A explains the cryptography terms used in the rest of the paper. If you have further questions please don’t hesitate to contact us at [email protected] 3 TLS@DNS-SEC The first problem to solve to communicate securely is to ensure that you really are connecting to the right person, or in the case of Internet services, the right server. Lets say we want to make a . secure connection to a website to fetch the URL https://starttls.se To connect to a web server we do two things: 1. Use DNS to find the IP address of starttls.se 2.Make a TCP connection to port 443 of the computer with that IP address sta rtt ls. 201 se 1-1 2-1 3 The problem facing the designers of the TLS protocol, the secure protocol used by HTTPS, is that neither of these two operations are secure. An attacker can change the IP number returned by DNS. Or she can change the routing so that our TCP connection actually reaches another computer. Therefore they created a method so that the web server can identify itself, in our case proving that it is indeed starttls.se. The method uses public-key cryptography. The web server gives us a certificate. The certificate contains the server’s public key, a valid-to date, the domain name. It is digitally signed with the private key of a certificate authority. To validate the certificate the web browser must: 1. Check that the certificate contains the correct domain name, starttls.se in our case 2.Check the valid-to date of the certificate 3.Check the signature against a list of public keys from certificate authorities it trusts If the certificate is valid the web browser knows the server’s public key. Now the server has to prove that it has the corresponding private key. If it passes this last test the web browser knows it has connected to the right server. 5 So we have a system to create secure connections, what it the problem? Well there are actually quite a few problems. I will describe what I think is the worst problem. Using this system it costs extra to communicate securely, since you have to pay good money to a certificate authority to buy the certificate. The cost in itself is not the worst problem. The worst problem is that quite a few people wanted to communicate somewhat securely without paying extra for the certificate. They deemed DNS and routing to be secure enough, but wanted some protection against eavesdropping. To cater to these people developers of web browsers and other software made a certificate error into a warning message that can be dismissed by a user. Users are very good at dismissing warning messages. Basically the system relies on users being educated enough to know when it is safe to dismiss a security warning, and when the warning is important. In practice this makes the system much less secure, since few users are interested enough in the intricacies of certificate warnings. However there are a golden lining to this story. Obviously there is a great demand for a lower cost, and simpler, system that can enable secure connections. DNS-SEC Fortunately things have progressed since the TLS protocol was designed. One of the key problems was that we could not trust in the information provided by DNS. That is no longer the case. The new DNS-SEC standard makes the DNS system secure. DNS-SEC itself uses public key encryption to ensure that information in the DNS system is correct. Using DNS-SEC we can trust the IP-address we get from DNS. However we still need to ensure that we actually connected to the right server. An attacker can still change the routing, so we end up connecting to another computer. The good news is that we can use DNS-SEC for that part as well. We just need to put information into DNS to identify the servers public key. Connecting to a server would then look like this: 1. Get the IP-address and information about the public key for starttls.se from DNS using DNS-SEC 2.Connect to port 443 on the IP-address using TCP 3.Use the information from DNS to verify the public key of the server 6 Warning messages It is imperative that we do not repeat the mistakes of the past. Just using DNS-SEC to replace certificates with information in DNS is not enough. We must make it into an connection error if we cannot verify the server. The connection must fail with an error message. When you configure a server to use TLS@DNS-SEC that should mean that it is not allowed to connect to it using any non-secure method. Outsourcing secure services At the moment it is easy for one web server to handle any number of web sites that use unsecure HTTP. It only needs one IP address to do so. However to use HTTPS we need one IP address per certificate. This makes it harder, and more expensive, to run secure web sites for lots of customers. Once we put the information used to verify a server in DNS we will get rid of these limits. It suddenly becomes as easy to run a HTTPs site as a HTTP site. It should become standard for web hotels to offer HTTPS as well as HTTP. STARTTLS for HTTP One problem with the move to TLS@DNS-SEC is that old web browsers will still need a valid certificate, or they will show a warning message. That might mean that we must wait to take full advantage of TLS@DNS-SEC until all web browsers support the new standard. But we can kick-start the process. The first secure implementation of most Internet protocols used a separate protocol, just like HTTP and HTTPS. We got SMTPS for secure SMTP and IMAPS for secure IMAP. Later they incorporated secure connections as an extension to the old protocol. Now it is possible to run secure SMTP and IMAP by using the command STARTTLS. We no longer require two different protocols, nor do you need to run a secure service on a separate port. Unfortunatly HTTP is not as suited for a STARTTLS command. Therefore we are stuck with HTTP and HTTPS. However implementing STARTTLS for HTTP would be one way to kick-start TLS@DNS-SEC. A web browser that supports TLS@DNS-SEC could check in DNS for information whether the web server supports STARTTLS. If the web server supports STARTTLS the web browser cerates a secure connection to the web server. Thus creating a secure connection using HTTP and STARTTLS, not HTTPS. Users who use an old web browser would still get an unsecure HTTP connection. But they would not get any warning messages. Thus using STARTTLS with HTTP would be the preferred way for 7 SRV-records everyone who wants to provide better security but do not want to buy a certificate, or get the extra IP address that is needed. Users who want to take advantage of the improved security of such a web site would just have to change to a web browser that supports TLS@DNS-SEC and STARTTLS for HTTP. Forward security As I describe in the Cryptography primer in Appendix A the DiffieHellman algorithm can be used to create a session key, that is safe even from someone who has access to our private key. Unfortunaly using Diffie-Hellman key exchange is not mandatory in TLS, though it is available as an extension. For new services we must ensure that this extension is mandatory, and that it is a connection failure to connect without using a Diffie-Hellman key exchange. Three party handshake Using TLS we will always expose the public key of the server we connect to. This is usually not a bit deal, since we are already disclosing as much information simply by making the connection. If you connect to the web server that handles https://starttls.se it doesn’t really matter that you also get the public key. Later in this paper we are going to use TLS to connect between actual users. If we use a normal TLS handshake we will expose the identity of the user we connect to. To avoid this we will need to support a three-party handshake, with these three parties: K I will describe a few new services in this document. However I do not want to create new domain names, like u serlookup.starttls.se, servicebroker.starttls.se, to run them. Instead I think just using the short starttls.se would do just fine. Fortunately there is already a solution to this, the SRV DNS-record. The SRV record lets us address a service, rather than a computer. Using it we don’t need to have domain names like u serlookup.starttls.se or servicebroker.starttls.se. Instead we can use the userlookup or servicebroker service for starttls.se, and run them on separate computers if we want to. Unfortunately SRVrecords has not seen widespread adaption. One of the advantages of creating a new standard is that we can make use of SRV-records mandatory. Since we have SRV records we do not need to assign any new port number for the services. Instead we should always find each service using a SRV-record. I would even go further and put some configuration information about the services in DNS. Most of the services I describe use HTTP, it would be really nice to put the configuration of the path of the service in DNS. That would make it possible to run this new service on an existing web service. For example, on one web server the userlookup service might run at: https://starttls.se/find-user.php?email=[user email] While another stand-alone user metadata server could use 1. The service 2.The user we are connecting to 3.The user that are connecting https://userdb.starttls.se/[user email] At first we do a normal TLS handshake to the service, using its public key. This public key will be exposed to anyone eavesdropping on the conversation. After we have done this TLS handshake we have a encrypted connection. Within this connection we will do a second handshake. This time using the public key of the user we are connecting to. In this second handshake the user that are connecting will also provide a client certificate, to identify herself. Note that this handshake is not enough to make the connection anonymous to anyone doing large-scale surveillance of the network. For true protection against someone finding out who we are communicating with it is necessary to use an anonymous network like TOR. 8 9 Finding a user’s public key ia t t ma ar st s@ ? @ ? ? tt ? @ ? ? ? ? ? ? se . ls ? ? ? @ ? ? ? ? ? ? @ ? ? ? ? ? ? ? ? @ ? ? ? In the last chapter described how TLS creates secure connections, using public-key cryptography. We would like to use the same cryptography to authenticate users. That way we can replace all our passwords with just one public-private key pair. The problem is finding a user’s public key. We need an electronic phone book with public keys rather than phone numbers. I suggest that we associate a public key with an email address. The advantage is that we can keep using email addresses in the user interface, while the public keys and associated information is hidden from view. Just as we only need to remember the domain name, not a public key, to reach a secure web site. However if we turn the email address into a universal password we get another problem — it is the owner of the domain that has control over email addresses. She can at any time reassign an email address. If we are not careful the owner of the domain may be able to impersonate anyone who uses the domain. This is the right behavior for a company. If an employee leaves a company the company should still be able to use accounts that has been created with her corporate email address. But this is not right for private email. If I change network provider I might lose my email address. However that does not mean that a new customer, that gets my old email address, should be able to impersonate me or access accounts that I created. The system administrator at my email provider should definitely not be able to impersonate me. 11 Store both email address and public key To solve the problem that a user can lose her email address we must ensure that the user is always in control of her own private key. Then it becomes impossible for the email provider to impersonate a user. But it will still be possible for the email provider to change the public key associated with an email address. Therefore we must always store both the public key and the email address the first time we encounter a user. If the public key changes we should treat the email address as belonging to a new user, and not give access to old accounts. As users we do not want to worry about the public key. For us it is hard enough to remember email addresses. So in the user interface of an application we only use the email address. But the application itself must store both the email address and the public key. Each computer and device the user uses has its own publicprivate key pair, and a certificate to prove that it is valid. If a device is lost, or a private key is stolen, the master private key can be used to create a revocation message. This message simply states that the lost or stolen key is no longer valid. Key management application A system like this needs a a user friendly key management application. The key management application creates the master publicprivate keypair, as well as the certificates for other devices. Each device creates its own public-private keypair. The public key is then retreived from the device and opened in the key management application. The key management application creates a certificate, signing it with the private master key. Finally the certificate is stored on the device, thereby giving it authorization to login User key management Having just one private-public key pair is not enough. As with physical keys private keys can be lost, stolen or copied. Besides many users use several computing devices — perhaps a stationary computer, a laptop and a smartphone. You might need to login from every device, but having many copies of your private key makes it less secure. A private key can be protected with a password. But can we expect users to appreciate typing in a secure password every time they want to visit a website from their smartphone? Therefore it would be nice if you could have several keys, and disable the key on the smartphone if it is stolen. This can be achieved with certificates. Unlike the certificates used in TLS these certificates are issued by the user herself. The user has a master public-private key pair. This master key pair is only used to create certificates that verify other key pairs. The master private key is rarely used and can therefore be stored in a safe way. It need not even be stored on a computer, but could be stored on a USB-memory or even on paper. 12 as the user. For safety a copy of the certificate in kept by the key management application. If the user loses a device, or suspect that someone has been able to get the private key from a device, the key management application can revoke the certificate for that device. This can be uploaded automatically to the public key catalog service 13 Design of the system Initial upload of public master key The system should provide the following information, as well as an API for updating it: To change any of the files a user needs to authenticate herself using her private master key. But to do that the service must know the public master key. How do we upload the public master key the first time? The answer is that we must use another system for the inital upload of the public master key. The exact details will differ between implementations. 1. The user’s current public master key 2.The user’s current settings file 3.A revocation list of certificates that are no longer valid 4.A signed chain of old public master keys Each file, except the public master key, needs to be signed to prove that the information comes from the user. The settings file has many uses. In this paper we will use it to provide the address to the user’s service broker, described in a later chapter. Note that neither the settings file nor any other file contains personal information. The purpose is to provide more functionallity to an email address, not to reveal information about the person the email address belongs to. The revocation list is used to handle situations where you lose a private key, perhaps because a mobile phone is stolen. The revocation list contains certificates that are no longer valid. The fourth file is necessary to change the master key. Remember, a change of master key might look as if someone has highjacked the email address and is trying to impersonate the user. To show that a change is legitimate we need to sign the new public master key with the old private master key. If a user changes master key several times we get a chain of such signatures, proving that each change was legitimate. Implementation The service is rather simple, all that is necessary is to provide a directory with some files. I suggest that we use the HTTP protocol, and the HTTP GET method to fetch the directory, and the files. The directory is returned as a XML document with the same schema as used by the PROPFIND method of WEB-DAV. Each file described above has its own fixed file name. We do of course use TLS@DNS-SEC to provide a secure connection to the service. The user should be able to update the information. In order to do so she must authenticate herself with a client certificate in the TLS connection, using her private master key. Then she can change any of the files using HTTP PUT, and delete files with HTTP DELETE. The service must also ensure that the files the user uploads has been correctly signed. We find the service using SRV-records, as described in the previous chapter. 14 The settings file The settings file is an XML document that can be used to personalize settings for this user. We will use it to store the address of the user’s service broker. However the settings file is intended as a general way to implement per user settings. It could for example be used to provide a PGP key for sending encrypted emails. Email clients that support reading the settings file could then automatically encrypted messages using PGP to recipients who use PGP. International characters All files and protocols described here should use UTF-8 to encode international characters. Furthermore all domain name and email addresses should be encoded in readable form. The domain name linköping.se should be encoded as linköping.se, not xn--linkping-q4a.se. At the moment international characters in domain names are causing problems. It is unclear in which context you can use a readable domain name, and in which context you have to use the encoded form. As an example, sometimes the URL http://linköping.se works. Other applications do not support it and you have to use http://xn--linkping-q4a.se. Therefore it is very important to specify that new services must support the readable form of domain names. The domain name should always be stored in this form. An application should encode the domain name just before asking a DNS query. It should also decode the domain name to human readable form just after receiving an answer from DNS. That way there will not be any leakage of encoded domain names for users to worry about. 15 Open access control My dear friends Alice [email protected] Bob [email protected] Clyde [email protected] Member of Book Club Alice [email protected] 16 Most cloud-based services have their own access control system. That works very well within a service. But if makes it much harder to get services that cooperate — it is not always easy to integrate access control systems. With an open access control system it would be very easy to perform such integrations. The basis for this access control system is that the user identifies herself using her private key. The service then checks whether she has permissions to connect. We do not rely on passwords, we only need to know someones email address to grant access to her. The simplest version of the open access control system is that each user maintains a list of friends, with their email address and public key. It is important that we do not forget to store the public key, to be safe in case the email address is highjacked. If the public key has changed we should not grant access until we verify that it has changed for a legitimate reason. Perhaps your friend’s harddisk crashed. However for larger organizations this is not sufficient. For example, it is easier to publish pictures to every member of the book club if someone keeps an up-to-date membership list. We want to handle groups of users, and make it easy to grant permission to every member of a group. One simple way to do this is to define a file-format for such a group, and then make it easy to share and perhaps automatically update such group membership files. 17 Group service A more scalable solution is to create a service that handles groups. Such a group needs its own public-private key pair. Therefore the group service needs the same functionality as the user service, described in the previous chapter. The groups service also needs the following new operations: 1. List the group membership 2.Create a membership certificate 3. List URLs that the group has access to 4.List groups this group is a member of 5.Receive updates of URLs that the group has access to 6.Receive updates of group memberships Groups can have either an open or closed membership list. A group with an open membership list lets members and services see who is a member. A group with a closed membership list does not disclose who is a member. How you login with a group membership differs depending on whether the group has a open or closed membership list. For a list with an open membership list you just login as yourself. The service can check your membership against the open list. For groups with a closed membership list the service cannot find out that you are a member. Instead you have to prove that you are a member, with a signed membership certificate. One of the group services is to provide such membership certificates. In some cases you may want to login to a service anonymously, without disclosing your email address. To handle those cases the group service can issue an anonymous member certificate. As you can see the user is responsible for choosing how to login. If she does not login using the correct group membership certificate she may be denied access. How will the user know how to login? Each group must maintain a list of URLs that the group has access to. This list is always available to members. Using this list the user’s applications can determine how to login. To maintain this list the group service can be informed when it has been granted access to an URL, or when the access has been revoked. Group address To make it simple to use groups each group must have a human readable address. I suggest that we use email addresses for group addresses. Then a group will look like a mailing list. Though we do not have to support sending email to group members. 18 Using an email address as the group address works well for groups maintained by an organization. However it would be nice for a user to be able to defined groups herself. Then we need an adress that contains both an email address as well as a group name. One way to encode such a group address would be group@user@ domain. Such user-owned groups must be implemented using the service broker that is described in the next chapter. My dear friends Groups that are member of other groups Alice [email protected] Bob [email protected] Clyde [email protected] Members of National Club Football Club National Club Book club Just having simple groups is not powerful enough to model access control for larger organizations. We need to have groups that belong to other groups. Football If these groups have closed memClub bership lists it will be a bit compliMember one [email protected] cated to login: Lets say the user is a member Member two [email protected] of the group “Book club”. The group “Book club” is in turn a member of the group “National Club”. The group “National Club” Book Club has permission to use a service. Member three To gain access to the service [email protected] the user must prove that she is a Member four [email protected] member of the group “National Club”. However she cannot get a membership certificate directly, instead she must first get a membership certificate for the group “Book club”. Using this certificate she can connect the group service for the group “National Club”, and get a certificate that proves she is an indirect member of the group “National Club”. Using this certificate she can connect to the service. For this to work it is of course necessary for the user to know that the group “Book club” is a member of the group “National Cub”. Therefore each group must maintain a list of which groups it is a member of. To maintain the list the group service need to be informed when the group has been added as a member of another group, or when the membership has been revoked. It is as important to inform users about group memberships, Therefore we also need a service that can be used to inform a user about new or revoked group memberships. Such a service for users should be implemented using the service broker described in the next chapter. 19 I am Bob’s laptop. Service broker I am Alice’s service broker! Where is Alice’s photo album? It is here: I am Clyde’s phone. We have a way to find a user’s public key, as well as a signed settings file. Does that mean that we can create secure end-to-end services between users? Yes we do. Just put the address to a service in the settings file. Then use public-key cryptography to ensure that we connect to the right user. One catch though — the information in the settings file is public. Everyone can see which services you provide. I would like to be able to provide a service to a few friends, without having to tell anyone else that the service even exists. My solution to this problem is to create a service broker service. You connect and identify yourself to my service broker, and it tells you about the services I provide, as well as how to connect to each service. But if you are not on my friends list the service broker will not give you any information at all. Best of all, if we design the service broker well, it will be very easy to create new services. Thus providing an infrastructure for anyone who has a good idea and wants to create a new application that lets users communicate and share data. I do not know you. Go away! 21 Using the service broker URL to named services The service broker should perform four functions: The URL needs to contain the following information: 1. Provide information about unnamed services 2.Provide information about named services 3.Act as a mailbox for services that are not online at the moment 4.Act as a gateway to connect to services Unnamed services are services that a user can only have one copy of, or that has a natural default. Examples would be email or instance messaging. To find such a service you only search for the service id. A service id should be a domain name or an URL. That way it is easy to invent a new services without needing a central registry of service ids. The only address that is needed to connect to such a service is the user’s email address. Named services are services that a user may run many copies of. Examples are personal web sites, group chats, version control repositories or remote file systems. To connect to a named service we need a URL, that contains both the user’s email address as well as a path to the service. Some, perhaps most, services will be running on computers or devices that are not turned on all the time. When the service is offline we can let the service broker act as an answering machine. The service broker will forward any messages sent when the service comes online. The fourth function is to provide a gateway to connect to an actual service. Some services will be independent of the service broker — you connect to them by creating a new encrypted TLSconnection to an address provided by the service broker. Other services may be running on laptops, mobile devices and home computers that are behind firewalls and therefore impossible to connect to directly. In such cases the service broker can tunnel a connection to the service. Revealing the IP address of a laptop or mobile device will disclose information about your current location. Something you may not always want to do. Using the service broker as a gateway is a way to hide the IP address from the person that is connecting to the service. Note that this does not protect against someone doing surveillance of the network itself. To protect against that you need to use an anonymous network like TOR. 22 1. The email address of the user 2.Path to the service record 3.Extra information to send on to the service itself As an example, lets examine this URL: sb://[email protected]/computers/loke/web:stattls/version/2.0/ index.jsp?d=starttls.se [email protected] is my email address, that leads you to my service broker. /computers/loke/web is the path to the service record. Before getting the service record we do not know anything about the service, or how to interpret the rest of the URL. This particular service record is for a web server, which we will connect to using HTTPS. The rest of the URL is interpreted as a path to fetch the object /starttls/version/2.0/index.jsp?d=starttls.se from the web server. For every service that can be used in a service record we need to define exactly how the last part of the URL is to be interpreted. A drawback of this system is that there is no way to know which application you should use for a certain URL. The actual application depends on the service record — it is not enough to just look at the URL. Therefor an application that implements service record URLs needs to provide good error messages to guide the user in case the URL points to a service record the application cannot handle. Implementation of the service broker The service broker uses the HTTP protocol. It has five functions: 1. List services 2.Fetch a service record 3.Connect to a service 4.Get a initial session key for connecting to a service 5.Send a message to a service The service broker is not intended to run as a web application on an existing web server. Rather it is intended to be implemented as a stand-alone application that uses an embedded HTTP server. It is important that it is easy to use the service broker, since it is intended to be run by normal users not system administrators. The reason to use the HTTP protocol is to use a well known protocol than invent a new protocol. 23 The service records A service record is a dynamically generated XML document containing the following information: 1. The domain name or URL that defines the service 2.The current status of the service 3.Information about how to connect to the service. The following are possible: 1. A domain name 2.A IP-number, port and key fingerprint 3.A TOR hidden service 4.Connection through the service broker The service record does not have to be signed. Rather we make sure that it is valid by authenticating the connection to the service broker. Connecting to the service broker We get the address of the service broker from the user´s settings file. The address is either a URL, in which case we use TLS@DNS-SEC to connect. Or the address is a TOR hidden service, for users who want to be anonymous. Usually both the service broker and the user that connects identify themselves using their public keys. However we do not want to reveal the identity of either user to an eavesdropper. Therefor we need to use the three party TLS-handshake, as described in the TLS@DNS-SEC chapter. The assumption is that the service broker is used for services which are only available to a limited audience. Therefore the user that connects needs to identify herself. However it is possible to use it for anonymous services as well. In that case the use that connects does not have to identify herself. Though existing Internet protocols, such as HTTP, is probably sufficient to build anonymous services. 24 25 Distributed database With the service broker we can build applications with secure endto-end communication between two users. But how do we build larger systems? Cloud-based services run on a large number of computers, hosted at various data centers. However even more computers are used to run the web browsers that are needed to access a cloud-based service. We have to figure out how we can run a service on these computers, without needing the servers. The web has been hugely successful for developing applications. Our goal is to make it as simple to develop services running on the new infrastructure, as it is to develop a web application today. So lets look at the design of a web service. It is built using three layers: 1. Web browser layer, which uses HTML and JavaScript to run the user interface 2.Web application layer, where is programmed in a language such as PHP, Python, Java or .NET, and run on a web server 3.SQL database that stores the application data The first two layers are simple to use with another architecture. Instead of connecting to a remote web server we could just run a local web server on our computer. However the third layer is more complicated. A SQL database needs to be centralized. That means that only one computer could be used to run the SQL database. If the user who runs 27 the SQL database loses her network connection the service will be down. That may be acceptable for a small service that is used by a few friends. But the solution does not scale. Once the group that uses the service grows to hundreds or thousands of users one computer will not be enough to run the database. One reason why we cannot run the SQL database on several computers is that SQL databases are based on transactional integrity. That basically means that every change to the SQL database is done in a certain order. Alice and Bob may attempt to change information in the database at exactly the same time. However the database will always decide that either Alice and Bob was first, and perform the changes in order. Lets say we have a project management application and Alice and Bob both decide to change the name of the project “Internet stuff” at the same time. The database decides to perform Alice change first. She changes the name to “Important Internet stuff”. Then the database tries to perform Bobs change. Bob tried to change the name from “Internet stuff” to “Internet things”. However the database will detect that the name is being changed by Alice. Therefore Bobs transaction will fail — he must review the new name before he can attempt to change it. NoSQL Fortunately SQL databases are not the only way to store data. Cloud-based services that scale to hundreds of millions of users have found SQL databases to be too limited. It is simply too hard to build a SQL database that is large enough to host all data for such a service. Therefore a number of NoSQL databases has been created, that solves the problem of running databases on many computers. NoSQL databases are scalable, meaning that it is always possible to store more data, or handle more users, by running the database on more computers. One of the NoSQL databases, CouchDB, is very interesting for our purpose. CouchDB does not have centralized transactions, such as a SQL database. That makes it much easier to run the database on several computers. However it also means that the data may not always be in sync. 28 Lets say Alice and Bob is using a project management application build using CouchDB. Alice is in California while Bob is in Sweden. Therefore Alice is connected to the US database server while Bob is connected to the European database server. Now when they try to change the name of the project both Alice and Bob succeeds. For a little while the project will be known as “Important Internet stuff” in the US and “Internet things” in Europe. However the database servers are in contact with each other they will soon discover that they both contain changes to the same name. Then the databases will choose to keep only one of the changes. Lets say Alice wins again, and the project will be named “Important Internet stuff” in Europe as well as in the US. The database still remembers the conflict, and how it was resolved. It also remembers that the project might have been named “Internet things”, if it had decided to keep Bobs change. Bob is notified that his change was retroactively overruled. If he still insists his name was the best one he can change it again. The difference between transactions in SQL and CouchDB is that in SQL all conflicts are resolved before the change is perfomed. In CouchDB the conflict is resolved after the change is performed. The SQL approach is simpler to use, since the database never needs to be changed retroactively. However this simplicity comes at the cost of not being able to scale. Since we need a distributed database we have to accept the more complex retroactive conflict resolution. We must be aware that data can change retroactively, and build applications that notifies users when this happens. Distributed database service Using a NoSQL database such as CouchDB we can create the foundation for running large applications on top of user’s computers. Just as the bittorrent protocol can be used to share large files between users, a NoSQL database can be used to share structured data suitable for more complex applications that file sharing. 29 Distributed application Now we have a database that can be used to build a distributed application. Is that enough? It could be enough. We can certainly build an app that uses the Service Broker protocol to find a NoSQL database. However we have not yet reached our goal to make it as simple to build an application as it is to build a web application today. One of the great aspects of a web application is that users do not have to install it. Since the application is run in one central location changes to the application will be visible to users immediately. However we want to get rid of the web server. Therefore we must distribute the application. By definition this means that users need to install it on their computer. We want to make this process as easy as possible. Perhaps we should even let the application update itself automatically. Then it would act almost as a web application. As soon as the developer updates the application the users get get the new version. That would be great. However there is one problem. It is not always safe to install and run applications. When the developer updates a web application the update only affects her web server. If she makes mistakes it is only the web server that is vulnerable. But if the application is downloaded to other users their computers become vulnerable. Therefore we need to verify that an application does not contain malicious code, or dangerous bugs, before we distribute it. This means that normal application development can never be as fast and convenient as web development. But wait a minute. A large part of a web application consists of HTML and JavaScript that run on the user’s computer. How come it is safe to download JavaScript to a web browser, but not applications? 31 Sandbox The reason is that the JavaScript is executed in a sandbox in the web browser. The sandbox provides a very limited environment with strict rules. As an example, a JavaScript program running in the web browser cannot read files from the computers hard drive. Thus JavaScript cannot be used to snoop around on your computer. So it is possible to run part of a web application in a sandbox. Does this mean that we could run the whole application in the sandbox? Actually it does. Instead of just writing part of the application in JavaScript we could write the whole application in JavaScript, and run it using a suitable sandbox. That way it would be safe for a user to download the applications Which would mean that we can treat it as a web application, and download new versions automatically. API To build such an application we need a suitable JavaScript API. We should start with the normal APIs available in HTML5, and extend it with the following functions: 1. Access data stored in the NoSQL database 2.Get events when data is changed in the NoSQL database We need to use access control to limit who gets access to the data stored in the NoSQL database. However that access control functionality can be kept outside of this API. The application itself need not be aware of the access control, it is happy just reading and writing information to the NoSQL database. Some other part of the system handles sharing information with other users. The application runs in a special web browser that implements the new API. The special web browser handles connections to other users’ Service Brokers, as well as access control. Getting there At first there will be a catch 22 situation. In order to develop an application using this new API there needs to be users that can use it. But it is quite a hassle to create private-public keys pairs, learning about the key management and setting up a personal service broker. There needs to be good applications before someone goes through all that. Is there any way to jump start development of such applications? Well the JavaScript API describes above does not invole any public keys or service brokers. The special web browser handles all 32 that. The application can actually run in a different environment, as long as it has access to the new JavaScipt API. One way to run such an application is to run it on a web server, as a normal web application. In such an deployment the JavaScript API is implemented using AJAX. The NoSQL database runs on the web server. The new API makes it possible to build applications that can be used both on todays Internet, as well as with the new infrastructure described in this white paper. It is even be possible to run the same application both as a web application, and using the new infrastructure, at the same time. In such a hybrid deployment some users use a normal web browser and access the application via the web. Other users use the new special web browser, and connect through a service broker. 33 Appendix A — Cryptography primer 0101 1110 1010 1001 0101 1110 1010 1001 Symmetric-key ciphers Symmetric-key ciphers are the work horse of any system using cryptography. A good way to understand a symmetric-key cipher is to imagine a deck or cards. By shuffling the deck the cards will be in random order, even if the deck was ordered when you started. Thus every time you plat a card game it will be different, you cannot rely on the ace of spades always being the fifth card drawn. Now imagine you could remember exactly how you shuffled the card. Then you could redo the shuffle, but backwards, and get the deck back to its original order. This is basically how a symmetric-key cipher works, even though it rearranges bits in a binary message rather than cards in a deck. It uses mathematical operations to shuffle the message into what looks like a random mess. Now in order to remember how to shuffle, or encrypt, a message a computer cipher uses a binary key. The key controls which mathematical operations to apply to the message. Using a different key will create a totally different random mess. Only by using the same key is it possible to get back the original message. The key itself is just a binary number, all numbers are valid. The size of the key determines how many different combinations of encrypted messages we can create. With an 8 bit key we can only create 256 different encrypted messages. Even if the encrypted message looked like a random mess, it would be easy to try every combination. Therefore someone could find the original message without knowing the key. To be safe we use a key that is at least 128 bits. That means that there are 2^128, or 340,282,366,920,938,463,463,374,607,43 1,768,211,456, possible combinations. A 40, 56 or 64 bit key is not 35 c onsidered safe. Such a message could be decrypted today by trying all combinations. Symmetric-key ciphers are very good for protecting data you want to store. You provide a password that is turned into a binary number. That number is uses as a key to encrypt a file, or perhaps your entire hard disk. As long as you choose a password that is not easy to guess your data should be safe. Perhaps even from you — always make sure you remember your encryption password. If you use a good encryption program there is simply no way to get the data back if you forget the password. 1100 0011 0101 1010 0101 1110 1010 1001 Public-key cipher A symmetric-key cipher would also work for communication, providing you share a good password with everyone you want to communicate with. That is right, for everyone you want to communicate with you would need a separate password. And you would also have to come up with that password in a safe way, so that no one can intercept the password, since everyone that knows the password can eavesdrop on the communication. Basically you would have to meet and come up with the password in a safe place where you cannot be overheard. Thus only using symmetric-key encryption is very good for plots for spy novels, but not really useful for securing Internet communication. I am not sure if e-commerce would have taken off, if you first had to go to the local Amazon office to come up with a safe password to be able to communicate securely Amazon’s web site... 36 One solution to the communication problem is to use a public-key cipher. A public-key cipher does not use a single key, but a key pair. Unlike a symmetric-key it is not possible to use any binary number as a key. Instead the key pair has to be generated, using random data as input. When the key pair has been generated you get two keys, lets call them A and B. Now the remarkable thing is that a message encrypted with the key A can only be decrypted with key B. A message encrypted with the key B can only be decrypted with the key A. Since we use random input every time we generate a key-pair we get a different one. When you create your own key pair you can be certain that it is unique. Due to this process it is not possible to turn a password into a public-key. Instead of remembering a password you need to store the key-pair as a computer file. Now using a key-pair, rather than just a straightforward key seems a bit complicated. Why go through all the hassle with computer files instead of passwords? As the name may imply the power of a public-key cipher comes to light when you use one of the keys as a public key and the other as a private key. You give the public key to anyone that might want to contact you. You make sure only you have access to the private key. Now what happens if someone encrypts a message with your public key? First of all, since the message cannot be decrypted using the same key you do not have to worry about the other people that also has your public key. They cannot decrypt the message. Actually the message can only be decrypted with the other key in the keypair — your private key. 37 This property is used for digital signatures. Instead of a handwritten signature you use your private key to sign a document. It can also be used as a login mechanism instead of a password. As we will see in the next chapter it can also be used to verify that we are connecting to the web server. The most used algorithm for public-key ciphers is RSA. Since the mathematics behind RSA is different than for a symmetric key the keys must be much longer to be secure. Unlike symmetrical ciphers where we can use any binary number as a key only a small portion of all binary numbers are valid RSA keys. Therefore a RSA-key neesd to be 2048 bits to be considered to be secure. It is important to note that this is only necessary for RSA — a 2048-bit symmetrical cipher key is just wasteful. Such a cipher would be very slow compared to a 128 cipher or 256-bit cipher, without adding any security. We have a way to encrypt messages to you, that only you can decrypt! The spy-novel meeting where you need to determine a secure password in privacy is no longer necessary. Instead you can just exchange public keys in a public setting, it doesn’t matter if someone else sees them. Actually it is even better that way, the more people who know your public key the better. Remember that each key in the key-pair can decrypt a message encrypted by the other key. What happens if you encrypt a message with your private key? At first glance that may seem useless. The message can be decrypted by your public key. What use is that? Perhaps you could use it to send to a message to all your friends, but wouldn’t that mean that they would have to be very careful about handling your public key? Are we back at the spy-novel meeting again? Actually it is not a useful way to encrypt a message that can only be decrypted by your friends. As the name suggests a public key should be, well public. We should assume that anyone can decrypt the message. However if they succeed in decrypting a message with your public key they learn one important thing — that the message has been encrypted by your private key. Since you are the only one who has access to your private key, the message must have been encrypted by you. 38 Forward security with Diffie-Hellman Now we have public-key ciphers, does that mean that our communication is secure? Well if we use public-key ciphers for communication we need the private key to decrypt a message. So as long as we keep the private key safe we are ok. By the way, if someone gets hold of the private key at some point in the future they will be able to decrypt the message. So we need to be very careful with that private key... We now live in the petabyte age — it is certainly feasible for someone to store all your communication. Perhaps you just have to be grateful for the short time you could hide behind your private key. Or perhaps not. There is actually a algorithm that lets two people create a secret number, in public. The algorithm is called Diffie- Hellman. 39 a b Lets say that Alice and Bob wants to create a secret number. Alice starts by creating a random number a, while Bob creates another random number b. a K=B mod p K=Ab mod p Alice does a computation using her secret random number a as well as the number B she got from Bob. The result is the number K. Bob does some computations using his secret random number b as well as the number A he got from Alice. The result is the number K A=ga mod p A! Alice than does some computation with her random number and tells everyone the result A. B=gbmod p B! K That is right both Alice and Bob got the same number K. However to compute it it was necessary to have either the secret random number a or b. Only Alice or Bob knows one of these two numbers. Even if someone did overhear the public numbers, A and B she cannot compute K without either a or b. Usually we do not use Diffie-Hellman in a public setting. But if there are eavesdroppers we can be safe in the knowledge that they cannot get the secret number K, even if they have our private keys and can listen in to the communication. It also makes it possible to really forget. If we use K as our encryption key, it will not be possible for anyone to decrypt the communication if we forget about a, b and K. Bob does some computation with his random number and tells everyone the result B. 40 41 Appendix B — FAQ Do we have to rely on DNS, given that it seems to come under increased political control? DNS is needed for us to keep using nice email-addresses and readable URLs. I find it unlikely that it is possible to create a competing catalog service, that can scale to as many users as DNS, without coming under any political control. To replace DNS we will simply have to do without the nice email addresses. Instead we would have to use some sort of web-of-trust, where users share their public key with their friends. Most of the services described in this paper would work in such a system, and the JavaScript application will work without modification. Does not OpenID already solve the problem with passwords? OpenID does indeed let a user identify herself to a web site. However there are many advantages with a public key based system. It makes it much easier to build access control systems for example. And we can use it in applications that are not web based. Isn’t there already public key solutions in use? Many countries have national systems that use public key crypt ography to identify citizens. However the system described in this paper is not used to identify a person, but rather to identify an email address. This seems like a good idea. When can I use it? That depends on you. To actually create the services described in this paper will take some effort. If you want to help please contact us at [email protected] 42 43 At startTLS, you can easily test if you can email securely, or if your friends can send and recieve email securely. https://starttls.se/ This book is free — attribution cc by — which means it has some rights reserved according to the Creative Commons License, see URL below. Simply, it’s free to share and distribute. http://creativecommons.org/licenses/by/3.0/legalcode
© Copyright 2025