Download Report

i
General Information & Communication Technology
350101 GenICT I & II 2015
Partial Lecture Notes
Michael Kohlhase
School of Engineering & Science
Jacobs University, Bremen Germany
[email protected]
March 24, 2015
ii
Preface
This Document
This document contains the course notes for the those parts of the course General Information &
Communication Technology I & II held at Jacobs University Bremen in the academic year 2014.
Contents: The document mixes the slides presented in class with comments of the instructor to
give students a more complete background reference.
Caveat: This document is made available for the students of this course only. It is still a draft
and will develop over the course of the current course and in coming academic years.
Licensing: This document is licensed under a Creative Commons license that requires attribution,
allows commercial use, and allows derivative works as long as these are licensed under the same
license.
Course Concept
Aims: The course 350101 “General Information & Communication Technology I/II” (GenICT) is
a two-semester course that introduces concepts of Computer Science Concepts to non-CS students.
The course is co-taught by four Jacobs Computer Science Faculty each covering a quarter of the
materials.
Course Contents
Goal: We want to demonstrate both theoretical foundations of CS as Computer Science, and we
want to provide practical knowledge helping students to cope with understanding and handling
Computers, electronic documents and data, and the Web. Roughly the first half of the first
semester is devoted to theoretical foundations and core concepts (Kohlhase and Jaeger), and the
second half of the semester to the practical real-world stuff (Schnwlder and Baumann). Throughout
the semester, students will be introduced stepwise to one of the main programming languages of
today, Python.
Acknowledgments
Materials: The presentation of the programming language python uses materials prepared by Dr.
Heinrich Stamerjohanns and Dr. Florian Rabe for the ESM Phython modules.
GenICT Students: The following students have submitted corrections and suggestions to this and
earlier versions of the notes: Kim Philipp Jablonski, Tom Wiesing.
Contents
Preface . . . . . . . .
This Document .
Course Concept .
Course Contents
Acknowledgments
I
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
GenICT 2: Dependable and Secure Software
ii
ii
ii
ii
ii
1
1 Introduction to Dependability and Security
3
2 Software Errors
7
3 Software Testing
3.1 Software Testing Introduction .
3.2 Functional (Black-Box) Testing
3.2.1 Unit Testing . . . . . .
3.2.2 Integration Testing . . .
3.3 Structural (White-Box) Testing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
13
13
20
23
4 Software Maintenance
4.1 Motivation . . . . . . . . . . . . . .
4.2 Revision Control Systems . . . . . .
4.2.1 Introduction/Motivation . . .
4.2.2 Centralized Version Control .
4.2.3 Distributed Revision Control
4.3 Bug/Issue Tracking Systems . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
28
28
31
33
34
5 Security by Encryption
5.1 Introduction to Crypto-Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Public Key Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Internet Security by Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
37
40
47
.
.
.
.
.
.
.
.
.
.
iii
Part I
GenICT 2: Dependable and
Secure Software
1
Chapter 1
Introduction to Dependability and
Security
Dependable & Secure Software
Definition 1.0.1 (Dependability) A system is called dependable if it can
maintain its working capacity (work as specified) for a certain time or until
the completion of a specified amount of work under given service conditions
without forced interruptions.
Definition 1.0.2 (Security) Information security, sometimes shortened to
InfoSec, is the practice of defending information from unauthorized access, use,
disclosure, disruption, modification, perusal, inspection, recording or destruction.
Observation: We want our software systems to be dependable and secure
(D&S)
This module: But how can we ensure this? (what are the consequences if not)
©: Michael Kohlhase
D&S Terminology/Taxonomy
3
1
4
CHAPTER 1. INTRODUCTION TO DEPENDABILITY AND SECURITY
availability
reliability
Dependability & Security of
systems is analyzed in terms of
attributes: what do we
want to achieve
threats:
wrong
means: what can be do
about that
safety
attributes
integrity
maintainability
confidentiality
what can go D&S
These are for general systems
(adapt to software systems)
fault/bug
error
threats
failure
prevention
removal
means
tolerance
forecasting
©: Michael Kohlhase
2
Dependability & Security Attributes
Definition 1.0.3 D&S attributes are qualities of a system that affect overall
D&S: we have
availability: readiness for correct service
safety: absence of catastrophic consequences on the user(s) and the environment
integrity: absence of improper system alteration
confidentiality: the absence of unauthorized disclosure of information
reliability: continuity of correct service
maintainability: ability for a process to undergo modifications and repairs
the first five contribute to dependability, the last three to security
We will concentrate on “correct service” in this module
©: Michael Kohlhase
3
Dependability & Security Threats
Definition 1.0.4 D&S threats are things that can affect a system and cause
a drop in the D&S attributes.
A fault (also called a bug) is a defect in a system.
An error is a discrepancy between the intended behavior of a system and
its actual behavior inside the system boundary.
5
A failure is an instance in time when a system displays behavior that is
contrary to its specification.
Observation: The presence of a fault in a system may lead to a failure.(or not)
Example 1.0.5 Input and state conditions may never cause this fault to be
executed ; no error ; no failure.
Observation: Errors occur at runtime when the system enters an unexpected
state due to the activation of a fault.
(need debuggers or logs to find)
Observation: An error may not necessarily cause a failure.
Example 1.0.6 An exception may be thrown by a system but this may be
caught and handled using fault tolerance techniques. (system works correctly)
©: Michael Kohlhase
4
The first “Bug”: 1945
Moth found trapped between points at Relay #70, Panel F, of the Mark II
Aiken Relay Calculator, while it was being tested at Harvard University, September 9, 1945.
©: Michael Kohlhase
5
Dependability & Security Means
Definition 1.0.7 D&S means are measures to break the fault-error-failure
chain to increase D&S of a system, they include
prevention: stop faults from being incorporated into a system
removal of faults from a system
(during development & use)
6
CHAPTER 1. INTRODUCTION TO DEPENDABILITY AND SECURITY
forecasting: predicts likely faults so that they can be removed or their effects
can be circumvented.
tolerance: mechanisms that allow a system to still deliver the required
service in the presence of faults
For this module: we will concentrate on
fault removal!
testing as a means for finding bugs during development
verification as a means for showing the absence of (certain) faults
bug/issue-tracking as a means for reporting faults during use.
confidentiality and DS-attributesintegrity
encryption for ensuring confidentiality and authentication.
©: Michael Kohlhase
6
Chapter 2
Software Errors
The first “Bug”: 1945
Moth found trapped between points at Relay #70, Panel F, of the Mark II
Aiken Relay Calculator, while it was being tested at Harvard University, September 9, 1945.
©: Michael Kohlhase
7
Myths about Software bugs
Benign Bug Hypothesis: Bugs are nice, tame, and logical.
Bug Locality Hypothesis: A bug discovered within a component affects only
that components behavior.
Control Bug Dominance: Most bugs are in the control structure of programs.
Corrections Abide: A corrected bug remains correct.
Silver Bullets: A language, design method, environment grants immunity from
bugs.
Sadism Suffices: All bugs can be caught using low cunning and intuition.
7
8
CHAPTER 2. SOFTWARE ERRORS
©: Michael Kohlhase
8
Sources of Software Errors
Requirements Definition: Erroneous, incomplete, inconsistent requirements.
Design: Fundamental design flaws in the software.
Implementation: Mistakes in chip fabrication, wiring, programming faults, malicious code.
Support Systems: Poor programming languages, faulty compilers and debuggers, misleading development tools.
Inadequate Testing of Software: Incomplete testing, poor verification, mistakes
in debugging.
Evolution: Sloppy redevelopment or maintenance, introduction of new flaws
in attempts to fix old flaws, incremental escalation to inordinate complexity.
©: Michael Kohlhase
9
Effects of Software bugs (Examples)
Military Aviation Problems
An F-18 crashed because of a missing exception condition: if ... then ...
without the else clause that was thought could not possibly arise.
In simulation, an F-16 program bug caused the virtual plane to flip over
whenever it crossed the equator, as a result of a missing minus sign to
indicate south latitude.
Year Ambiguities
In 1992, Mary Bandar received an invitation to attend a kindergarten in
Winona, Minnesota, along with others born in ’88. (Mary was 104 years old)
Mr. Blodgetts auto insurance rate tripled when he turned 101.(first driver over 100)
His age was interpreted as 1. (program: a teenager is someone under 20!)
Dates, Times, and Integers
(32, 768 = 215 overflows 16-bit words)
A Washington D.C. hospital computer system collapsed on September 19,
1989, 215 days after January 1, 1900, forcing a lengthy period of manual
operation.
COBOL uses a two-character date field . . .
The Linux term program, died word wide on October 26, 1993.
Shaky Math
program fault in a earthquake simulation program ; 5 US nuclear power
plants shut down in 1979(fault discovered after the power plants were built)
Problem: sum instead of sum of absolute values(plants to week for large quake)
9
Therac-25 Radiation “Therapy”
In Texas, 1986, a man received between 16,500-25,000 rads in less than 1
sec, over an area of about 1 cm.
(lost arm, later died)
In Texas, 1986, a man received at least 4,000 rads to brain.
(died)
In Washington, 1987, a patient received 8,000-10,000 rads instead prescribed 86 rads.
(died)
Bank Generosity
A Norwegian bank ATM dispersed 10× the amount. (long lines, great joy)
A software flaw caused a UK bank to duplicate every transfer payment
request for half an hour. (initial loss: 2 × 109 £ after recovery 5 × 106 £)
Making Rupee!
An Australian man purchased 104, 500 $ worth of Sri Lankan Rupees.
The first bank’s software had displayed a bogus exchange rate in the Rupee
position!
A judge ruled that the man had acted without intended fraud and could
keep the extra 335, 758 $!
The next day he sold the Rupees to another bank for 440, 258 $
Bug in BoNY Software
The Bank of New York (BoNY) had a 3.2 × 1010 $ overdraft as the result
of a 16-bit integer counter that went unchecked.
BoNY stuck, while NY Federal Reserve debited BoNY’s cash account.
The bug cost BoNY 5 × 109 $ in interest payments.
BoNY had to borrow 2.4 × 1010 $ to cover itself for 1 day until a fix.
©: Michael Kohlhase
10
10
CHAPTER 2. SOFTWARE ERRORS
Chapter 3
Software Testing
3.1
Software Testing Introduction
Software Testing (Intro)
Definition 3.1.1 software testing is the process of reviewing or exercising a
program with the specific intent of finding errors prior to delivery to the end
user.
Test Feature Space
Testing is a complex and multi-faceted area
Level
(will cover some)
regression
acceptance
safety
system
security robustness
integration
usability
reliability
Accessibility
unit
performance
correctness
white grey black
maintainability
box box box
portability
manual
interoperability
semi-automatic
…
Automation
automatic
Quality
320312 Software Engineering (P. Baumann)
7
©: Michael Kohlhase
11
The Significance of Testing
Most widely-used activity for ensuring that software systems satisfy the specified requirements.
Consumes substantial project resources. Some estimates: ∼ 50% of development costs
11
12
CHAPTER 3. SOFTWARE TESTING
NIST Study 2002: The annual cost of inadequate testing in the US can be as
much as 59 billion US dollars.
©: Michael Kohlhase
12
Limitations of Testing
Testing cannot occur until after the code is written.
The problem is big!
Perhaps the least understood major SE activity.
Exhaustive testing is not practical even for the simplest programs. WHY?
Even if we “exhaustively” test all execution paths of a program, we cannot
guarantee its correctness. – The best we can do is increase our confidence!
“Testing can show the presence of bug, not their absence.”(Edsger W. Dijkstra)
Testers do not have immunity to bugs.
Even the slightest modifications after a program has been tested invalidate
(some or even all of) our previous testing efforts.
Automation is critically important.
Unfortunately, there are only a few good tools, and in general, effective use of
these good tools is very limited.
©: Michael Kohlhase
13
Testing Methods (General)
Definition 3.1.2 Software testing by inspecting the code (automatically or
manually) is called static testing.
Definition 3.1.3 Software testing by executing code on a given set of test
cases is referred to as dynamic testing.
Definition 3.1.4 Black-box testing (or functional testing) treats the software
as a “black box”, examining functionality without any knowledge of internal
implementation.
Definition 3.1.5 White-box testing (or structural testing) is a software testing method that takes internal structures or workings of an application into
account
©: Michael Kohlhase
14
Specification-based Testing
Definition 3.1.6 Specification-based testing aims to test the functionality of
3.2. FUNCTIONAL (BLACK-BOX) TESTING
13
software according to the applicable requirements, usually given as a test suite.
Definition 3.1.7 A test suite is a set of test cases: sets of inputs, execution
conditions and expected results for a particular test objective. A test case is
the smallest entity that is always executed as a unit, from beginning to end.
a test suite can be seen as a form of specification
Definition 3.1.8 A software requirements specification (SRS, or just specification) is a description of the structure and behavior of a software system,
laying out functional and non-functional requirements.
A SRS may include a set of use/test cases that describe interactions the users
will have with the software.
©: Michael Kohlhase
15
Testing Levels (in the Project Workflow)
Testing & The Design Cycle
Testing occurs at different levels
(should be integrated into design process)
What users
really need
Acceptance
testing
System
testing
Requirements
Integration
testing
Design
Code
Unit
testing
Project work flow
Dynamic testing
320312 Software Engineering (P. Baumann)
©: Michael Kohlhase
3.2
3.2.1
9
16
Functional (Black-Box) Testing
Unit Testing
Unit Testing
Definition 3.2.1 Unit testing, is a specification-based testing method that
specifically tests a single “unit” of code in isolation.
A unit can be an entire module, a single class or function, or almost anything
in between as long as the code is isolated from other code not under testing
(which itself could have errors and would thus confuse test results).
14
CHAPTER 3. SOFTWARE TESTING
Unit testing usually supported via a test harness that automates running test
cases
(e.g. upon save)
most programming languages have frameworks for unit-testing nowadays
Benefits of unit testing:
tests as specification: write tests before coding; tests pass ; code complete
regression testing: run unit tests after every change
tests as documentation: test cases document what is critical about the unit
simplify integration testing: rely on thoroughly tested units.
©: Michael Kohlhase
17
Unit Testing in python (after [Knu])
python has a unit testing framework: unittest
(standard library)
Running Example: file prime1.py
def is_prime(number):
"""Return True if *number* is prime."""
for element in range(number):
if number % element == 0:
return False
return True
def print_next_prime(number):
"""Print the closest prime number larger than *number*."""
index = number
while True:
index += 1
if is_prime(index):
print(index)
©: Michael Kohlhase
18
A first Unit Test
A first unit test for prime1.py in file test_prime1.py
import unittest
from prime1 import is_prime
class PrimesTestCase(unittest.TestCase):
"""Tests for ‘primes.py‘."""
def test_is_five_prime(self):
"""Is five successfully determined to be prime?"""
self.assertTrue(is_prime(5))
if __name__ == ’__main__’:
unittest.main()
Unit test with a single test case: test_is_five_prime
3.2. FUNCTIONAL (BLACK-BOX) TESTING
15
in unittest any function whose name starts with test in a class derived from
unittest.TestCase is a unit test case.
test cases are run and their assertions checked by unittest.main()).
run this by python test_primes.py and obtain
$ python test primes.py
E
==============================================
ERROR: test is five prime ( main .PrimesTestCase)
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Traceback (most recent call last):
File ”test primes.py”, line 8, in test is five prime
self.assertTrue(is prime(5))
File ”/home/jknupp/code/github code/blug private/primes.py”, line 4, in is prime
if number % element == 0:
ZeroDivisionError: integer division or modulo by zero
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Ran 1 test in 0.000s
The E is the result of the single test case being run(failure (error) otherwise .)
error message points to line and problem(python is zero-indexed ; division by zero)
here the error is encountered before the test even terminates!
fix line 3 to for element in range(2, number): test again
$ python test_primes.py
.
----------------------------------------------Ran 1 test in 0.000s
all is OK.
©: Michael Kohlhase
19
Assertions in python
A unit test consists of one or more assertions (statements that assert that
some property of the code being tested is true).
Example 3.2.2 asserting that 5 is prime: self.assertTrue(is_prime(5))
Other Assertions: Details at https://docs.python.org/3/library/unittest.html
16
CHAPTER 3. SOFTWARE TESTING
Method
assertEqual(a, b)
assertNotEqual(a, b)
assertTrue(x)
assertFalse(x)
assertIs(a, b)
assertIsNot(a, b)
assertIsNone(x)
assertIsNotNone(x)
assertIn(a, b)
assertNotIn(a, b)
assertIsInstance(a, b)
assertNotIsInstance(a, b)
checks that
a == b
a != b
bool(x) is True
bool(x) is False
a is b
a is not b
x is None
x is not None
a in b
a not in b
isinstance(a, b)
not isinstance(a, b)
new in
3.1
3.1
3.1
3.1
3.1
3.1
3.2
3.2
all accept optional message argument for error messages on failure via the key
msg.
©: Michael Kohlhase
20
More Unit Tests for is_prime
test_is_five_prime worked for an generic prime number.
Test negative cases by adding a method to the PrimesTestCase class:
def test_is_four_non_prime(self):
"""Is four correctly determined not to be prime?"""
self.assertFalse(is_prime(4), msg=’Four is not prime!’)
assertFalse specifies that we expect 4 to be compound. The msg message
outputs additional information if the unit test fails.
©: Michael Kohlhase
21
Testing Edge Cases
Errors usually occur in edge cases: here 0, 1, negative integers.
testing the zero case.
def test_is_zero_not_prime(self):
"""Is zero correctly determined not to be prime?"""
self.assertFalse(is_prime(0))
gives the result
python test primes.py
..F
================================================
FAIL: test is zero not prime ( main .PrimesTestCase)
Is zero correctly determined not to be prime?
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Traceback (most recent call last):
File ”test primes.py”, line 17, in test is zero not prime
self.assertFalse(is prime(0))
AssertionError: True is not false
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Ran 3 tests in 0.000s
3.2. FUNCTIONAL (BLACK-BOX) TESTING
17
FAILED (failures=1)
Right, we changed the range statement to exclude zero and one.
lets fix that
(then the tests pass.)
def is_prime(number):
"""Return True if *number* is prime."""
if number in (0, 1):
return False
for element in range(2, number):
if number % element == 0:
return False
return True
©: Michael Kohlhase
22
Testing for negative numbers (a whole range)
let’s test a whole range
(program in python)
def test_negative_number(self):
"""Is a negative number correctly determined not to be prime?"""
for index in range(-1, -10, -1):
self.assertFalse(is_prime(index))
test fails, but we do not get enough information
python test primes.py
...F
======================================================================
FAIL: test negative number ( main .PrimesTestCase)
Is a negative number correctly determined not to be prime?
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Traceback (most recent call last):
File ”test primes.py”, line 22, in test negative number
self.assertFalse(is prime(index))
AssertionError: True is not false
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Ran 4 tests in 0.000s
FAILED (failures=1)
which negative number did it fail on?
(unittest unhelpful)
we can fix this with a better message
def test_negative_number(self):
"""Is a negative number correctly determined not to be prime?"""
for index in range(-1, -10, -1):
self.assertFalse(is_prime(index), msg=’{} is not prime’.format(index))
this gives
python test primes
...F
==============================================================
FAIL: test negative number (test primes.PrimesTestCase)
Is a negative number correctly determined not to be prime?
18
CHAPTER 3. SOFTWARE TESTING
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Traceback (most recent call last):
File ”./test primes.py”, line 22, in test negative number
self.assertFalse(is prime(index), msg=’{} should not be determined to be prime’.format(index))
AssertionError: True is not false : −1 should not be determined to be prime
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Ran 4 tests in 0.000s
FAILED (failures=1)
©: Michael Kohlhase
23
The End Result
A better implementation: prime3.py
def is_prime(number):
"""Return True if *number* is prime."""
if number <= 1:
return False
for element in range(2, number):
if number % element == 0:
return False
return True
A (somewhat) comprehensive test case test_prime3.py
import unittest
from prime3 import is_prime
class PrimesTestCase(unittest.TestCase):
"""Tests for ‘primes.py‘."""
def test_is_five_prime(self):
"""Is five successfully determined to be prime?"""
self.assertTrue(is_prime(5))
def test_is_four_non_prime(self):
"""Is four correctly determined not to be prime?"""
self.assertFalse(is_prime(4), msg=’Four is not prime!’)
def test_is_zero_not_prime(self):
"""Is zero correctly determined not to be prime?"""
self.assertFalse(is_prime(0))
def test_negative_number(self):
"""Is a negative number correctly determined not to be prime?"""
for index in range(-1, -10, -1):
self.assertFalse(is_prime(index), msg=’{} is not prime’.format(index))
if __name__ == ’__main__’:
unittest.main()
©: Michael Kohlhase
24
What have we learnt?
While developing tests for is_prime we learnt that unit-testing
can be used for black-box testing of functions
(tests never looked)
3.2. FUNCTIONAL (BLACK-BOX) TESTING
19
finds errors early
(division by zero)
finds regressions
(when we changed to range(2,number))
documents edge cases
(0/1 prime? a priori unclear)
is quite a lot of work
(test suite longer than the function)
Still to do
automation of test application(scripts, IDE, continuous integration framework)
did we test enough?
testing print_next_prime
(difficult, it uses is_prime)
(maybe 1234567891 is not found prime?)
©: Michael Kohlhase
25
Unit Testing with Mocks/Stubs
Recall: Unit testing is about testing a module in isolation
Problem: What if the module (e.g. a function) calls others? Any fault in those
could cause test failure.
Example 3.2.3 Testing networking code
(network may be down)
Example 3.2.4 Testing modules with database access(really use production DB?)
Replace inferior modules with specially constructed stubs or mocks (mockups)
that always succeed, give controlled results
Various ways to do this
(vary in complexity and effectiveness)
Definition 3.2.5 A monkey patch is a way for a program to extend or modify
supporting system software locally (affecting only the running instance of the
program
©: Michael Kohlhase
26
Simple Mocking by Monkey Patching in python
Remember our running example?
(the second function)
def print_next_prime(number):
"""Print the closest prime number larger than *number*."""
index = number
while True:
index += 1
if is_prime(index):
print(index)
calls two extenal functions: is_prime and print
(not isolated)
Idea: Monkey patch them
(python has a framework) gives us the
@patch(’foo’) annotation that marks foo for patching.
20
CHAPTER 3. SOFTWARE TESTING
©: Michael Kohlhase
27
Monkey Patching in python: Test Case
Towards a test case for print_next_prime function
First import the unittest and unittest.mock frameworks.
then import the is_prime and print_next_prime functions(so that we can test/patch them.)
import unittest
from unittest.mock import patch
from primes import is_prime
from primes import print_next_prime
The test case proper: e.g. the next prime after four
(is five!)
class NextPrimeTestCase(unittest.TestCase):
"""Testing print_next_prime method"""
@patch(’builtins.print’)
@patch(’primes.is_prime’)
def test_is_five_after_4(self, is_prime, print):
"""is 5 next prime after four?"""
is_prime.return_value = True
print_next_prime(4);
print.assert_called_with(5)
if __name__ == "__main__":
unittest.main()
@patch(’foo’) only does its job if foo is mentioned as an argument to
the test.
©: Michael Kohlhase
3.2.2
28
Integration Testing
Integration Testing
Definition 3.2.6 Integration testing is the phase in software testing in which
individual software units are combined and tested as a group.
things tested in integration testing include
import/export type compatibility
representation compatibility
out-of-range errors
Example 3.2.7 The NASA Mars orbiter unexpectedly turned into lander because NASA specified a thruster in SI units and the contractor interpreted them
as PSI.
Integration testing strategies: big-bang, top-down, bottom-up, sandwich(see these next)
3.2. FUNCTIONAL (BLACK-BOX) TESTING
21
Definition 3.2.8 In big bang integration testing, all or most of the units
are coupled together to form a complete software system and then used for
integration testing.
great if it works, not much information if it does not
©: Michael Kohlhase
(time saver)
29
Top-Down Integration Testing
A
Definition 3.2.9 In top-down integration
testing,
the top module is tested with stubs,
re-run tests for each replacement
B
F
G
C
depth-first replacement of stubs
D
E
Advantage: tracing faults to user-visible behaviors (failures)
©: Michael Kohlhase
(linking faults)
30
Bottom-Up Integration Testing
A
Definition 3.2.10 In bottom-up integration testing,
tested units are grouped into modules
re-run unit tests with each replacement
B
G
C
depth-first replacement of drivers
Advantage: finding errors early.
F
D
E
(modules are always tested)
Definition 3.2.11 In sandwich integration testing top-down and bottom-up
integration testing are combined.
©: Michael Kohlhase
31
System Testing
Definition 3.2.12 System testing is a functional testing method which is
conducted on a complete, integrated system to evaluate the system’s compliance with its specified requirements.
“integrated system” =
ˆ integrated hardware and software(after integration testing)
Focus on use & interaction of system functionalities rather than details of
22
CHAPTER 3. SOFTWARE TESTING
implementations
Should be carried out by a group independent of the code developers
alpha testing: end users at developer’s site
beta testing: at end user site, without the developer involved!
Concerns in system testing:
(including, but not limited to)
GUI and usability
compatibility & security(does it play nice with the client’s other systems?)
(can users interact with the system?)
load, volume, scalability (can the system handle the required load and data volumes?)
©: Michael Kohlhase
32
Acceptance Testing
Definition 3.2.13 In user acceptance testing (UAT) the client tests whether
the a system meets the contractual requirements so that transfer of ownership
can take place.
UAT is sometimes mixed/confused with beta testing.
Problems: UAT is a crucial step in the software life-cycle and should be planned
well ahead
- customer may demand new functionality when exposed to the system
+ agree on UAT test suite in contract
©: Michael Kohlhase
33
Testing Levels (in the Project Workflow)
Testing & The Design Cycle
Testing occurs at different levels
What users
really need
(should be integrated into design process)
Acceptance
testing
Requirements
Design
Code
System
testing
Integration
testing
Unit
testing
Project work flow
Dynamic testing
320312 Software Engineering (P. Baumann)
9
3.3. STRUCTURAL (WHITE-BOX) TESTING
23
©: Michael Kohlhase
3.3
34
Structural (White-Box) Testing
Coverage Analysis vs. Path-Based Testing
How good are our tests?: Do they test all of the possible program behaviors?
We have to look at the program to find out
(white-box testing)
Path-based testing: Analyze program behaviors (paths through the program)
and generate conditions for test cases that cover these paths.
Definition 3.3.1 Control-flow testing is a structural testing method that
takes the control flow of the program as a model to determine suitable test
cases.
©: Michael Kohlhase
35
Control Flow Graphs
Definition 3.3.2 The control flow graph of a program is a graph that models
its control structure. Its nodes are are labeled with process blocks – sequences
of statements without control operators and decisions – Boolean expressions
in control structures. Two nodes are connected with an edge, iff they can be
executed subsequently. We call a node with in-degree greater that 1 a junction.
Example 3.3.3 Consider the function that raises x to the power of y.
def power (x,y):
if(y<0):
p = -y; else
p = y
z=1.0
while (p != 0):
z = z * x;
p -= 1
if(y<0):
return 1.0 / z
else
return z
if y<0
T a
b
F
p = y
p = -y
c
d
z=1.0
e
T
f
while p!=0
F
g
h
z=z*x
p-=1
if y<0
T
i
k
F
return 1/z return z
©: Michael Kohlhase
36
Paths in Programs
Definition 3.3.4 Let G be a control flow graph, then we call a path in G
complete if it starts at the unit’s entry and ends at the return.
Complete paths are useful for testing because:
It is difficult to set up and execute paths that start at an arbitrary statement.
24
CHAPTER 3. SOFTWARE TESTING
It is difficult to stop at an arbitrary statement without changing the code
being tested.
We think of routines as input/output paths.
There are many paths between the entry and exit points of a typical routine.
Even a small routine can have a large number of paths.
©: Michael Kohlhase
(infinite)
37
How many Paths are enough for Testing?
In Principle: we would have to test all paths
(infeasible)
Definition 3.3.5 In testing we speak of
path coverage, iff all paths are tested
statement coverage, iff all statements are covered.
decision coverage, iff all decisions are tested in both directions
path coverage ; decision coverage ; statement coverage (at least without goto)
Definition 3.3.6 (Testing Strategies) If we strive for
path coverage, we speak of path testing
statement coverage, we speak of statement testing
decision coverage, we speak of branch testing
©: Michael Kohlhase
38
Path Predicates
Intuition: Every path corresponds to a succession of true or false values for
the predicates traversed on that path.
Definition 3.3.7 A path predicate is a Boolean expression that characterizes
the set of input values that will cause a path to be traversed.
Definition 3.3.8 Any set of input values that satisfies all of the conditions
of the path predicate will force the routine through that path; we say that they
achieve the path. If there is no such set of inputs, the path is unachievable.
Definition 3.3.9 The act of finding a set of solutions to the path predicate
expression is called path sensitization.
©: Michael Kohlhase
Path Testing Absolute Value
39
3.3. STRUCTURAL (WHITE-BOX) TESTING
25
Example 3.3.10 Consider the following function
(very very simple)
if x<0
def abs (x)
if (x<0):
x = -x
return x
T
F
a
x=-x
b
c
return x
We have two paths: ac and b.
path
ac
b
b
path
pred.
x<0
x≥0
x≥0
(we can test exhaustively here)
test case, e.g.
input output
−3
3
0
0
3
3
comment
edge case
regular
©: Michael Kohlhase
40
Path Testing Power
Example 3.3.11 Recall the power function
def power (x,y):
if(y<0):
p = -y; else
p = y
z=1.0
while (p != 0):
z = z * x;
p -= 1
if(y<0):
return 1.0 / z
else
return z
(simple)
if y<0
T a
b
F
p = y
p = -y
c
d
z=1.0
e
T
f
while p!=0
F
g
h
z=z*x
p-=1
if y<0
T
i
k
F
return 1/z return z
We have infinitely many paths: (ac|bd)e(f g) ∗ h(i|k)(path testing impossible)
Remark: Paths acehk and bdehi are unachievable.
(same predicate y > 0)
Example 3.3.12 (Statement/Branch Testing)
Choose complete paths that exercise all statements/branches (here the same)
path
acefghi
bdehk
bdefgfghk
path
predicate
x < 0, p = −y = 1
x ≥ 0,y = p = 0
x ≥ 0,y = p = 2
©: Michael Kohlhase
Coverage Testing in python
test case, e.g.
input
output
h−1, 1i
1
h0, 0i
1
h2, 2i
4
41
26
CHAPTER 3. SOFTWARE TESTING
python has a coverage testing tool: coverage.
install it with pip install coverage
run it from the command line: coverage test_prime3.py
or even coverage run --branch test_prime3.py
generate a nice html page coverage html
©: Michael Kohlhase
(for branch coverage)
(see it in a browser)
42
Chapter 4
Software Maintenance
Motivation
Software Maintenance
Programs and software systems are long-lived objects
(often decades)
©: Michael Kohlhase
n
te
st
in
m
g
ai
nt
en
an
ce
en
t
at
io
sig
en
qu
ire
m
[?] claims that 90% of costs
are in maintenance
n
0
de
and adapted to changing
requirements (that as well)
em
5
im
pl
they
are
continually
improved
(that introduces new bugs)
10
ts
15
re
4.1
43
Lehman’s Laws of Software Evolution
Context: A program that is written to perform some real-world activity; how
it should behave is strongly linked to the environment in which it runs.
Lehmann et al [?] identify a set of 8 laws about software evolution, including
Continuing Change – a program must be continually adapted or it becomes
progressively less satisfactory
Invariant Work Rate – the average effective global activity rate in an evolving program is invariant over the product’s lifetime.
Continuing Growth – the functional content of an E-type system must be
continually increased to maintain user satisfaction over its lifetime
27
28
CHAPTER 4. SOFTWARE MAINTENANCE
Declining Quality – the quality of a program will appear to be declining
unless it is rigorously maintained and adapted to operational environment
changes
Feedback System – program evolution processes constitute multi-level, multiloop, multi-agent feedback systems and must be treated as such to achieve
significant improvement over any reasonable base.
©: Michael Kohlhase
44
Lessons from Lehmanns Laws
Software maintenance is the elephant in the room
we need to get the maintenace phase right
Continuing Change ; need to manage software over time
release management
revision management
(distribute regular updates to the program)
regression testing
(compare old behavior to new one)
(keep access to all versions, track changes)
Feedback Cycle ; manage user feedback
find out what users really need/want
allow users to report failures.
Solution: Software Lifecycle Management Systems.
Example 4.1.1 GitHub or GitLab offer revision control, issue tracking and
project planning features.
©: Michael Kohlhase
4.2
45
Revision Control Systems
We address a very important topic for document management: supporting the document life-cycle
as a collaborative process. In this section we discuss how we can use a set of tools that have been
developed for supporting collaborative development of large program collections can be used for
document management.
We will first introduce the problems and current attempts at solutions and the introduce two
classes of revision control systems and discuss their paradigmatic systems.
4.2.1
Introduction/Motivation
Lifecycle Management for Digital Documents
Documents may have a non-trivial life-cycle involving multiple actors.
Example 4.2.1 For a novel we have the following stages:
4.2. REVISION CONTROL SYSTEMS
29
1. skeleton/layout
(chapters, characters, interactions)
2. first complete draft
(given out to test readers)
3. private editing cycle ; accepted draft(testing with more readers, refining/condensing the story)
4. publisher’s editing cycle ; final draft(professional editor proposes refinements to the draft)
5. copyediting for spelling, adherence of publisher’s house style
6. adding artwork/cover ; first published edition
7. e-dition (eBook) etc.
(different artwork, links, interactivity)
Example 4.2.2 For technical books, multiple editions follow to adapt them
to changing domain or correct errors.
©: Michael Kohlhase
46
Document Lifecycle Mgmt. & Collaboration Approaches
Practice: Send around MS Word documents by e-mail
(dates in file name)
Characteristics/Problems:
++ well-understood technology
(no training need)
– version tracking as a social process
(error prone)
– merging diverging versions is annoying
(manual process)
– archiving past versions optional/manual
(storage problems)
– no multifile support, no snapshots
Summary: only supports serial collaboration, no multifile support
start
time
δ1
D1
δ2
D2
finish
δ3
...
δn
Dn
larger teams ; more time wasted
©: Michael Kohlhase
47
Document Lifecycle Mgmt. & Collaboration Approaches
Practice: Put your documents on Dropbox or MS Sharepoint
Characteristics/Problems:
– local install of (proprietary) software
+ auto-synchronization between cloud and user copies upon save
+ auto-archiving past versions in cloud
30
CHAPTER 4. SOFTWARE MAINTENANCE
– merging diverging versions unsupported
(manual process)
– no multifile support, no snapshots
Summary: only supports serial collaboration
start
time
δ1
D1
δ2
D2
finish
δ3
...
δn
Dn
larger teams ; more time wasted
©: Michael Kohlhase
48
Document Lifecycle Mgmt. & Collaboration Approaches
Practice: Use etherpad, google docs or Office 365 for collaborative editing.
Characteristics/Problems:
+ browser-based, no installation necessary
+ real-time auto-synchronization between cloud and user copies
+ auto-archiving past versions in cloud
+ no diverging versions
– no multifile support, no snapshots
Summary: only supports serial collaboration
start
time
δ1
D1
δ2
D2
finish
δ3
...
δn
Dn
larger teams ; more time wasted
©: Michael Kohlhase
49
Document Lifecycle Mgmt. & Collaboration Approaches
Practice: Use version control system (for ASCII-based file formats)
Characteristics/Problems:
– special install, training necessary
– restricted to character/line-based formats
4.2. REVISION CONTROL SYSTEMS
31
+ user-initiated synchronization between cloud and user copies
+ auto-archiving past versions on server
++ multifile support, snapshots, merging support, tagging
Summary: supports parallel, branching collaboration
start
δ4
δ1
D1
δ2
δ6
D2
δ3
D4
δ5
D3
time
δ7
...
0
δn−3
...
...
δn−3
finish
δn−2
Dn−2
δn−1
δn0
Dn−1
0
δn−1
δn
Dn
Dn−3
larger teams ; large-scale parallelization/experimentation
©: Michael Kohlhase
4.2.2
50
Centralized Version Control
Centralized version control systems ti
Computing and Managing Differences with diff & patch
Definition 4.2.3 diff is a file comparison utility that computes differences
between two files f1 and f2 . Differences are output linewise in a diff file (also
called a patch), which can be applied to f1 to obtain f2 via the patch utility.
Example 4.2.4
The quick brown
fox jumps over
the lazy dog
The quack brown
fox jumps over
the loozy dog
1c1,2
< The
--> The
>
3c4
< the
--> the
quick brown
quack brown
lazy dog
loozy dog
Definition 4.2.5 A diff file consists of a sequence of hunks that in turn
consist of a locator which contrasts the source and target locations (in terms
of line numbers) followed by the added/deleted lines.
©: Michael Kohlhase
51
Merging Differences with merge3
There are basically two ways of merging the differences of files into one.
32
CHAPTER 4. SOFTWARE MAINTENANCE
Definition 4.2.6 In two-way merge, an automated procedure tries to combine two different files by copying over differences by guessing or asking the
user.
Definition 4.2.7 In three-way merge the files are assumed to be created by
changing a joint original (the parent) by editing. The merge3 tool examines
the differences and patterns appearing in the changes between both files as
well as the parent, building a relationship model to generate a new revision.
Usually, non-conflicting differences (affecting only one of the files) can directly
be copied over.
©: Michael Kohlhase
52
Definition 4.2.8 A revision control system is a software system that tracks the change process
of sets of files via a repository that stores the files’ revisions – the content of the files at the time
of a commit.
Users do not directly work on the repository, but on a working copy that is synchronized with
the repository by revision control actions
• checkout: creates a new working copy from the repository
• update: merges the differences between the base revision of the working copy and the revision
of the repository into the working copy.
• commit: transmits the differences between the repository revision and the working copy to
the repository, which registers them, patches the repository revision, and makes this the new
head revision
Version Control with Subversion
Definition 4.2.9 Subversion is a centralized revision control system that features
Central repository
(for current revision and reverse diffs)
Local working copies
(asynchronous checkouts, updates, commits)
They are kept synchronized by passing around diff differences and patching the
repository and working copies. Conflicts are resolved by (three-way) merge.
checkout O
LC1 (∅)
commit δ1
repository
update δ1
LC2 (O)
merge δ1
commit cr(δ1 , δ2 )
©: Michael Kohlhase
LC3 (O + δ2 )
53
4.2. REVISION CONTROL SYSTEMS
33
Collaboration with Subversion
Idea: We can use the same technique for collaboration between multiple working copies.
Diff-Based Collaboration:
...
W C 1 (O17 )
up
W C n (O19 )
up
ci
ci
R19
The Subversion system takes care of the synchronizeation:
you can only commit, if your revision is HEAD
If there are changes on the same line, you have a conflict.
update merges the changes into your working copy
©: Michael Kohlhase
4.2.3
(otherwise update)
54
Distributed Revision Control
Centralized vs. Distributed Version Control
Problem with Subversion:
we can only commit when online!
all collaboration goes via the repository
Idea: Distribute the Repositories and move differences between them.
pull
checkout
WC
1
1
δ
(O17
)
R (O17 )
commit
...
checkout
δ
W C n (O19
)
commit
pull
push
0
R1 (O19 )
push
pull
R19
headless
©: Michael Kohlhase
Distributed Version Control with git
55
34
CHAPTER 4. SOFTWARE MAINTENANCE
Definition 4.2.10 git is a distributed version control system t hat features
local repositories
(contains head and reverse diffs)
multiple remote repositories
changes from a remote repository can be pulled into the local one.
local working copies
local changes can pushed to a remote repository
(local commits)
(branches/forks)
Definition 4.2.11 There are various repository management systems that
facilitate providing repositories, e.g.
GitHub, a repository hosting service at http://GitHub.com(free public repositories)
GitLab, an open source repository management system (http://gitlab.org)
©: Michael Kohlhase
56
GitFlow: An Elaborate Development Model based on GIT
[?] suggests a development model with feature branches, . . .
©: Michael Kohlhase
4.3
57
Bug/Issue Tracking Systems
Bug/Issue Tracking Systems
Definition 4.3.1 A bug tracking system (also called bugtracker or issue tracking system) is a software application that keeps track of reported issues – i.e.
software bugs and feature requests – in software development projects.
4.3. BUG/ISSUE TRACKING SYSTEMS
35
Example 4.3.2 There are many open-source and commercial bugtrackers
bugzilla: http://bugzilla.org
GitHub: http://github.com
(simple Markdown syntax )
GitLab: http://gitlab.com
(open source version of GitHub)
(Mozilla’s bugtracker)
TRAC: http://trac.edgewall.org(+Wiki +Mgt. features, mostly for SVN)
JIRA: https://www.atlassian.com/software/jira
©: Michael Kohlhase
(proprietary)
58
The Anatomy of an Issue (How to Write a Good One)
Components of an bug report
title: a short and descriptive overview
(one line)
description: a precise description of the expected and actual behavior, giving exact reference to the component, version, and environment in which
the bug occurs.
(bugs must be reproducible and localizable)
attachment: e.g. a screen shot, set of inputs, etc.
Example 4.3.3 (A bad bug report description)
My browser crashed. I think I was on foo.com. I think that this is a really bad
problem and you should fix it or else nobody will use your browser.
Example 4.3.4 (A good one)
I crash each time I go to foo.com (Mozilla build 20000609, Win NT 4.0SP5).
This link will crash Mozilla reproducibly unless you remove the border=0 attribute:
<IMG SRC="http://foo.com/topicfoos.gif" width=34 border=0 alt="News">
Remember: developers are also human
(try to minimize their work)
Components of a feature request: like a bug, but only expected behavior
©: Michael Kohlhase
59
Bugtracker Workflow
Typical Workflow: supported by bugtrackers
user reports issue
QA engineer triages issues – classification, remove duplicates, identify dependencies, tie to component, . . .
developer accepts or re-assigns issue
bug fixing
(files report in the system)
other users extend/discuss/up/downvote issue
(fixes who is responsible primarily)
project planning by identification of sub-issues, dependencies (new issues)
(design, implementation, testing)
36
CHAPTER 4. SOFTWARE MAINTENANCE
issue landing
bug closure
(sign-off, integration into code base)
release of the fix
(in the next revision)
Administrative Metadata: to make these workflows work
issue number: for referencing with e.g. #15
comments: a discussion thread focused on this issue.
labels: for specializing bug search
resolution for fixed bugs
assignee: a developer currently responsible
participants: people who get notified of changes/comments
status: e.g. one of new, assigned, fixed/closed, reopened.
FIXED: source updated and tested
INVALID: not a bug in the code
WONTFIX: “feature”, not a bug
DUPLICATE: already reported elsewhere; include reference
WORKSFORME: couldnt reproduce issue
dependencies: which bugs does this one depend on/block?
©: Michael Kohlhase
60
Dependency Graph of a Firefox Issue in Bugzilla
©: Michael Kohlhase
61
Chapter 5
Security by Encryption
5.1
Introduction to Crypto-Systems
There are various ways to ensure security on networks: one is just to cease all traffic (not a very
attractive one), another is to make the information on the network inaccessible by physical means
(e.g. shielding the wires electrically and guarding them by a large police force). Here we want
to look into “security by encryption”, which makes the content of Internet packets unreadable by
unauthorized parties. We will start by reviewing the basics, and work our way towards a secure
network infrastructure via the mathematical foundations and special protocols.
Security by Encryption
Problem: In open packet-switched networks like the Internet, anyone
can inspect the packets
(and see their contents via packet sniffers)
create arbitrary packets
(and forge their metadata)
can combine both to falsify communication
(man-in-the-middle attack)
In “dedicated line networks” (e.g. old telephone) you needed switch room
access.
But there are situations where we want our communication to be confidential,
Internet Banking(obviously, other criminals would like access to your account)
Login to Campus.net(wouldn’t you like to know my password to “correct” grades?)
Whistle-blowing(your employer should not know what you sent to WikiLeaks)
The Situation: Alice wants to communicate with Bob privately, but Eve(sdropper)
can listen in
37
38
CHAPTER 5. SECURITY BY ENCRYPTION
Eve
Alice
Bob
Idea: Encrypt packet content
(so that only the recipients can decrypt)
an build this into the fabric of the Internet (so that users don’t have to know)
©: Michael Kohlhase
62
Encryption: Terminology & Examples
Definition 5.1.1 Encryption is the process of transforming information (referred to as plaintext) using an algorithm to make it unreadable to anyone
except those possessing special knowledge, usually referred to as a key. The
result of encryption is called ciphertext, and the reverse process that transforms
ciphertext to plaintext: decryption. We call a method for encryption/decryption a cipher.
Definition 5.1.2 The corresponding science is called cryptology, it has two
areas of study: cryptography (encryption/decryption via ciphers) and code
breaking/cryptoanalysis: decrypting ciphertexts without a key or recovering
keys from ciphertexts.
Example 5.1.3 (Spartan encryption (since ca. 700 BC))
The oldest (military) encryption method is a scytale
– a wooden stick of defined diameter, onto which a
strip of parchment with letters can be wrapped to
reveal the plaintext. Here the stick is the key and
the parchment strip the ciphertext.
Example 5.1.4 (The Caesar Cipher)
Shift the letters of the alphabet by n letters to the
right. Julius Caesar (first
mention) used 3, Augustus
1. Support by hardware.
Example 5.1.5 (Don’t forget your Bank Card PIN)
5.1. INTRODUCTION TO CRYPTO-SYSTEMS
39
Write the encoded PIN number to the card, here
complete each digit to 9. PIN = 5315
©: Michael Kohlhase
63
Code-Breaking, e.g. by Frequency-Analysis
Letters (bigrams,
trigrams,. . . )
in English come in characteristic
frequencies.
(ETAOINSHRDLU)
Use those to to decode a cipher text:
most frequent character represents an
“E”, the second most frequent a “T”,
...
this works well for simple substitution
ciphers
Data Paradox: Deciphering longer texts is often easier than short ones.
Lesson for Encryption: Change your cipher often
©: Michael Kohlhase
(minimize data)
64
The simplest form of encryption (and the way we know from spy stories) uses uses the same key
for encryption and decryption.
Symmetric Key Encryption
Definition 5.1.6 Symmetric-key cryptosystems are a class of cryptographic
algorithms that use essentially identical keys for both decryption and encryption.
Example 5.1.7 Permute the ASCII table by a bijective function ϕ : {0, . . . , 127} →
{0, . . . , 127} (ϕ is the shared key)
Example 5.1.8 The AES algorithm (Advanced Encryption Standard) [AES01]
is a widely used symmetric-key algorithm that is approved by US government
organs for transmitting top-secret information.
(efficient but safe)
AES is safe: For AES-128/192/256, recovering the key takes 2126.1 /2189.7 /2254.4
steps respectively.
(38/57/78 digit numbers)
Note: For trusted communication sender and recipient need access to shared
key.
Problem: How to initiate safe communication over the internet?(far, far apart)
Need to exchange shared key
(chicken and egg problem)
Pipe dream: Wouldn’t it be nice if I could just publish a key publicly and use
that?
40
CHAPTER 5. SECURITY BY ENCRYPTION
Actually: this works, just (obviously) not with symmetric-key encryption.
©: Michael Kohlhase
5.2
65
Public Key Encryption
To get around the chicken-and-egg problem of secure communication we identified above, we
will introduce a more general way of encryption: one where we allow the keys for encryption
and decryption to be different. This liberalization allows us to enter into a whole new realm of
applications.
The following presentation is based on the one in [Con12]
Diffie/Hellmann Key Exchange 1c
Agree on a joint base color, here
joint:
Eve
joint:
joint:
Alice
Bob
©: Michael Kohlhase
66
Diffie/Hellmann Key Exchange 2c
randomly pick a private color ( / ).
joint:
Eve
priv:
priv:
joint:
joint:
Alice
Bob
©: Michael Kohlhase
Diffie/Hellmann Key Exchange 3c
67
5.2. PUBLIC KEY ENCRYPTION
41
mix private color into the joint base color
( / ) to the partner
to disguise it and send mixtures
joint:
alice:
bob:
key: ??
Eve
priv:
priv:
joint:
bob:
joint:
alice:
Alice
Bob
©: Michael Kohlhase
68
Diffie/Hellmann Key Exchange 4c
mix your own private color to get the key
joint:
alice:
bob:
key: ??
Eve
priv:
key:
priv:
key:
joint:
bob:
joint:
alice:
Alice
Bob
Note: Eve cannot determine the shared key
the private colors.
Two success factors to this trick!
, since she would need one of
mixing colors is associative and commutative
(order/grouping irrelevant)
mixing colors is much simpler than getting the original colors back from
mixture
©: Michael Kohlhase
69
A numeric one-way function
We need a one-way function for numbers to compute numeric keys.
Idea: Take the discrete logarithm.
Definition 5.2.1 (Recap) We say that a is congruent to b modulo m, iff
an = b and 0 ≤ b < m.
42
CHAPTER 5. SECURITY BY ENCRYPTION
Idea: We can do arithetics modulo: 5 + 4 ≡ 2 mod 7 or 34 ≡ 1 mod 8.
Theorem 5.2.2 (A useful Fact) If p is prime and b is a primitive root of
n, then the bx mod p distribute evenly over 0 ≤ x < p.
Definition 5.2.3 Let p be a prime number, b a primitive root of n, and
bx ≡ y mod pthen we call x the discrete logarithm of y modulo p for the base
k.
Observation 5.2.4 The discrete logarithm is very hard to compute: essentially p times the steps as for the discrete power.
(generate and test)
Corollary 5.2.5 The discrete logarithm is a one-way function. (for large p)
©: Michael Kohlhase
70
Diffie/Hellmann Key Exchange 1
Agree on a modulus m and a base b, e.g. p = 17, b = 3
joint: p = 17, b = 3
Eve
joint: p = 17, b = 3
joint: p = 17, b = 3
Alice
Bob
©: Michael Kohlhase
71
Diffie/Hellmann Key Exchange 2
randomly pick a private exponent (e = 54/e = 24), the private key.
joint: p = 17, b = 3
Eve
priv: 24
priv: 54
joint: p = 17, b = 3
joint: p = 17, b = 3
Alice
Bob
©: Michael Kohlhase
72
5.2. PUBLIC KEY ENCRYPTION
43
Diffie/Hellmann Key Exchange 3
send be mod p (the public key) to the partner (354 ≡ 15 mod 17 and 324 ≡ 16 mod 17)
joint:
alice:
bob:
key:
p = 17, b = 3
15
16
??
Eve
priv: 24
priv: 54
joint: p = 17, b = 3
bob: 16
joint: p = 17, b = 3
alice: 15
Alice
Bob
©: Michael Kohlhase
73
Diffie/Hellmann Key Exchange 4
raise the partner’s public key to your private key
Alice: 1654 ≡ 324
54
54 24
Bob: 1524 ≡ 3
≡ 354·24 ≡ 1 mod 17
≡ 324·54 ≡ 1 mod 17
joint:
alice:
bob:
key:
p = 17, b = 3
15
16
??
Eve
priv: 54
key: 1
joint: p = 17, b = 3
bob: 16
priv: 24
key: 1
joint: p = 17, b = 3
alice: 15
Alice
Bob
Note: Eve cannot determine the shared key 1, since she would need one of the
private keys.
Two success factors to this trick!
discrete exponentiation is associative and commutative (order/grouping irrelevant)
discrete logarithm is a one-way function.
©: Michael Kohlhase
74
Public Key Encryption
Definition 5.2.6 In an asymmetric-key cryptosystem, the key needed to encrypt a message is different from the key for decryption. Such a method is
44
CHAPTER 5. SECURITY BY ENCRYPTION
called a public-key cryptosystem if the the decryption key (the private key) is
very difficult to reconstruct from encryption key (called the public key). We
speak of a (cryptographic) key pair.
Asymmetric cryptosystems are based on trap door functions: one-way functions
that can (only) inverted with a suitable key.
trap door functions are usually based on primd factorization.
©: Michael Kohlhase
75
Applications of Public-Key Kryptosystems
Preparation: Create a cryptographic key pair and publishe the public key.
(always keep the private key confidential!)
Application: Confidential Messaging:
To send a confidential message the sender encrypts it
using the intended recipient’s public key; to decrypt the
message, the recipient uses the private key.
Application: Digital Signatures:
A digital signature consists of a plaintext together with
its ciphertext – encoded with the sender’s private key.
A message signed with a sender’s private key can be
verified by anyone who has access to the sender’s public
key, thereby proving that the sender had access to the
private key (and therefore is likely to be the person
associated with the public key used), and the part of
the message that has not been tampered with.
©: Michael Kohlhase
76
The confidential messaging is analogous to a locked mailbox with a mail slot. The mail slot is
exposed and accessible to the public; its location (the street address) is in essence the public key.
Anyone knowing the street address can go to the door and drop a written message through the
slot; however, only the person who possesses the key can open the mailbox and read the message.
An analogy for digital signatures is the sealing of an envelope with a personal wax seal. The
message can be opened by anyone, but the presence of the seal authenticates the sender.
Note: For both applications (confidential messaging and digitally signed documents) we have only
stated the basic idea. Technical realizations are more elaborate to be more efficient. One measure
for instance is not to encrypt the whole message and compare the result of decrypting it, but only
a well-chosen excerpt.
Let us now look at the mathematical foundations of encryption. It is all about the existence of
natural-number functions with specific properties. Indeed cryptography has been a big and somewhat unexpected application of mathematical methods from number theory (which was perviously
thought to be the ultimate pinnacle of “pure math”.)
5.2. PUBLIC KEY ENCRYPTION
45
Encryption by Trapdoor Functions
Idea: Mathematically, encryption can be seen as an injective function. Use
functions for which the inverse (decryption) is difficult to compute.
Definition 5.2.7 A one-way function is a function that is “easy” to compute
on every input, but “hard” to invert given the image of a random input.
In theory: “easy” and “hard” are understood wrt. computational complexity
theory, specifically the theory of polynomial time problems. E.g. “easy” =
ˆ
O(n) and “hard” =
ˆ Ω(2n )
ˆ to P = N P conjecture)
Remark: It is open whether one-way functions exist (≡
In practice: “easy” is typically interpreted as “cheap enough for the legitimate
users” and “prohibitively expensive for any malicious agents”.
Definition 5.2.8 A trapdoor function is a one-way function that is easy to
invert given a piece of information called the trapdoor.
Example 5.2.9 Consider a padlock, it is easy to change from “open” to
closed, but very difficult to change from “closed” to open unless you have a
key (trapdoor).
©: Michael Kohlhase
77
Of course, we need to have one-way or trapdoor functions to get public key encryption to work.
Fortunately, there are multiple candidates we can choose from. Which one eventually makes it
into the algorithms depends on various details; any of them would work in principle.
Candidates for one-way/trapdoor functions
Multiplication and Factoring: The function f takes as inputs two prime numbers p and q in binary notation and returns their product. This function can
be computed in O(n2 ) time where n is the total length (number of digits)
of the inputs. Inverting this function requires finding the factors of a given
integer N . The best factoring algorithms known for this problem run in time
1
2
2O(log(N ) 3 log(log(N )) 3 ) .
Modular squaring and square roots: The function f takes two positive integers
x and N , where N is the product of two primes p and q, and outputs x2 div N .
Inverting this function requires computing square roots modulo N ; that is,
given y and N , find some x such that x2 mod N = y. It can be shown that
the latter problem is computationally equivalent to factoring N (in the sense
of polynomial-time reduction)
(used in RSA encryption)
Discrete exponential and logarithm: The function f takes a prime number p
and an integer x between 0 and p − 1; and returns the 2x div p. This discrete
exponential function can be easily computed in time O(n3 ) where n is the
number of bits in p. Inverting this function requires computing the discrete
logarithm modulo p; namely, given a prime p and an integer y between 0 and
p − 1, find x such that 2x = y.
46
CHAPTER 5. SECURITY BY ENCRYPTION
©: Michael Kohlhase
78
To see whether these trapdoor function candidates really behave as expected, RSA laboratories,
one of the first security companies specializing in public key encryption has established a series of
prime factorization challenges to test the assumptions underlying public key cryptography.
Example: RSA-129 problem
Definition 5.2.10 Call a number semi-prime, iff it has exactly two prime
factors.
These are exactly the numbers involved in RSA encryption.
RSA laboratories initiated the RSA challenge, to see whether multiplication is
indeed a “practical” trapdoor function
Example 5.2.11 (The RSA129 Challenge) is to factor the semi-prime
number on the right
So far, the challenges up to ca 200 decimal digits have been factored, but all
within the expected complexity bounds.
but: would you report an algorithm that factors numbers in low complexity?
©: Michael Kohlhase
79
Note that all of these test are run on conventional hardware (von Neumann architectures); there
have been claims that other computing hardware; most notably quantum computing or DNA computing might have completely different complexity theories, which might render these factorization
problems tractable. Up to now, nobody has been able to actually build alternative computation
hardware that can actually even attempt to solve such factorization problems (or they are not
telling).
Classical- and Quantum Computers for RSA-129
5.3. INTERNET SECURITY BY ENCRYPTION
©: Michael Kohlhase
47
80
This concludes our excursion into theoretical aspects of encryption, we will now turn to the task
of building these ideas into existing infrastructure of the Internet and the WWWeb. The most
obvious thing we need to do is to publish public keys in a way that it can be verified to whom
they belong.
5.3
Internet Security by Encryption
Public Key Certificates
Definition 5.3.1 A public key certificate is an electronic document which
uses a digital signature to bind a public key with an identity, e.g. the name of
a person or an organization.
Idea: If we trust the signatory’s signature, then we can use the certificate
to verify that a public key belongs to an individual. Otherwise we verify the
signature using the signatory’s public key certificate.
Problem: We can ascend the ladder of trust, but in the end we have to trust
someone!
In a typical public key infrastructure scheme, the signature will be of a certificate authority, an organization chartered to verify identity and issue public key
certificates.
In a “web of trust” scheme, the signature is of either the user (a self-signed certificate) or other users (“endorsements”). (e.g. PGP =
ˆ Pretty Good Privacy)
on a UNIX system, you can create a certificate (and associated private key)
e.g. with
(Windows similar ; Google)
openssl ca -in req.pem -out newcert.pem
©: Michael Kohlhase
81
48
CHAPTER 5. SECURITY BY ENCRYPTION
Building on the notion of a public key certificate, we can build secure variants of the applicationlevel protocols. Of course, we could do this individually for every protocol, but this would duplicate
efforts. A better way is to leverage the layered infrastructure of the Internet and build a generic
secure transport-layer protocol, that can be utilized by all protocols that normally build on TCP
or UDP.
Building Security in to the WWWeb Infrastructure
Idea: Build Encryption into the WWWeb infrastructure (make it easy to use)
; Secure variants of the application-level protocols that encrypt contents
Definition 5.3.2 Transport layer security (TLS) is a cryptographic protocol
that encrypts the segments of network connections at the transport layer, using
asymmetric cryptography for key exchange, symmetric encryption for privacy,
and message authentication codes for message integrity.
TLS can be used to make application-level protocols secure.
©: Michael Kohlhase
82
Let us now look at bit closer into the structure of the TLS handshake, the part of the TLS protocol
that initiates encrypted communication.
A TLS Handshake between Client and Server
Definition 5.3.3 A TLS handshake authenticates a server and provides a
shared key for symmetric-key encryption. It has the following steps
1. Client presents a list of supported encryption methods
2. Server picks the strongest and tells client
(C/S agree on method)
3. Server sends back its public key certificate
(name and public key)
4. Client confirms certificate with CA
(authenticates Server if successful)
5. Client picks a random number, encrypts that (with servers public key) and
sends it to server.
6. Only server can decrypt it (using its private key)
7. Now they both have a shared secret
(the random number)
8. From the random number, both parties generate key material
Definition 5.3.4 A TLS connection is a transport-layer connection secured
by symmetric-key encryption. Authentication and keys are established by a TLS
handshake and the connection is encrypted until it closes.
©: Michael Kohlhase
83
The reason we switch from public key to symmetric encryption after communication has been
initiated and keys have been exchanged is that symmetric encryption is computationally more
efficient without being intrinsically less secure.
But there is more to the integration of encryption into the WWWeb, than just enabling secure
transport protocols. We need to extend the web servers and web browsers to implement the
secure protocols (of course), and we need to set up a system of certification agencies, whose public
keys are baked into web servers (so that they can check the signatures on public keys in server
5.3. INTERNET SECURITY BY ENCRYPTION
49
certificates). Moreover, we need user interfaces that allow users to inspect certificates, and grant
exceptions, if needed.
Building Security in to the WWWeb Infrastructure
Definition 5.3.5 HTTP Secure (HTTPS) is a variant of HTTP that uses
TLS for transport. HTTPS URIs start with https://
Server Integration:
All common
web
servers
support
HTTPS
on port 443 (default),
but
need a public key certificate.
(self-sign one or buy one from a CA)
Browser Integration: All common
web browsers support HTTPS and
give access to certificates
©: Michael Kohlhase
84
Confidential E-Mail with Digital Signatures
Hey: That was a nice theoretical exercise, how can I use that in practice?
Example 5.3.6 (Secure E-Mail)
Adding PGP (Pretty Good Privacy; an open source cryptosystem) to Thunderbird
(tools & addons & get addons)
add the enigmail addon to thunderbird
let that generate a public/private key for your e-mail account(give password)
let it install GnuPGP
(the actual cryptosystem)
Done! little sign/encrypt buttons appear on the lower right of
your composition window.
Your mail visible to strangers
(here test mail to myself)
50
CHAPTER 5. SECURITY BY ENCRYPTION
Note that the transport metadata on top are not encrypted
Your mail after authentification
(here test mail to myself)
Note the verified signature shown on top.
©: Michael Kohlhase
85
The Web of Trust
Recap: We can only verify a signature, if we have, if we have a PKI certificate.
Cost Problem: PKI certificates are expensive!(and authority needs to know you)
Centrality Problem: What happens if a PKI authority has been compromised?
Idea: instead of self-signed PKI certificates, mutually sign certificates in a
“Web of Trust”
costs are minimal
(we already know each other)
no central point of failure
©: Michael Kohlhase
(more resilient)
86
Acknowledgement: The following presentation is adapted from [Rya]
The Web of Trust for the Three Musketeers
D’Artagnan arrives at Paris, and has a duel with Athos, Porthos, and Aramis,
but they learn to trust each other against the guards of Cardinal Richilieu.
To seal their friendship, they decide to exchange their public keys.
Definition 5.3.7 A key is called valid, iff it belongs to the individual it claims
to belong to.
5.3. INTERNET SECURITY BY ENCRYPTION
51
To certify validity, the four friends also sign each-other’s public key.
Example 5.3.8 d’Artagnan signs Portos’ key to say that I, d’Artagnan,
vouch that this key belongs to Porthos by adding my signature to it.
The musketeers also trust each other to make introductions.
The situation from d’Artagnan’s perspective
d’Artagnan
t:ultimate v:ultimate
Athos
t:full v:full
Porthos
t:full v:full
Aramis
t:full v:full
©: Michael Kohlhase
87
Porthos sends over Planchet as his new valet
d’Artagnan can verify that Planchet is who he says he is, because his key bears
Porthos’ signature.
(d’Artagnan has full trust in Porthos)
d’Artagnan signs Planchet’s key
d’Artagnan
t:ultimate v:ultimate
Athos
t:full v:full
Porthos
t:full v:full
Aramis
t:full v:full
Planchet
t:unknown v:full
©: Michael Kohlhase
88
Setting (personal) Trust for other Keys
Validity can be computed, but trust (in validity of introduced keys) must be
set personally
d’Artagnan sets the trust in all the valets to “marginal”
52
CHAPTER 5. SECURITY BY ENCRYPTION
d’Artagnan
t:ultimate v:ultimate
Athos
t:full v:full
Grimaud
t:marginal
v:full
Porthos
t:full v:full
Mousqueton
t:marginal
v:full
Aramis
t:full v:full
Planchet
t:marginal
v:full
©: Michael Kohlhase
Bazin
t:marginal
v:full
89
Understanding Marginal Trust
Rule 5.3.9 No escalation of trust levels
Planchet introduces d’Artagnan to their landlord M. Bonacieux
d’Artagnan
t:ultimate v:ultimate
Athos
t:full v:full
Grimaud
t:marginal
v:full
Porthos
t:full v:full
Mousqueton
t:marginal
v:full
Aramis
t:full v:full
Planchet
t:marginal
v:full
Bazin
t:marginal
v:full
Bonacieux
t:unknown
v:marginal
©: Michael Kohlhase
90
Understanding Marginal Trust
All other valets also vouch for M. Bonacieux by signing his key
Rule 5.3.10 3 marginal =
ˆ 1 full
5.3. INTERNET SECURITY BY ENCRYPTION
53
d’Artagnan
t:ultimate v:ultimate
Athos
t:full v:full
Grimaud
t:marginal
v:full
Porthos
t:full v:full
Mousqueton
t:marginal
v:full
Aramis
t:full v:full
Planchet
t:marginal
v:full
Bazin
t:marginal
v:full
Bonacieux
t:unknown
v:full
©: Michael Kohlhase
91
But Technical Aspects of Security are not the only ones. . .
©: Michael Kohlhase
92
54
CHAPTER 5. SECURITY BY ENCRYPTION
Bibliography
[AES01]
Announcing the ADVANCED ENCRYPTION STANDARD (AES), 2001.
[Bro75]
Fred Brooks. The Mythical Man-Month. Addison-Wesley, 1975.
[Con12]
Jamie Condliffe. Easily understand encryption using. . . paint and clocks? Gizmodo,
2012.
[Dri10]
Vincent Driessen. A successful git branching model. online at http://nvie.com/
posts/a-successful-git-branching-model/, 2010.
[Knu]
Jeff Knupp.
Improve your python:
Understanding unit testing.
Web tutorial at http://www.jeffknupp.com/blog/2013/12/09/
improve-your-python-understanding-unit-testing/.
[LRW+ 97] Meir M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, and W. M. Turski. Metrics
and laws of software evolution – the nineties view. In Proc. 4th International Software
Metrics Symposium (METRICS ’97), pages 20–32, 1997.
[Rya]
Konstantin Ryabitsev.
PGP Web of Trust:
Core concepts behind
trusted
communication.
http://www.linux.com/learn/tutorials/
760909-pgp-web-of-trust-core-concepts. seen 2014-09-20.
55
Index
achieve, 24
alpha
testing, 22
asymmetric-key
cryptosystem, 39
attribute, 4
authority
certificate, 43
availability, 4
big
statement, 24
cryptoanalysis, 34
breaking
code, 34
code
breaking, 34
cryptography, 34
cryptology, 34
cryptosystem
asymmetric-key, 39
public-key, 40
bang, 21
beta
testing, 22
big
bang, 21
functional
testing, 12
testing
functional, 12
block
process, 23
bottom-up, 21
branch
testing, 24
case
test, 13
certificate
authority, 43
checkout, 30
cipher, 34
ciphertext, 34
commit, 30, 31
confidentiality, 4
control
flow
graph, 23
control
revision (system), 30
copy
working, 30
coverage
decision, 24
path, 24
decision, 23
decision
coverage, 24
decryption, 34
dependable, 3
diff
file, 30
patch, 30
digital
signature, 40
discrete
logarithm, 38
door
trap (function), 40
dynamic
testing, 12
error, 4
failure, 5
bug, 4
fault, 4
file
diff, 30
flow
control (graph), 23
forecasting, 6
function
one-way, 41
trapdoor, 41
hunk, 30
InfoSec, 3
56
INDEX
Information
security, 3
security
Information, 3
integrity, 4
junction, 23
key, 34
key
public (certificate), 43
key
pair, 40
private, 40
public, 40
logarithm
discrete, 38
maintainability, 4
means, 5
merge
two-way, 30
two-way
merge, 30
merge
three-way, 30
three-way
merge, 30
monkey
patch, 19
one-way
function, 41
pair
key, 40
parent, 30
patch
monkey, 19
path
predicate, 24
sensitization, 24
path
coverage, 24
testing, 24
plaintext, 34
predicate
path, 24
prevention, 5
private
key, 40
process
block, 23
public
57
key
certificate, 43
public
key, 40
public-key
cryptosystem, 40
push, 32
reliability, 4
remote, 32
removal, 5
repository, 30
revision, 30
revision
control
system, 30
safety, 4
sandwich, 21
scytale, 34
semi-prime, 42
sensitization
path, 24
signature
digital, 40
software
testing, 11
specification, 13
requirements
software (specification), 13
software
requirements
specification, 13
statement
coverage, 24
testing, 24
static
testing, 12
suite
test, 13
test
case, 13
suite, 13
testing
alpha, 22
beta, 22
testing
software, 11
testing
dynamic, 12
static, 12
testing
branch, 24
58
path, 24
statement, 24
threat, 4
tolerance, 6
top-down, 21
trap
door
function, 40
trapdoor, 41
trapdoor
function, 41
UAT, 22
acceptance
user (testing), 22
user
acceptance
testing, 22
unachievable, 24
update, 30
valid, 46
structural
testing, 12
testing
structural, 12
working
copy, 30
INDEX