68. Workshop ¨ uber Algorithmen und Komplexit¨ at

68. Workshop u
¨ ber
Algorithmen und Komplexit¨
at
(Theorietag)
Friedrich-Schiller-Universit¨
at Jena
13. November 2014
Zusammenfassungen der Vortr¨
age
(zusammengestellt von Martin Mundhenk)
Vorl¨
aufiges Programm
9:40–10:05 Markus L. Schmid (Trier):
Pattern matching with variables
10:05–10:30 Florin Manea (Kiel):
Pattern matching with variables: fast algorithms and new hardness results
10:45–11:10 Katrin Casel (Trier):
Mathematische Modelle zur Anonymisierung von Mikrodaten
11:10–11:35 Andr´e Nichterlein (TU Berlin):
On combinatorial anonymization
11:35–12:00 Manuel Malatyali (Paderborn):
Online top-k-position monitoring of distributed data streams
13:00–14:00 Martin Dietzfelbinger (Ilmenau):
Zuf¨alligkeit in Hashfunktionen: Konstruktionen und Anwendungen
14:15–14:40 Christian Komusiewicz (TU Berlin):
Polynomial-time data reduction for the subset interconnection design problem
14:40–15:05 Manuel Sorge (TU Berlin):
The minimum feasible tileset problem
15:05–15:30 Arne Meier (Hannover):
Parameterized complexity of CTL: a generalization of Courcelle’s theorem
15:50–16:15 Moritz Gobbert (Trier):
Die Komplexit¨at von Latrunculi
16:15–16:40 Pascal Lenzner (Jena):
Selfish network creation – dynamics and structure
16:40–17:05 Stefan Kratsch (TU Berlin):
Losing weight is easy
Mathematische Modelle zur Anonymisierung
von Mikrodaten
Katrin Casel
Universit¨at Trier
Elektronische Aufzeichnungen vertraulicher pers¨onlicher Daten existieren in vielen Bereichen, z.B. in Form digitalisierter medizinischer Akten. Diese Informationen bilden einerseits wertvolle Ressourcen f¨
ur die Forschung, in den falschen H¨anden aber andererseits eine
bedenkliche Verletzung der Privatsph¨are. In Deutschland d¨
urfen derartige Daten daher nur
ver¨offentlicht werden, wenn Einzelangaben nur mit einem unverh¨altnism¨aßig großen Auf”
wand an Zeit, Kosten und Arbeitskraft zugeordnet werden k¨onnen“(§16 Abs. 6 BStatG).
Ein solcher Anonymisierungsgrad ben¨otigt eine Bearbeitung individueller Originaldaten
(Mikrodaten), die u
¨ber das reine L¨oschen eindeutiger Attribute wie Name oder Steueridentifikationsnummer hinausgeht.
In den letzten Jahrzehnten entstanden viele unterschiedliche Methoden zur Anonymisierung mit haupts¨achlich heuristischen L¨osungsverfahren. Abstrakte Modellierungen erm¨oglichen den Vergleich unterschiedlicher Ans¨atze und er¨offnen neue Wege zur Probleml¨osung.
Konkret lassen sich viele Generalisierungsmethoden (Vergr¨oberung der Mikrodaten) als
spezielle Cluster-Probleme auf Graphen und bestimmte Zugriffseinschr¨ankungen als kombinatorische Probleme auf Matrizen modellieren. Eine Untersuchung der Parallelen und
Unterschiede zu bekannten Problemen wie k-center, k-colorability, set-cover, etc. ergibt
neue L¨osungsans¨atze und erlaubt eine Klassifizierung der Anonymisierungsmethoden hinsichtlich Approximierbarkeit und (parametrisierter) Komplexit¨at.
1
Zuf¨
alligkeit in Hashfunktionen:
Konstruktionen und Anwendungen
Martin Dietzfelbinger
TU Ilmenau
¨
Der Vortrag gibt einen Uberblick
u
¨ber neuere Konstruktionen und Analysen von Hashfunktionen (f¨
ur Datenstrukturen – nicht kryptographische Hashfunktionen!) mit guten
und sehr guten Zuf¨alligkeitseigenschaften und stellt einige Anwendungen vor, die von so
starken Hashfunktionen profitieren. Dies sind W¨orterb¨
ucher (Dictionaries), die Darstellung von Funktionen mit konstanter Auswertezeit, extrem platzeffiziente perfekte Hashfunktionen und die Simulation voll zuf¨alliger, also idealer Hashfunktionen. Interessant ist
der Methodenmix: Wahrscheinlichkeitsrechnung, Lineare Algebra und die Theorie der Zufalls(hyper)graphen werden kombiniert.
2
Die Komplexit¨
at von Latrunculi
Moritz Gobbert
Universit¨at Trier
[email protected]
Ludus Latrunculorum — auch Latrunculi genannt — ist ein altes R¨omisches Spiel, dessen
exakte Regeln nicht vollst¨andig u
¨berliefert sind. Aufgrund dieser Unstimmigkeiten gibt
es unterschiedliche Rekonstruktionen des Regelwerks. Inhalt des Vortrags ist die Frage
nach der Komplexit¨at des Spiels. Zuerst wird kurz auf den geschichtlichen Hintergrund
des Spieles eingegangen. Danach wird ein bestimmtes Regelwerk vorgestellt, welches vielen modernen Beschreibungen des Spiel entspricht. Dann wird gezeigt, dass schon alleine
die Frage, ob Spieler A einen bestimmten Spielstein so ziehen kann, dass dieser einen bestimmten Spielstein von Spieler B schl¨agt, N P-vollst¨andig ist. Dies steht im Kontrast zu
vielen anderen Spielen wie z. B. Schach oder Go, bei denen analoge Fragestellungen in
Polynomzeit entscheidbar sind. Zum Schluss des Vortrags wird ein Ausblick auf einige,
noch offene Fragen bez¨
uglich Latrunculi gegeben.
Keywords: Komplexit¨at, Latrunculi, Ludus Latrunculorum, R¨omisches Spiel.
3
Polynomial-time Data Reduction for
the Subset Interconnection Design Problem
Jiehua Chen
Christian Komusiewicz
Rolf Niedermeier
Manuel Sorge
Ondˇrej Such´
y
Mathias Weller
Fakult¨
at f¨
ur Softwaretechnik und Theoretische Informatik, TU Berlin, Germany
Email: [email protected]
The NP-hard Subset Interconnection Design problem, also known as Minimum
Topic-Connected Overlay, is motivated by numerous applications including the design of scalable overlay networks and vacuum systems. It has as input a finite set V and
a collection of subsets V1 , V2 , . . . , Vm ⊆ V , and asks for a minimum-cardinality edge set E
such that for the graph G = (V, E) all induced subgraphs G[V1 ], G[V2 ], . . . , G[Vm ] are
connected. We study Subset Interconnection Design in the context of polynomialtime data reduction rules that preserve the possibility to construct optimal solutions. Our
contribution is threefold: First, we show the incorrectness of earlier polynomial-time data
reduction rules. Second, we show linear-time solvability in case of a constant number m
of subsets, implying fixed-parameter tractability for the parameter m. Third, we provide
a fixed-parameter tractability result for small subset sizes and tree-like output graphs. To
achieve our results, we elaborate on polynomial-time data reduction rules which also may
be of practical use in solving Subset Interconnection Design.
4
Losing weight is easy
Michael Etscheid & Matthias Mnich & Heiko R¨oglin
Universit¨at Bonn
Stefan Kratsch
TU Berlin
The talk discusses some aspects of having large numbers and weights in the inputs of combinatorial problems. Pivotal problems in this regard are Subset Sum and Knapsack
but also weighted versions of classical NP-hard problems like Vertex Cover as well as
problems related to integer linear programs. Shrinking numbers to small size is an important task in kernelization, where one studies provable bounds on efficient simplification of
NP-hard problems.
We recall and lightly discuss some fairly recent attempts at coping with large numbers
in Subset Sum and Knapsack. Further progress beyond these results was the topic of
several open problems in kernelization. The last part of the talk shows how an almost 30
year old theorem single-handedly defeats the open problems.
5
Selfish Network Creation – Dynamics and
Structure
Pascal Lenzner
Department of Computer Science
Friedrich-Schiller-University Jena
[email protected]
Many important networks, most prominently the Internet, are not designed and administrated by a central authority. Instead, such networks have evolved over time by (repeated)
uncoordinated interaction of selfish agents which control and modify parts of the network.
The Network Creation Game [Fabrikant et al. PODC’03] and its variants attempt to model this scenario. In these games, agents correspond to nodes in a network and each agent
may create costly links to other nodes. The goal of each agent is to obtain a connected
network having maximum service quality, i.e. small distances to all other agents, at low
cost.
The key questions are: How do the equilibrium networks of these games look like and
how can selfish agents actually find them? For the latter, recent results on the dynamic
properties of the sequential version of these games will be surveyed. For the former, ongoing
work focussing on structural properties is presented.
6
Online Top-k-Position
Monitoring of Distributed Data Streams
Manuel Malatyali
Heinz Nixdorf Institute, University of Paderborn
In this talk we consider a model in which there is one coordinator and a set of
n distributed nodes directly connected to the coordinator. Each node continuously
receives data from an input stream only known to the respective node or, in other
words, observes a private function whose value changes over time. At any time, the
coordinator has to know the k nodes currently observing the k largest values. In order
to inform the coordinator about its current value, a node can exchange messages
with the coordinator. Additionally, the coordinator can send broadcast messages
received by all nodes. The goal in designing an algorithm for this setting, which we
call Top-k-Position Monitoring, is to find a solution that, on the one hand, keeps the
coordinator informed as much as necessary for solving the problem and, at the same
time, aims at minimizing the communication, i.e., the number of messages, between
the coordinator and the distributed nodes.
For the considered problem, we present an algorithm that combines the notion
of filters with a kind of random sampling of nodes. The basic idea of assigning filters
to the distributed nodes is to reduce the number of exchanged messages by providing
nodes constraints defining when they can safely resign to send observed changes in
their input streams to the coordinator. However, if it might become necessary to
communicate observed changes and update filters, we make extensive use of a new
randomized protocol for determining the maximum (or minimum) value currently
observed by (a certain subset of) the nodes.
As our problem is an online problem, since the values observed by the nodes
change over time and are not known in advance, in our analysis we compare the
number of messages exchanged by our online algorithm to that of an offline algorithm
that sets filters in an optimal way. We show that they differ by a factor of at most
O((log∆+k)·logn) on expectation, where ∆ is the largest difference of the values
observed at the nodes holding the k-th and (k+1)-st largest value at any time.
7
Pattern Matching with Variables:
Fast Algorithms and New Hardness Results
1
Henning Fernau1 , Florin Manea2 ,
Robert Merca¸s2 , and Markus L. Schmid1 ,
Fachbereich IV – Abteilung Informatikwissenschaften,
Universita¨t Trier, D-54286 Trier, Germany,
{Fernau, MSchmid}@uni-trier.de
2
Department of Computer Science,
Kiel University, D-24098 Kiel, Germany,
{flm, rgm}@informatik.uni-kiel.de
A pattern is a string that consists of terminal symbols (e. g., a, b, c) and variables (e. g., x1 , x2 , x3 ). The
terminal symbols are constants, while the variables are uniformly replaced by strings over the set of
terminals; thus, a pattern is mapped to a terminal word. For example, x1 abx1 x2 cx2 x1 can be mapped to
acabaccaaccaaac by the replacement (x1 → ac, x2 → caa).
Due to their simple definition, the concept of patterns emerges in various areas of theoretical computer
science, such as language theory (pattern languages), learning theory (inductive inference, PAC-learning),
combinatorics on words (word equations, unavoidable patterns, etc.), pattern matching (generalised function matching), database theory (extended conjunctive regular path queries), and we can also find them in
practice as extended regular expressions with backreferences, used in programming languages, e.g., Perl,
Java, Python.
In all these different applications, the main purpose of patterns is to express combinatorial pattern matching questions. Unfortunately, deciding whether a given general pattern can be mapped to a given word is
N P-complete. On the other hand, some subclasses of patterns are known for which the matching problem
is in P; however, the existing polynomial time algorithms for these classes are fairly basic and cannot be
considered efficient in a practical sense. Therefore, we present several efficient algorithms for the known
polynomial variants of the matching problem. While we consider our algorithms to be non-trivial, their
running times have still an exponential dependency on certain parameters (necessary under common complexity theoretical assumptions) of patterns and, thus, are acceptable only for strongly restricted classes
of patterns.
In some applications of patterns it is necessary to require the mapping of variables to be injective. To this
end, we show the N P-completeness of the following natural combinatorial factorisation problem: given a
number k and a word w, can w be factorised into k distinct factors? It follows that even for the trivial
patterns x1 · · · xk the matching problem is N P-complete if we require injectivity. In terms of complexity,
a clear borderline between the injective and the non-injective versions of the matching problem is thus
established.
8
Parameterized Complexity of CTL:
A Generalization of Courcelle’s Theorem
Martin Lu
Arne Meier
¨ck
Irina Schindler∗
Institut fu
¨r Theoretische Informatik
Leibniz Universit¨at Hannover
{lueck, meier, schindler}@thi.uni-hannover.de
We present an almost complete classification of the parameterized complexity of all operator fragments of the satisfiability problem in computation tree logic CTL. The investigated
parameterization is temporal depth and pathwidth. The classification shows a dichotomy between W[1]-hard and fixed-parameter tractable fragments. The only real operator
fragments which is in FPT is the fragment containing solely AX. Also we prove a generalization of Courcelle’s theorem to infinite signatures which will be used to prove the
FPT-membership cases.
∗
Supported in part by DFG ME 4279/1-1.
9
On Combinatorial Anonymization
Andr´e Nichterlein
Fakult¨
at f¨
ur Softwaretechnik und Theoretische Informatik, TU Berlin, Germany
Email: [email protected]
We review our recent and ongoing work on analyzing the computational complexity of combinatorial data
anonymization, mainly discussing degree-based network anonymization. Roughly speaking, an object is
called k-anonymous if there are at least k − 1 other objects in the data that ”look the same”. In case of
graphs, a vertex is called k-anonymous if there are at least k −1 other vertices having the same degree. The
goal to make a graph k-anonymous (that is, all its vertices shall be k-anonymous) leads to a number of
algorithmic
graph modification problems. These problems are mostly intractable, in particular we exclude
√
o( n)-approximation algorithms with running time f (s) · nO(1) where s denotes the number of allowed
modifications. On the positive side, we show efficiently solvable cases when restricting to edge insertion
as allowed modification.
This talk is based on joint work with Cristina Bazgan, Robert Bredereck, Vincent Froese, Sepp Hartung,
Clemens Hoffmann, Rolf Niedermeier, Ondrej Such´
y, Nimrod Talmon, and Gerhard Woeginger.
10
Pattern Matching with Variables
Markus L. Schmid
Universit¨at Trier, FB IV–Abteilung Informatikwissenschaften
Let Σ be an arbitrary alphabet of terminals and let X = {x1 , x2 , x3 , . . .} be an enumerable set of variables.
Any string α ∈ (Σ ∪ X)+ is a pattern and every string w ∈ Σ∗ is a word. A substitution is a mapping
h : (X ∪ Σ) → Σ∗ with h(a) = a for every a ∈ Σ. For a pattern α = z1 z2 . . . zn , zi ∈ Σ ∪ X, 1 ≤ i ≤ n, by
h(α) we denote the word h(z1 )h(z2 ) . . . h(zn ). The problem of pattern matching with variables is defined
as follows:
Pattern Matching with Variables (VPatMatch)
Instance: A pattern α and a word w.
Question: Does there exist a substitution h with h(α) = w?
As an example, we consider the pattern α = x1 a x1 b x2 x2 , where a, b, c ∈ Σ, and the word w =
bacaabacabbaba. We note that (α, w) is a positive instance of VPatMatch since for h(x1 ) = baca
and h(x2 ) = ba we have h(α) = w. On the other hand, there exists no substitution h with h(α) =
cbcabbcbbccbc.
Due to their natural and simple definition, the concept of patterns (and how they map to words) emerges
in various areas of theoretical computer science, such as language theory, learning theory, combinatorics on
words, pattern matching, database theory, and we can also find them in practice in the form of extended
regular expressions with backreferences, used in programming languages like Perl, Java, Python, etc.
The problem VPatMatch, as defined above, is N P-complete (which is easy to show), but in the literature different variants of VPatMatch are investigated: the nonerasing version (i. e., variables must be
substituted by non-empty words), the terminal-free version (i. e., the patterns contain only variables), the
injective version (i. e., different variables must be substituted by different words) and any combination of
these. In addition to that there are many numerical parameters: number of variables, number of terminals,
length of w, number of occurrences per variable, length of the images h(x). By combining the different
VPatMatch-variants with all possibilities of bounding some of the numerical parameters by constants,
we obtain a fairly large class of different pattern matching problems with variables.
In this talk, we present some of the main results of a systematic multivariate complexity analysis (see
[3, 1, 2]) of this rich class of pattern matching problems with variables. It turns out that surprisingly
strong restricted versions of VPatMatch are still N P-complete, while all polynomial time solvable cases
are such that the brute-force algorithm already has polynomial running time.
Literatur
[1] H. Fernau and M. L. Schmid. Pattern matching with variables: A multivariate complexity analysis.
In Proceedings of the 24th CPM, volume 7922 of LNCS, pages 83–94, 2013.
[2] H. Fernau, M. L. Schmid, and Y. Villanger. On the parameterised complexity of string morphism
problems. In Proceedings of the 33rd FSTTCS, volume 24 of Leibniz International Proceedings in
Informatics (LIPIcs), pages 55–66, 2013.
[3] D. Reidenbach and M. L. Schmid. Patterns with bounded treewidth. Information and Computation,
2014. http://dx.doi.org/10.1016/j.ic.2014.08.010.
11
The Minimum Feasible Tileset problem
Yann Disser
Institut f¨
ur Mathematik, TU Berlin, Germany
Stefan Kratsch
Institut f¨
ur Softwaretechnik und Theoretische Informatik, TU Berlin, Germany
Manuel Sorge
Institut f¨
ur Softwaretechnik und Theoretische Informatik, TU Berlin, Germany
Email: [email protected]
We consider the Minimum Feasible Tileset problem: Given a set of symbols and subsets of these symbols (scenarios), find a smallest possible number of pairs of symbols (tiles)
such that each scenario can be formed by selecting at most one symbol from each tile. We
show that this problem is NP-complete even if each scenario contains at most three symbols. Our main result is a 4/3-approximation algorithm for the general case. In addition,
we show that the Minimum Feasible Tileset problem is fixed-parameter tractable both
when parameterized with the number of scenarios and with the number of symbols.
12