My research statement

Shikha Singh
Research Statement
1/3
I am interested in designing algorithms to solve practical problems posed by strategic behavior and the emergence of big data.
Strategic behavior is ubiquitous and comes into play whenever the input of an algorithm
is collected from self-interested agents. For example, the voters in an election, or the bidders
in an auction can lie to manipulate the outcome for their own benefit. Mechanisms are algorithms tailored to incentivize truthful behaviour from participating agents and studied under
the discipline of algorithmic game theory.
When dealing with massive datasets, the performance of an algorithm is governed by factors
outside traditional algorithmic paradigms.
Conventionally, the efficiency of an algorithm is determined by the number of computations
it has to perform. However, when the size of the input is too large to fit in in the memory of a
device, computing on it requires transferring smaller chunks of data between an external disk
and memory. These input/output operations then become a bottleneck, and external memory
algorithms are developed to minimize the transfer cost [4].
Furthermore, standard algorithm design assumes that the entire input is known from the
start. However, in many situations the data might not be available all at once but arrives over
time. Online algorithms [14] specialize in performing well on such uncertain data.
Online and External Memory Algorithms
Sorting big data is one of the most fundamental computational tasks, used heavily by websites
such as Google and Amazon for content organisation. Sorting is also the core operation in all
database management systems [22, 25, 27].
External merge sort is a well-known and widely used external memory sorting algorithm
[4, 22, 26, 27]. In a joint work with Micheal Bender, Samuel McCauley, Andrew McGregor
and Hoa Vu [9], we improve the first phase of external merge sort. We revisit the classic
problem of run generation, which has been studied for over 50 years [17, 24, 30]. We provide
the first theoretical analysis of the oldest and most common technique for run generation called
replacement selection [18, 25, 26, 30]. While it was known to perform well on random data
[17], we show that a simple modification of replacement selection, called up-down replacement
selection performs asymptotically better. This result extends the analysis of the same technique
by Knuth in 1963 [24]. Our optimal online algorithm for run generation had been proposed as a
practically implementable heuristic, shown to work well in practice as well [30]. Thus our work
establishes the theoretical foundations for analysing heuristics proposed in previous literature
[10, 18, 24, 26, 30], which will guide future research on the subject.
Algorithmic Game Theory
With the advent of cloud computing [1, 3] and crowd-sourced internet marketplaces [2], it is
essential to have verification schemes through which the client can ensure correctness of the
computation performed by third parties. Such delegation of computation becomes especially
important for computationally weak devices such as cellphones and tablets. For businesses to
conduct such exchange of money for resources, we need verifiable protocols for delegation of
computation [19, 23, 32].
Interactive proofs [7, 20] are a classical way to perform such verification. A weak client
or verifier can successfully verify the claims made by a powerful server or prover through an
interactive proof. However, interactive proofs are impractical due to the high computation and
communication cost incurred by the verifier. Recently, rational proofs [6] were introduced as a
Shikha Singh
2/3
simple and efficient alternative to interactive proofs. They incorporate a reward for the prover,
which is computed by the verifier based on estimation tools (such as scoring rules [35]). The
prover is rational in the game theoretic sense, that is, he only wants to maximize his reward.
Rational proofs ensure that the prover’s reward is maximized only if he carries out the desired
computation correctly.
However, the model of rational proofs only allows for a single prover, while in practice
multiple provers might be involved as third-party servers. In a collaborative work with Jing
Chen and Samuel McCauley [11], we extend rational proofs to allow any number of provers.
With multiple untrusted rational provers, there is an added risk of possible collusion—the
provers could cooperate together to obtain a better reward. Our proof system is robust against
such collusion and is more powerful than all existing interactive proof systems. Thus, this
paper resolves an open problem posed in [6]. We characterize the proof system for two class of
provers—provers sensitive to very small losses in reward and those who are not. In future, we
hope to construct super-efficient multi-prover rational proofs for useful complexity classes.
Future Directions
Online Facility Location. In the literature of algorithmic game theory, the data available
to an algorithm is collected through strategic agents who might lie to shift the outcome in their
favor. For example, if the government wants to establish a public facility such as a park, and asks
the citizens to report the ideal location they prefer, they could provide incorrect information
to move the establishment closer to them. Similarly, in public elections, several voters might
independently or in a group collude to make their favorite candidate win. Mechanisms involving
monetary payments are often unsuitable in such situations due to ethical or legal reasons. Thus,
moneyless mechanisms are required which are strategyproof, that is, the strategic agents cannot
affect the outcome by providing dishonest information [31].
I am interested in the problem of facility location and its variants [15, 29, 31], which is a
model problem for such mechanisms without money. In particular, I want to construct strategyproof mechanisms without money for online facility location. In this problem, the agents
arrive over time and decisions regarding the number of facilities and their location need to
be made online, so as to optimize a social objective. I wish to apply new techniques from
mathematical programming, such as Lagrangian duality [34], to solve this problem.
Online Algorithms for Estimating Sortedness of Big Data. The research on the run
generation problem [9] has fueled my interest in developing algorithms which combine the
essential ingredients of external memory [4] and streaming [8]. Streaming algorithms process
the input one at a time and take few (ideally one) passes over the data to estimate useful
information [16, 28, 33]. I want to design algorithms which have some foreknowledge of the
incoming stream (equal to the size of the memory), and can store essential information on an
external disk (at the cost of expensive transfers). Thus they are sensitive to both the constraints
usually associated with big data (availability over time and disk access cost). I want to focus
on constructing such an algorithm to estimate the sortedness of a data stream [13, 21], which
is a well-known problem with important applications in bio-informatics [5, 12].
References
[1] Amazon elastic compute cloud. Online at http://aws.amazon.com/ec2/.
[2] Amazon mechanical turk. Online at https://www.mturk.com/mturk.
[3] Sun utility computing. Online at http://www.oracle.com/us/sun/index.htm.
Shikha Singh
3/3
[4] A. Aggarwal and S. Vitter, Jeffrey. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, Sept. 1988.
[5] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool.
Journal of molecular biology, 215(3):403–410, 1990.
[6] P. D. Azar and S. Micali. Rational proofs. In Proc. 44th Annual Symposium on Theory of Computing
(STOC), pages 1017–1028, 2012.
[7] L. Babai. Trading group theory for randomness. In Proc. 70th Annual ACM Symposium on Theory of
Computing (STOC), pages 421–429, 1985.
[8] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In
Proc. 21st ACM Symposium on Principles of Database Systems (PODS), pages 1–16, 2002.
[9] M. Bender, S. McCauley, M. Andrew, S. Singh, and H. Vu. Run Generation Revisited: What Goes Up May
or May Not Come Down. Submitted. Available online at http://www.cs.stonybrook.edu/~shiksingh/
BenderMcMc15.pdf.
[10] B. Chandramouli and J. Goldstein. Patience is a virtue: Revisiting merge and sort on modern processors.
In Proc. 2014 ACM SIGMOD Int’l Conference on Management of Data, pages 731–742, 2014.
[11] J. Chen, S. McCauley, and S. Singh. Rational interactive proofs with multiple provers. Submitted. Available
online at http://www.cs.stonybrook.edu/~shiksingh/ChenMcCauleySingh.pdf.
[12] A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg. Alignment of whole
genomes. Nucleic Acids Research, 27(11):2369–2376, 1999.
[13] F. Ergun and H. Jowhari. On distance to monotonicity and longest increasing subsequence of a data stream.
In Proc. 90th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 730–736, 2008.
[14] A. Fiat. Online algorithms: The state of the art (lecture notes in computer science). 1998.
[15] D. Fotakis and C. Tzamos. Winner-imposing strategyproof mechanisms for multiple facility location games.
In Internet and Network Economics, pages 234–245. 2010.
[16] A. G´
al and P. Gopalan. Lower bounds on streaming algorithms for approximating the length of the longest
increasing subsequence. SIAM Journal on Computing, 39(8):3463–3479, 2010.
[17] B. J. Gassner. Sorting by replacement selecting. Communications of the ACM, 10(2):89–93, 1967.
[18] M. A. Goetz. Internal and tape sorting using the replacement-selection technique. Communications of the
ACM, 6(5):201–206, 1963.
[19] S. Goldwasser, Y. T. Kalai, and G. N. Rothblum. Delegating computation: interactive proofs for muggles.
In Proc. 40th Annual ACM Symposium on Theory of Computing (STOC), pages 113–122, 2008.
[20] S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof systems. SIAM J.
Comput., 18(1):186–208, 1989.
[21] P. Gopalan, T. Jayram, R. Krauthgamer, and R. Kumar. Estimating the sortedness of a data stream. In
Proc. 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 318–327, 2007.
[22] G. Graefe. Implementing sorting in database systems. ACM Computing Surveys (CSUR), 38(3):10, 2006.
[23] J. Kilian. A note on efficient zero-knowledge proofs and arguments. In Proceedings of the twenty-fourth
annual ACM symposium on Theory of computing, pages 723–732, 1992.
[24] D. E. Knuth. Length of strings for a merge sort. Communications of the ACM, 6(11):685–688, 1963.
[25] D. E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. 1998.
[26] P.-˚
A. Larson. External sorting: Run formation revisited. IEEE Transactions on Knowledge and Data
Engineering, 15(4):961–972, 2003.
[27] P.-˚
A. Larson and G. Graefe. Memory management during run generation in external sorting. In Proc. 1998
ACM SIGMOD Int’l Conference on Management of Data, volume 27, pages 472–483, 1998.
[28] D. Liben-Nowell, E. Vee, and A. Zhu. Finding longest increasing and common subsequences in streaming
data. Journal of Combinatorial Optimization, 11(2):155–175, 2006.
[29] P. Lu, X. Sun, Y. Wang, and Z. A. Zhu. Asymptotically optimal strategy-proof mechanisms for two-facility
games. In Proc. 11th Annual ACM conference on Electronic Commerce, pages 315–324, 2010.
[30] X. Martinez-Palau, D. Dominguez-Sal, and J. L. Larriba-Pey. Two-way replacement selection. In Proc. of
the VLDB Endowment, volume 3, pages 871–881, 2010.
[31] A. D. Procaccia and M. Tennenholtz. Approximate mechanism design without money. In Proc. 10th Annual
ACM conference on Electronic commerce, pages 177–186, 2009.
[32] G. N. Rothblum. Delegating computation reliably: paradigms and constructions. PhD thesis, Massachusetts
Institute of Technology, 2009.
[33] X. Sun and D. P. Woodruff. The communication and streaming complexity of computing the longest common
and increasing subsequences. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete
algorithms, pages 336–345, 2007.
[34] N. K. Thang. Lagrangian duality based algorithms in online scheduling. arXiv:1408.0965, 2014.
[35] R. L. Winkler. Scoring rules and the evaluation of probability assessors. Journal of the American Statistical
Association, 64(327):1073–1078, 1969.