Big Data and Intelligent Transportation Systems

BIG DATA & INTELLIGENT TRANSPORTATION SYSTEMS
John W. Bagby *
Abstract:
Transportation is a quintessential application domain for big data
and the associated predictions from big data analytics. Almost every
domain where big data is deployed shares most policy concerns
with Intelligent Transportation Systems (ITS) including privacy,
security, intellectual property, and the attraction of capital to fund
innovation and deployment. This article explores the unique ITS
domain-specific prediction that informs both decision-making and
control derived from big data analytics and how these matters will
be constrained by public policy. Such policy intervention is most
likely through litigation, legislation, regulation, and standards
development. Policy intervention should be anticipated (1) where
big data predictions and the resulting control systems prove to be
unreliable or (2) when injustice is perceived to profoundly impact
life, liberty or property interests or have disparate impact on
demographic groups. A risk-benefit approach is deployed here to
propose an inter-disciplinary techno-policy research agenda.
I. INTRODUCTION
Big data is all the rage; hopefulness inspires widespread belief that it is
essential to solving future complex and societal problems. However, there is
also widespread dread it will actually cause more societal problems than it
solves. Proponents primarily argue the former and seldom admit or address
much of the latter. Proponents claim that: (1) big data is essential to resolve
intractable engineering design and healthcare problems, 1 (2) big data might
inform predictive analytics essential to modern counter-terrorism 2 and
improve the effectiveness & efficiency of law enforcement, 3 (3) big data
*
Professor of Information Sciences and Technology, the Pennsylvania State
University.
1
Tien James M., Overview of Big Data: A US Perspective, 44 THE BRIDGE 12-19
(Nat’l Acad. Press) accessible at: http://www.nae.edu/File.aspx?id=128774
2
Bulk Collection of Signals Intelligence: Technical Options, Comm. on Responding to
Sect. 5(d) of PDD-28: The Feasibility of Software to Provide Alternatives to Bulk Signals
Intelligence Collection; Comp.Sci.& Telecomm.Bd.; Div. on Eng.&Phys.Sci.
Nat'l.Res.Counc (2015) accessible at: http://cryptome.org/2015/01/nap-bulk-sigint.pdf
3
Hoofnagle, Chris Jay., Big Brother's Little Helpers: How ChoicePoint and Other
Commercial Data Brokers Collect and Package Your Data for Law Enforcement, 29 NCJ
INT'L L. & COM. REG. 595 (2003) (predicting separate business model for private sector
data brokers providing big data assistance to law enforcement); Cavoukian, Ann, & Jeff
Jonas. Privacy by design in the age of big data. Information and Privacy Commissioner of
2
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
promises to resolve the long-felt, final attainment of the elusive, frictionless
transaction cost-free free market, 4 and (4) big data might enable prediction,
using various tools of analytics, of relationships and outcomes across most
fields of endeavor and academic domains. 5 This article addresses how big
data is essential to enable intelligent transportation systems (ITS), see
Figure I.
Figure I:
Big Data Enablement of ITS: a Generalized Architecture
Ontario, Canada, 2012. (arguing false positives resulting from law enforcement use of big
data falsely accuse the innocent); Tene, Omer & Jules Polonetsky, Privacy in the age of big
data: a time for big decisions, 64 STAN. L. REV. ONLINE 63 (2012) (arguing big data
practicality must balance against individual rights) and Skolnick, Jerome H., JUSTICE
WITHOUT TRIAL: LAW ENFORCEMENT IN DEMOCRATIC SOCIETY, (Quid pro books, 2011)
(arguing big data use to predict criminality risks damage and punishment without due
process).
4
See generally, Whinston, Andrew B., Soon-Yong Choi, Dale O. Stahl & Dale O.
Stahl, THE ECONOMICS OF ELECTRONIC COMMERCE, (1997 MacMillan London).
5
See generally, Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution
that will transform how we live, work, and think. Houghton Mifflin Harcourt.
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
3
A. Provisional Definition: Big Data
What is big data? Provisional definitions abound but many devotees use
some form of the following – a narrower scope for the formulation of big
data than is emerging in the data analytics field in business:
big data is “an accumulation of data that is too large and complex
for processing by traditional database management tools.” 6
This restrictive conceptualization should resonate as probably too limiting
because advances in data processing are foreseeable and should be
anticipated. Feasibility of using big data, once considered a technical
frontier beyond reach, 7 is steadily being resolved. Many big data definitions
are problematic in limiting their scope particularly as software tools, storage
and processing capacity, and bandwidth advance (roughly) according to
Moore’s law. 8
Moore’s law and evolving business practices suggests that four
phenomena propel big data analytics: 1st seemingly ever decreasing
data storage costs, 2nd seemingly ever increasing processing
capabilities, 3rd demand for analysis of big data, and 4th apparent
9
usefulness of predictions made thereon.
For the purposes of applying big data in the fields of intelligent
transportation systems (ITS), this article is more permissive providing a far
less doctrinaire conceptualization. This is accomplished by not attempting
to limit big data to the frontiers of data science. This is a more hopeful and
practical definition acknowledging that actual practice is far more forgiving
and already calls itself big data analytics.
6
Merriam-Webster.com accessible at: http://www.merriamwebster.com/dictionary/big%20data
7
Tien James M., Overview of Big Data: A US Perspective, 44 THE BRIDGE 12, 14
(Nat’l Acad. Press) accessible at: http://www.nae.edu/File.aspx?id=128774
8
Moore, Gordon E.,Cramming more components onto integrated circuits,
ELECTRONICS 4. (April 19, 1965) reprinted in 86 PROOCD. IEEE 82-5 (Jan.1998)
accessible at: http://www.cs.utexas.edu/~fussell/courses/cs352h/papers/moore.pdf
9
John W. Bagby, Using an Industrial Organization (I/O) Lens to Enhance Predictive
Analytics: Disentangling Emerging Relationships in the Electronic Surveillance Supply
Chain, LEGAL AND ETHICAL ISSUES IN PREDICTIVE DATA ANALYTICS, Virginia Tech
University, June 20, 2014.
4
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
B. Provisional Definition: Intelligent Transportation Systems (ITS)
It is the same story for ITS, a broad patchwork of “cool” technologies
that are rather poorly integrated without very heavy reliance on software 10
and big data. ITS holds great future promise to solve engineering and
societal problems: public safety, environmentalism, efficiency,
infrastructure costs, mobility, convenience, national security and law
enforcement. Most readers can quickly identify some complex
embodiments of ITS technologies, such as Google’s autonomous/smart car,
as well as some of ITS component technologies, such as these well-known
constituents, many essentially rely on various forms of telematics: 11















10
lane departure warning systems,
collision warning & avoidance systems,
adaptive cruise controls,
automated toll collection (EZPass),
congestion pricing of road use,
variable (warning sign) messaging,
dynamic traffic control,
driver assist to driverless highways,
mobile 911 location referencing,
freeway traffic management systems,
automated traffic enforcement (red light cameras)
commercial vehicle monitoring & control
public transit monitoring & control
navigation, GPS, “infotainment”
mass surveillance
Ramsey, Mike, Fears Push Car Makers Deep into Silicon Valley, WALL ST.J.
(3.26.15) accessible at: http://www.wsj.com/articles/ford-mercedes-set-up-shop-in-siliconvalley-1427475558 (arguing automaker fears of missing smart car revolution invade
silicon valley now that software accounts for 10 - 25% of new passenger vehicle value).
11
Duri, Sastry, Gruteser, M., Liu, X., Moskowitz, P., Perez, R., Singh, M., & Tang, J.
M., Framework For Security And Privacy in Automotive Telematics, PROCEED. 2ND INT’L
WORKSHOP ON MOBILE COMMERCE (ACM, 2002).
“Telematics” is an interdisciplinary field focused on long-distance transmission of
computer information. Telematics systems are composed of sensor arrays, communications
and control technologies that permit remote monitoring and management of vehicles. These
are applicable to spacecraft, aircraft (including drones), highway surface transportation
(passenger vehicles, fleets), rail traffic, and waterborne vessels, among others. See
generally Buxton, W., Integrating the periphery and context: A new taxonomy of
telematics, 95 PROCEED. GRAPHICS INTERFACE-- (1995).
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium








BIG DATA & ITS
5
automated risk management (real-time insurance underwriting),
weigh in motion for commercial fleet operations,
weather condition notification & routing remediation,
parking management,
hazardous cargoes & environmental control,
driver monitoring,
o Commercial operator regulatory compliance
o Operator real-time risk based underwriting (Progressive
“Snap-Shot”)
forensic enablement - data capture:
o ex post cockpit recording (on-board black boxes)
o real time remote transmission (telematics),
emergency condition resolution (OnStar).
Autonomous, full-automation, and artificial intelligent instantiations of ITS
vehicles are rife in popular culture. Examples include the artificial
intelligence (AI) controlled Johnny Cab in “Total Recall,” 12 the voicecontrolled autonomous vehicles in “Demolition Man,” 13 or the faster and
safer autonomous highway vehicles hypothesized in “I, Robot.” 14
C. Provisional Definition: Big Data & ITS
ITS might be THE paradigm application of big data analytics
methodologies and components. As the short list above indicates, ITS is a
huge umbrella of component technologies and applications that promise
improvements in public safety, environmentalism, efficiency, infrastructure
costs, mobility, convenience, national security and law enforcement.
Three challenging design problems are presented. First, there is the
design of ITS technologies that collect, archive and make available the big
data for use in ITS applications. Second, the design of the ITS applications
enable and depend directly on analytical and control system designs that
deliver on the promises of efficiency, safety and information services.
Third, the integrated design of these two systems, ITS technologies and ITS
12
Total Recall, Prod. Buzz Feitshans, Dir. Paul Verhoeven, Perf. Arnold
Schwarzenegger, Sharon Stone, TriStar Pictures, 1990.
13
Demolition Man, Prod. Joel Silver, Dir. Marco Brambilla, Perf. Sylvester Stallone,
Wesley Snipes & Sandra Bullock, Warner Bros., 1993.
14
I, Robot, Prod. Laurence Mark, Dir. Alex Proyas, Perf. Will Smith & Bridget
Moynahan, 20th Century Fox, 2004.
6
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
applications, must be reliably integrated. Designs for ITS system
integrations have only recently been successful in demonstrations, so few
are widely deployed. ITS system integrations must be continually updated
and tested given the risks of failure. The integration of ITS technology
designs with the ITS applications designs is the design area posing the most
critical policy problems.
Most of the initial ITS technology designs were conceptually described
in the 1990s 15 and many were slowly rolled out in limited demonstration
forms by the year 2000. This was mandated and facilitated by various
federal laws largely through component or affiliated agencies to the U.S.
Department of Transportation (DoT). 16 Many proponents in the ITS
community predicted very near term deployment of turn-key systems by
early in the early 21st century. 17 However, this nearly irrepressible
optimism 18 continues today despite progress beyond the piecemeal
component deployments urged under the Intelligent Vehicle Initiative (IVI)
of the late 1990s. 19
However, experience clearly demonstrates these predictions, many
dating back to the mid-1990s, were over-optimistic. Left to be unanswered
are practical questions of attracting capital and robust, well-tested
technologies into manufacturing, assembly of critical mass, and
development of rigorous standards-compliant interconnectivity in critical
safety hardware and supporting software and communications systems.
15
THE NATIONAL ITS ARCHITECTURE: A FRAMEWORK FOR INTEGRATED
TRANSPORTATION INTO THE 21ST CENTURY, Office of the Assistant Secretary for Research
and Technology, Intelligent Transportation Systems Joint Program Office, Department of
Transportation accessible at:
http://itsarch.iteris.com/itsarch/documents/physical/physical.pdf
16
E.g., Norman Y. Mineta Research and Special Programs Improvement Act, PUB. L.
108-426 (108th Cong.) accessible at:
http://www.rita.dot.gov/laws_and_regulations/public_law_108_426.html
17
See e.g., Qu, Zhihua, Cooperative control of dynamical systems: applications to
autonomous vehicles, (SPRINGER SCI. & BUS. MEDIA, 2009) (arguing current feasibility and
huge promise for autonomous vehicles in most fields of endeavor).
18
Rogers, Christina, Google Sees Self-Driving Car on Road Within Five Years, WALL
ST.J. Jan.13, 2015 ((arguing that Google executive sees no regulatory hurdles to Google’s
fully automated autonomous vehicle) accessible at: http://www.wsj.com/articles/googlesees-self-drive-car-on-road-within-five-years-1421267677
19
Advanced Public Transportation Systems: The State of the Art Update 2000U.S.
Department of Transportation, Federal Transit Administration, (2000). 1998 Transportation
Equity Act for the 21st Century (TEA-21), Pub. L. 105-178 (June 9, 1998) accessible at:
http://www.fhwa.dot.gov/tea21/tea21.pdf
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
7
Furthermore, actual user acceptance, rather than initial consumer interest
must also be proven.
ITS uses component technologies well-known to the ITS communities.
These include (1) sensor networks, (2) location and proximity referencing,
(3) data repositories (e.g., cloud, on-board vehicle memory), (4) telecommunications connections (wireless and landline, short range and longer
range), (5) analytics capacity assessing using statistical, actuarial and
network science methodologies, (6) accretion of huge new databases
supplementing existing data, and (7) control systems for in-vehicle and
roadbed actuation. This simple list conceals the vast complexity of ITS
systems, somewhat more diagrammatically revealed in the U.S. DoT’s
depiction of the ITS general architecture in Figure II. 20
Figure II:
National ITS Architecture - High Level Architecture Diagram
Source: Office of the Assistant Secretary for Research and Technology, Intelligent
Transportation Systems Joint Program Office, Department of Transportation
20
Office of the Assistant Secretary for Research and Technology, Intelligent
Transportation Systems Joint Program Office, Department of Transportation, accessible
at: http://www.standards.its.dot.gov/LearnAboutStandards/NationalITSArchitecture
8
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
Unlike many aspects of big data, prediction is only part of the ITS big
data implementation. Direct control and regulatory enforcement are also
major objectives. For example, HOV lane accessibility or closures, barriers
blocking on/off-ramps or special lanes (HOV), ramp metering, variable
messaging signage, automated traffic ticketing enforcement are but a few
existing examples. Autonomous vehicles may become “cars that drive
themselves,” raising significant risk management questions when hardware
and/or software failures are provable with forensic-quality evidence. Indeed,
risks of big data in the ITS application domain may offer some unique risks,
long hypothesized, 21 but now emerging in “enhanced” form. 22
The following sections assemble from technical, policy and assessment
literatures that cross these domains to enable an analysis of how policy
enablement, reaction and manage might emerge as ITS big data
technologies are deployed and assessed. The next section II explores the
generalized types of promised or speculative benefits expected for particular
ITS technologies and the generalizations about classes of benefits and
applications. Section III discusses ITS litigation risks, which serves as a
proxy for risk generally. Section IV hypothesizes how ITS deployment will
shift, attenuate or magnify such risks. This article concludes with tentative
findings summarizing how the risk-benefit approach deployed here can
inform an inter-disciplinary techno-policy research agenda consistent with
ITS goals as rationalized by public policy realities.
II. BIG DATA & ITS’S LITIGATION & POLICY RISKS
Big data challenges the status quo – a condition that both attenuates
some legal risks while accentuating other risks. However, the curmudgeon’s
views are predictable here: big data generally is merely a redux of the so
called “cyberlaw revolution” of the late 1990s; furthermore, ITS is nothing
revolutionary because the history of automotive development is a constant
21
See generally, Bagby, John W. & Gary L. Gittings, Litigation Risk Management for
Intelligent Transportation Systems (Part One), ITS-Quarterly, Vol.VII, No. 2 (SpringSummer 1999), Bagby, John W. & Gary L. Gittings, Litigation Risk Management for
Intelligent Transportation Systems (Part Two), ITS-Quarterly, Vol.VII, No. 3 (Fall 1999)
and Bagby, John W. & Gary L. Gittings, Litigation Risk Management for Intelligent
Transportation Systems (Part Three), ITS-Quarterly, Vol.VIII, No. 1 (Winter 2000).
22
Staff Report, Tracking & Hacking: Security & Privacy Gaps Put American Drivers
at Risk, Senate Office of Edward J. Markey (D-Mass.), Feb.2015, accessible at:
http://www.markey.senate.gov/imo/media/doc/2015-02-06_MarkeyReportTracking_Hacking_CarSecurity%202.pdf
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
9
adjustment to serial deployments of evolutionary technologies. In both
cases there is nothing revolutionary or so completely new that existing
principles cannot be adapted in a straightforward manner.
The development of Cyberlaw over the past two decades has caused
some useful awakening by practicing lawyers, policy makers and the laity.
There is some general consensus that Cyberlaw is composed of: (1) changes
to legal, regulatory and governing procedures, (2) transactional practice, (3)
intellectual property (IP), (4) privacy, (5) security and (cyber) crimes, (6)
various regulatory matters (e.g., antitrust, employment, financial regulation,
telecommunications), (7) technology transfer, and (8) myriad consumer
protections. As scholars predict and analyze emerging policy on big data
generally and on ITS big data in particular it seems likely that these policies
may closely track the development of cyberlaw. But is cyberlaw a distinct,
new field or should it be confined to an evolution from existing law? 23 The
next two sections address this as the law of the horse that suggests universal
method to address policy application in new fields of endeavor (like big
data and ITS).
A. Big Data & ITS: Just Another Law of the Horse?
The evolutionary approach posits that ITS Big Data will never become
an independent field. 24 This is reminiscent of Karl Llewellyn’s denigration
of developing any new specialized field, likening it to the “law of the
horse.” 25 Under this theory it is intellectually irresponsible to build new
“fields of study” in narrow areas like the law of the horse (e.g., horse theft,
horse financing and sale, horse injury torts, regulation of jockeys).
Llewellyn argued that the resulting laws are amateurish, inadequate and
ineffective. Llewellyn’s reluctance to develop commercial law as a new
23
Easterbrook, Frank H., Cyberspace and the Law of the Horse, 1996 U. CHI. LEGAL
F. 207. (arguing against any rush to treat cyberlaw differently than “traditional space, by
reacting to cyber-libertarians demand for cyberspace exceptionalism by urging resistance to
creating special exceptions for cyberspace).
24
See generally, Bagby, John W. (special issue ed.), Forward to the Special Issue on
Cyberlaw, 39 AM.BUS.L.J. 521 (Summer 2002) (arguing sufficient significance of
cyberlaw to require special treatment to attract scholarly investment). Portions of this
section are adapted from this special issue forward.
25
Karl N. Llewellyn, Across Sales on Horseback, 52 Harv. L. Rev. 725 (1939); Karl
N. Llewellyn, The First Struggle to Unhorse Sales, 52 Harv. L. Rev. 873 (1939) (defending
the newly created UCC as no more than a standardization of terms based on existing
commercial practice).
10
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
field was an important pre-condition to the UCC’s inherent flexibility. The
UCC’s apparent success is largely reliant on the experience of professionals
who already reduce transactions costs by developing customized
commercial relations. The UCC’s standardized gap filler provisions are
mere supplementation reliant primarily on customization by innovative
commercial transaction designers.
Llewellyn’s UCC approach has clearly succeeded to enable transaction
innovation with evolutionary approaches using general laws based on
durable fundamental principles precisely. This approach has lasting value
when compared to continental civil law directives that are excessively
detailed and thereby inflexible. So for the emerging field of ITS big data,
should new policy promote revolutionary cyberlaw because of Internet
exceptionalism, just another “law of the horse”? Are ITS and big data
revolutionary new fields that necessitate merely “idiosyncratic
transactions?” Are big data and ITS best promoted by policy amateurs who
design narrow rules for a new realm before that field can effectively
develop its own useful experience?” 26 Of course, dilettantes descend into
most new fields and policy should always embrace generality and
flexibility. 27 How can these inevitable undesirable consequences
accompanying a rush to legislate cyberlaw, big data law or ITS special
exceptions be avoided?
B. Applying Samuelson’s Evolutionary Model to Big Data & ITS Policy
The evolutionary approach is more measured, pragmatic and careful.
Legal traditionalists likely prefer evolution to revolution because it is
consistent with the efficiency of the common law, 28 and this supports stare
decisis. Thus, coherence is preserved and reinforced when applying
26
See Bagby, John W. (special issue ed.), Forward to the Special Issue on Cyberlaw,
39 AM.BUS.L.J. 521, 525 (Summer 2002) (proposing cyberlaw exceptionalism to the extent
it attracts academic research investment, fresh perspective and fresh talent unencumbered
by insider self-interest).
27
See Tonry, Michael & Norval Morris, Retirement of Sheldon Messinger, 80 CAL. L.
REV. 310 (1992).
28
See, Richard A. Posner, ECONOMIC ANALYSIS OF LAW (5th ed. Aspen Law &
Business, 1998), but see Todd J. Zywicki, The Rise and Fall of Efficiency in the Common
Law: A Supply-Side Analysis, 97 NW. U. L. REV. 1551 (2003). See also Bagby, John W,
Common Law Development of the Duty of Information Security in Financial Privacy
Rights, FOURTH ANNUAL FORUM ON FINANCIAL INFORMATION SYSTEMS AND
CYBERSECURITY: A PUBLIC POLICY PERSPECTIVE, Smith School of Business, Univ.
Maryland, May 23, 2007.
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
11
existing public policy through traditional law to any new set of
technologies. Berkeley Professor Pamela Samuelson’s approach predicted
that cyberlaw would be of sufficient magnitude to require special
treatment. First, in her approach to cyberlaw exceptionalism, the policy
analyst should reassess first principles. Second, a minimalist approach
should taken in all new policymaking adhering to the principle of
simplicity. Finally, whenever possible new policy should remain
technology neutral as much as is possible. 29 Applying these principles
from Llewellen, Easterbrook and Samuelson to big data and ITS,
policymakers would repeatedly wrestle with the choice between
evolutionary and revolutionary approaches.
It is foreseeable that many big data and ITS constituents would put
political pressure on policymakers to adopt at least some aspects of a
revolutionary approach. For example, those with vested interests in
massive, quick deployment of their own products, but which seek to elude
any risk of product or service liability, would likely urge revolutionary tort
reform exceptions that would “pre-empt” state tort law. Federal statutory
solutions (DoT regulations eliminating liability seem unlikely) might be
viewed as the silver bullet against liability risks posed by plaintiffs’
lawyers. By contrast, on the user side, one might expect civil libertarians to
argue the personal security risks of insecure big data contained in insecure
cloud repositories poses excessive privacy and personal security risks, e.g.,
stalking, impersonation. It might be expected such policy pressures could
result in costly cybersecurity regulations, specific rights of action for
liability, innovation-chilling design standards and extensive audit-enabling
recordkeeping. In both cases the twin promises of big data and ITS to
enhance societal goals would be thwarted by most revolutionary
approaches.
C. Policy Risks for Big Data & ITS are Policy Venue Dependant 30
Policy risks for Big Data and ITS span the cyberlaw fields of IP,
privacy, security and various consumer protective regulatory programs. In
29
Samuelson, Pamela, Five Challenges for Regulating the Global Information
Society, REGULATING THE GLOBAL INFORMATION SOCIETY (Chris Marsden ed., 2000).
30
Portions of this subsection III.C. are adapted in part from Bagby, John W.,
Illuminating the Elusive Cyber-Infrastructure Policy Resolution: the Industrial
Organization Lens, No. ALSB2013_0102 presentation-Academy of Legal Studies in
Business, Aug. 8, 2013, Boston MA.
12
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
this section the cybersecurity aspects are highlighted. Identifying and
assessing litigation and policy risks for other fields of big data and ITS
should follow this model.
Traditional security law, security law in the general cyber-infrastructure
context and security law applicable to big data enablement of ITS is
decidedly sectoral and not omnibus. The sectoral approach means there are
provisions of law, regulation and the common law that impinge on security
concerns, but these are neither applicable broadly across fields of law nor
broadly across industries or economic sectors. 31 That is, security law in the
U.S. closely resembles the sectoral nature of U.S. privacy law:
The U.S. has no comprehensive privacy (/security) protection
policy. Privacy (/security) laws are narrowly drawn to particular
industry sectors, which can be called a sectoral approach to privacy
(/security) regulation. Regulation of privacy (/security) generally
arises in the U.S. after there is considerable experience with privacy
(/security) abuses, an approach consistent with liberty, laissez-faire
economics and common law precedents as the major approach to
law making. As a result, U.S. privacy (/security) law is a
hodgepodge, patchwork of sectoral protections, narrowly construed
and derived from constitutional, statutory and regulatory provisions
of international, federal and state law. 32 (compare/contrast
emphasis added)
Omnibus approaches are much more comprehensive, they mandate
strong rights, thereby imposing strong duties on most industries and on
many government activities. Strong omnibus regulation is often politically
infeasible. Cyber-infrastructure security suffers because legal requirements
are not pervasive across industry and government sectors.
The traditional law of security is also a hodgepodge, patchwork derived
from various fields of law and security law is also based on constitutional,
statutory and regulatory provisions of international, federal and state law.
Security laws generally arise ex post, following crisis or galvanized political
will derived from mounting evidence of abuses. Traditional sources include
criminal law, tort law, contract, and malpractice. Privacy laws and security
31
See also Strauss, J., & Rogerson, K., Policies for online privacy in the United States
and the European Union, 19 TELEMATICS & INFORMATICS 173 (2002).
32
Bagby, John W., The Public Policy Environment of the Privacy-Security
Conundrum/Complement, pp.195-213 Ch. XII in Sangin Park (ed.), STRATEGIES AND
POLICIES IN DIGITAL CONVERGENCE (2007 Idea Group Ref., Hershey PA).
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
13
laws are linked in two fundamental ways: 1st as a trade-off 33 and 2nd as a
complement. 34 Big data is intimately involved with both privacy and
security law while ITS will be connected in a somewhat more tenuous way.
Sectoral laws impact security generally, cyber-infrastructure in
particular and big data in ITS. 35 These constrain activities in particular
industries ranging from several bellwether sectors like the federal regulation
of healthcare, finance, intellectual property, federal administrative law,
education, veterans affairs, deceptive trade practices, 36 and childrens’
protection. The states are also active, primarily in cyber-infrastructure
protection of identity theft with security breach notification (disclosure)
requirements, spyware and data disposal provisions.
1. Layered Policy Mechanisms for Cyber-Infrastructure Security
Cyber-infrastructure security policy emanates from one or more of
several layers; the optimal source depends on constraints imposed by
political considerations as well as the predicted effectiveness of each in
isolation and the system effectiveness of the combined set of controls. First,
despite the prospect for market failure, market discipline most certainly
provides at least some useful pressure to invest in security. 37 A subset of
market disciplines are industry best practices. These evince weak-form, de
facto standardization (e.g., mimicking behavior) that function best as a form
of information sharing. Another component of market discipline is derived
from the employment market for cyber-infrastructure security professionals
who would service the big data and ITS industries. Security professionals
33
National security and criminal law are two closely connected examples of the
tension between strong privacy law because it arguably leads to weak collective security.
34
Strong personal security relies on strong privacy practices.
35
Shaw, Thomas J. (ed.), INFORMATION SECURITY AND PRIVACY, (Am.Bar Assn.
2010).
36
See generally Bagby, John W, Common Law Development of the Duty of
Information Security in Financial Privacy Rights, FOURTH ANNUAL FORUM ON FINANCIAL
INFORMATION SYSTEMS AND CYBERSECURITY: A PUBLIC POLICY PERSPECTIVE, Smith
School of Business, Univ. Maryland, May 23, 2007 accessible at:
http://faculty.ist.psu.edu/bagby/Pubs/CommonLawEfficiencyCustodyDutyInfoSecurity1.pdf
37
Hahn, Robert W. & Anne Layne‐Farrar, The Law and Economics of Software
Security 30 HARV. J. L. & PUB. POLICY 284 (2007) (arguing market forces can work to
incentivize security investment, diverse software security problems suggest varying
remediation approaches and that traditional criminal law is rather ineffective to deter cybercrime).
14
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
share skill sets, some preparation (e.g., education, degrees from accredited
institutions), and credentialing. 38 These factors arguably contribute to some
uniformity among industry best practices. Professionalism in other
professions has emanated from licensing statutes, malpractice litigation, and
best practices.
Second, de jure standards drive very significant security undertakings.
For U.S. federal agencies, the Federal Information Security Management
Act 39 (FISMA) is influential to create an IT security compliance framework
for both civilian and Department of Defense (DoD) agencies. In the private
sector, there is a widening choice for de jure IT security standards from
National Institute of Standards and Technology (NIST), 40 the Control
Objectives for Information, and Related Technology (CoBIT) developed for
investment securities disclosure and the financial services industry by the
Information Systems Audit and Control Association’s (ISACA), 41 and the
control and standards International Organization for Standardization
(ISO). 42 The U.S. DoT is the coordinative body for ITS standardization, no
such single point of regulatory contact liiikely exists currently nor will
emerge for big data security. The effective penetration of alternative
security standards varies considerably. 43
38
A common security credential is the Certified Information Systems Security
Professional (CISSP) issued by the International Information Systems Security
Certification Consortium, Inc., (ISC)² a global, not-for-profit that provides education and
certification in IT security. Dozens of competing certification authorities exist throughout
the world.
39
FISMA is the Title II component of the E-Government Act of 2002, H. R. 2458,
Pub.L. 107-347, 116 Stat. 2899; codified at 44 U.S.C. §3541, et seq. (establishes federal
Chief Information Officer in the Office of Management and Budget (OMB); delegates
authority to the National Institute for Standards and Technology (NIST) and the National
Security Agency (NSA) to issue Federal Information Processing Standards (FIPS)
applicable to federal agencies and some federal contractors.
40
The NIST 800-series adapts the FIPS to private-sector government contractors,
accessible at: http://csrc.nist.gov/publications/PubsFIPS.html
41
COBIT standards are accessible by subscription at:
http://www.isaca.org/Knowledge-Center/cobit/Pages/COBIT-Online.aspx
42
IT security standards in the 27,000 series, Information Security Management
Systems, are issued jointly by the ISO and the International Electrotechnical Commission
(IEC). For example, ISO/IEC 27001: 2005 Information technology – Security techniques –
Information security management systems – Requirements (2005) holds promise as an
important model. Standards from the International Organization for Standardization (ISO)
are generally accessible only on a fee-basis at http://www.iso.org
43
For example, FIPS penetration in U.S. federal agencies is strong while penetration
into state governments is less so. COBIT is widely used by private-sector firms that are
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
15
Third, constitutional provisions, statutes and administrative regulations
are likely to provide increasingly specific security incentives for both big
data and ITS. The U.S. Constitution provides several structural security
provisions: U.S. and several state statutes mandate specific and largely
sectoral security requirements. Adminsitrative regulations, most at the
federal level, also mandate security. Fourth, importantly for the
development of more comprehensive security regimes, case law interpreting
privacy and security law presumably applicable to big data and ITS, 44 at
both state and federal levels, is refining security duties in various sectors. 45
2. Regulatory Tools
Various mechanisms can incentivize security investment. Some are
appropriate from almost any source, for example, professionalism of cyberinfrastructure personnel working in big data or ITS could be incentivized, as
it is for other professions, by licensing standards, malpractice litigation
duties and regulatory requirements, any of which could be mandated by
either state or federal law as supplemented by professional NGO
associations. Appropriate flexibility in cyber-infrastructure security
practices associated with big data and ITS is accommodated when standards
are produced by expert sources, such as are the audit standards for
professional self-regulation.
Disclosure is becoming an effective incentive to cyber-infrastructure
publicly-traded in the U.S. Penetration of ISO standards in the EU is substantial, but lags
considerably in the U.S. One explanation may be that mandatory compliance with ISO
standards is much stronger in the EU than in the U.S. One major exception for world-wide
conformity to ISO standards are the isotainers developed by Malcolm P. McLean of U.S.based Sea-Land Corp. Containerized freight must be compliant with ISO 668 - Series 1
freight containers -- Classification, dimensions and ratings (1995). See also Container
Handbook (Gesamtverband der Deutschen Versicherungswirtschaft e.V. - GDV) 2009,
accessible at: www.containerhandbuch.de/
44
See e.g., White, Anthony E., The Recognition of a Negligence Cause of Action for
Victims of Identity Theft: Someone Stole My Identity, Now Who Is Going to Pay for It,; 88
MARQ. L. REV. 847 (2004-2005).
45
See e.g., Bagby, John W, Common Law Development of the Duty of Information
Security in Financial Privacy Rights, FOURTH ANNUAL FORUM ON FINANCIAL
INFORMATION SYSTEMS AND CYBERSECURITY: A PUBLIC POLICY PERSPECTIVE, Smith
School of Business, Univ. Maryland, May 23, 2007 accessible at:
http://faculty.ist.psu.edu/bagby/Pubs/CommonLawEfficiencyCustodyDutyInfoSecurity1.pdf
16
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
security investment for big data and ITS. Consider that forty-six states 46 and
at least one federal statute 47 require disclosure of security intrusions from
breaches of PII databases that comprise part of big data. Direct personal
delivery of disclosure notice generally must be delivered directly to
potentially impacted parties under security breach notification legislation.
Breach notices are hard to keep secret, most are publicized broadly
supplementing the victims’ pressure with other market disciplines.
Eventually these have resulted in pressures for more legislation. 48 Some
such legislation requires further implementation by regulatory agencies. 49
Several less onerous security incentives might be supplied by direct
regulations. Some such incentives are included as part of consent decree
settlements with regulators, and some are beginning to appear in many de
jure standards. For example, mandatory security management regimes
require contingency planning addressing a wide range of activities that are
components of security risk management. The range of activities is
considerable and is best customized to the size of the entity, the line of the
entity’s business activities, and the particular risks that are most likely. Risk
assessment regimes are often implemented as a master risk-benefit analysis.
At a minimum, such regimes require (1) initial then ongoing threat
assessment, (2) iterative response planning, and (3) risk control, retention,
sharing, and transfer. Security audits by internal auditors as well as periodic
certified audits by independent experts or other audit authorities are an
increasingly frequent security mechanism. These engagements rely on
identification of technical and administrative controls and then requires
their testing. Cyber-infrastructure security regimes are well-represented in
these emerging audit practices.
46
For a fairly current, comprehensive listing see Security Breach Legislation 2011,
The National Conference of State Legislatures, accessible at:
 http://www.ncsl.org/default.aspx?tabid=22295 (state-by-state description)
 http://www.ncsl.org/default.aspx?tabid=13489 (table)
47
Health Information Technology for Economic and Clinical Health (HITECH) Act
applies a federal security breach notification requirement for protected healthcare
information (PHI) governed under the Health Insurance Portability and Accountability Act
(HIPAA). HITECH is a component of the American Recovery and Reinvestment Act of
2009 (ARRA), Pub.L.111-5.
48
PrivacyRights.org is one of several compilers of privacy breach notices, see:
http://www.privacyrights.org
49
See e.g., Breach Notification for Unsecured Protected Health Information, 74
Fed.Reg.42740 (August 24, 2009) codified as 45 C.F.R. 160 et. seq. (2010).
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
17
3. Central Role of Standardization in Managing Big Data Risks for ITS
Consistent with the philosophy of migrating standards from a particular
inflexible design configuration to more flexible performance standards,
cyber-infrastructure security regulations on big data and ITS are unlikely to
specify particular technical protections expected to enhance security. There
has been very considerable push-back against this rules-based
standardization or mandatory-design standards approach to establishing
security methods. 50 Lobbyists “K Street” success in opposing a strict rulesbased approach requiring particular design or performance standards may
be the most cogent explanation for the current fragmented state of security
incentives generally or as applicable to big data and ITS. For example,
while encryption is an obvious remedy for insecure IT systems, few statutes
or regulations directly require encryption. 51 Figure III depicts how
standardization could improve ITS quality, avert regulatory pressures and
lower litigation risks.
Figure III:
Source: Office of the Assistant Secretary for Research and Technology, Intelligent
Transportation Systems Joint Program Office, Department of Transportation
50
See, Letter from R. Bruce Josten, U.S. Chamber of Commerce, to Members of U.S.
Senate (July 31, 2012) (voicing strong opposition to S.3414 Cybersecurity Act of 2012).
But see, Letter from Sen. John D. (Jay) Rockefeller IV, Chairman, Senate Committee On
Commerce, Science, & Transportation, to Fortune 500 CEOs (Sept. 19, 2012) (requesting
views on cybersecurity practices and failed regulation).
An interesting conflict is now noted in the divergence of attitude between the industrywide lobbying powerhouse, the U.S. Chamber of Commerce, that generally took a hard line
against any mandatory cyber-infrastructure regulations, and supportive responses of
Fortune 500 CEOs to Senator Jay Rockafeller’s direct solicitation of support for cyberinfrastructure security regulations.
51
See e.g., Cal. Civ. Code §§ 56.06, 1785.11.2, 1798.29, 1798.82. California’s
S.B.1386 does not require encryption but exempts incidents from disclosure if the data lost
in a breach is encrypted. Id at §2.
18
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
4. Intellectual Property: Database Acquisition, Transfer & Use 52
Big data, business analytics and data science generally runs on data,
facts, collections of artistic expression and the like. Business models for
traditional data vendors like Lexis-Nexis, Westlaw, Moody’s, and eBay,
share a need for a reasonable cash-flow to fund investment in acquiring
data, and for converting analog data to digital formats. Legal protection for
such databases incentivizes their creation and maintenance, without which,
only publicly-financed libraries and research universities might
systematically accumulate such collections.
Databases assume many forms, such as collections of copyright
protected works, data in numeric and natural language form and many other
forms of materials. Databases are traditionally arranged in systematic and
methodical ways to enable retrieval and access. Traditional databases are
composed of raw facts that are meaningful after arrangement in logical
ways to enable meaningful analysis. Users enabled to aggregate particular
aspects of such data may observe meaningful relationships. Most useful
databases must be managed to maintain usefulness with constant updates
and periodical purges of stale entries.
Databases include literary works, artistic works, texts, sounds, images,
numbers, facts, production and shipping information, transactions, financial
data, geographic information and quickly increasing volumes of personally
identifiable information (IPII). Databases, stored on networked servers and
computers, are traditionally accessed according to defined criteria. By
contrast, modern big data expands this to do exploratory relationship and
hypothesize associations. Both traditional databases and big data are
manipulated to create reports which are transmitted using
telecommunications increasingly over the public Internet, although secure
data mining occurs in closed, proprietary Intranets.
Database technology continues to advance into distributed processing
systems that may be composed of informally linked separate repositories.
Some data collections are so huge they are called data warehouses.
Relational databases may revealing associations permitting talented domain
experts to identify new relationships using data mining that encourages
52
Portions of this subsection II.C.4. are adapted in part from Bagby, John W., Trade
Secrets and Database Protection, Ch. 5 in: CYBERLAW HANDBOOK FOR ECOMMERCE,
(©2003, West Pub. Co. (Cengage) Mason OH; ISBN: 0-324-26028-8).
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
19
decision-making on new and potentially valuable factors: new associations
are discovered, new sequences are uncovered and new data classifications
or clusters are constructed. Big data analysis enables better forecasts. 53
a. Database Protections Under U.S. IP Law: Copyright
In the U.S., databases receive weak protection as a form of intellectual
property (IP) under one or more of several legal rights regimes: tort, trade
secret, copyright and contract. These theories are longstanding and wellestablished. For example, the tort theories have been used to protect
databases from unauthorized access, use or resale by an infringing outsider:
misappropriation, trespass, conversion and trade secrets. Other legal
theories are supportive: breach of contract, database license violations, and
breach of employment or confidentiality agreement.
Weak database protections is often illustrated under copyright law.
Copyright protects only the form of expression and not the underlying
ideas: hence the idea- expression dichotomy. However, copyright may
protect databases as a compilation when it includes a unique selection and
arrangement of facts. The compilation results from selecting, coordinating,
arranging or organizing existing materials. Infringement occurs when the
accused work copies the selection and arrangement of a protected
compilation. Independently created but similar selections and arrangements
are unlikely infringing. Copyright acknowledges sufficient originality and
creativity in making non-obvious choices from among numerous choices. 54
Most of the weakness in copyright protection of databases was settled in
Feist. 55 Rural Telephone’s white page telephone directory was a regulatory
requirement. Name and address information, provided by telephone
customers, was listed alphabetically by surname for the directory. Feist was
not liable for copyright infringement in the copying of Rural’s directory to
publish a competing directory. The Supreme Court found insufficient
“authorial” originality in a telephone company white pages listing of
customers to deserve protection as a compilation.
53
Winn, The Emerging Law of Commercial Transactions in Electronic Consumer
Data, 56 Bus.Law. 22?, 235 (Nov.2000).
54
Matthew Bender & Co. v. Hyperlaw, Inc., 168 F.3d 674 (2d Cir.1990).
55
Feist Publications, Inc. v. Rural Telephone Service Co., Inc., 499 U.S. 340 (1991).
20
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
b. Database Protections Under U.S. IP Law: Trade Secrecy
Database protection under U.S. law is much more common using
technical security measures and contract restrictionst. Technical methods
include a variety of electronic, security, programming and physical
safeguards that lock up the data. For example, encryption of the data on a
CD-ROM would impede infringement. Similarly, it would be very time
consuming to acquire all the data served from a website if the site’s search
engine responds to queries with only piecemeal answers. The NADA and
Kelly’s Blue Book vehicle pricing guides deploy this scheme to prevent
wholesale harvesting of “free data” by competitors. To the extent that
technical security measures are effective to restrict access, meter use or
generally impede comprehensive extraction of large portions of an owner’s
data, such methods may be sufficient. While technical progress can
overcome such security controls, they may become more effective when
reinforced with contractual restrictions, such as end-user license agreements
(EULA) configured as click-wrap access barriers.
EULA terms of use and restrictions on decompiling, reverse engineering
and reselling data are all important contractual restrictions impeding
database misappropriation. Databases owners consider their data to be a
form of IP following the licensing regime for their content rather than
outright sales or assignments. Licensing is most appropriate for databases
collections of copyrighted works, video, music, images, etc. By contrast,
licensing factual databases requires physical and contractual controls
elevating the database into trade secrecy. EULA terms generally limit the
licensee’s use, resale, and manipulation of the database and restrict online
access except upon assent to a EULA. ProCD, Inc. v. Zeidenberg56 supports
this “click-wrap” regime. The Uniform Computer Information Transactions
Act (UCITA) was a general failure in establishing a uniform statutory
licensing regime for information in databases.
c. Database Protections Under U.S. IP Law: Sui Generis Schemes
Political pressures following Feist from the database management
industry, from media giants and from foreign nations signaled possible sui
generis database protections. Indeed, the EU urged the WIPO to add sui
generis database protections to multi-lateral trade negotiations following
the EUs March 1996 passage of the EU Directive on the Legal Protection
56
ProCD, Inc. v. Zeidenberg, 86 F.3d 1447 (7th Cir. 1996).
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
21
of Databases 57 Congress has repeatedly failed to comply, controversial
database protection bills have died without floor votes.
Why should a sui generis form of IP be created to cover databases? How
would this encourage more deliberate big data practices, such as negotiated
access and cooperative database maintenance? Concerned scientists have
argue that strong database IP rights could impede scientific research without
sufficient offsetting public benefit. New or extended IP rights would surely
remove at least some valuable data from the public domain. How can the
benefits of new or expanded IP rights offset their costs, particularly given
they are highly speculative? Furthermore, many costs are understated or
overlooked outright by IP rights advocates, such as the systematic
underestimation of of infringement enforcement and compliance costs.58
d. Database Protections Under U.S. IP Law: Constructing a Balanced
Approach
Perhaps a more thorough understanding of the rich diversity in types of
data and databases would inform the deliberate creation of sui generis
database rights. More precise targeting only to industry and government
sectors deserving of property rights incentives seems prudent. This targeting
could confine the inevitable harmful externalities of an abrupt introduction
of a new property rights regime on scientific inquiry, small business and
innovation incentives. For example, the nature of the data, the source of
incentive or funding for its collection, innovation in database architecture
and functionality, the most likely uses of the data, compatibility of technical
security with the known, useful business models, bearing of enforcement
costs and the public interest are all potentially important factors. Without
this basic understanding, it is highly likely that new sui generis database
rights will not help society as much as their proponents argue.
57
OJ 1996 L77/20.
Congressman Kastenmeier argues the burden of proof that new IP rights needed
should are high. Any new, sui generis IP rights should satisfy these criteria: (1) the new IP
right will fit harmoniously with other IP regimes; (2) the new IP rights can be defined in a
reasonably clear and satisfactory manner; (3) the proposed new IP rights must be based on
an honest and rigorous cost-benefit analysis; and (4) the new rights must clearly enrich and
enhance the public domain.
58
22
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
e. Database Protections Under U.S. IP Law: Role of IP Rights 59
Data, in various forms, drive both basic and applied research in most
fields. Data collection and use require many costly steps, from
conceptualization to maintenance. Academia is a useful exemplar to
emerging big data advocates because there exists an ideal of maintaining
public-domain access and a strong belief that society benefits from findings
based on such data. Proposals to tighten the IP controls over databases
appear antithetical to longstanding academic values.
Academic researchers must often pay to access large databases.
Biotechnology firms license out the annotations that enrich the genome
projects, financial economists need comprehensive securities trading
statistics, and effective scholarship in public policy, law, and taxation
requires access to online subscription services. Ever-increasing privacy
rights, confidentiality agreements, and national security restrictions are
curtailing open access to many other important databases. Furthermore,
public research universities squeezed for cash flow from state appropriation
seek to enhance royalty income by following suit: licensing their databases
for use by outside parties.
The issues raised are neither simple nor straightforward. Under U.S.
law, ideas are as “free as the air” unless they are embodied in a patented
invention or kept confidential as trade secrets. However, it is a fundamental
tenet of academic scholarship that information are public goods. U.S.
copyright law protects only the selection and arrangement of the data in a
database, not the data itself. For broader protection, the courts have required
recourse to trade-secret laws employing physical and contractual controls.
Sui generis database protection raises public-policy issues that reveal
the depth of the conflict between open access and proprietary values. This is
not really new territory. Since the 19th century, Congress has created
several forms of sui generis intellectual property rights. Many of these
forms are familiar to the university community: design patents (ornamental
designs, 1842); plant patents (asexual reproduction, 1930); plant varieties
(sexual reproduction, 1970); and semiconductor chips (maskworks, 1984).
Occasionally, the courts and state legislatures have also developed new
subject matter or otherwise expanded intellectual property rights, as has
59
This sub-section is adapted from Bagby, John W. Outlook: Who Owns the Data?
RESEARCH PENN STATE, Vol.26, No.1 (Jan. 2003) accessible at:
http://faculty.ist.psu.edu/bagby/Pubs/WhoOwnsDataRPSJan03.htm
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
23
happened with software, business methods, plant species, and even life
forms. In each case, proponents have argued that stronger property rights
encourage investment that would not otherwise occur, and that such
investment benefits society more than it costs. Today various sectors of the
Internet “content” industry are lobbying heavily for expanded IP
protections. They argue that pirates can too easily “ride free” on the “sweat
of the brow” of database creators. Content providers believe that weak U.S.
protections encourage wholesale infringement, and narrow the possibilities
for profit using various business models reliant primarily on the sale of data.
Critics argue that sui generis database rights would impose substantial
new costs on research, and that they are particularly inappropriate where the
database results from publicly funded work. Indeed, the National Research
Council (NRC) strongly cautions that sui generis database protections could
retard scientific research. Furthermore, the NRC argues that current tradesecret controls and licensing restrictions afford sufficient protection. 60
The significance of this debate is magnified by controversy over
expanding the definition of a database. Traditional definitions are narrow;
they limit databases to organized collections of numerical observations
taken from tightly controlled experiments. By contrast, big data purists
contemplate broader definitions, including structured and unstructured
content of nearly any type of information or work. Sui generis proposals
would broadly define data to include “any physical or digital collection of
information or works arranged in a systematic or methodical way for
retrieval or access by manual or electronic means.” Under this view,
databases would include literary and artistic works, texts, sounds, images,
numbers, facts, statistics, production or shipping information, transactions,
financial data, health information, geographic information, and private
personal data derived from almost any source.
Recent technological developments enhance the value of databases
considerably, raising the stakes. Consider the impact of peer-to-peer filesharing, automated data harvesting by Internet “bots,” electronic agents
posing as human users, and the capability for near-instantaneous
aggregation and association of data from physically separate or independent
databases, resulting in data “mining” and data “warehousing.”
60
A QUESTION OF BALANCE: PRIVATE RIGHTS AND THE PUBLIC INTEREST IN
SCIENTIFIC AND TECHNICAL DATABASES, Committee for a Study Promoting Access to
Scientific and Technical Data for the Public Interest, National Research Council (1999).
24
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
Thomas Jefferson, an accomplished inventor and the first federal
administrator of federal patents rights as Secretary of State, remains an
important influence on intellectual property rights in the U.S. Jefferson
insisted that society should not suffer the “embarrassment” of granting a
monopoly in intellectual property rights unless society's benefits are clearcut and substantial. Modern Jeffersonians expand this reasoning to argue
that the burden of proof must be kept very high on proposal to expand IP
rights. They argue that new rights must: (1) fit harmoniously with existing
intellectual property protections; (2) be defined in a reasonably clear and
satisfactory manner; (3) be based on an honest cost-benefit analysis; and (4)
clearly enrich and enhance the public domain. The ultimate question is
whether the benefits of new or expanded IP regimes will offset their costs.
Many projections of such benefits are highly speculative and too often the
costs are overlooked. For example, the costs of privacy intrusion,
infringement enforcement and compliance are systematically
underestimated in public policy debates, largely to the benefit of IP
professionals.
A more thorough understanding of the diverse types of databases arising
under evolving and advancing big data techniques seems essential before
any new form of IP is created. Database rights may subvert scholarship if
they are not more precisely targeted. Abrupt introduction of new IP rights
could negatively impact scientific inquiry but also small business creation
and incentives for other types of innovation.
Academics must participate in the big data IP rights debate by
developing some consensus on the information policy issues raised. It
seems fundamental, first, to distinguish clearly between various types of
databases according to such factors as the nature of the data, the source of
funding for data collection and maintenance, the degree of innovation in
database architecture and functionality, the likely uses for the data, and
other considerations. Without a common basic understanding and some
sense of others’ experience with database protections, big data IP rights will
miss the mark.
Table I illustrates the available IP regimes historically and currently
useful in protection of databases in the U.S.
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
25
Table I: Database Protection Laws
Strengths
Weaknesses
Type of
Database Law
Tort:
Protects data even if
Misappropriation technical security
measures or contractual
restrictions are not used
Tort:
Most common in U.S.
Trade Secrets
Requires technical
security measures or
contractual restrictions
Tort:
Protects information
Trespass to
stored in owner’s
personal property, such
Chattels
as computer or network
storage
Copyright
Strong prohibition
against copying, rewards
creativity in selection and
arrangement
Contract
Common protection in
the U.S. Enforceable to
prevent misuse,
disclosure, copying or
reverse engineering
Sui Generis
Stronger protection than
under nearly any other
theory; validates the
“sweat of the brow
theory”
Protects only “hot news”
or other elements not part
of copyright law
protections
Rights are lost when
information is disclosed
into the public domain
Trespass theory could be
expanded so far that
interactivity would be
severely impeded in
network settings
Feist limits protection to
compilations made with
creativity in selection and
arrangement; underlying
ideas and facts are not
copyrightable
Requires privity, not
applicable unless user
assents to restrictions;
form and conscious
understanding of assent
required remains unclear
Constitutionality suspect;
could impede research and
narrow the public domain;
offers excessive rewards
for some uncreative work
III. RISK SHIFTING UNDER ITS BIG DATA
ITS allegedly enables societal resolution of a wish list of horribles:
environmental catastrophe, recurring and extraordinary infrastructure costs,
public safety costs, unattainable law enforcement improvements and
26
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
national security surveillance accuracy. Few ITS systems are possible
without big data. Therefore, the promised benefits of ITS are attainable
mostly with big data applications that do not impose negative externalities 61
beyond their promise of benefits. This section acknowledges that (1) ITSBig Data will have unintended consequences, (2) that ITS-Big Data
proponents may promise illusive benefits and (3) implementations by the
states and local governments, federal government, NGOs, and third party
private-sector service providers will sometimes fall short of the perfection
promised. Indeed, side-effects are likely to be manifest as both positive
externalities (socially beneficial side effects) as well as negative
externalities (side effects imposing social costs).
Big Data foreseeable and unforeseen impacts will likely be addressed by
various well-known risk shifting techniques, although it can be expected
that novel risk management (identification, assessment, control, shifting)
techniques will be developed and deployed. 62 Longstanding legal and
regulatory concepts will most certainly apply to ITS-Big Data including
those bedrock risk shifting problems such as insurability, insurable interest,
insurance market development, statutes of repose, statutes of limitation,
causation, foressability, joinder, class action, mass tort, hold harmless,
acknowledgements, release, cognovit, software/systems/database licensing,
and public acceptance. Figure IV illustrates risk transfer points in the
generalized ITS architecture
Figure IV: Risk Shifting under Generalized ITS Architecture
Source: Office of the Assistant Secretary for Research and Technology, Intelligent
Transportation Systems Joint Program Office, Department of Transportation
61
See generally Coase, Ronald H., The Problem of Social Cost, 3 J. L. & ECON. 1-44
(Oct. 1960) accessible at Stable URL: http://www.jstor.org/stable/724810 (arguing
negative externalities should be viewed only as a property rights problem that is resolved
only with private contracting by parties that “will negotiate voluntary agreements that lead
to the socially optimal resource allocation and output mix regardless of how the property
rights are assigned.”)
62
See generally, ISO/DIS 31000 (2009). Risk management — Principles and
guidelines on implementation. International Organization for Standardization.
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
27
IV. IMPLICATIONS OF PUBLIC POLICY IMPACTS ON ITS BIG DATA
In the regulatory and private sector developmental history of ITS, public
safety has assumed a super-ordinate position from among all the ITS goals
of public safety, environmentalism, efficiency, infrastructure costs,
mobility, convenience, national security and law enforcement. It seems
clear that big data feeds all these ITS goals, but in somewhat different ways
and to different extent. Therefore, this section will address how public
policy application to ITS systems can choose either (1) weak, omnibus or
general application to all ITS systems or (2) the more likely result of
predictable U.S.-style policy formulation resulting in a more sectoral
approach to regulation, with particular sectoral results for the major big data
regulatory themes of privacy and security. Thus, this section acknowledges
that it may be more useful to decompose ITS into particular ITS
applications susceptible to this sectoral approach, reserving omnibus
approaches to the common elements of ITS technologies (e.g., wireless
telecommunications, sensor arrays, data centers located in the cloud).
A. Omnibus vs. Sectoral Approaches: Regulation of ITS & Big Data 63
The accumulation and use of big data for [ITS] analytics and ultimately
prediction suggests strong merit in contrasting the different approaches to
privacy rights [and cyber security duties] under the American experience [as
compared to] that regime largely dominant in European nations. U.S.
privacy rights are generally described as sectoral; typically regulation is
imposed ex post, only after disputes take shape and public policy intervenes
in response to the need to settle disputes among competing forces. By
contrast, the post-WWII European experience is omnibus; regulation is
pervasive over economic sectors ex ante. This difference illustrates a
cultural clash over liberty [America] and paternalism [Europe].
Two important values are at play in the American sectoral approach: 64
63
Subsection A is adapted from text accompanying note 21 in John W. Bagby, Using
an Industrial Organization (I/O) Lens to Enhance Predictive Analytics: Disentangling
Emerging Relationships in the Electronic Surveillance Supply Chain, LEGAL AND ETHICAL
ISSUES IN PREDICTIVE DATA ANALYTICS, Virginia Tech University, June 20, 2014.
64
Some components of this section are adapted from a white paper by Bagby, John
W., Regulation of Private Data Management: a Supply Chain Analysis, (Oct. 2013) which
is further adapted from excerpts found in Bagby, John W., Privacy, Chapter 13 in:
ECOMMERCE LAW, (©2003; West Publishing Co. Mason OH,) and Bagby, John W.,
Managing Information Rights to Develop the Energy Informatics Field (August 4, 2012)
28
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
(1) individual autonomy and (2) personal experience. This means that
privacy is compared with ownership over data observed, the meaning given
that data and the intangible intellectual property rights to use this
knowledge or sell it as a product or service. The controversy over privacy
[or security] regulation is essentially a struggle to resolve these often
conflicting claims. It occurs when society finds it appropriate to intervene
by setting policies [balancing the] benefit[s for] individuals and society. For
example, this conflict is settled in favor of individual privacy interests when
restrictions are placed on data collection or use, such as the well-known
financial and health care information privacy restrictions. By contrast, the
balance can also favor the rights to use and own observations and
perceptions - examples include broadening law enforcement investigatory
powers and the protectability of corporate information and data as trade
secrets.
B. Discussion: Big Data Risk Profiles for Particular ITS Applications
Table I reveals the results of analysis of risk shifting likely for particular
ITS Applications reliant on big data.
Table II: Analysis of Results –
Regulation of Risk Addressed by ITS Big Data
Accessible at:
http://faculty.ist.psu.edu/bagby/Pubs/ITS-BigDataTableII.pdf
V. DISCUSSION AND NEXT STEPS
A. Specific Policy Risks for ITS-Big Data
Many fields of law are implicated by ITS and big data separately.
However, the fusion of ITS and big data produce even more concentrated
regulatory and liability interests of affected parties. Many policy issues are
simple extensions or analogical extensions from adjacent fields while others
arise de novo given the intensity and divergence generated by the fusion of
available at SSRN: http://ssrn.com/abstract=2564257 or
http://dx.doi.org/10.2139/ssrn.2564257 originally presented as developmental (working)
paper No. ALSB2012_0182, Academy of Legal Studies in Business, Kansas City MO Aug.
2012 (developing generalized data architecture susceptible to public policy analysis,
regulation and incentives).
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
BIG DATA & ITS
29
ITS and big data.
First, decision-making based on analysis of big data is generally
associative rather than causative. 65 According to one author:
Big data analytics probably work best when they provide interesting
or promising “leads” for further analysis, measurement and
evidence gathering. Thus, it is easy to predict big growth for
predictions based on big data analytics for decision-making
unconstrained by: (1) the traditional professional and scientific
inquiry standards for creation of new “knowledge,” or (2) any
principles of regulatory fairness, or (3) any competent and careful
satisfaction of due process in litigation. Furthermore, as prediction
accuracy of big data analytics is believed to improve, it is
predictable that pressures will mount to continually improve these
66
analytic processes and apply them in more realms.
Science fiction writers, privacy zealots and big data theorists appear to
uniformly recognize that big data holds strong promise for unjust outcomes.
Indeed, due process in criminal prosecutions require proof beyond a
reasonable doubt closer to causation. This is repeatedly argued to make any
decisions impacting life, liberty or property based on big data associations
useful only at the investigation stage not at findings of guilt or liability.
When traditional due process interests or life, liberty, or property are
implicated as direct stakes, current big data techniques run huge injustice
risks largely from a constitutional perspective.
Civil law’s preponderance of the evidence stands in a middle ground
both between direct causation and circumstantial sufficiency. Some
associative reasoning based on circumstantial evidence is noted.
Furthermore, regulatory rulemaking and standardization permit much more
associative reasoning to comprise causation findings. Law enforcement
leads are generally predicated on associative causation, thus using big data
implications solely as “leads.” While “leads” have some impact on life,
65
See generally, Bagby, John W., Book Review, 14 J. L.ECON.& POLICY – (2008) of
Harcourt, Bernard E., AGAINST PREDICTION: PROFILING, POLICING, AND PUNISHING IN AN
ACTUARIAL AGE, (2006).
66
See generally, Bagby, John W., Using an Industrial Organization (I/O) Lens to
Enhance Predictive Analytics: Disentangling Emerging Relationships in the Electronic
Surveillance Supply Chain, LEGAL AND ETHICAL ISSUES IN PREDICTIVE DATA ANALYTICS,
Virginia Tech University, June 20, 2014, accessible at:
http://faculty.ist.psu.edu/bagby/Pubs/BagbyWorkingPaperPredictiveAnalytics.pdf
30
BIG DATA & ITS
April 17, 2015
Ind.Univ./Va.Tech.
Big Data Colloquium
liberty or property, they are usually weak impacts and very considerable
due process robust legal process must follow that uncovers more causative
relationships between evidence adduced and conclusions that prosecutors
attempt to prove with direct and some circumstantial evidence. Key to the
direct action taken on big data in the law enforcement scenario is that
limited law enforcement resources are allocated based on the strength of big
data evident “leads.” This means that other somewhat “less promising”
leads are allocated less enforcement resources. At the margin, the role of big
data is to optimize limited law enforcement resources. Political
reconciliation based on conviction records, overturned convictions and law
enforcement embarrassment are the major disciplines on use of big data
“leads,” “hunches,” and suspicions.
Similarly, national security investigations and action seldom relies on
direct causation evidence sufficient to prove guilt beyond a reasonable
doubt. In its decision-making, aounter-terrorism, national security and other
intelligence community (IC) decisions are based almost solely on
associative findings. These are weak from a due process standpoint, but
appear to satisfy the prevailing IC and black operations decision-making
standards when sufficiently “probable” using “confidence” assessments of
experienced IC analysts and well-trained field agents. Again, like in law
enforcement, projections are made to investigate further. However, spy
movies abound that agents with “license to kill” may take direct action
impacting due process rights based on big data analysis. Big data plays
bigger role in such decision-making.