Lecture 17 16/3/15

Lecture 17
16/3/15
Distributed Databases
Site Autonomy
• Each server participating in a distributed database is
administered independently (for security and backup
operations) from the other databases, as though each
database was a non-distributed database.
• Although all the databases can work together, they are
distinct, separate repositories of data and are administered
individually.
2
Data Dictionary
• A read-only set of tables that provides information about the
database. A data dictionary contains:
• The definitions of all schema objects in the database (tables,
views, indexes, clusters, synonyms, sequences, procedures,
functions, packages, triggers, and so on)
• How much space has been allocated for, and is currently used by,
the schema objects
• Default values for columns
• Integrity constraint information
• The names of Oracle users
• Privileges and roles each user has been granted
• Auditing information, such as who has accessed or updated
various schema objects
• Other general database information
3
Use of a Data Dictionary
• The data dictionary has three primary uses:
• Oracle accesses the data dictionary to find information about
users, schema objects, and storage structures.
• Oracle modifies the data dictionary every time that a data
definition language (DDL) statement is issued.
• Any Oracle user can use the data dictionary as a read-only
reference for information about the database
4
Continued..
• The Oracle Database user SYS owns all base tables and useraccessible views of the data dictionary.
• Data in the base tables of the data dictionary is necessary for
Oracle Database to function.
• Therefore, only Oracle Database should write or change data
dictionary information.
• No Oracle Database user should ever alter rows or schema
objects contained in the SYS schema because such activity can
compromise data integrity. The security administrator must
keep strict control of this central account.
5
Database Schema
• A schema is the set of metadata (data dictionary) used by the
database, typically generated using DDL. A schema defines
attributes of the database, such as tables, columns, and
properties. A database schema is a description of the data in a
database.
• SCOTT is a schema that includes the EMP, DEPT and BONUS tables
with various grants, and other stuff.
• SYS is a schema that includes tons of tables, views, grants, etc etc
etc.
• SYSTEM is a schema.....
6
Continued
7
Schema Objects and Naming in a
Distributed Database
• A schema object (for example, a table) is accessible from
all nodes that form a distributed database.
• Therefore, just as a non-distributed local DBMS
architecture must provide an unambiguous naming
scheme to distinctly reference objects within the local
database, a distributed DBMS must use a naming scheme
that ensures that objects throughout the distributed
database can be uniquely identified and referenced.
• To resolve references to objects (a process called name
resolution) within a single database, the DBMS usually
forms object names using a hierarchical approach.
8
Continued..
• For example, within a single database, a DBMS guarantees
that each schema has a unique name, and that within a
schema, each object has a unique name.
• Because uniqueness is enforced at each level of the
hierarchical structure, an object's local name is guaranteed to
be unique within the database and references to the object's
local name can be easily resolved.
• DDBMs simply extend the hierarchical naming model by
enforcing unique database names within a network.
9
10
Two-Phase Commit Mechanism
• A DBMS must guarantee that all statements in a transaction,
distributed or non-distributed, are either committed or rolled
back as a unit, so that if the transaction is designed properly,
the data in the logical database can be kept consistent.
• The effects of a transaction should be either visible or invisible
to all other transactions at all nodes; this should be true for
transactions that include any type of operation, including
queries, updates, or remote procedure calls (RPCs).
11
Transparency Continued..
• A distributed DBMS architecture should also provide facilities to
transparently replicate data among the nodes of the system.
• Maintaining copies of a table across the databases in a distributed
database is often desired so that tables that have high query and
low update activity can be accessed faster by local user sessions
because no network communication is necessary.
• If a database that contains a critical table experiences a prolonged
failure, replicates of the table in other databases can still be
accessed.
• A DBMS that manages a distributed database should make table
replication transparent to users working with the replicated tables.
• Finally, the functional transparencies explained above are not
sufficient alone. The distributed database must also perform with
acceptable speed.
12
Transparency in a Distributed
Database System
• The functionality of a distributed database system must be
provided in such a manner that the complexities of the
distributed database are transparent to both the database
users and the database administrators.
• Location transparency exists if a user can refer to the same
table the same way, regardless of the node to which the user
connects. Location transparency is beneficial for the following
reasons:
• Access to remote data is simplified, because the database users
do not need to know the location of objects.
• Objects can be moved with no impact on end-users or database
applications.
13
Continued..
• A distributed database system should also provide query,
update, and transaction transparency. For example, standard
SQL commands, such as SELECT, INSERT, UPDATE, and DELETE,
should allow users to access remote data without the
requirement for any programming.
• Transaction transparency occurs when the DBMS provides the
functionality described below using standard SQL COMMIT,
SAVEPOINT, and ROLLBACK commands, without requiring
complex programming or other special operations to provide
distributed transaction control.
• The statements in a single transaction can reference any
number of local or remote tables.
14
Distributed Databases
• In a non-heterogeneous environment, the
client and server character sets should be
either the same as or subsets of the main
server character set
In a heterogeneous environment, the NLS settings of
the client, the transparent gateway and the nonOracle data source, should be either the same or a
subset of the Oracle server NLS character set
15
Homogenous Environment
Heterogonous Environment
Data Replication
• store a separate copy of the full tables in each location
• if a copy is stored at every site: Full Replication
• Advantages:
• reliability
• fast response
• Disadvantages
• storage requirements
• complexity and cost of updating
16
Database Replication
• Replication is the process of copying and maintaining database
objects in multiple databases that make up a distributed
database system.
• Changes applied at one site are captured and stored locally
before being forwarded and applied at each of the remote
locations.
• Replication provides user with fast, local access to shared
data, and protects availability of applications because
alternate data access options exist.
• Even if one site becomes unavailable, users can continue to
query or even update the remaining locations.
17
Continued
• In a multimaster replication environment, the replication groups are
called master groups. Corresponding master groups at different sites
must contain the same set of replication objects
• Multi-master replication allows data to be stored by a group of
computers, and updated by any member of the group. All members
are responsive to client data queries.
• The multi-master replication system is responsible for propagating
the data modifications made by each member to the rest of the
group, and resolving any conflicts that might arise between
concurrent changes made by different members.
• Master group "SCOTT_MG" contains an exact replica of the
replicated objects at each master site.
18
Master Group
19
Snapshot site
"Group A" at the snapshot site maintains only
a partial replica of master group "Group A" at
the master site, while the "Group B" snapshot
and master groups maintain a complete replica
20
Replication Sites
• A master site maintains a complete
copy of all objects in a replication
group.
• A snapshot site supports read-only and
updateable snapshots of the table data
at an associated master site.
• All master sites in a multimaster
replication environment
communicate directly with one
another to propagate data and
schema changes in the replication
group.
• A replication group at a master site is
more specifically referred to as a
master group.
• A replication group's master
definition site is a master site serving
as the control point for managing the
replication group and objects in the
group.
• A snapshot site's table snapshots can
contain all or a subset of the table data
within a replication group.
• However, these must be simple
snapshots with a one-to-one
correspondence to tables at the
master site.
• For example, a snapshot site may
contain snapshots for only selected
tables in a replication group.
• And a particular snapshot might be
just a selected portion of a certain
replicated table.
• A replication group at a snapshot site
is more specifically referred to as a
snapshot group.
• A snapshot group can also contain
other replication objects.
21
Replication Conflicts
• Asynchronous multimaster and updateable snapshot
replication environments must address the possibility of
replication conflicts that may occur when, for example, two
transactions originating from different sites update the same
row at nearly the same time.
• When data conflicts do occur, you need a mechanism to
ensure that the conflict will be resolved in accordance with
your business rules and that the data converges correctly at all
sites.
22
Replication Conflicts Continued..
• In addition to logging any conflicts that may occur in your
replicated environment, Oracle replication offers a variety of
conflict resolution methods that will allow you to define a
conflict resolution system for your database that will resolve
conflicts in accordance with your business rules.
• If you have a unique situation that Oracle's pre-built conflict
resolution methods cannot resolve, you have the option of
building and using your own conflict routines.
23