Experiment 4.3 How to defragment persistent storage at an extent level ? .3

.3
CSCI315 Database Design and Implementation
Experiment 4.3
How to defragment persistent storage at
an extent level ?
Experimented and described by Dr. Janusz R. Getta
School of Computer Science and Software Engineering, University of Wollongong, Australia,
Bldg. 3, room 210, phone +61 02 42214339, fax +61 02 42214170,
e-mail: [email protected], Web: http://www.uow.edu.au/∼jrg, Msn: jgetta, Skype: jgetta007
Table of contents
Step 0 How to begin and what you need to know before you start ?
Step 1 How to find the values of PCTRFREE and PCTUSED parameters ?
Step 2 How to analyze the storage parameters of a relational table ?
Step 3 How to find the values of row identifiers ?
Step 4 How to analyze the distribution of rows within the data blocks ?
Step 5 How to eliminate the empty blocks from the extents ?
Step 6 How to release the empty extents ?
Step 7 How to clean up after the experiment ?
References
Actions
Step 0 How to begin and what you need to know before you start ?
A printable copy of this experiment in pdf format is available here .We speak about fragmentation of persistent storage at an extent level in a situation when an extent consists of
many data blocks and some of these blocks are empty. Then, it is possible to move the rows
0
These are the specifications of the homeworks in a subject Database Design and Implementation
(CSCI315) delivered in January 2009 at Singapore Institute of Management by Dr. Janusz R. Getta
4-1
Experiment 4.3: How to defragment persistent storage at an extent level ?
4-2
from the other not fully occupied extents into a lower number of extents. It reduces the
total number of read block operations performed by the system during a full scan of relational table. Defragmentation at an extent level means that empty blocks within the extents
are removed and the empty extents allocated at the end of segment are released. Defragmentation at an extent level is performed when a segment goes through intensive deletion
operations or when a parameter PCTFREE has been set to a wrong value. You can learn more
about PCTFREE and PCTUSED parameters at the end of this action. Turn the system on (if
it is off) and logon as PC user. Download and uncompress SQL scripts used in Homework
4. Use cd command to navigate to a folder where the downloaded and uncompressed SQL
scripts are located. Start SQL*Plus client in a way described in either Experiment 1.1 for
XP operating system or in Experiment 1.2 for Linux operating system. Connect as a user
STUDENT . A password is: student .While connected as a user STUDENT , execute a script
file creatbs40k.sql to create a locally managed tablespace TBS40K with a uniform size of
each extent equal to 40 Kbytes.Remain connected as a user STUDENT . Next, execute a script
grantquota.sql to grant 5 Mbytes quota on a tablespace TBS40K to a user STUDENT .Execute a
script makedef.sql to make a tablespace TBS40K a default tablespace of a user STUDENT .Execute a script dbcreate4.sql to create a sample database. The space management parameters
PCTFREE and PCTUSED control the use of free space within a data block by insert, update,
and delete statements. Both parameters should be determined when creating (or altering)
the database objects. PCTFREE parameter sets the minimum percentage of a data block to
be reserved as free space for possible updates to rows that already exist in that block. For
example, assume that CREATE TABLE statement sets a value of PCTFREE to 20. It means
that 20will be kept free and it will be available for the future updates on the rows within
each block. New rows can be inserted into a block before it is 80PCTUSED parameter sets
the minimum percentage of a block that can be used for row data plus overhead before new
rows can be added to the block. After a data block is filled to the limit determined by
PCTFREE , the system considers the block as unavailable for the insertions of new rows until
the percentage of occupancy in that block falls below a value of parameter PCTUSED . Until
this value is achieved, the system uses free space in the data block only for the updates to
the rows already contained in the block. For example, assume that CREATE TABLE statement
sets PCTUSED to 40. Then, each data block used for this table’s data segment is considered
unavailable for the insertion of any new rows until the amount of used space in the block
falls to 39(assuming that the block’s used space has previously reached a value of PCTFREE ).
Freelist is a list of data blocks allocated for a segment’s extents such that each block has an
amount of free space greater than a value of parameter PCTFREE for this segment. When the
system processes INSERT statement then a free list is scanned to find the first block available
for the insertion. Processing of DELETE and UPDATE statements that decrease amount of
storage used in a block below a value of PCTUSED parameter cause the block to be appended
to a free list. Then, subsequent insertions into the block reduce amount of free space below PCTFREE the block is excluded from a freelist. A freelist forms a chain of free blocks
in each segment. There are different types of freelists: master freelist, process freelist and
transaction freelist.
Experiment 4.3: How to defragment persistent storage at an extent level ?
4-3
Step 1 How to find the values of PCTFREE and PCTUSED parameters ?
While connected as a user STUDENT , execute a script file listpcts.sql to list the values of
PCTFREE and PCTUSED parameters for the relational tables POSITION, SNEEDED, APPLICANT owned
by a user CSCI315 .For example, to get the values of PCTFREE and PCTUSED parameters for
a relational table APPLICANT owned by a user CSCI315 the script accesses a data dictionary
view DBA_TABLES in the following way:
SELECT TABLE_NAME,
OWNER,
TABLESPACE_NAME,
PCT_FREE,
PCT_USED
FROM SYS.DBA_TABLES
WHERE TABLE_NAME = ’APPLICANT’ AND
OWNER = ’CSCI315’;
All relational tables owned by a user CSCI315 and included in a sample database of TPC-W
benchmark have the default values of PCTFREE and PCTUSED parameters equal to PCTFREE=10 and
a value of PCTUSED is NULL because a tablespace CSCI315 has been created with a default
value of parameter SEGMENT SPACE MANAGEMENT i.e. a value used wasAUTO .
Step 2 How to analyze the storage parameters of a relational table ?
A data dictionary view DBA_TABLES contains a lot of statistical data related to the storage
parameters of the relational tables. For example, it contains information about a number
of empty (never used) blocks, the total number of rows stored in a relational table, average
amount of free space per data block, the total number of chained rows, i.e. the rows that
span over more than one block, average row length, average amount of free space of all blocks
listed on the freelists, the total number of blocks on freelist, etc. While connected as a user
STUDENT , execute a script file listanalysis.sql to analyze the storage structures of a relational
table APPLICANT owned by a user STUDENT .The script executes ANALYZE TABLE statement
with COMPUTE STATISTICS clause in order to perform the full analysis of a relational table.
The results of analysis are stored in a data dictionary view DBA_TABLES and retrieved by
the script from there. Because of the performance reasons the system does not update
all statistics after each modification of a table. For example, to analyse a relational table
ITEM owned by a user CSCI315 the script executes the following statements:
ANALYZE TABLE CSCI315.ITEM COMPUTE STATISTICS;
Experiment 4.3: How to defragment persistent storage at an extent level ?
4-4
SELECT TABLE_NAME,
OWNER,
TABLESPACE_NAME,
NUM_ROWS,
AVG_SPACE,
CHAIN_CNT
FROM SYS.DBA_TABLES
WHERE TABLE_NAME = ’ITEM’ AND
OWNER = ’CSCI315’;
SELECT TABLE_NAME,
AVG_ROW_LEN,
NUM_FREELIST_BLOCKS,
AVG_SPACE_FREELIST_BLOCKS
FROM SYS.DBA_TABLES
WHERE TABLE_NAME = ’ITEM’ AND
OWNER = ’CSCI315’;
Remain connected as a user STUDENT and execute a script listanalysis.sql to perform the
analysis of the relational tables ITEM, CUSTOMER, COUNTRY owned by a user CSCI315 .Analysis of a relational table ITEM provides the following data. The table consists of 10,000 rows.
Average amount of free space per block is equal to 1,112 bytes. It means that an average
block is around 86There is no rows that span over more than one data block, i.e. so called
”chained rows”. Average row length is 502 bytes, which means that each block contains on
average 14 rows. The total number of allocated data blocks and still available for insertions,
i.e. the blocks where occupancy level after an insertion does not exceed 100-PCTFREE =
90equal to 5. An average free space in each block available for insertion is equal to 7,346
bytes.
Step 3 How to find the values of row identifiers ?
A row identifier uniquely determines a physical location of a row in a persistent storage.
A row identifier consists of the following values: (i) file number, (ii) block number within
a file, and (iii) row number within a block. Access to a relational table through a row
identifier is the fastest possible way how a row can be read from a relational table. In
implementation of data application this kind of access is not recommended as the system
does not guarantee persistence of the row identifiers, i.e. reorganisation of a relational table
changes the values of row identifiers. While connected as a user STUDENT , execute a script file
listrowids.sql to lists the identifiers of rows in a relational table APPLICANT owned by a user
STUDENT .The script uses the functions ROWID_RELATIVE_FNO, ROWID_BLOCK_NUMBER, and
ROWID_ROW_NUMBER from a PL/SQL package DBMS_ROWID to extract the values of file number,
Experiment 4.3: How to defragment persistent storage at an extent level ?
4-5
relative block number in a file, and relative row number in a block. For example, the
identifiers of rows located in a relational table COUNTRY can be listed in the following way:
SELECT
DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) FNO,
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BNO,
DBMS_ROWID.ROWID_ROW_NUMBER(ROWID) RNO
FROM CSCI315.COUNTRY
ORDER BY BNO;
Execute a script listrowids.sql to list the identifiers of rows in a relational table COUNTRY owned
by a user CSCI315 .The script reports the locations of 92 rows included in a relational table
COUNTRY All rows are located in one data block. File number is equivalent to file identifier ( File id. ) reported by a script listfiles.sql . When listing the names of files enter
CSCI315 as name of tablespace. Execution of the script provides a full name of a file that
contains the rows included in a relational table COUNTRY .
Step 4 How to analyze the distribution of rows within the data blocks ?
While connected as a user STUDENT , execute a script creatab2k.sql that contains the following statement.
CREATE TABLE TAB2K(
ATTRIBUTE1 CHAR(1024),
ATTRIBUTE2 CHAR(1024))
PCTFREE 30
PCTUSED 70;
The script creates a relational table TAB2K with the row length equal to 2 Kbytes and located
in a locally managed tablespace TBS40K with the uniform extent size equal to 16 Kbytes. At
the moment the tablespace is a default tablespace of a userSTUDENT .Each row in a relational
table TAB2K consist of two fixed size strings each 1 Kbytes long. Size of each extent is equal
to 40 Kbytes. A value of PCTFREE parameter is equal to 30, which means that 30must be
left free for updates. A value of PCTUSED parameter is equal to 70, which means a block
will be available for insertions when its occupancy level will go below 70Execute a script
listexts.sql . The script should find the allocation of extents in a relational table TAB2K .The
results reported by the script confirms our expectations that a relational table TAB2K consist
of one extent 40 Kbytes long (5 data blocks, each 8 Kbytes). Note, that the first three block
in the first extent are always used by the system for the organizational purposes. Next,
execute a script listanalysis.sql to analyze and to list the storage properties of a relational
table TAB2K . The results of analysis are trivial. At the moment a relational table TAB2K is
empty. Execute a script ins7rows.sql that inserts 7 rows like:
Experiment 4.3: How to defragment persistent storage at an extent level ?
4-6
INSERT INTO TAB2K VALUES(’01’,’01’);
Then, execute a statement:
SELECT COUNT(*)
FROM TAB2K;
to verify the total number of rows in a relational table TAB2K .Insertions of the rows should
trigger allocation of the new extents. Note that, a value of an attribute ATTRIBUTE1 in a
table TAB2K uniquely identify each row. It allows for the selective deletions of the rows later
on. Execute a script listexts.sql to find the total number of extents allocated for a relational
table TAB2K after the insertions. To accommodate 7 rows the system allocated 2 extents: 40
Kbytes and 40 Kbytes. A data block, which is 8 Kbytes long may theoretically accomodate 3
rows. It is impossible to store 4 rows because to store a row in in a block the system needs 2
Kbytes for data included in a row and 2 bytes for an entry in a row directory within a block.
Note, that a relational table TAB2K has been created with a value of parameter PCTFREE equal
to 30. It means that 30Therefore, only 8 Kbytes - (8 Kbytes*0.3) = 5.6 Kbytes is available
for the insertions. A value of PCTFREE parameter equal to 30 limits the total number of
rows per block to 2. Remain connected as a user STUDENT . To verify the row distribution
analysis described above, execute a script listrowids.sql for a relational table TAB2K . The
results listed by the script confirm our expectations - each data block (except the first one)
contains 2 rows. Execute a script listmap.sql to list a map of free an allocated extents in
a tablespace TBS40K .The tablespace consists of a number of extents. Some of the extents
are allocated by the relational tables of a sample database created at the very beginning
of this experiment. Two extents, each one 40 Kbytes long, are allocated by a relational
table TAB2K The last extent is free and it contains all free space in the tablespace. Next,
Execute a script listanalysis.sql to analyze and list the storage properties of a relational table
TAB2K after the insertion of 7 rows. The results of analysis are as follows. Average free space
per block is equal to 6,013 bytes. This is because the total amount of disk storage allocated
in 2 extents is is 7 blocks * 8 Kbytes = 56 Kbytes. If we forget about the space consumed
by a row dictionary and variable headers in the blocks then implementation of a row takes
2 Kbytes. It means that data occupy 7 rows * 2 Kbytes = 14 Kbytes. Hence, 56 Kbytes
- 14 Kbytes = 42 Kbytes are free. A free area is distributed over 7 blocks (as we already
mentioned the first extent contributes only with two blocks), hence an average free space per
block is equal to 42 Kbytes / 7 blocks = 6 Kbytes, which is almost exactly the same result as
obtained from ANALYZE TABLE statement. Execution of a script listanalysis.sql reports that
there are no data blocks on free list. Remain connected as a user STUDENT . In the next step,
execute a script del4rows.sql to delete the first 4 rows from a relational table TAB2K and to
commit the deletions. The script executes the statements:
DELETE
FROM TAB2K
Experiment 4.3: How to defragment persistent storage at an extent level ?
4-7
WHERE ATTRIBUTE1 <= ’04’;
COMMIT;
Again, execute the scripts: listexts.sql , listanalysis.sql , listrowids.sql .The results of the
analysis performed above show that even after the deletions a relational table TAB2K still
consists of 3 extents. None of the extents has been deallocated because the table has not
been dropped or truncated. Comparison if the results returned listrowids.sql with the results
produced earlier shows that the second block in the first extent and the first block in the
second extent are empty. These blocks has been attached to a freelist of blocks for this
segment. This is why the total number of blocks on a freelist is now equal to 4. The blocks
attached to a freelist are available for the insertions. It is also an evidence that deletion
of the rows from a relational table does not return the earlier allocated extents to a pool
free space available in a tablespace. Deletion of the rows from a relational table creates the
”holes” inside the extents. This effect is called as fragmentation of disk space at an extent
level. Unfortunately, the system does not remove these holes automatically. It may happen
that even more than 50is not used and still a full scan of the table access all data blocks.
Therefore, extent-level fragmentation has a very negative impact on performance of full table
scans.
Step 5 How to eliminate the empty blocks from the extents ?
To eliminate fragmentation of persistent storage at an extent level we can use the same
techniques as for defragmentation at a segment level, i.e. either create a new table and copy
the contents of old table to a new table, drop the table, and rename new into old, or export
and import a table. Elimination of fragmentation at an extent level allows more rows to be
packed within each block and as the result it reduces a number of read block operations.
While connected as a user STUDENT , execute a script creanew2k.sql that creates a relational
table NEW2K in the following way.
CREATE TABLE NEW2K(
ATTRIBUTE1 CHAR(1024),
ATTRIBUTE2 CHAR(1024))
PCTFREE 30
PCTUSED 70;
A relational table NEW2K has the same schema as the original table TAB2K . Execute a script
file copytab2k.sql that copies the contents of a relational table TAB2K into a relational table
NEW2K . The script file contains the following statement.
INSERT INTO NEW2K (
SELECT *
FROM TAB2K );
Experiment 4.3: How to defragment persistent storage at an extent level ?
4-8
Execute the scripts: listexts.sql , listanalysis.sql , listrowids.sql . to list information about
the internal structures of a relational table NEW2K .A copy of the contents of a relational table
TAB2K into a relational table NEW2K allocated two extents instead of three extents allocated
by the original table. Three extents are enough to accommodate three rows in a relational
table NEW2K . The results of analysis also show that average amount of free space per block
has decreased, which means more economic usage of disk storage.
Step 6 How to release the empty extents ?
While connected as a user STUDENT , execute a script deltab2k.sql to delete all rows from
a relational table TAB2K and to commit the deletions. The script contains the following
statements.
DELETE FROM TAB2K;
COMMIT;
Next, execute the scripts listexts.sql listrowids.sql to find the total number of extents
allocated and to find the identifiers of rows in a relational table TAB2K Despite, that the
table is empty it still consists of three extents. To release the unused extents we have to use
TRUNCATE TABLE statement. Remain connected as a user STUDENT and truncate a relational
table TAB2K with TRUNCATE statement with DROP STORAGE option, see below.
TRUNCATE TABLE TAB2K DROP STORAGE;
Next, execute a script listexts.sql to check the total number of allocated extents. At this
point an empty relational table TAB2K allocates only one extent, which consists of two data
blocks. It is the smallest possible allocation for an empty relational table.
Step 7 How to clean up after the experiment ?
While connected as a user STUDENT , execute a script makedef.sql to make a tablespace
USERS a default tablespace of a user STUDENT .At the end of the experiment we remove the
database objects created so far. To drop the tablespaces created in this experiment connect
as a user STUDENT and execute a script file clean4-3.sql that contains the following statement.
DROP TABLESPACE TBS40K INCLUDING CONTENTS AND DATAFILES;
References
Experiment 4.3: How to defragment persistent storage at an extent level ?
SQL Reference, CREATE TABLESPACE statement
SQL Reference, DROP TABLESPACE statement
SQL Reference, ALTER USER statement
SQL Reference, CREATE TABLE statement
SQL Reference, DROP TABLE statement
SQL Reference, ANALYZE statement
SQL Reference, INSERT statement
SQL Reference, DELETE statement
SQL Reference, TRUNCATE statement
Reference, DBA TABLES view
Reference, DBA DATA FILES view
Reference, DBA EXTENTS view
Reference, DBA FREE SPACE view
PL/SQL Packages and Types Reference, DBMS ROWID package
4-9