Files

Files
University of Dublin
Trinity College
Data placed on secondary storage is collected into files
The layout of the data in the file will have an enormous
impact on how long it takes to find a particular piece of
data
File Basics
We don’t want to have to look through the whole file if
we are on looking for one small piece of data
[email protected]
We want to find the right piece of data with as few
accesses to the file as possible
File Basics
Logical Files
2
Opening and Creating Files in Java
How do we OPEN and CREATE files in Java?
output = new DataOutputStream(new FileOutputStream("client.dat"));
Program
Operating
System
In this code:
• Files are opened by creating objects of stream classes
FileInputStream and FileOutputStream
• If the file does not exist a file is created with that name
• A DataOutputStream object named output is created for the
client.dat file. The argument “client.dat” is passed to the
FileOutputStream constructor that opens the file. This establishes a
“line of communication”
Logical
Files
Physical
Files
Other
Hardware
Devices
File Basics
3
File Basics
Closing Files in Java
When you close a file, you can use the logical filename
for another file
Closing a file that has been used for output also
ensures that everything has been written to disk:
Reading Files in Java
Before reading a file it must be opened
input = new DataInputStream(new
FileInputStream("client.dat"));
To read the file you must specify what type of data is to
be read. The piece of data is usually assigned to a
previously declared variable, e.g.
• More efficient to write data in blocks than in bytes
• Therefore operating system store data for input/output in a buffer
• Closing a file ensures the buffer is flushed to disk
In Java, a FileOutputStream object can be closed by
close(). For example output.close()
File Basics
4
account = input.readInt( );
name = input.readUTF( );
balance = input.readDouble();
5
File Basics
6
1
Writing Files in Java
Detecting End of File (EOF)
As we read from a file the operating system keeps track of our
location using a read/write pointer
Before writing, a file must be opened:
output = new DataOutputStream( new
FileOutputStream("client.dat"));
In Java if an EOF marker is reached while reading the
EndofFileException is thrown and file closed
This is coded as follows –
The file is written by copying values to it:
try {
account = input.readInt( );
...
}
catch ( EOFException eof ) {
input.close( );
}
output.writeInt( accountNumber );
output.writeUTF(firstNameField.getText());
output.writeDouble(d.doubleValue());
File Basics
7
File Basics
8
Stream File
Consider the following information –
Stream File
wilde> od -A x -xc lotr.txt
000000 4c6f 7264 206f 6620 7468 6520 5269 6e67
L
o
r
d
o
f
t
h
000010 732c 2031 3839 2c20 4661 6e74 6173 790a
s
,
1
8
9
,
F
a
000020
• Lord of the Rings, 189, Fantasy
• Star Wars, 210, Sci-Fi
e
n
t
R
i
n
g
a
s
y
\n
When this information is written to a (stream) file, we
lose the organisation of the units of data
od is a program which enables the user to dump file
information in octal and other formats.
When the file is accessed by a program its information
is seen only as a string of characters
The output above is shown as hexidecimal shorts.
Solution : Use records with field structures
File Basics
9
File Basics
Field
What is a Field?
Movie Class as Fields
Writing out the Movie Class as fields to a stream file
using Binary, length based and fixed length encodings
of the fields, looks like
• A Fixed or variable number of bytes that form a data value
• E.g. Movie Name such as Star Wars
There are many ways of adding structure to files to
maintain the identity of fields, for example
• Choose a special character/delimiter that will not appear as a
legitimate character within a field and then insert that character into
the file after writing each field… called delimited-text field
• Use a fixed length for each field (the size depending on field in
question) and pad out when length of actual data value is less than
the fixed length… called fixed-length field
• Write the length of the value (in bytes) followed by the value in
exactly that number of bytes… called length-based field
• Write the name of the field and then value both represented as
delimited-text fields… called identified field
File Basics
10
11
public class Movie {
….
public void write(DataOutputStream out)
throws IO Exception {
//write object as length based and fixed length encodings
out.writeShort(name.length());
out.writeBytes(name);
out.writeInt(id);
out.writeShort(genre.length());
out.writeBytes(genre);
}
};
File Basics
12
2
Class to support length-based field
Class FieldOps {
// a class that supports static operations on length-based fields
public static String readLength(DataInput in)
throws IOException {
//read string from 2-byte length and bytes
int len = in.readShort();
byte [] bytes = new byte[len];
in.readFully(bytes);
return new String(bytes);
}
More complex than write, as first needs to read length
and then create array of that many bytes
Public class Movie { ….
public void read(DataInput in)
throws IO Exception {
//read object as length-based and fixed-length fields
name = FieldOps.readLength(in);
public static void writeLength (DataOutput out, String str)
throws IOException{
//wrtie string as 2-byte length plus bytes
out.writeShort(str.length());
out.writeBytes(str);
}
};
File Basics
Reading Movie Fields
id = in.readInt();
genre = FieldOps.readLength(in);
}
};
13
File Basics
Record
What is a Record?
• Fixed or variable length collections of fields
• E.g. Movie Record such as Star Wars, 210, SciFi
A record is a set of fields that belong together when the
file is viewed in terms of a higher level organisation.
A record is another level of organisation that we impose
on the data to preserve meaning
Records do not necessarily exist in the file in any
physical sense
But they are an important logical notion
File Basics
15
Movie Class as a Record
Writing out the Movie Class as a record to a stream file using Binary,
length based and fixed length encodings of the record, looks like
Byte Array Streams support use of stream operations on a memory buffer
Public class Movie { ….
public void writeRecord(DataOutput out)
throws IO Exception {
//write the record into a byte stream
ByteArrayOutputStream byteStream
= new ByteArrayOutputStream();
DataOutputStream recStream
= new DataOutputStream(byteStream);
write(recStream); // movie.write the movie object into ByteStream
//write the length and the record into the output stream
byte [] buffer = byteStream.toByteArray();
out.writeShort(buffer.length());
out.write(buffer);
}
File Basics };
16
Movie Class as a Record
Reading the movie record into a movie class
14
Records to Blocks mapping
Records of a file must be allocated to disk blocks
public class Movie { ….
public void readRecord(DataInput in)
throws IO Exception {
int len = in.readShort();
byte [ ] buffer = new byte[len];
in.readFully(buffer);
//turn the byte array buffer into an input stream
DataInputStream recStream
=new DataInputStream(new ByteArrayInputStream(buffer));
read(recStream); // read the Movie object using movie.read
}
};
• Recall, a block is the unit of data transfer between disk and
memory
Blocking Factor (bfr) for a file
• Number of records that can fit in a block when block size > record
size
• bfr = round_down(B/R)
– B is block size in bytes
– R is actual record size in bytes for fixed length records but is the average record
size in bytes for variable length records
• Unused portion = B – (bfr*R) bytes
File Basics
17
File Basics
18
3
Spanned/Unspanned Organisation
What happens in case R > B?
• Spanned organisation
– End of first block points to rest of record in an unused portion of another block
• Unspanned organisation
– Each record starts at a new block and uses next block, with unused portions not
used
Files of Records
• A file is a sequence of records, where each record is a collection of
data values (fields or data items)
• A file descriptor (or file header) includes information that describes
the file, such as the field names and their data types, and the
addresses of the file blocks on disk
• Records are stored on disk blocks.
• The blocking factor for a file is the (average) number of file records
stored in a disk block
• The physical disk blocks that are allocated to hold the records of a
file can be contiguous, linked, or indexed.
• In a file of fixed-length records, all records have the same format.
– Usually, unspanned blocking is used with such files
• Files of variable-length records require additional information to be
stored in each record, such as separator characters and field types.
– Usually spanned blocking is used with such files
File Basics
19
File Basics
File Operations
OPEN: Readies the file for access, and associates a
pointer that refers to a current file record at each point
in time
20
File Operations
READ: Reads the current file record into a program
variable
INSERT: Inserts a new record into the file, and makes it
the current file record
FIND: Searches for the first file record that satisfies a
certain condition, and makes it the current file record
FINDNEXT: Searches for the next file record (from the
current record) that satisfies a certain condition, and
makes it the current file record
DELETE: Removes the current file record from the file,
usually by marking the record to indicate that it is no
longer valid
MODIFY: Changes the values of some fields of the
current file record
File Basics
21
File Basics
File Operations
CLOSE: Terminates access to the file
22
Review
Operating Systems provide a logical view of physical
files to Programs
Files in Java
REORGANISE: Reorganises the file records. For
example, the records marked deleted are physically
removed from the file or a new organisation of the file
records is created.
• Opening, reading, writing, EOF, closing
Stream Files
Fixed and Variable Fields
• Delimited-text, fixed-length, length-based, identified
READ_ORDERED: Reads the file blocks in order of a
specific field of the file
Records (and their relationship to Blocks)
• bfr, spanned/unspanned
Files of Records
File Operations
File Basics
23
File Basics
24
4