Files University of Dublin Trinity College Data placed on secondary storage is collected into files The layout of the data in the file will have an enormous impact on how long it takes to find a particular piece of data File Basics We don’t want to have to look through the whole file if we are on looking for one small piece of data [email protected] We want to find the right piece of data with as few accesses to the file as possible File Basics Logical Files 2 Opening and Creating Files in Java How do we OPEN and CREATE files in Java? output = new DataOutputStream(new FileOutputStream("client.dat")); Program Operating System In this code: • Files are opened by creating objects of stream classes FileInputStream and FileOutputStream • If the file does not exist a file is created with that name • A DataOutputStream object named output is created for the client.dat file. The argument “client.dat” is passed to the FileOutputStream constructor that opens the file. This establishes a “line of communication” Logical Files Physical Files Other Hardware Devices File Basics 3 File Basics Closing Files in Java When you close a file, you can use the logical filename for another file Closing a file that has been used for output also ensures that everything has been written to disk: Reading Files in Java Before reading a file it must be opened input = new DataInputStream(new FileInputStream("client.dat")); To read the file you must specify what type of data is to be read. The piece of data is usually assigned to a previously declared variable, e.g. • More efficient to write data in blocks than in bytes • Therefore operating system store data for input/output in a buffer • Closing a file ensures the buffer is flushed to disk In Java, a FileOutputStream object can be closed by close(). For example output.close() File Basics 4 account = input.readInt( ); name = input.readUTF( ); balance = input.readDouble(); 5 File Basics 6 1 Writing Files in Java Detecting End of File (EOF) As we read from a file the operating system keeps track of our location using a read/write pointer Before writing, a file must be opened: output = new DataOutputStream( new FileOutputStream("client.dat")); In Java if an EOF marker is reached while reading the EndofFileException is thrown and file closed This is coded as follows – The file is written by copying values to it: try { account = input.readInt( ); ... } catch ( EOFException eof ) { input.close( ); } output.writeInt( accountNumber ); output.writeUTF(firstNameField.getText()); output.writeDouble(d.doubleValue()); File Basics 7 File Basics 8 Stream File Consider the following information – Stream File wilde> od -A x -xc lotr.txt 000000 4c6f 7264 206f 6620 7468 6520 5269 6e67 L o r d o f t h 000010 732c 2031 3839 2c20 4661 6e74 6173 790a s , 1 8 9 , F a 000020 • Lord of the Rings, 189, Fantasy • Star Wars, 210, Sci-Fi e n t R i n g a s y \n When this information is written to a (stream) file, we lose the organisation of the units of data od is a program which enables the user to dump file information in octal and other formats. When the file is accessed by a program its information is seen only as a string of characters The output above is shown as hexidecimal shorts. Solution : Use records with field structures File Basics 9 File Basics Field What is a Field? Movie Class as Fields Writing out the Movie Class as fields to a stream file using Binary, length based and fixed length encodings of the fields, looks like • A Fixed or variable number of bytes that form a data value • E.g. Movie Name such as Star Wars There are many ways of adding structure to files to maintain the identity of fields, for example • Choose a special character/delimiter that will not appear as a legitimate character within a field and then insert that character into the file after writing each field… called delimited-text field • Use a fixed length for each field (the size depending on field in question) and pad out when length of actual data value is less than the fixed length… called fixed-length field • Write the length of the value (in bytes) followed by the value in exactly that number of bytes… called length-based field • Write the name of the field and then value both represented as delimited-text fields… called identified field File Basics 10 11 public class Movie { …. public void write(DataOutputStream out) throws IO Exception { //write object as length based and fixed length encodings out.writeShort(name.length()); out.writeBytes(name); out.writeInt(id); out.writeShort(genre.length()); out.writeBytes(genre); } }; File Basics 12 2 Class to support length-based field Class FieldOps { // a class that supports static operations on length-based fields public static String readLength(DataInput in) throws IOException { //read string from 2-byte length and bytes int len = in.readShort(); byte [] bytes = new byte[len]; in.readFully(bytes); return new String(bytes); } More complex than write, as first needs to read length and then create array of that many bytes Public class Movie { …. public void read(DataInput in) throws IO Exception { //read object as length-based and fixed-length fields name = FieldOps.readLength(in); public static void writeLength (DataOutput out, String str) throws IOException{ //wrtie string as 2-byte length plus bytes out.writeShort(str.length()); out.writeBytes(str); } }; File Basics Reading Movie Fields id = in.readInt(); genre = FieldOps.readLength(in); } }; 13 File Basics Record What is a Record? • Fixed or variable length collections of fields • E.g. Movie Record such as Star Wars, 210, SciFi A record is a set of fields that belong together when the file is viewed in terms of a higher level organisation. A record is another level of organisation that we impose on the data to preserve meaning Records do not necessarily exist in the file in any physical sense But they are an important logical notion File Basics 15 Movie Class as a Record Writing out the Movie Class as a record to a stream file using Binary, length based and fixed length encodings of the record, looks like Byte Array Streams support use of stream operations on a memory buffer Public class Movie { …. public void writeRecord(DataOutput out) throws IO Exception { //write the record into a byte stream ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); DataOutputStream recStream = new DataOutputStream(byteStream); write(recStream); // movie.write the movie object into ByteStream //write the length and the record into the output stream byte [] buffer = byteStream.toByteArray(); out.writeShort(buffer.length()); out.write(buffer); } File Basics }; 16 Movie Class as a Record Reading the movie record into a movie class 14 Records to Blocks mapping Records of a file must be allocated to disk blocks public class Movie { …. public void readRecord(DataInput in) throws IO Exception { int len = in.readShort(); byte [ ] buffer = new byte[len]; in.readFully(buffer); //turn the byte array buffer into an input stream DataInputStream recStream =new DataInputStream(new ByteArrayInputStream(buffer)); read(recStream); // read the Movie object using movie.read } }; • Recall, a block is the unit of data transfer between disk and memory Blocking Factor (bfr) for a file • Number of records that can fit in a block when block size > record size • bfr = round_down(B/R) – B is block size in bytes – R is actual record size in bytes for fixed length records but is the average record size in bytes for variable length records • Unused portion = B – (bfr*R) bytes File Basics 17 File Basics 18 3 Spanned/Unspanned Organisation What happens in case R > B? • Spanned organisation – End of first block points to rest of record in an unused portion of another block • Unspanned organisation – Each record starts at a new block and uses next block, with unused portions not used Files of Records • A file is a sequence of records, where each record is a collection of data values (fields or data items) • A file descriptor (or file header) includes information that describes the file, such as the field names and their data types, and the addresses of the file blocks on disk • Records are stored on disk blocks. • The blocking factor for a file is the (average) number of file records stored in a disk block • The physical disk blocks that are allocated to hold the records of a file can be contiguous, linked, or indexed. • In a file of fixed-length records, all records have the same format. – Usually, unspanned blocking is used with such files • Files of variable-length records require additional information to be stored in each record, such as separator characters and field types. – Usually spanned blocking is used with such files File Basics 19 File Basics File Operations OPEN: Readies the file for access, and associates a pointer that refers to a current file record at each point in time 20 File Operations READ: Reads the current file record into a program variable INSERT: Inserts a new record into the file, and makes it the current file record FIND: Searches for the first file record that satisfies a certain condition, and makes it the current file record FINDNEXT: Searches for the next file record (from the current record) that satisfies a certain condition, and makes it the current file record DELETE: Removes the current file record from the file, usually by marking the record to indicate that it is no longer valid MODIFY: Changes the values of some fields of the current file record File Basics 21 File Basics File Operations CLOSE: Terminates access to the file 22 Review Operating Systems provide a logical view of physical files to Programs Files in Java REORGANISE: Reorganises the file records. For example, the records marked deleted are physically removed from the file or a new organisation of the file records is created. • Opening, reading, writing, EOF, closing Stream Files Fixed and Variable Fields • Delimited-text, fixed-length, length-based, identified READ_ORDERED: Reads the file blocks in order of a specific field of the file Records (and their relationship to Blocks) • bfr, spanned/unspanned Files of Records File Operations File Basics 23 File Basics 24 4
© Copyright 2025