COMP104 - 2015 - Second CA Assignment Compiler Structures

COMP104 - 2015 - Second CA Assignment
Compiler Structures
Symbol Table Management by Hashing
Assessment Information
Assignment Number
Weighting
Assignment Circulated
Deadline
Submission Mode
Learning outcome assessed
Purpose of assessment
Marking criteria
Submission necessary in order
to satisfy Module requirements?
2 (of 2)
10%
Tuesday 24th March 2015
Wednesday 13th May 2015; 15.00
Electronic
4, viz “Construct programs which demonstrate in a
simple form the operation of examples of systems programs,
including ... simple compilers.”
Provide practical experience of issues
in compiler design within Java.
Scheme provided at end of document.
No
1
Introductory Background
This assignment concerns the organisation and maintenance of Symbol Tables
using a technique called hashing.
Recall that symbol tables are used within (for example, Java) compilers as a
means of recording information about variable, method, and class names. Details
about so-called reserved or key words are also often stored using symbol tables,
eg in order to avoid problems that might arise were such names unintentionally
to be introduced as variable names.
A potential problem in organising symbol tables is their size and the fact
that these may have to be accessed very often: this structure being used at most
stages of a compiler’s operation. So it is important that the entries referring
to properties of frequently used identifiers can be found quickly. Although
techniques such as “binary search” provide one approach that is often used, this
has the drawback that the symbol table entries (or keys) must be sorted using
some ordering convention and so, if the data held is continually changing, there
is an overhead in maintaining such ordering as keys are added (and deleted).
Hashing (if implemented carefully) can avoid such problems, allowing fast
lookup of keys (often superior to binary search methods) without the overheads
involved in maintaining a specific order of keys.
In this approach we have the following structures:
1. The symbol table itself. For the purposes of this assignment this is just
an array of Strings.
2. A maximum number of distinct keys that the symbol table is allowed to
hold. In total the structure has the form
String[] SymTab = new String[max_number_keys];
3. A hash function, Hash, which maps Strings to integer values in the range
[0,max number keys-1]. There are many, many possible ways in which
such a function can be defined and no specific approach is prescribed for
the purposes of this assignment. One frequently used method for defining
Hash(< key value >) is that of summing the integer (unicode) values
of each individual character in the String key value and then (in order
to ensure the outcome falls within the range [0,max number keys-1])
computing the remainder after division by max number keys. For example, using this hash function, the String GCD maps to 16 + 12 + 13 =
41, so that (if the symbol table allows more than 41 keys to be held)
then SymTab[41]=GCD; if the maximum number of key is, 30 say, then
SymTab[11]=GCD.
Of course an obvious problem arises: whatever hash function is used, there is
the possibility of two (or more) distinct keys being hashed to the same value. For
example, in the case defined above, all of the keys {GCD,GDC,DCG,DGC,CDG,CGD}
will return 41 as their hash code. Thus, a mechanism is needed in order to store
2
keys and recover their indices within the symbol table should two key values
have the same hash code reported. In the approach known as closed hashing this
problem is dealt with by specifying an interval, s, (with 0 < s < N ) describing
successive locations of the symbol table to consider. That is, suppose we have
a table of size 30 and fix s = 7. Then for the sequence of keys
GCD,GDC,DCG,DGC,CDG,CGD
all having Hash(< key >) = 41 we get,
GCD
GDC
DCG
DGC
CDG
CGD
is
is
is
is
is
is
stored
stored
stored
stored
stored
stored
in
in
in
in
in
in
location
location
location
location
location
location
11 ie 41%30
18 ie 11 + 7
25 ie 11 + 7 + 7
2 ie 11 + 7 + 7 + 7 = 32%30 = 2
9 ie 11 + 7 + 7 + 7 + 7 = 2 + 7 mod 30
16 ie 11 + 7 + 7 + 7 + 7 + 7 = 2 + 7 + 7 mod 30
Thus every key is stored directly in the Symbol Table itself so obviating the
need to use linked lists of keys that hash to the same location (the method
called open hashing).
In summary, the storage of keys in a symbol table using closed hashing,
involves implementing a class
class SymbolTable
Fields:
private
private
private
private
int N;
int s;
int size_of=0;
String[] Keys;
//
//
//
//
Number of keys this table can store.
Increment used to probe locations when clashes arise.
The number of keys held in the table so far
The Symbol Table contents.
Constructor:
public SymbolTable(int table_capacity, int skip)
//Defines an instance with N=table_capacity; s=skip; and
//Keys=new String[N]
// NB For each 0<=i<N Key[i] should be initiated as new String("**")
// where "**" is used to encode the fact that this location has not been used yet.
Methods:
private int HashKey(String IDENT)
// Returns the hash code of IDENT.
Detailed Requirements
For this final practical assignment you are asked to write a Java program to
implement the SymbolTable class described at the end of the preceding section.
Your program will have to make provision for the following:
P1. Realisation of the SymbolTable class and the methods defined. NB:
Some form of hashing function must be implemented (ie the method
HashKey(String IDENT)). You are free to explore different methods for
doing so or simply to implement the technique described above, however,
3
you SHOULD NOT use any of the predefined Java Library “hash
code” methods (if in doubt about whether a particular approach is permissible please contact either the demonstrator or myself).
P2. In order to provide an environment within which your implementation of
the SymbolTable class can be tested you should implement the following
method in this class:
public void populate_from(Scanner input)
This will be called from the main() method, so that having created an
instance SymTab of SymbolTable with parameters (N, s), the call
SymTab.populate_from(input);
should result in a sequence of String Keys being read and inserted into
the table. You may use the String "##" to indicate there are no further
keys to be read. For example, (the start of) an input file might look like
50 3
ABCD
integer
constant
IDENT
KEY_VALUE
##
P3. When completed, you should then arrange for the contents of the Symbol
Table to be output in the form:
Maximum Size:
Number of locations actually used:
Table Contents
0 #
1 #
...
k #
...
N #
IDENT # HashKey(IDENT)%Size
IDENT # HashKey(IDENT)%Size
IDENT # HashKey(IDENT)%Size
IDENT # HashKey(IDENT)%Size
If a location is not being used, ie Keys[k]=** (location never used) then
this should be indicated in the output (in such cases there is no need to
give the hash code for the contents).
4
Some additional points
A1. Your implementation should robustly be able to handle the errors
E1. Attempts to add an IDENT when the symbol table is full, ie every
Key[] is in use.
E2. Attempts to add an IDENT which is already stored in the table.
A2. It is extremely important that the maximum size (N ) of the table and
the value s used when resolving clashes are relatively prime, ie have
greatest common divisor equal to 1. This will always be the case when N
is a prime itself (and, of course, s < N ) or in the (rather insipid) choice
s = 1. Unless (N, s) satisfy GCD(N, s) = 1 it is possible for the table to
appear full (unable to insert a key into) when, in fact, it is not so. The
best way of testing if < N, s > are suitable is to check if the Greatest
Common Divisor (gcd) of N and s is equal to 1. Should gcd(N, s) 6= 1
then the values are unsuitable. A method to compute gcd(x, y) is given
below:
private int GCD(int x, int y)
{
int temp;
int tx=x;
int ty=y;
while (!(tx==ty))
{
if (tx<ty)
{
temp=tx; tx=ty; ty=temp;
};
tx = tx-ty;
};
return tx;
}
5
Submission Instructions
Firstly, check that you have adhered to the following list:
1. All of your code is within a single file. Do NOT use more than one file.
2. Both your name AND User ID are clearly indicated at the start of your
code, eg by
// Name: My Name ; ID u?????
3. The file’s name MUST be
SymbolTableOperation.java
This means that the main class name must also be SymbolTableOperation.
Submit only the Java source: design documentation, compiled .class files,
sample outputs, extraneous commentary and similar ephemera are neither
required nor desired.
4. Your program is written in Java, not some other language.
5. The file is a text file: not compressed or encoded or otherwise mangled.
6. Your program compiles and runs on the Departmental Windows system.
If you have developed your code elsewhere (eg your home PC), port it to
our system and perform a compile/check test before submission. It is your
responsibility to check that you can log onto the departmental system well
in advance of the submission deadline.
7. Your program does not bear undue resemblance to anybody else’s. Electronic checks for code similarity will be performed on all submissions and
instances of plagiarism will be dealt with in accordance with the procedures and sanctions prescribed by the relevant University Code of Practice.
The rules on plagiarism and collusion are explicit: do not copy anything
from anyone else’s code, do not let anyone else copy from your code and
do not hand in “jointly developed” solutions.
Your solution must be
SUBMITTED ELECTRONICALLY
Electronic submission: Your code must be submitted to the departmental
electronic submission system at:
http://intranet.csc.liv.ac.uk/cgi-bin/submit.pl
You need to login in to the above system and select COMP104-2: Compiler
Table Managment from the drop-down menu. You then locate the file containing your program that you wish to submit, check the box stating that you
have read and understood the University Code of Practice on Plagiarism and
Collusion, then click the Upload File button.
6
MARKING SCHEME
Below is the breakdown of the mark scheme for this assignment. Each category
will be judged on the correctness, efficiency and modularity of the code, as well
as whether or not it compiles and produces the desired output.
• Adherence to specification (ie information requested, correct naming etc.)
= 15
• Implementation of SymbolTable class and methods = 35.
• Handling of errors (Symbol Table full, multiple identifiers) = 25
• Output form = 15 marks
• Comments and layout = 10 marks
This assignment contributes 10% to your overall mark for COMP104.
Finally, please remember that it is always better to hand in an incomplete
piece of work, which will result in some marks being awarded, as opposed to
handing in nothing, which will guarantee a mark of 0 being awarded. Demonstrators will be on hand during the COMP104 practical sessions to provide
assistance, should you need it.
7