CS 550 – Artificial Intelligence, Professor Roch Assignment 5

CS 550 – Artificial Intelligence, Professor Roch
Assignment 5
Beware of what you eat! (80 points)
Yet another image of a
frightened woman. Why
can’t they write more roles
like Ellen Ripley?
The 1963 film Matango (Attack of the Mushroom People) featured a
form of mushroom that would turn its eater into a carnivorous
mushroom (due to radiation naturally). Fortunately for us, that
seems to primarily happen in mid-20th century Japanese horror films.
However, it certainly is not safe to eat all mushrooms and your task
in this assignment is to build decision trees to determine whether or
not species of mushrooms from two genera of family Agaricaeae
(Agaricus spp. and Lepiota spp.) are poisonous or edible.
A zip archive may be found on Blackboard containing the following:
 example_reader.py
 directory learners
 directory mushrooms
The mushrooms directory contains a data set from the UC Irvine machine learning
repository. File agaricus-lepiota.names.txt contains a description of the dataset. The file
agaricus-lepiota.csv contains comma-separated-value lists of mushroom attributes. The
last value of each row indicates if the mushroom is edible (e) or poisonous (p).
Normally, this is what you would like to predict, but for supervised learning (and
evaluation) we need the category information. File attributes.csv contains the full name
of each attribute and attribute-value-abbr.csv contains one line for each attribute with a
list of abbreviations and their expansions.
Function example_reader.csv_data can be used to read these files and build a data
dictionary that can be used by the machine learning libraries in learners.learning. See the
code documentation for details.
The provided machine learning library expects instances of learners.learning.DataSet.
The DataSet constructor expects keyword arguments examples and attrnames, both of
which can be accessed from the dictionary returned by the csv_data class. Learning a
tree is easy; simply call learners.learning.DecisionTreeLearner on the instance of
DataSet. It will return an instance of class DecisionFork which represents a node in a
decision tree with recursive instances of DecisionFork for interior nodes and
DecisionLeaf instances at leaf nodes. Example:
tree = learners.learning.DecisionTreeLearner(dataset)
# predict an example
prediction = tree(example) # example is one attribute vector
if prediction == “e”:
print “Safe to eat”
else:
print “A non-fungal diet is starting to look pretty good”
CS 550 – Artificial Intelligence, Professor Roch
Write a program in file mushrooms.py that performs a six-fold test. For each test, print
the number of training examples, the number of test examples, and the misclassification
rate (#misclassified/#test examples). Print out the misclassified examples.
Questions for further understanding (20 points each):
1. Compute the information gain for each variable for each attribute of the entire
mushroom dataset. (You can reuse or cannibalize code from the assignment, just
make sure you understand what you are doing.)
2. The mushrooms directory contains an additional partition of the data: trainohoh.csv and test-ohoh.csv. Try to use a decision tree built on this training data to
classify the test-ohoh data. It fails. Debug this to find out why. What might you
do to fix this problem?