CS 550 – Artificial Intelligence, Professor Roch Assignment 5 Beware of what you eat! (80 points) Yet another image of a frightened woman. Why can’t they write more roles like Ellen Ripley? The 1963 film Matango (Attack of the Mushroom People) featured a form of mushroom that would turn its eater into a carnivorous mushroom (due to radiation naturally). Fortunately for us, that seems to primarily happen in mid-20th century Japanese horror films. However, it certainly is not safe to eat all mushrooms and your task in this assignment is to build decision trees to determine whether or not species of mushrooms from two genera of family Agaricaeae (Agaricus spp. and Lepiota spp.) are poisonous or edible. A zip archive may be found on Blackboard containing the following: example_reader.py directory learners directory mushrooms The mushrooms directory contains a data set from the UC Irvine machine learning repository. File agaricus-lepiota.names.txt contains a description of the dataset. The file agaricus-lepiota.csv contains comma-separated-value lists of mushroom attributes. The last value of each row indicates if the mushroom is edible (e) or poisonous (p). Normally, this is what you would like to predict, but for supervised learning (and evaluation) we need the category information. File attributes.csv contains the full name of each attribute and attribute-value-abbr.csv contains one line for each attribute with a list of abbreviations and their expansions. Function example_reader.csv_data can be used to read these files and build a data dictionary that can be used by the machine learning libraries in learners.learning. See the code documentation for details. The provided machine learning library expects instances of learners.learning.DataSet. The DataSet constructor expects keyword arguments examples and attrnames, both of which can be accessed from the dictionary returned by the csv_data class. Learning a tree is easy; simply call learners.learning.DecisionTreeLearner on the instance of DataSet. It will return an instance of class DecisionFork which represents a node in a decision tree with recursive instances of DecisionFork for interior nodes and DecisionLeaf instances at leaf nodes. Example: tree = learners.learning.DecisionTreeLearner(dataset) # predict an example prediction = tree(example) # example is one attribute vector if prediction == “e”: print “Safe to eat” else: print “A non-fungal diet is starting to look pretty good” CS 550 – Artificial Intelligence, Professor Roch Write a program in file mushrooms.py that performs a six-fold test. For each test, print the number of training examples, the number of test examples, and the misclassification rate (#misclassified/#test examples). Print out the misclassified examples. Questions for further understanding (20 points each): 1. Compute the information gain for each variable for each attribute of the entire mushroom dataset. (You can reuse or cannibalize code from the assignment, just make sure you understand what you are doing.) 2. The mushrooms directory contains an additional partition of the data: trainohoh.csv and test-ohoh.csv. Try to use a decision tree built on this training data to classify the test-ohoh data. It fails. Debug this to find out why. What might you do to fix this problem?
© Copyright 2024