CPS 481/581:  Advanced  Artificial Intelligence

Project :  Decision Tree Learner

 

Dr. Jennifer Seitzer

 

 

 

Description

A Decision tree takes as input an object or situation described by a set of attributes and returns a “decision” which is the the predicted output value for the input.  The input attributes can be discrete or continuous.  For this project, we assume discrete inputs.  The output form (the decision)  of a Decision Tree Learning algorithm is a set of classifications characterized by the set of input attributes in the form of IF-THEN rules.

A decision tree reaches its decision by performing a sequence of tests.  Each internal node in the tree corresponds to a test of the value of the attribute properties, and the branches from the node are labeled with the possible values of the test.  Each leaf node in the tree specifies the value to be returned if that leaf is reached and is comprised of a homogeneous set of examples (with respect to the target classification).

Using the attributes described on Page 654 in your text and the data on Page 656,  write an implementation of a decision tree learner.  A sketch of the decision tree learning algorithm is:

Decision Tree Learner(current node)

If currentnode has heterogeneous examples (w.r.t. target)
        Choose an attribute at current node
         Split examples at current node by that attribute’s possible values

         For each child node
                      Decision Tree Learner(child node)

 

Techniques on Choosing an Attribute on Which to Split

One compelling area of research (as well as challenge in programming this) is the choice of attribute on which to split.  You may simply use random splitting.  This simply randomly chooses one of the “not-yet-selected” attributes for the examples at any given node.  Random Splitting unfortunately may split the examples into many small classes or even singleton classes, thereby producing very long paths in the tree (and thus, very long IF-THEN rules).

Another strategy might be to employ information gain splitting where attributes are tested ahead to see which attribute will give you the closest to a 50% split between positive and negative examples.  This will keep your tree relatively shallow (and thus, your resultant rules, relatively short).  I recommend using this form of splitting for your implementation.

 

Output Requirements

Your output should be a backward traversal/interpretation of the decision tree that you internally build during the implementation of the Decision Tree Algorithm.   For each leaf of your tree, you should output the following:

LEAF #;   PARENT ATTRIBUTE ;  and the IF-THEN rule where the antecedent constitutes the backward path from the leaf up to parent to grandparent, etc., to the root.  Obviously, the consequent of the rule is formed by the target predicate and its parity. 

Note:  each node in the path is identified by an <attribute name == attribute value> tuple

 

Submission Requirements

  • For each submission, materials must be submitted in a folder with pockets. The folder should be properly identified on the outside areas with the following information:
    1. Last Name
    2. First Name
    3. Class Name and Number
  • A two page typed description explaining the goal of the submission, your implementation, the algorithm you created,  the splitting mechanism you used, the data structure you created, how you implemented the data structure, the algorithms you employed, and any difficulties you encountered in doing this part of the project.
  • A hard copy of all source code with the first line indicating, by a comment, the name of the physical file in which the source code is located.
  • A hard copy of output from the program   You may implement this program using C++ or Java.
  • An executable version on a form of removable media containing  your program

 

Deadline Schedule:

TASK

CONTAINED IN FILES

DUE DATE and CODE DEMONSTRATION DATE

1.  Create example_class, main,

example.h example.cpp

One Week

2.  Organize the data into a text file

Example_data.txt

One Week

3.  Read the data into an array of example_class;  Output the data in your example_array (i.e., write output as an example_class method)

main.cpp

One Week

4.  implement decision tree algorithm and form the tree

Decision_tree_class.cpp

10 days

5.  output IF-THEN rules from the generated decision_tree

Decision_tree_class method 

10 days

6.  Integrate above steps and complete submission requirements including external (the paper) and internal documentation

 

Two Weeks