CPS 481/581: Advanced Artificial Intelligence
Project : Decision Tree Learner
Dr. Jennifer Seitzer
Description
A Decision tree takes as input an object or
situation described by a set of attributes and returns a “decision” which is
the the predicted output value for the input. The input attributes can be discrete or continuous. For this project, we assume discrete
inputs. The output form (the decision) of a Decision Tree
Learning algorithm is a set of classifications
characterized by the set of input attributes in the form of IF-THEN rules.
A decision tree reaches its decision by
performing a sequence of tests. Each
internal node in the tree corresponds to a test of the value of the attribute
properties, and the branches from the node are labeled with the possible values
of the test. Each leaf node in the tree
specifies the value to be returned if that leaf is reached and is comprised of
a homogeneous set of examples (with respect to the target classification).
Using the attributes described on Page 654 in
your text and the data on Page 656, write an implementation of a decision
tree learner. A sketch of the decision
tree learning algorithm is:
Decision
Tree Learner(current node)
If currentnode has heterogeneous examples (w.r.t.
target)
Choose an attribute at current
node
For each child node
Decision Tree Learner(child node)
Techniques
on Choosing an Attribute on Which to Split
One compelling area of research (as well as
challenge in programming this) is the choice of attribute on which to
split. You may simply use random splitting. This simply randomly chooses one of the
“not-yet-selected” attributes for the examples at any given node. Random Splitting
unfortunately may split the examples into many small classes or even singleton
classes, thereby producing very long paths in the tree (and thus, very long
IF-THEN rules).
Another strategy might be to employ information gain splitting where
attributes are tested ahead to see which attribute will give you the closest to
a 50% split between positive and negative examples. This will keep your tree relatively shallow
(and thus, your resultant rules, relatively short). I recommend using this form of splitting for
your implementation.
Output
Requirements
Your output should be a backward
traversal/interpretation of the decision tree that you internally build during
the implementation of the Decision Tree Algorithm. For each leaf of your tree, you should
output the following:
LEAF #;
PARENT ATTRIBUTE ; and the
IF-THEN rule where the antecedent constitutes the backward path from the leaf
up to parent to grandparent, etc., to the root.
Obviously, the consequent of the rule is formed by the target predicate
and its parity.
Note:
each node in the path is identified by an <attribute name ==
attribute value> tuple
Submission
Requirements
Deadline Schedule:
|
TASK |
CONTAINED IN FILES |
DUE DATE and CODE DEMONSTRATION DATE |
|
1.
Create example_class, main, |
example.h example.cpp |
One Week |
|
2.
Organize the data into a text file |
Example_data.txt |
One Week |
|
3.
Read the data into an array of example_class; Output the data in your example_array
(i.e., write output as an example_class method) |
main.cpp |
One Week |
|
4.
implement decision tree algorithm and form the tree |
Decision_tree_class.cpp |
10 days |
|
5.
output IF-THEN rules from the generated decision_tree |
Decision_tree_class method |
10 days |
|
6.
Integrate above steps and complete submission requirements including
external (the paper) and internal documentation |
|
Two Weeks |