Description
1. In this problem, you are asked to compare the classification performance of perceptron
(sequential gradient descent or similar version) and MSE (pseudoinverse version) classifiers
on the wine dataset. This problem is designed to be used with the functions provided by the
PRTools5 toolbox (if using Matlab), or the named functions from scikit-learn (if using
Python).
For this problem, use the wine dataset files provided in the HW1 folder.
(a) State clearly whether you are using PRTools5 and Matlab, or scikit-learn and Python.
(b) Store a copy of the unnormalized data (as provided), and also a standardized version
of the data, for your use. Standardized means each feature is normalized to 0 mean
and unit variance. Note that the normalizing factors should be calculated from the
training data only (why?), and then applied to both the training data and test data.
For this part, report on the mean and standard deviation of each feature of the
unnormalized training data, and answer the “why?” question above.
Hint: For part (b), you may either code it yourself, or use available functions: scikitlearn’s function sklearn.preprocessing.StandardScaler, or Matlab’s function zscore().
If you use these functions, then be sure you know the formulas to apply
standardization so you understand the algebra as well.
Parts (c)-(f) below use perceptron. For these parts, use standardized data.
(c) For the perceptron classifier (perlc in PRTools5 or sklearn.linear_model.Perceptron in
Python), answer the following questions (by looking at the documentation,
comments, or the code, as needed):
(i) What is the default initial weight vector?
p. 2 of 3
(ii) What is the halting condition? If the solution weight vector (which would
correctly classify all training data points) is not reached, what is the backup
halting condition?
Hints for (ii): (1) For both PRTools and scikit-learn, this may require some
digging into the code to answer. Note that for PRTools, the backup halting
condition is not just the runtime. (2) If you prefer, you can skip this for now
and answer most of the parts below without the answer to (c)(ii); then come
back to finish this.
(d) This part will be done twice: once using only the first 2 features, and then again
using all 13 features.
Apply the perceptron learning algorithm to the training data, using the one vs. rest
method. Report the resulting 3 weight vectors and the classification accuracy of your
classifier on both the training set and the test set.
Tips:
PRTools5 users: Note that perlc first augments and standardizes the data by default,
so you can input the unnormalized unaugmented data. Also, one vs. rest is the default
method. testc will give classification error rate. You can retrieve weight vectors
using getWeightsFromPrmapping.m (provided in the HW7 folder).
Scikit-learn users: You can extract the weight vectors using model.coef (for
nonaugmented ) and model.intercept (for ).
(e) This part should also be done twice (for the first 2 features and for all 13 features).
Run the perceptron of part (d) 100 times, with randomly chosen starting weight
vectors each time. Pick the run that has the best performance on the training set. (If
there is a tie for the best performing run, pick one of the runs at random.) Report the
final 3 weight vectors, and the classification accuracy on both the training set and the
test set.
(f) Compare and comment on your results from (d) and (e) on training data and on test
data. That is, compare 2 features to 13 features in (d) for each dataset; likewise
compare 2 features to 13 features for in (e) for each dataset. Also, compare (d) to (e)
for 2 features for each dataset; and compare (d) to (e) for 13 features for each dataset.
Parts (g)-(j) below use MSE (pseudo-inverse version) classification. Please also refer to the
tips for PRTools implementation and for scikit-learn implementation below (on next page).
(g) For this part use unnormalized data. Run the pseudoinverse classifier, and report the
classification accuracy on the test data, for the first 2 features and for all 13 features.
(h) Repeat part (g) except using standardized data.
(i) Compare your test-accuracy results of (g) and (h). Are they identical, similar, or
quite different?
(j) Compare and comment on your test-accuracy results of (h) and (e). Are they
identical, similar, or quite different?
w w0
p. 3 of 3
Tip for PRTools5 implementation of pseudoinverse classifier
Use fisherc.
Tips for scikit-learn implementation of pseudoinverse classifier
Use sklearn.linear_model.LinearRegression. This regression function can be used for 2-
class classification.
> Use non-reflected data points.
> Refer to Discussion 7 and related document posted on D2L for more tips.
2. Note: in this problem you may do the plots by hand or by computer (your choice); but
everything else (e.g., the quadratic mapping, finding decision regions and boundaries, etc.)
is to be done by hand.
In a 2-class problem with 2 features, you are given the following training data:
(a) Plot the points in 2D (non-augmented) feature space. Are they linearly separable?
The rest of this problem deals with using a phi-machine approach to get a nonlinear
classifier. Use a quadratic polynomial mapping, and order the components of your mapped
vectors the same as we did in lecture.
(b) List the points [as -tuples] in expanded feature space.
(c) Find a decision boundary (by hand) in the expanded feature space. [Hint: try plotting
the points in space.] Plot the boundary and decision regions (in
space), and give a complete weight vector that correctly separates the prototypes
in the expanded feature space. Also state the decision rules in the notation of the
expanded feature space (i.e., in terms of the ).
(d) Map the decision boundary and rules/regions that you found in (c) back into the
original feature space: that is, give the decision rules in this space (in terms of
), and give an equation for the decision boundary (in terms of ).
Plot the data points, decision boundary, and show the decision regions, all in the
original ( ) feature space.
S1 : (0,0)
T , (0,1)
T , (0,−1)
T
S2 : (−2,0)
T , (−1,0)
T , (0,2)
T , (0,− 2)
T , (1,0)
T , (2,0)
T
u
( D′ +1)
x1
2, x2
2 ( ) x1
2, x2
2 ( )
w′
ui
x1 and x2 x1 and x2
x1, x2