Description
Part A: Classification Problem
This project aims at building neural networks to classify the Cardiotocography dataset
containing measurements of fetal heart rate (FHR) and uterine contraction (UC) features on
2126 cardiotocograms classified by expert obstetricians [1]. The dataset can be obtained
from: https://archive.ics.uci.edu/ml/datasets/Cardiotocography.
The cardiotocograms were classified by three expert obstetricians and a consensus
classification label with respect to a morphologic pattern and to a fetal state (N: Normal; S:
Suspect; P: Pathologic) was assigned to each of them. The aim is to predict the N, S and P class
labels in the test dataset after training the neural network on the training dataset.
Read the data from the file: ctg_data_cleaned.csv. Each data sample is a row of 23 values: 21
input attributes and 2 class labels (use the NSP label with values 1, 2 and 3 and ignore the
other). First, divide the dataset in 70:30 ratio for training and testing. Use 5-fold crossvalidation on the training dataset for selecting the optimal model, and test it on the testing
data.
1. Design a feedforward neural network which consists of an input layer, one hidden layer
of 10 neurons with ReLU activation function, and an output softmax layer. Assume a
learning rate 𝛼 = 0.01, L2 regularization with weight decay parameter 𝛽 = 10−6
, and
batch size = 32. Use appropriate scaling of input features.
a) Use the training dataset to train the model and plot both accuracies on training and
testing data against epochs.
b) State the approximate number of epochs where the test error converges.
(10 marks)
2. Find the optimal batch size by training the neural network and evaluating the
performances for different batch sizes.
a) Plot cross-validation accuracies against the number of epochs for different batch
sizes. Limit search space to batch sizes {4, 8, 16, 32, 64}. Plot the time taken to
train the network for one epoch against different batch sizes.
b) Select the optimal batch size and state reasons for your selection.
c) Plot the train and test accuracies against epochs for the optimal batch size.
Note: use this optimal batch size for the rest of the experiments.
(9 marks)
3. Find the optimal number of hidden neurons for the 3-layer network designed in part (2).
a) Plot the cross-validation accuracies against the number of epochs for different
number of hidden-layer neurons. Limit the search space of number of neurons to
{5,10,15,20,25}.
b) Select the optimal number of neurons for the hidden layer. State the rationale for
your selection.
c) Plot the train and test accuracies against epochs with the optimal number of
neurons.
(9 marks)
4. Find the optimal decay parameter for the 3-layer network designed with optimal hidden
neurons in part (3).
a) Plot cross-validation accuracies against the number of epochs for the 3-layer
network for different values of decay parameters. Limit the search space of decay
parameters to {0, 10−3
, 10−6
, 10−9
, 10−12}.
b) Select the optimal decay parameter. State the rationale for your selection.
c) Plot the train and test accuracies against epochs for the optimal decay parameter.
(9 marks)
5. After you are done with the 3-layer network, design a 4-layer network with two hiddenlayers, each consisting 10 neurons, and train it with a batch size of 32 and decay parameter
10-6
.
a) Plot the train and test accuracy of the 4-layer network.
b) Compare and comment on the performances of the optimal 3-layer and 4-layer
networks.
(10 marks)
Hint: Sample code is given in file ‘start_project_1a.py’ to help you get started with this
problem.
Part B: Regression Problem
This assignment uses the data from the Graduate Admissions Predication [2]. The dataset
contains several parameters, like GRE score (out of 340), TOEFL score (out of 120), university
Rating (out of 5), strengths of Statement of Purpose and Letter of Recommendation (out of
5), undergraduate GPA (out of 10), research experience (either 0 or 1), that are considered
important during the application for Master Programs. The predicted parameter is the chance
of getting an admit (ranging from 0 to 1). You can obtain the data from:
Graduate Admission 2
Each data sample is a row of 9 values: 1 serial number (ignore), 7 input attributes and the
probability of getting an admit as targets. Divide the dataset at 70:30 ratio for training and
testing.
1. Design a 3-layer feedforward neural network consists of an input layer, a hidden-layer of
10 neurons having ReLU activation functions, and a linear output layer. Use mini-batch
gradient descent with a batch size = 8, 𝐿2
regularization at weight decay parameter 𝛽 =
10−3
and a learning rate 𝛼 = 10−3
to train the network.
a) Use the train dataset to train the model and plot both the train and test errors against
epochs.
b) State the approximate number of epochs where the test error is minimum and use it
to stop training.
c) Plot the predicted values and target values for any 50 test samples.
(10 marks)
2. Use the train data to compute (and plot) an 8X8 correlation matrix between the different
feature scores and the corresponding chances of admit.
a) Which features are most correlated to each other? Is it justifiable?
b) What features have the highest correlations with the chances of admit?
(12 marks)
3. Recursive feature elimination (RFE) is a feature selection method that removes
unnecessary features from the inputs. Start by removing one input feature that causes
the minimum drop (or maximum improvement) in performance. Repeat the procedure
recursively on the reduced input set until the optimal number of input features is
reached. Remove the features one at a time. Compare the accuracy of the model with
all input features, with models using 6 input features and 5 input features selected using
RFE. Comment on the observations.
(12 marks)
4. Design a four-layer neural network and a five-layer neural network, with the hidden
layers having 50 neurons each. Use a learning rate of 10-3
for all layers and optimal
feature set selected in part (3).
Introduce dropouts (with a keep probability of 0.8) to the layers and report the
accuracies. Compare the performances of all the networks (with and without dropouts)
with each other and with the 3-layer network.
(14 marks)
Hint: Sample code is given in file ‘start_project_1b.py’ to help you get started with this
problem.
References:
[1] Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of
Cardiotocograms. J Matern Fetal Med 5:311-318.
[2] Mohan S Acharya, Asfia Armaan, Aneeta S Antony: A Comparison of Regression Models
for Prediction of Graduate Admissions, IEEE International Conference on Computational
Intelligence in Data Science 2019.