Main analysis

The primary analysis for the thesis, where we train a classifier for the code vs prose task.


Table of Contents

NOTE: This TOC is manually built and may not be up to date.


Setup

Configuration

This cell contains all the configuration options available for the analysis.

Misc setup

Loading

Loading EEG

First we need to load the EEG data used during the experiments.

Lets have a look at the loaded data:

We see that some channels are bad some of the time, we will deal with that later.

Loading markers

Now we need to load the markers produced during each trial of the experiment, so we can annotate the EEG data.

Lets take a look at some of the marker rows:

Inspect the markers

Preprocessing

Now we need to preprocess the data a bit, gathering the EEG data for each trial in the experiment.

Select subjects

Filter no answer trials

We filter away rows where space was clicked (didn't answer/skipped/unsure?)

Filter short trials

We filter away rows where the subject didn't spend at least min_task_duration seconds with the task.

Bandpass filtering

Exponential moving standardize

Constructing epochs

Now we match up the EEG data with the markers to create our epochs.

Inspecting epochs

Epochs to windows

Now we split up the epochs into windows of a fixed size.

Filter away windows with bad signal

Constructing our X and y

Now to actually construct matrices that we can feed into the classifier.

Balance the dataset

Inspect the dataset

Training

Here we train our model using pyRiemann.

First we set up the different classifiers we want to train:

And then we train each classifier and plot their respective confusion matrices:

Learning curves

Now to check the learning curves and see if the train and validation scores converge.

Note: Performance is currently terrible as there isn't enough data for the model to learn to generalize across subjects (easily seen by changing to shuffled CV).

A great example of how to plot learning curves is available here: https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html

LORO validation

Compare to bandpower features

Now we're interested in if our classifiers outperform basic bandpower features with common classifers such as SVM and random forest.

This code is largely based on the training for device activity in eegclassify.main._train_features.