The Relative Contribution of High-Gamma Linguistic Processing Stages of Word Production, and Motor Imagery of Articulation in Class Separability of Covert Speech Tasks in EEG Data

Word production begins with high-Gamma automatic linguistic processing functions followed by speech motor planning and articulation. Phonetic properties are processed in both linguistic and motor stages of word production. Four phonetically dissimilar phonemic structures “BA”, “FO”, “LE”, and “RY” were chosen as covert speech tasks. Ten neurologically healthy volunteers with the age range of 21–33 participated in this experiment. Participants were asked to covertly speak a phonemic structure when they heard an auditory cue. EEG was recorded with 64 electrodes at 2048 samples/s. Initially, one-second trials were used, which contained linguistic and motor imagery activities. The four-class true positive rate was calculated. In the next stage, 312 ms trials were used to exclude covert articulation from analysis. By eliminating the covert articulation stage, the four-class grand average classification accuracy dropped from 96.4% to 94.5%. The most valuable features emerge after Auditory cue recognition (~100 ms post onset), and within the 70–128 Hz frequency range. The most significant identified brain regions were the Prefrontal Cortex (linked to stimulus driven executive control), Wernicke’s area (linked to Phonological code retrieval), the right IFG, and Broca’s area (linked to syllabification). Alpha and Beta band oscillations associated with motor imagery do not contain enough information to fully reflect the complexity of speech movements. Over 90% of the most class-dependent features were in the 30-128 Hz range, even during the covert articulation stage. As a result, compared to linguistic functions, the contribution of motor imagery of articulation in class separability of covert speech tasks from EEG data is negligible. Electronic supplementary material The online version of this article (10.1007/s10916-018-1137-9) contains supplementary material, which is available to authorized users.


I. INTRODUCTION
Motor imagery is a well-established paradigm in BCIs. The low-frequency oscillations (< 35 Hz) elicited by "MI" activity, have been detectable by EEG for many decades. MI does not occur independently and is the end-result of many cognitive functions. For example, anticipating an onset cue and initiating "imagined" movement after cue recognition requires stimulus-driven executive control, with high-Gamma activity in regions such as the pre-frontal cortex [1,2]. To take advantage of such class dependent cognitive activity [3,4], the entire data bandwidth of the EEG system must be utilized [5] (and not only Alpha and Beta bands). Covert word production begins with high-Gamma (>70 Hz) linguistic processing stages [6][7][8], followed by motor imagery of articulation [9,10]. Language is exceedingly more complex than movement [11] and requires analysis with much higher resolution than traditional MI band power [12]. However, covert speech is more intuitive and natural for BCI communication compared to MI. In this study, a single experimental protocol and analysis pipeline is used: once for MI tasks, and once for covert speech tasks. The performance of the system for each paradigm is calculated and the results are discussed.

A. Experiment Protocol
In this study, each recording run contains four classes, which are shown in the user interface by four arrows: up, down, left, and right. Within a recording run, 10 examples of each task are presented in a random order (each run has 40 trials) to avoid user fatigue. During recording, a new task is determined by an arrow appearing on the screen for 3 seconds. After the arrow disappears, there is a 3 second standby state. Task onset is presented as a beep sound for all classes. A second beep indicates a rest period before the next trial. The experimental protocol is presented in figure 1.
Each user completes two recording runs, which are identical in every way with the exception of type of mental task (MI, covert speech). For MI tasks, the four arrows represent left hand movement (left arrow), right hand (right arrow), left foot (down arrow), and right foot (up arrow). In covert speech tasks, the user imagines speaking the phonemic structures: BA (back/down arrow), FO (forward/up arrow), LE (left arrow), and RY (right arrow), which are phonetically very dissimilar tasks [13].

B. Data Acquisition
Four neurologically healthy volunteers participated in this experiment. The EEG signals were recorded using an Enobio dry electrode system with 20 channels and 10/10 configuration [14]. Data was recorded at a sampling rate of 500 Hz and saved in "gdf" format. Compared to wet electrode systems, setting up the Enobio is extremely easy. However, the quality of recorded signals may restrict the number of classes it can use simultaneously. This study provides an evaluation of the system's capability. Figure 1. The experiment protocol for recording four randomly presented trials. Each class corresponds to a directional arrow. After task presentation, a beep sound is used for all classes as task onset. A second beep indicates a rest period before the next task.

C. Pre-Processing
Recorded data was pre-processed using EEGLAB [15]. Data was down-sampled to 256Hz and re-referenced using common average. Line noise was removed with an FIR notch filter (49.5,50.5Hz). The AAR toolkit [16] was used for artefact rejection. EOG and EMG artifacts were reduced, with SOBI [17] and CCA algorithms [18] respectively. These methods outperform ICA, which is ineffective beyond 70 Hz [19,20]. One-second epochs were extracted from the pre-processed data and saved as a numeric matrix for further analysis.

D. Feature Extraction
The discrete Gabor transform [21,22] was used to generate features. The original data can be reconstructed from the features with no information loss. Each Gabor feature contains information on both time and frequency. In this study, a time step of 0.015625 seconds (4 time samples) and a frequency band of 2Hz is used. A one-second epoch from one channel (256 time samples) is converted into a 64x64 feature matrix. Figure 2 presents the definition of the discrete Gabor transform. This method of feature selection makes it possible to identify the type of neural activity from the indexes of the features used in classification.

E. Feature Selection and classification
Classification true positive rate is estimated by a 5-fold cross validation process [23]. In each fold, 8 trials are used for feature selection and training the classification object, and 2 trials are set aside for testing. The most valuable features for distinguishing four classes are discovered by the Davies-Bouldin index [24]. Initially, all pair-wise DBI matrices are calculated (6 binary combinations with 4 classes). The four-class DBI is a conservative approximation based on the two-class DBIs, which is defined in figure 3. In this experiment, 91% of the total computational cost is spend on generating the DBI matrix. However, the dimensionality of the feature space is significantly reduced. In this study, the 3K most valuable features (from a total of 81920) are identified and used to train the LDA classifier. Features in the test data with the same indexes are used to test the performance of the classifier.

A. Classification Accuracy
The true positive rates for one class vs. all, are estimated as the mean and standard deviation of the five-fold cross validation process. Table 1 contains these results for the four participants and both types of cognitive task. The reader should kindly consider that the objective of this study is not to maximize classification accuracy. The experimental protocol and analysis pipeline provided identical environments for both paradigms, while identifying the most important activities related to the selected features. With four classes, the classification accuracy is significantly higher than chance level for both paradigms, suggesting that the results are meaningful, despite being imperfect.

B. Time-frequency distribution of best features
The 60K features identified in the motor imagery experiments, are shown in a cumulative joint time-frequency plot of the feature space and presented in figure 4. As expected, valuable class dependent activity is not limited to the Alpha and Beta bands. In addition, the nominal bandwidth of (1-125) Hz given by Enobio is confirmed, as valuable features are identified in the entire frequency range.  For motor imagery tasks in this experiment 15.1% of all the most valuable features are significantly concentrated within the (0.73-0.875) second range. This time period corresponds with performing imagined movements and the suppression of the Primary Motor Cortex (stopping actual movements) via "goal driven executive control". Such executive control involves high-frequency cognitive activity in brain regions such as the Superior Parietal Cortex and the Pre-Frontal Cortex [1,2]. 23.2% of all the most valuable features are within the Alpha and Beta bands (MI). The other 76.8% of the features are in the Gamma, and high-Gamma bands (cognitive functions). This suggests that in motor imagery tasks, cognitive functions generate a significantly greater amount of class dependent activity compared to the execution movement. Figure 5 presents the cumulative joint time-frequency plot of the feature space containing the most valuable 60K features identified in the covert speech experiments. 48.8% of these features are above 70 Hz, which correspond with the linguistic processing functions [8]. These linguistic functions, which are entirely class dependent, do not exist in motor imagery. This provides a possible explanation for the higher classification accuracy of covert speech tasks (82.5%) compared to motor imagery tasks (77.2%) in an identical environment, considering there is a direct positive correlation (with R=0.8822 and P=0) between their performances.
Considering that tasks are identified before trials begin, the cognitively demanding linguistic functions (conceptual preparation, Lemma selection) are completed before onset. The linguistic functions occurring within trials (phonological code retrieval, syllabification) are performed automatically by the brain [9] and require no user effort. All other cognitive functions within trials (executive control, imagined movement) are also present in MI tasks. As a result, the cognitive effort of using covert speech tasks and MI tasks are virtually identical in this study.

IV. DISCUSSION
The linguistic processing stages of word production prior to articulation, which are entirely class-dependent, consist of conceptual preparation, Lemma selection, phonological code retrieval, and syllabification [25]. By incorporating difference in meaning, and difference in phonetic structure, for selecting selected covert speech tasks, class separability can be significantly enhanced.
In this experiment, linguistic class separability is maximized by selecting phonetically dissimilar covert speech classes [13]. This explains the superior performance of covert speech tasks compared to MI tasks in the otherwise identical environment designed in this study.
Linguistic studies using intra-cranial implants have demonstrated that these linguistic processing stages have high-Gamma signatures in the (70-170Hz) range [10,[26][27][28][29]. As bandwidth of EEG systems increases and EMG removal algorithms become more reliable, covert speech BCIs will become much more capable. Although other BCI systems (such as MI) will also improve, language, which is the most intuitive and natural form of human communication, would logically be the preferred paradigm of choice for a BCI.