Comparative analysis of various machine learning techniques for classification of speech disfluencies
Article
Sharma, N., Kumar, V., Mahapatra, P. and Gandhi, V. 2023. Comparative analysis of various machine learning techniques for classification of speech disfluencies. Speech Communication. 150, pp. 23-31. https://doi.org/10.1016/j.specom.2023.04.003
Type | Article |
---|---|
Title | Comparative analysis of various machine learning techniques for classification of speech disfluencies |
Authors | Sharma, N., Kumar, V., Mahapatra, P. and Gandhi, V. |
Abstract | Speech plays a vital role in communication, from expressing oneself, to utilizing speech-based platforms, speech is a necessity. Any disruption in speech is referred to as disfluency, and can impact one’s quality of life. This paper presents an experimental study on various techniques for the detection and classification of speech disfluencies. Six different types of disfluencies are examined in this paper, namely Interjection, Sound Repetition, Word Repetition, Phrase Repetition, Revision and Prolongation (6 classes). However, this paper also goes a step further by including the clean speech signals as an added class alongside the six disfluencies, thereby making this work more robust with 7 classes. Various machine learning approaches have been investigated on the University College London Archive of Stuttered Speech (UCLASS) dataset; a standard disfluency dataset generated by University College London (UCL). Five different feature extraction techniques viz. Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Cepstral Coefficients (LPCC), Gammatone Frequency Cepstral Coefficients (GFCC), Mel-filterbank energy features, and Spectrograms have been used. Comparative analysis of various classifiers shows that MFCC, GFCC, and Spectrograms achieved greater than 90% accuracy on both 6 and 7 classes with the kNN classifier. As a future scope to this study, the authors aim to focus on tackling the challenges of detecting multiple disfluencies present simultaneously in a speech sample. |
Keywords | Disfluency; Speech Recognition; Feature Extraction; Speech Signals |
Sustainable Development Goals | 9 Industry, innovation and infrastructure |
Middlesex University Theme | Health & Wellbeing |
Publisher | Elsevier |
Journal | Speech Communication |
ISSN | 0167-6393 |
Publication dates | |
Online | 23 Apr 2023 |
May 2023 | |
Publication process dates | |
Submitted | 07 Nov 2022 |
Accepted | 22 Apr 2023 |
Deposited | 06 Nov 2023 |
Output status | Published |
Accepted author manuscript | 2023-Comparative Analysis of Various Feature Extraction Techniques_FinalSubmittedAccepted_Speech.pdf License |
Copyright Statement | © 2023. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/(opens in new tab/window) |
Digital Object Identifier (DOI) | https://doi.org/10.1016/j.specom.2023.04.003 |
Language | English |
https://repository.mdx.ac.uk/item/vxy13
Download files
Accepted author manuscript
78
total views6
total downloads1
views this month0
downloads this month