A Questionnaire-Based Ensemble Learning Model to Predict the Diagnosis of Vertigo: Model Development and Validation Study

doi:10.2196/34126

Original Paper

¹Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, Shanghai, China

²Nursing Department, Eye & ENT Hospital, Fudan University, Shanghai, China

³Department of Information Management and Information Systems, Fudan University, Shanghai, China

⁴State Key Laboratory of Medical Neurobiology and Ministry of Education Frontiers Center for Brain Science, Fudan University, Shanghai, China

⁵National Health Commission Key Laboratory of Hearing Medicine, Fudan University, Shanghai, China

⁶Institutes of Brain Science and the Collaborative Innovation Center for Brain Science, Fudan University, Shanghai, China

⁷Department of Otorhinolaryngology-Head and Neck Surgery, The Second Affiliated Hospital of Anhui Medical University, Hefei, China

⁸Department of Otolaryngology-Head and Neck Surgery, The First Affiliated Hospital, Medical College, Xiamen University, Xiamen, China

⁹Department of Otolaryngology-Head and Neck Surgery, Shengjing Hospital of China Medical University, Shenyang, China

¹⁰Department of Otolaryngology, Shanghai Pudong Hospital, Shanghai, China

¹¹Department of Otolaryngology, Shenzhen Second People’s Hospital, Shenzhen, China

¹²Department of Otolaryngology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China

¹³Institutes of Biomedical Sciences, Fudan University, Shanghai, China

*these authors contributed equally

Corresponding Author:

Huawei Li, MD, PhD

Department of Otorhinolaryngology

Eye & ENT Hospital

Fudan University

Room 611, Building 9

No. 83, Fenyang Road, Xuhui District

Shanghai, 200031

China

Phone: 86 021 64377134 ext 2669

Email: hwli@shmu.edu.cn

Background: Questionnaires have been used in the past 2 decades to predict the diagnosis of vertigo and assist clinical decision-making. A questionnaire-based machine learning model is expected to improve the efficiency of diagnosis of vestibular disorders.

Objective: This study aims to develop and validate a questionnaire-based machine learning model that predicts the diagnosis of vertigo.

Methods: In this multicenter prospective study, patients presenting with vertigo entered a consecutive cohort at their first visit to the ENT and vertigo clinics of 7 tertiary referral centers from August 2019 to March 2021, with a follow-up period of 2 months. All participants completed a diagnostic questionnaire after eligibility screening. Patients who received only 1 final diagnosis by their treating specialists for their primary complaint were included in model development and validation. The data of patients enrolled before February 1, 2021 were used for modeling and cross-validation, while patients enrolled afterward entered external validation.

Results: A total of 1693 patients were enrolled, with a response rate of 96.2% (1693/1760). The median age was 51 (IQR 38-61) years, with 991 (58.5%) females; 1041 (61.5%) patients received the final diagnosis during the study period. Among them, 928 (54.8%) patients were included in model development and validation, and 113 (6.7%) patients who enrolled later were used as a test set for external validation. They were classified into 5 diagnostic categories. We compared 9 candidate machine learning methods, and the recalibrated model of light gradient boosting machine achieved the best performance, with an area under the curve of 0.937 (95% CI 0.917-0.962) in cross-validation and 0.954 (95% CI 0.944-0.967) in external validation.

Conclusions: The questionnaire-based light gradient boosting machine was able to predict common vestibular disorders and assist decision-making in ENT and vertigo clinics. Further studies with a larger sample size and the participation of neurologists will help assess the generalization and robustness of this machine learning method.

J Med Internet Res 2022;24(8):e34126

doi:10.2196/34126

Keywords

vestibular disorders; machine learning; diagnostic model; vertigo; ENT; questionnaire

Dizziness and vertigo are the major complaints of patients with vestibular disorders, with an estimated lifetime prevalence of dizziness (including vertigo) of 15%-35% [1]. Dizziness and vertigo are incapacitating and considerably impact patients’ quality of life. These conditions often lead to activity restriction and are closely associated with psychiatric disorders such as anxiety, phobic, and somatoform disorders [1-3]. Patients with dizziness and vertigo are also at a higher risk of falls and fall-related injuries, especially older people [4]. However, the diagnosis of vestibular disorders is challenging and time-consuming. It involves a variety of vestibular and neurological causes and complex pathological processes, leading to misdiagnosis and potentially widespread overuse of imaging among vertiginous patients [5-8]. Consequent delays in diagnosis can worsen the functional and psychological consequences of the disease.

The application of artificial intelligence in diagnosing dizziness and vertigo dates back more than 30 years. Expert systems such as Vertigo [9], Carrusel [10], and One [11] consist of knowledge bases with fixed diagnostic rules. They infer through nonadaptive algorithms that were unable to learn from patients’ data. Different machine learning algorithms, including genetic algorithms, neural networks, Bayesian methods, k-nearest neighbors, and support vector machines, have also been employed to analyze patient data from One [12-16]. The predictive accuracy was 90%-97% for 6 common otoneurologic diagnoses and 76.8%-82.4% for 9 diagnostic categories. EMBalance is a comprehensive platform that was launched in 2015 to assist the diagnosis, treatment, and evolution of balance disorders by using ensemble learning methods based on decision trees (Adaptive Boosting) [17,18]. There has been a shift from pure knowledge-driven to data-driven methodology in computer-aided diagnosis of vestibular disorders.

Except Vertigo, all of the models mentioned above are based on patients’ medical history and examinations combined with necessary tests, while in practice, patient history alone provides important clues to possible diagnosis and further evaluation [19]. Numerous questionnaires for dizziness and vertigo have emerged during the past 2 decades to assist the clinical diagnosis of vestibular disorders [20-27]. Most of these studies used simple statistical models, typically logistic regression, validated with the same data as modeling [26-28]. Few studies have tried to apply machine learning algorithms. However, the accuracy of these models was not as good as that of simple statistical models owing to small data sets or inappropriate choice of modeling data [29,30].

This study is part of the Otogenic Vertigo Artificial Intelligence Research (OVerAIR) study, in which the overarching purpose is to build a comprehensive platform that integrates diagnosis, treatment, rehabilitation, and follow-up in a cohort of patients with otogenic vertigo by using artificial intelligence. The specific aims of this study include developing and verifying a diagnostic platform for vertigo and assisting clinical decision-making by using machine learning techniques and further exploring the effectiveness and clinical utility of the proposed platform.

Study Design

Patients presenting with a new complaint of vertigo or dizziness according to the classification of vestibular symptoms by the Barany Society [31] were enrolled consecutively from the ENT and vertigo clinics of Eye & ENT Hospital of Fudan University, The Second Hospital of Anhui Medical University, The First Affiliated Hospital of Xiamen University, Shengjing Hospital of China Medical University, Shanghai Pudong Hospital, Shenzhen Second People’s Hospital, and The First Affiliated Hospital of Chongqing Medical University from August 2019 through March 2021. At their first interview with an ENT specialist, patients completed the electronic version of the questionnaire via a tablet or smartphone after giving informed consent. Those who were unable to read and complete the questionnaire by themselves answered the questions read by the researchers. We did not interfere with the normal medical procedures of the patients. Patients were scheduled for a next visit as the specialist considered necessary; therefore, they did not stick to a fixed follow-up time.

Ethics Approval

This study was approved by the Institutional Review Boards of all participating centers (approval 2019091). This study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis reporting guidelines [32].

Outcomes

Each patient went through routine history collection followed by complete otoneurological examinations, and further workup (ie, pure tone audiometry, vestibular testing, computed tomography, and magnetic resonance imaging) was prescribed when necessary. The clinical diagnosis given by ENT specialists with more than 5 years of clinical experience who were blinded to questionnaire responses was used as the reference diagnosis. The reference diagnostic standards include practice guidelines for benign paroxysmal positional vertigo (BPPV) by the American Academy of Otolaryngology-Head and Neck Surgery [33] and diagnostic criteria for vestibular disorders (including vestibular migraine [34], Meniere disease [35], persistent postural-perceptual dizziness [36], vestibular paroxysmia [37], and bilateral vestibulopathy [38]) by the Barany society. Patients with typical clinical features who did not meet the criteria of definite diagnosis were given probable diagnosis. Patients without a specific diagnosis within 2 months or who stopped coming for visits before reaching a final diagnosis were labeled undetermined.

Questionnaire Development

The diagnostic questionnaire was developed through an iterative process that mainly consisted of the following 3 stages.

Focus group and panel meeting: First, a focus group discussion and 3 follow-up panel meetings were convened to identify the commonly seen peripheral vestibular disorders in ENT clinics. In this process, 16 disorders were identified and the featured manifestations of each disorder were listed. The literature of diagnostic or practice guidelines for each disorder was searched and the pertinent ones were carefully reviewed. After that, the initial questionnaire composed of 43 items was drafted.
Patient interview: Fifteen patients who presented with vertigo in our ENT clinic were interviewed for the understandability and easiness of filling out the questionnaire. Two patients reported that it was too long and time-consuming. Another 3 complained of being asked too many questions such as heart disease and medication taken, which seemed unrelated to their vertigo condition. At this stage, the wording of the questionnaire was thoroughly simplified and 6 questions were deleted.
Expert group meeting: At a national conference, 12 experts (from ENT, neurology, vestibular examination, and rehabilitation) were invited to evaluate the suitability and clarity of the questionnaire, and they put forward suggestions for further revision. During this process, the items were reordered and some were combined or omitted.

Statistical Analysis

We compared 9 candidate machine learning methods to screen for the one with the best performance. Five non–ensemble learning algorithms were considered, namely, decision tree [39], ridge regression [40], logistic regression (with L2-regularization) [41], support vector classification [42], and support vector classification with stochastic gradient descent [43]. Ensemble learning refers to a general meta approach that strategically improves predictive performance by combining the predictions from multiple models. Four of the ensemble learning methods were implemented, namely, random forest [44], Adaptive Boosting [45], gradient boosting decision tree [46], and light gradient boosting machine (LGBM) [47]. We took bootstrapped cross-validation that randomly sampled data into train and validation sets by 7:3, which were repeated 100 times with replacement [48]. Models were trained on the training set and evaluated based on the prediction performance on the validation set. The best model was selected and tuned based on the average prediction performance over the 100 validation set. The area under the curve (AUC) was used to evaluate the performance of the models. In multiclass prediction, sensitivity, specificity, likelihood ratio, and AUC were calculated through a one-vs-rest scheme (microaverage). Then, recalibration was performed using calibration curves [49] and Brier scores [50] to adjust the difference between the predicted probability and observed proportion of each diagnostic category. External validation was performed using the data of the newest patients in the cohort (enrolled during the last 2 months), which constituted the test set. The 95% CIs of all the metrics were calculated through bootstrapping.

The missing values of Boolean variables were imputed with False in the main results, and sensitivity analysis was conducted by comparing different imputation strategies (ie, without imputation or imputation with True). All machine learning algorithms were implemented in Python, and the code is available in online resources. Hyperparameters are set to default according to the state-of-art machine learning package: sklearn.

Robustness and Sample Size Analysis

As a data-driven prediction approach for boosting clinical diagnosis, it is necessary to verify that the number of samples is enough for model development and validation. Following Riley [51] and Riley et al [52], we quantified the sufficiency of sample size in terms of the global shrinkage factor and the minimal number of samples. The criterion of enough sample size is to ensure a shrinkage factor >0.9. Further, given the acceptable shrinkage factor (eg, 0.9), the necessary size of the samples to develop a prediction model can be estimated based on the Cox-Snell ratio of explained variance.

Further, the increased flexibility of modern techniques implies that larger sample sizes may be required for reliable estimation compared with classical methods such as logistic regression. Thus, we followed the approach of van der Ploeg et al [53] to evaluate our best model LGBM’s sensitivity on sample size. The training set is of different sizes and subsampled from the development set. Each training set size is repeated 30 times to eliminate randomness, while the average AUC measures the performance on the test set.

Important Variables

To measure the importance of variables, we first evaluated multivariate feature importance according to information gains in cross-validation and selected the top 20 important variables. Then, to figure out feature importance in individual diagnostic categories, each selected variable was used to predict the 5 diagnostic categories independently, and univariate variable importance was measured in terms of AUC.

Overview of the Diagnostic Questionnaire

The final questionnaire consists of 23 items that incorporated branching logic. The full version of the questionnaire is available in Multimedia Appendix 1. The contents of the items are shown in Textbox 1.

Items in the diagnostic questionnaire.

One question on the characteristic of the symptom: was the head spinning or not? If not, then the kind of dizziness needs to be specified (heavy/muddled head, staggering, or other)
Three questions on the frequency, duration, and duration it has been since the first vertigo attack
One question on the condition of hearing loss, that is, which side and how it changes
Three questions on the condition of tinnitus, aural fullness, and earache, that is, which side and whether it changes before and after the attack should be specified (aggravate before/during the attack, relieve after the attack)
One question on the presence of headache, specifically the time of headache attack and relevant family history
One question on accompanied photophobia or phonophobia
One question on unsteadiness during, after, or without vertigo attacks
One question on whether symptoms worsen when standing or walking
Two questions on the condition of fall, consciousness state, and whether there was incontinence during the attack
Five questions on the triggering factors of vertigo, that is, lying down, turning over, getting up quickly, holding breath, loud stimulation, in some special scenes, special foods or smells, fatigue, insomnia, and getting angry
One question on whether it is cervical vertigo, that is, upper limb numbness and pain or neck pain
One question on prodrome, that is, cold, fever, and diarrhea before onset
One question on the medical history of otological disorders, that is, otorrhea, otitis media, ear surgery
One question on head and neck trauma and surgery history

Textbox 1. Items in the diagnostic questionnaire.

Demographic Characteristics of the Participants

A prospective cohort of 1693 patients was enrolled from the ENT and vertigo clinics of 7 participating centers (Table 1). The response rate was 96.2% (1693/1760, 67 declined participation). Of the 1693 enrolled patients, 1041 (61.5%) received 1 final diagnosis by the treating specialists, 14 (0.8%) had more than one diagnosis, 145 (8.6%) had a probable diagnosis, while the other 493 (29.1%) did not receive the final diagnosis within 2 months. The final diagnoses were found to be unevenly distributed. The most common diagnoses were BPPV, vestibular migraine, sudden sensorineural hearing loss with vestibular dysfunction (SSNHL-V), and Meniere disease. Since only patients with 1 final diagnosis were included in the model development and validation, 1041 patients (median age 50 [IQR 38-61] years, 608 [58.4%] females) in the 5 diagnostic categories were included in the model development and validation. Less frequent diagnoses with no more than 20 cases were labeled as “others” for the moment because there were not sufficient cases for them to form separate categories.

Of the 1041 patients, 928 were classified into the training set (for modeling and cross-validation) and 113 were included in the test set (Table 2). Figure 1 shows the study flowchart. The details of the training set and test set are described in Table 2.

Table 1. Demographic characteristics of the participants (N=1693).

Characteristic		Value
Age (years), median (IQR)		51 (38-61)
Sex, n (%)
	Female	991 (58.5)
	Male	702 (41.6)
Diagnoses, n (%)
	Benign paroxysmal positional vertigo	398 (23.5)
	Vestibular migraine	203 (12)
	Meniere disease	194 (11.5)
	Sudden sensorineural hearing loss with vestibular dysfunction	173 (10.2)
	Others^a	73 (4.3)
	Multiple diagnosis	14 (0.8)
	Probable diagnosis	145 (8.6)
	Undetermined	493 (29.1)

^aThis category included vestibular neuritis, persistent postural-perceptual dizziness, psychogenic dizziness, delayed endolymphatic hydrops, vestibular paroxysmia, cervicogenic vertigo, acoustic neuroma, presbyvestibulopathy, light cupula, Ramsay-Hunt syndrome, labyrinthine fistula, and superior semicircular canal dehiscence syndrome.

Table 2. Characteristics of the training data set and test set.

Characteristic		Training set (n=928)	Test set (n=113)
Age (years), median (IQR)		50 (37-60)	53 (41-63)
Sex, n (%)
	Female	536 (57.8)	72 (63.7)
	Male	392 (42.2)	41 (36.3)
Diagnoses, n (%)
	Benign paroxysmal positional vertigo	348 (37.5)	50 (44.2)
	Vestibular migraine	182 (19.6)	21 (18.6)
	Meniere disease	168 (18.1)	26 (23)
	Sudden sensorineural hearing loss with vestibular dysfunction	164 (17.6)	9 (8)
	Others^a	66 (7.1)	7 (6.2)

Figure 1. Patients with a new vertigo or dizziness complaint were screened between August 2019 and March 2021. Diagnoses were recorded within 2 months of follow-up.

Development and Validation of the Model

The LGBM model had the highest AUC of 0.937 (95% CI 0.917-0.962) and the lowest Brier score of 0.057 (95% CI 0.049-0.068) among the 9 models in cross-validation (Table 3). Therefore, it was recalibrated and used as the final predictive model.

For sensitivity analysis, when imputing the missing value with mode (the most frequent label), the AUC and Brier score of all 9 methods dropped (Table 4). Note that LGBM does not rely on imputation methods; therefore, it can directly utilize the information from missing to achieve a better prediction performance. LGBM without imputation performs as well as the recalibrated LGBM (imputed with 0), which verifies the robustness of our method. Ensemble learning methods performed better than non–ensemble learning methods except logistic regression with LASSO in cross-validation, indicating that the introduction of ensemble learning in vertigo diagnosis is effective across specific ensemble approaches. Further, LGBM performs better than other methods in AUC and Brier scores.

The receiver operating characteristic curves of the recalibrated LGBM model in cross-validation are shown in Figure 2. Table 5 presents the AUC, sensitivity, specificity, likelihood ratios, and accuracy in different diagnostic categories in both cross and external validation. The model made highly accurate prediction for SSNHL-V (AUC>0.98, positive likelihood ratio [+LR]>20, negative likelihood ratio [–LR]<0.05), accurate prediction for BPPV and Meniere disease (AUC>0.95, sensitivity>0.8, specificity>0.9, accuracy>0.9, +LR>10, –LR<0.2), and showed fair discriminative ability for vestibular migraine (AUC 0.9, 95% CI 0.87-0.92). The prediction of other diagnoses was unstable owing to the limited sample size and great heterogeneity in this category, with an AUC ranging from 0.771 to 0.929 in cross-validation and 0.879 to 0.957 in external validation.

Calibration curves in cross-validation (Figure 3) properly estimated the probability of Meniere disease and vestibular migraine and slightly underestimated the probability of SSNHL-V and BPPV. The predictions for other diagnoses were relatively conservative, as it was less likely to give probabilities close to 0 or 1. The Brier score was 0.058 (95% CI 0.049-0.068) in cross-validation, which suggested that the predicted probabilities fitted well with the actual proportions of the diagnoses. We also applied our methods to the external data set. The results indicated that the selected best model, LGBM, was of generalization ability in predicting vertigo diagnosis, achieving an AUC of 0.958 (95% CI 0.951-0.969). Meanwhile, LGBM also performed better than the second-best method, logistic regression, which achieved an AUC of 0.939 (95% CI 0.925-0.956) in external validation. The multivariable feature importance in terms of information gain is shown in Table 6.

The analysis of the global shrinkage factor of each diagnostic category and sensitivity analysis results indicated that the sample size of this study is sufficient for model development. See Multimedia Appendix 2 for more details of sample size analysis. Then, to figure out feature importance in individual diagnostic categories, each of the top 20 contributing variables in Table 6 was used to predict the 5 diagnostic categories independently, and univariate variable importance was measured in terms of AUC (Figure 4).

Table 3. The prediction performance of candidate algorithms.

Method		Area under the curve (95% CI)	Brier score (95% CI)
Non–ensemble learning
	Decision tree	0.765 (0.726-0.798)	0.125 (0.104-0.146)
	Ridge regression	0.803 (0.780-0.831)	0.087 (0.071-0.104)
	Logistic regression	0.928 (0.907-0.956)	0.060 (0.051-0.069)
	Support vector classification	0.501 (0.499-0.505)	0.239 (0.220-0.258)
	Stochastic gradient descent	0.733 (0.611-0.824)	0.141 (0.083-0.254)
Ensemble learning
	Random forest	0.924 (0.900-0.949)	0.063 (0.056-0.070)
	Adaptive Boosting	0.851 (0.793-0.901)	0.148 (0.144-0.151)
	Gradient boosting decision tree	0.925 (0.902-0.951)	0.064 (0.053-0.076)
	Light gradient boosting machine	0.935 (0.913-0.960)	0.057 (0.047-0.067)
	Recalibrated light gradient boosting machine	0.937 (0.917-0.962)	0.058 (0.049-0.068)

Table 4. Performance of different algorithms while imputing missing data with mode.

Method		Area under the curve (95% CI)	Brier score (95% CI)
Non–ensemble learning
	Decision tree	0.746 (0.690-0.791)	0.137 (0.114-0.169)
	Ridge regression	0.788 (0.733-0.817)	0.096 (0.076-0.121)
	Logistic regression	0.921 (0.900-0.943)	0.067 (0.057-0.082)
	Support vector classification	0.500 (0.500-0.500)	0.240 (0.222-0.258)
	Stochastic gradient descent	0.727 (0.578-0.819)	0.148 (0.090-0.251)
Ensemble learning
	Random forest	0.919 (0.896-0.939)	0.068 (0.061-0.078)
	Adaptive Boosting	0.833 (0.741-0.887)	0.148 (0.143-0.156)
	Gradient boosting decision tree	0.915 (0.888-0.935)	0.073 (0.059-0.093)
	Light gradient boosting machine	0.929 (0.906-0.950)	0.062 (0.055-0.072)
	Light gradient boosting machine (without imputation)	0.935 (0.916-0.956)	0.057 (0.049-0.065)

Figure 2. The receiver operating characteristic curves (solid lines) with 95% CI (between 2 dashed lines) for each diagnostic category. The performance of each diagnostic category was evaluated through one-vs-rest scheme. BPPV: benign paroxysmal positional vertigo; SSNHL-V: sudden sensorineural hearing loss with vertigo.

Table 5. Predictive ability in different diagnostic categories.

		AUC^a (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	+LR^b (95% CI)	–LR^c (95% CI)	Accuracy (95% CI)
Benign paroxysmal positional vertigo
	CV^d	0.97 (0.96-0.99)	0.94 (0.87-0.99)	0.92 (0.85-0.97)	13.23 (6.55-29.3)	0.07 (0.01-0.14)	0.92 (0.89-0.95)
	EV^e	0.98 (0.97-0.99)	0.97 (0.92-1)	0.90 (0.83-0.94)	10.23 (5.88-17.92)	0.04 (0-0.09)	0.93 (0.90-0.96)
Vestibular migraine
	CV	0.91 (0.87-0.94)	0.86 (0.76-0.95)	0.85 (0.74-0.95)	6.58 (3.56-13.93)	0.17 (0.07-0.27)	0.85 (0.78-0.92)
	EV	0.9 (0.87-0.92)	0.66 (0.52-0.76)	0.90 (0.85-0.96)	7.38 (4.71-12.05)	0.38 (0.26-0.51)	0.86 (0.82-0.88)
Sudden sensorineural hearing loss with vertigo
	CV	0.99 (0.97-1)	0.95 (0.88-1)	0.95 (0.90-0.99)	25.07 (9.39-67.93)	0.05 (0-0.12)	0.95 (0.91-0.98)
	EV	1.00 (1.00-1.00)	1.00 (1.00-1.00)	0.98 (0.97-1.00)	Inf^f (34.67-Inf)	0.00 (0.00-0.00)	0.98 (0.97-1)
Meniere disease
	CV	0.96 (0.93-0.98)	0.92 (0.81-1)	0.90 (0.82-0.96)	10.79 (5.28-22)	0.09 (0-0.21)	0.90 (0.84-0.95)
	EV	0.97 (0.97-0.98)	0.82 (0.69-0.88)	0.98 (0.95-0.99)	Inf (18.4-Inf)	0.19 (0.12-0.31)	0.94 (0.91-0.96)
Others
	CV	0.86 (0.77-0.93)	0.83 (0.66-1)	0.78 (0.55-0.93)	4.44 (2.10-9.77)	0.21 (0-0.44)	0.78 (0.57-0.91)
	EV	0.92 (0.88-0.96)	0.74 (0.50-0.86)	0.90 (0.85-0.94)	7.59 (5.05-12.02)	0.38 (0.26-0.51)	0.89 (0.85-0.93)

^aAUC: area under the curve.

^b+LR: positive likelihood ratio.

^c–LR: negative likelihood ratio.

^dCV: cross-validation.

^eEV: external validation.

^fInf: Positive likelihood ratio was infinity because specificity was 1.

Figure 3. Calibration curves (blue solid lines) with pointwise 95% confidence limits (grey ribbon) on the validation data based on recalibrated light gradient boosting machine model. BPPV: benign paroxysmal positional vertigo; SSNHL-V: sudden sensorineural hearing loss with vertigo.

Table 6. Multivariable feature importance in light gradient boosting machine model.

Variable	Feature importance
Sudden hearing loss	1039.8
Duration of episodes	912.3
Hearing loss	694.8
Time since first onset	468.1
Trigger: getting up, lying down, or rolling over	358.0
Age	255.6
History of headache	250.6
Frequency of attacks	221.4
Fluctuating hearing loss	186.3
Photophobia or phonophobia	185.7
Time since first hearing loss	183.7
Recurring symptoms	155.9
Tinnitus	135.5
Ear fullness	135.4
Headache during attacks	117.7
Aggravated by standing or walking	80.4
Trigger: fatigue, lack of sleep	69.7
Vertigo	65.0
Pain or numbness in the upper limbs	62.4
Unsteadiness during attacks	59.5
Family history of headache	54.2
Male	54.1
Fall	47.3
Loss of consciousness, incontinence	44.6
Tinnitus: aggravated before an attack, alleviated after an attack	36.7
Trigger: visual stimuli	31.0
Trigger: sound and pressure	23.0
Unsteadiness: after first onset	22.4
Prodrome: cold, fever, vomiting, or diarrhea	22.0
Family history of dizziness	17.4
Trigger: certain foods	15.9
Otalgia	11.6
Conscious when falling	9.8
History of otitis media or ear surgery	7.2
Tinnitus: worsen during vertigo	4.5
Fluctuating: gradually worsen	0.0
Unsteadiness between attackss	0.0
Recent history of head and neck trauma or surgery	0.0

Figure 4. Area under the curve in univariate prediction was used as the estimate of variable importance. AUC: area under the curve; BPPV: benign paroxysmal positional vertigo; SSNHL-V: sudden sensorineural hearing loss with vertigo.

Principal Findings

In this multicenter prospective cohort study, a questionnaire was developed to diagnose vertigo, and an LGBM model was developed using patients’ historical data collected through the questionnaire. This is, to our knowledge, the first questionnaire-based machine learning model to predict multiple diagnoses of vertigo. Because all the patients in this study were from ENT and vertigo clinics, the distribution of diagnoses differs from that in previous studies conducted in neurology and balance clinics [19-21,26]. There was a much higher prevalence of SSNHL-V (173/1693, 10.2%) and a lower prevalence of vestibular neuritis (22/1693, 1.3%) in our study.

Our model outperformed previously reported questionnaire-based statistical models in predicting common vestibular diagnoses [20,21,26]. A possible explanation is that machine learning methods are better at dealing with potentially nonlinear relationships and overfitting. Additionally, given the subjectivity of patient-reported historical information, data-driven models are better fits in questionnaire-based prediction than knowledge-driven models [9,11,54,55]. Compared with previous machine learning diagnostic systems that used comprehensive patient history data, physical examination, and laboratory tests, our questionnaire-based diagnostic model has its merits [13-17]. First, medical history provides important clues to the cause of vertigo, based on which the doctor will try to confirm or exclude a presumptive diagnosis. Therefore, a questionnaire-based diagnostic tool can provide early decision support according to patient history and help reduce unnecessary workup. Further, since questionnaire data come directly from patients, the model’s performance does not rely on the accurate interpretation of patient history by professionals. Besides, considering the limited accessibility of specific tests (eg, pure tone audiometry, caloric test, video head impulse test), a questionnaire requiring no special equipment is suitable across different clinical settings. However, a questionnaire-based diagnostic model also has intrinsic limitations. Patient-reported medical history can be imprecise because it can be easily affected by recall bias, misinterpretation, emotional state of the patients, and other subjective factors. Meanwhile, for patients with only nonspecific symptoms, physical examination and laboratory testing are more important diagnostic tools. Patient history should always be combined with objective evidence to make a more reliable diagnosis. Therefore, it is necessary to introduce physical examination and laboratory test results into the system in the future to make a comprehensive stepwise diagnostic prediction.

Limitations

This study had the following limitations. The uneven distribution of diagnoses made it difficult for the model to give accurate predictions of rare diagnoses. In order to reduce potential noise, we included only patients with 1 final diagnosis in modeling. The exclusion of patients with undetermined diagnosis was a potential source of bias. There were several reasons that these patients did not receive a specific diagnosis. In some cases, patients with BPPV might experience spontaneous remission while waiting for the scheduled positional test and treatment (1-2 weeks later), which also explains the relatively low prevalence of BPPV in our cohort than that in other ENT clinics [56]. The exclusion of these patients could reduce noise and improve model performance. Besides, some patients only experienced transient symptoms without observable structural, functional, or psychological changes; therefore, no specific diagnosis was given. Moreover, while a majority of patients completed all the necessary examinations within the follow-up, it was also possible that some rare causes were not determined within 2 months, possibly adding to the imbalance of data. Nevertheless, as the cohort expands, more patients with rare diagnoses will be included, which will enable the model to predict rare diagnoses with higher accuracy. We can also manage the influence of imbalanced data during modeling. Meanwhile, the observed AUC in external validation was higher than that in cross-validation, which could be accounted for by the relatively small sample size of the test set. More participants with definite diagnosis are needed for providing further validation. Finally, since this study was conducted in the ENT and vertigo clinic of tertiary centers, the predictive power of the model is yet to be verified in different clinical settings.

Conclusion

This study presents the first questionnaire-based machine learning model for the prediction of common vestibular disorders. The model achieved strong predictive ability for BPPV, vestibular migraine, Meniere disease, and SSNHL-V by using an ensemble learning method LGBM. As part of the OVerAIR platform, it can be used to assist clinical decision-making in ENT clinics and help with the remote diagnosis of BPPV. We have also been working on a smartphone app that integrates the questionnaire with referral, follow-up, treatment, and rehabilitation to improve the health outcomes of patients with vertigo. The next phase of the OVerAIR study will involve the participation of neurologists, which is expected to improve the model’s predictive ability for central vertigo and help assess its generalization and robustness.

Acknowledgments

This study was supported by the Capacity Building Project for interdisciplinary diagnosis and treatment of major diseases (otogenic vertigo), Shanghai Municipal Health Commission, Shanghai Municipal Key Clinical Specialty (shslczdzk00801), General Project of Scientific Research Fund of Shanghai Health Committee (grant 202040286), and the National Natural Science Foundation of China (91846302 and 72033003). The funding sources played no part in the design and conduct of this study; collection, management, analysis, and interpretation of data; writing of the report; or the decision to submit the manuscript for publication.

Authors' Contributions

FY and HD had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. FY, PW, HD, CZ, and HL were responsible for the concept and design of this study. All authors contributed to the acquisition, analysis, or interpretation of data and critical revision of the manuscript for important intellectual content. FY, PW, and HD drafted the manuscript. FY and HD performed the statistical analysis. HL and CZ obtained the funding for this study. FY, PW, HD, JW, HY, CZ, and HL provided administrative, technical, or material support. HL and CZ supervised the study.

Conflicts of Interest

None declared.

‎

Multimedia Appendix 1

The final version of the diagnostic questionnaire for vertigo applied in this study.

DOCX File , 21 KB

‎

Multimedia Appendix 2

Sample size analysis.

DOCX File , 119 KB

Eckhardt-Henn A, Best C, Bense S, Breuer P, Diener G, Tschan R, et al. Psychiatric comorbidity in different organic vertigo syndromes. J Neurol 2008 Mar;255(3):420-428. [CrossRef] [Medline]
Best C, Eckhardt-Henn A, Tschan R, Dieterich M. Psychiatric morbidity and comorbidity in different vestibular vertigo syndromes. Results of a prospective longitudinal study over one year. J Neurol 2009 Jan;256(1):58-65. [CrossRef] [Medline]
Lahmann C, Henningsen P, Brandt T, Strupp M, Jahn K, Dieterich M, et al. Psychiatric comorbidity and psychosocial impairment among patients with vertigo and dizziness. J Neurol Neurosurg Psychiatry 2015 Mar;86(3):302-308. [CrossRef] [Medline]
de Moraes SA, Soares WJDS, Ferriolli E, Perracini MR. Prevalence and correlates of dizziness in community-dwelling older people: a cross sectional population based study. BMC Geriatr 2013 Jan 04;13:4 [FREE Full text] [CrossRef] [Medline]
Grill E, Strupp M, Müller M, Jahn K. Health services utilization of patients with vertigo in primary care: a retrospective cohort study. J Neurol 2014 Aug;261(8):1492-1498. [CrossRef] [Medline]
Ahsan SF, Syamal MN, Yaremchuk K, Peterson E, Seidman M. The costs and utility of imaging in evaluating dizzy patients in the emergency room. Laryngoscope 2013 Sep;123(9):2250-2253. [CrossRef] [Medline]
Bakhit M, Heidarian A, Ehsani S, Delphi M, Latifi SM. Clinical assessment of dizzy patients: the necessity and role of diagnostic tests. Glob J Health Sci 2014 Mar 24;6(3):194-199 [FREE Full text] [CrossRef] [Medline]
Royl G, Ploner CJ, Leithner C. Dizziness in the emergency room: diagnoses and misdiagnoses. Eur Neurol 2011;66(5):256-263. [CrossRef] [Medline]
Mira E, Buizza A, Magenes G, Manfrin M, Schmid R. Expert systems as a diagnostic aid in otoneurology. ORL J Otorhinolaryngol Relat Spec 1990;52(2):96-103. [CrossRef] [Medline]
Gavilán C, Gallego J, Gavilán J. 'Carrusel': an expert system for vestibular diagnosis. Acta Otolaryngol 1990;110(3-4):161-167. [CrossRef] [Medline]
Auramo Y, Juhola M, Pyykkö I. An expert system for the computer-aided diagnosis of dizziness and vertigo. Med Inform (Lond) 1993;18(4):293-305. [CrossRef] [Medline]
Laurikkala JPS, Kentala EL, Juhola M, Pyvkkö IV. A novel machine learning program applied to discover otological diagnoses. Scand Audiol Suppl 2001(52):100-102. [CrossRef] [Medline]
Siermala M, Juhola M, Kentala E. Neural network classification of otoneurological data and its visualization. Comput Biol Med 2008 Aug;38(8):858-866. [CrossRef] [Medline]
Miettinen K, Juhola M. Classification of otoneurological cases according to Bayesian probabilistic models. J Med Syst 2010 Apr;34(2):119-130. [CrossRef] [Medline]
Varpa K, Joutsijoki H, Iltanen K, Juhola M. Applying one-vs-one and one-vs-all classifiers in k-nearest neighbour method and support vector machines to an otoneurological multi-class problem. Stud Health Technol Inform 2011;169:579-583. [Medline]
Joutsijoki H, Varpa K, Iltanen K, Juhola M. Machine learning approach to an otoneurological classification problem. 2013 Presented at: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); July 3-7, 2013; Osaka, Japan p. 1294-1297.
Exarchos T, Rigas G, Bibas A, Kikidis D, Nikitas C, Wuyts F, et al. Mining balance disorders' data for the development of diagnostic decision support systems. Comput Biol Med 2016 Oct 01;77:240-248. [CrossRef] [Medline]
Exarchos T, Bellos C, Bakola I, Kikidis D, Bibas A, Koutsouris D, et al. Management and modeling of balance disorders using decision support systems: the EMBALANCE project. In: GeNeDis 2014: Adv Exp Med Biol. Cham: Springer; 2015:61-67.
Zhao JG, Piccirillo JF, Spitznagel EL, Kallogjeri D, Goebel JA. Predictive capability of historical data for diagnosis of dizziness. Otol Neurotol 2011 Feb;32(2):284-290 [FREE Full text] [CrossRef] [Medline]
Bayer O, Warninghoff J, Straube A. Diagnostic indices for vertiginous diseases. BMC Neurol 2010 Oct 25;10:98 [FREE Full text] [CrossRef] [Medline]
Friedland DR, Tarima S, Erbe C, Miles A. Development of a Statistical Model for the Prediction of Common Vestibular Diagnoses. JAMA Otolaryngol Head Neck Surg 2016 Apr;142(4):351-356 [FREE Full text] [CrossRef] [Medline]
Imai T, Higashi-Shingai K, Takimoto Y, Masumura C, Hattori K, Inohara H. New scoring system of an interview for the diagnosis of benign paroxysmal positional vertigo. Acta Otolaryngol 2016;136(3):283-288. [CrossRef] [Medline]
Lapenna R, Faralli M, Del Zompo MR, Cipriani L, Mobaraki PD, Ricci G. Reliability of an anamnestic questionnaire for the diagnosis of benign paroxysmal positional vertigo in the elderly. Aging Clin Exp Res 2016 Oct;28(5):881-888. [CrossRef] [Medline]
Chen W, Shu L, Wang Q, Pan H, Wu J, Fang J, et al. Validation of 5-item and 2-item questionnaires in Chinese version of Dizziness Handicap Inventory for screening objective benign paroxysmal positional vertigo. Neurol Sci 2016 Aug;37(8):1241-1246. [CrossRef] [Medline]
Li L, Qi X, Liu J, Wang Z. Formulation and evaluation of diagnostic questionnaire for benign paroxysmal positional vertigo. Journal of the Neurological Sciences 2017 Oct;381:148-149. [CrossRef]
Roland L, Kallogjeri D, Sinks B, Rauch S, Shepard N, White J, et al. Utility of an Abbreviated Dizziness Questionnaire to Differentiate Between Causes of Vertigo and Guide Appropriate Referral: A Multicenter Prospective Blinded Study. Otol Neurotol 2015 Dec;36(10):1687-1694 [FREE Full text] [CrossRef] [Medline]
Kim H, Song J, Zhong L, Yang X, Kim J. Questionnaire-based diagnosis of benign paroxysmal positional vertigo. Neurology 2019 Dec 30;94(9):e942-e949. [CrossRef]
Britt CJ, Ward BK, Owusu Y, Friedland D, Russell JO, Weinreich HM. Assessment of a Statistical Algorithm for the Prediction of Benign Paroxysmal Positional Vertigo. JAMA Otolaryngol Head Neck Surg 2018 Oct 01;144(10):883-886 [FREE Full text] [CrossRef] [Medline]
Richburg H, Povinelli R, Friedland D. Direct-to-patient survey for diagnosis of benign paroxysmal positional vertigo. 2018 Dec Presented at: 17th IEEE International Conference on Machine Learning and Applications; December; Orlando, FL, USA p. 332-337.
Masankaran L, Viyanon W, Mahasittiwat V. Classification of benign paroxysmal positioning vertigo types from dizziness handicap inventory using machine learning techniques. 2018 Presented at: International Conference on Intelligent Informatics and Biomedical Sciences; 2018; Bangkok, Thailand p. 209-214.
Bisdorff A, Von Brevern M, Lempert T, Newman-Toker DE. Classification of vestibular symptoms: Towards an international classification of vestibular disorders. VES 2009 Oct 01;19(1-2):1-13. [CrossRef]
Collins GS, Reitsma JB, Altman DG, Moons KGM, TRIPOD Group. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. The TRIPOD Group. Circulation 2015 Jan 13;131(2):211-219 [FREE Full text] [CrossRef] [Medline]
Bhattacharyya N, Gubbels SP, Schwartz SR, Edlow JA, El-Kashlan H, Fife T, et al. Clinical Practice Guideline: Benign Paroxysmal Positional Vertigo (Update). Otolaryngol Head Neck Surg 2017 Mar;156(3_suppl):S1-S47. [CrossRef] [Medline]
Lempert T, Olesen J, Furman J, Waterston J, Seemungal B, Carey J, et al. Vestibular migraine: Diagnostic criteria. VES 2012 Nov 01;22(4):167-172. [CrossRef]
Lopez-Escamez JA, Carey J, Chung W, Goebel JA, Magnusson M, Mandalà M, et al. Diagnostic criteria for Menière's disease. VES 2015 Mar 01;25(1):1-7. [CrossRef]
Staab JP, Eckhardt-Henn A, Horii A, Jacob R, Strupp M, Brandt T, et al. Diagnostic criteria for persistent postural-perceptual dizziness (PPPD): Consensus document of the committee for the Classification of Vestibular Disorders of the Bárány Society. VES 2017 Oct 21;27(4):191-208. [CrossRef]
Strupp M, Lopez-Escamez JA, Kim J, Straumann D, Jen JC, Carey J, et al. Vestibular paroxysmia: Diagnostic criteria. VES 2017 Jan 27;26(5-6):409-415. [CrossRef]
Strupp M, Kim J, Murofushi T, Straumann D, Jen JC, Rosengren SM, et al. Bilateral vestibulopathy: Diagnostic criteria Consensus document of the Classification Committee of the Bárány Society. VES 2017 Oct 21;27(4):177-189. [CrossRef]
Loh W. Classification and regression trees. WIREs Data Mining Knowl Discov 2011 Jan 06;1(1):14-23. [CrossRef]
Rifkin RM, Lippert RA. Notes on regularized least squares. CSAIL Technical Reports. 2007 May 01. URL: http://hdl.handle.net/1721.1/37318 [accessed 2022-06-25]
Hosmer JDW, Lemeshow S, Sturdivant RX. Applied Logistic Regression. Hoboken, New Jersey: John Wiley & Sons; 2013.
Hsu CW, Chang CC, Lin CJ. A practical guide to support vector classification. Data Science ASSN. 2003. URL: http://www.datascienceassn.org/sites/default/files/Practical%20Guide%20to%20Support%20Vector%20Classification.pdf [accessed 2021-02-19]
Bottou L. Stochastic gradient descent tricks. In: Montavon G, Orr G, Müller KR, editors. Neural Networks: Tricks of the Trade. Berlin, Heidelberg: Springer; 2012:3-540.
Breiman L. Random forests. Machine learning 2001;45(1):5-32. [CrossRef]
Hastie T, Rosset S, Zhu J, Zou H. Multi-class adaboost. Statistics and its Interface 2009;2(3):349-360. [CrossRef]
Friedman JH. Stochastic gradient boosting. Computational Statistics & Data Analysis 2002 Feb;38(4):367-378. [CrossRef]
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. 2017 Dec Presented at: Advances in Neural Information Processing Systems 30 (NIPS 2017); December 4-9, 2017; Long Beach, CA, USA.
Rao C, Wu Y. Linear model selection by cross-validation. Journal of Statistical Planning and Inference 2005 Jan;128(1):231-240. [CrossRef]
Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol 2016 Jun;74:167-176. [CrossRef] [Medline]
Harrell JFE. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York, USA: Springer; 2015.
Riley RD. Correction to: Minimum sample size for developing a multivariable prediction model: Part II-binary and time-to-event outcomes by Riley RD, Snell KI, Ensor J, et al. Stat Med 2019 Dec 30;38(30):5672 [FREE Full text] [CrossRef] [Medline]
Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE, Moons KG, et al. Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Stat Med 2019 Mar 30;38(7):1262-1275. [CrossRef] [Medline]
van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol 2014 Dec 22;14:137 [FREE Full text] [CrossRef] [Medline]
Dong C, Wang Y, Zhang Q, Wang N. The methodology of Dynamic Uncertain Causality Graph for intelligent diagnosis of vertigo. Comput Methods Programs Biomed 2014;113(1):162-174. [CrossRef] [Medline]
Feil K, Feuerecker R, Goldschagg N, Strobl R, Brandt T, von Müller A, et al. Predictive Capability of an iPad-Based Medical Device (mex) for the Diagnosis of Vertigo and Dizziness. Front Neurol 2018;9:29 [FREE Full text] [CrossRef] [Medline]
Parker I, Hartel G, Paratz J, Choy N, Rahmann A. A Systematic Review of the Reported Proportions of Diagnoses for Dizziness and Vertigo. Otol Neurotol 2019 Jan;40(1):6-15. [CrossRef] [Medline]

‎

AUC: area under the curve

BPPV: benign paroxysmal positional vertigo

LGBM: light gradient boosting machine

OVerAIR: Otogenic Vertigo Artificial Intelligence Research

SSNHL-V: sudden sensorineural hearing loss with vestibular dysfunction

Edited by R Kukafka; submitted 12.10.21; peer-reviewed by R Bajpai, S Kim; comments to author 30.01.22; revised version received 14.02.22; accepted 13.06.22; published 03.08.22

©Fangzhou Yu, Peixia Wu, Haowen Deng, Jingfang Wu, Shan Sun, Huiqian Yu, Jianming Yang, Xianyang Luo, Jing He, Xiulan Ma, Junxiong Wen, Danhong Qiu, Guohui Nie, Rizhao Liu, Guohua Hu, Tao Chen, Cheng Zhang, Huawei Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.08.2022.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

A Questionnaire-Based Ensemble Learning Model to Predict the Diagnosis of Vertigo: Model Development and Validation Study