Hostname: page-component-68c7f8b79f-rgmxm Total loading time: 0 Render date: 2025-12-27T13:39:18.265Z Has data issue: false hasContentIssue false

Longitudinal validation of the Maudsley 3-item visual analogue scale (M3VAS): a new, brief, patient-reported outcome measure of depression

Published online by Cambridge University Press:  22 December 2025

Daniel Silman
Affiliation:
Centre for Affective Disorders, Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK
Maria Elena Middag
Affiliation:
Centre for Affective Disorders, Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK
Anthony J. Cleare
Affiliation:
Centre for Affective Disorders, Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK South London and Maudsley NHS Foundation Trust, Bethlem Royal Hospital, Beckenham, UK
Allan H. Young
Affiliation:
Centre for Affective Disorders, Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK Division of Psychiatry, Department of Brain Sciences, Imperial College, London, UK
Rebecca Strawbridge*
Affiliation:
Centre for Affective Disorders, Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK
*
Correspondence: Rebecca Strawbridge. Email: Becci.strawbridge@kcl.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Background

The Maudsley 3-item visual analogue scale (M3VAS) was developed as a novel and intuitive patient-reported measure for depression, focusing on core symptoms and suicidality.

Aims

To evaluate the longitudinal validity of M3VAS for capturing symptom change over time.

Method

Both M3VAS and the Patient Health Questionnaire (PHQ-9, as reference standard) were administered in an observational study (RHAPSODY, no. NCT04939818) at weeks 0, 2 and 4 to both depressed patients (n = 50) and matched controls (n = 24). We serially tested factor structure, internal consistency and convergence (correlation) over time, assessing responsiveness by both correlation of change in score and effect of time across scales (analysis of variance and effect size).

Results

M3VAS exhibited strong factor loadings and high item interrelatedness (Cronbach’s alpha 0.78–0.83) at all time points. Total scores correlated strongly with PHQ-9 at each time point (r > 0.8, P < 0.001). Correlation of score change over the study period (r = 0.65, P < 0.001) also confirmed responsiveness. In the depressed group, an effect of time on score was seen for both M3VAS (F = 4.942, P = 0.010) and PHQ-9 (F = 12.505, P < 0.001), with standard response mean (Cohen’s d) of 0.58 and 0.74, respectively. No effect of time was seen in the control group.

Conclusions

Following previous cross-sectional validation against the Quick Inventory of Depressive Symptomatology–Self-report, this present study demonstrated appropriate longitudinal measurement properties for M3VAS as a measure of depression, including responsiveness. Evaluating the ability of M3VAS to discern responses with a variety of treatments is a key future goal.

Information

Type
Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Royal College of Psychiatrists

The limitations of gold-standard approaches to clinical measurement of depression symptoms are increasingly being emphasised. Reference Bagby, Ryder, Schuller and Marshall1Reference Fried, Flake and Robinaugh5 Major depressive disorder (MDD) can be highly heterogenous beyond the requirement for significant and pervasive low mood and/or anhedonia as the two core symptoms. Reference Fried6 Standard, full-length outcome measures with conventional sum-scoring could place undue emphasis on symptoms that are less universally clinically relevant – the Hamilton Depression Rating Scale (HAM-D), Reference Hamilton7 for example, includes symptoms not featured in modern diagnostic criteria for MDD (e.g. gastrointestinal) and those potentially reflecting medication side-effects (e.g. somatic symptoms, agitation). These limitations, contributing to the lack of a coherent singular underlying measurement construct, Reference Bagby, Ryder, Schuller and Marshall1,Reference Fried8 may be hampering sensitivity to capture treatment effects in clinical trials. Reference Young and Moulton9 Recent secondary data analyses have shown a noteworthy trend of increased antidepressant separation from placebo utilising singular measurement of mood Reference Hieronymus, Emilsson, Nilsson and Eriksson10 or shorter subscales. Reference østergaard, Bech and Miskowiak11,Reference Ostergaard, Bech, Trivedi, Wisniewski, Rush and Fava12 Creating shorter depression outcome measures by concentrating on symptoms that have higher clinical relevance and specificity to depression therefore presents an attractive option. Self-report measures that are rapidly and intuitively completed have the added potential as digital measures to contribute to enhanced patient self-monitoring and ecological momentary assessment – approaches which, per se, could enrich understanding of affective dynamics and predictors of treatment course in depression. Reference Targum, Sauder, Evans, Saber and Harvey13 While a range of such brief/digital mood measurements exist, few have been subject to substantive validation, Reference Malhi, Hamilton, Morris, Mannie, Das and Outhred14 especially in regard to being considered as a clinical trial end-point measure.

To this end, we developed the Maudsley 3-item visual analogue scale (M3VAS), a novel instrument that proposes the capture of brief and sensitive measurement of the two core depressive symptoms (low mood and anhedonia), alongside suicidality, due to its high clinical pertinence. A VAS response style was favoured as having demonstrated higher degrees of resolution, and consequently greater sensitivity to change over time, than standard Likert scales for physical disorders. Reference Pfennings, Cohen and van der Ploeg15,Reference Fähndrich and Linden16 Selection of appropriate anchor points on the scale may also allow for more intuitive capture of a global level of subjective difficulty (e.g. ‘not at all’ versus ‘extremely’ depressed) over items in gold-standard Likert scales that have been suggested to be limited by exclusive focus on symptom frequency, which may impede patients’ conveying meaningful change – as observers have noted in regard to Patient Health Questionnaire 9 (PHQ-9). Reference Malpass, Wiles, Dowrick, Robinson, Gilbody and Duffy17,Reference Cameron, Reid and Lawton18 Adoption of VAS for depression has undergone minimal iteration since first employed by Zealley and Aitken in 1969 Reference Zealley and Aitken19 – exclusively focusing on singular mood assessment, which is typically contemporaneous. Reference Huang, Kohler and Kämpfen20,Reference Ahearn21 Although reasonable validity has been shown, Reference Bauer, Crits Christoph, Ball, Dewees, Mcallister and Alahi22 we hypothesised that this could be optimised in M3VAS, which covers pervasiveness over the previous 2 weeks, with item combination enriching diagnostic and clinical utility while maintaining simplicity.

Within clinical trial populations we have previously demonstrated the cross-sectional validity of M3VAS showing a favourable single-factor structure, internal consistency and strong convergence with the Quick Inventory of Depressive Symptomatology–Self-report (QIDS-SR-16 Reference Moulton, Strawbridge, Tsapekos, Oprea, Carter and Hayes23 ). As a further step in validating M3VAS for ongoing monitoring of depression severity, we now examine its longitudinal measurement properties, this time against PHQ-9 during a recent non-interventional clinical study over 4 weeks (RHAPSODY Reference Hampsey, Meszaros, Skirrow, Strawbridge, Taylor and Chok24 ). Specifically, the objectives were to:

  1. (a) determine stability of the internal structure (factorial validity and internal consistency) of M3VAS on repeated measurement;

  2. (b) determine convergent validity between M3VAS and PHQ-9 scores;

  3. (c) determine responsiveness of M3VAS against PHQ-9 by both convergence of score changes over time and evaluating the effect of time on scores.

Method

A protocol for this secondary analysis was developed prior to data access (published on a preprint server Reference Middag, Silman, Cleare, Young, Carter and Strawbridge25 ). Longitudinal assessment of the scale’s internal structure was added post hoc at the discretion of the investigators (see Supplementary Table 1 for the full list of amendments since the original protocol).

Design

The RHAPSODY study Reference Hampsey, Meszaros, Skirrow, Strawbridge, Taylor and Chok24 was an observational, longitudinal study that examined speech phenotyping for remote evaluation of neurodegenerative and psychiatric conditions. Self-reported measures of depressive symptoms were collected at baseline (w0) and at 2 (w2) and 4 weeks (w4), in addition to data from verbal cognitive and speech tasks. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation, and with the Helsinki Declaration of 1975 as revised in 2013. All study procedures were approved by the Health Research Authority and Health and Care Research Wales (REC reference no. 21/PR/0070). Written informed consent was obtained from all volunteers involved in the study.

From the parent study (which included additional groups with other neurodegenerative conditions), this secondary analysis focuses on two participant groups: patients with a current depressive episode and healthy controls – in whom score change is likely to be smaller, allowing for group comparison including differences in effect of time by depression status, alongside the pooled analyses. Data from 74 participants were included in our analysis which, in reference to COSMIN consensus criteria, Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet26 was considered an adequate sample size with which to conduct the principal analyses to meet the study objectives (subgroup analyses that are at risk of bias due to doubtful sample size are highlighted in the Discussion, below).

Participants

Participants were recruited from July 2021 to June 2022 via local clinical services, research databases and community advertisements. Participants were required to be native English speakers aged 18–85 years in order to participate.

Eligible for the affective disorder group (n = 50) were those experiencing a current major depressive episode according to DSM-IV, as assessed by the Mini-International Neuropsychiatric Interview v.5.0; Reference Sheehan, Lecrubier, Sheehan, Amorim, Janavs and Weiller27 individuals could have a diagnosis of either bipolar disorder or MDD. Depressive symptoms were required to be of at least moderate severity, as assessed using the Clinical Global Impression scale. Reference Busner and Targum28 Any treatment status was accepted (and not overseen by the study team).

Unaffected healthy controls were matched to the affective disorder group for gender, age and education levels. Healthy controls were permitted to have mild physical comorbidities (including respiratory, immunological, metabolic or cardiologic conditions) that did not hinder daily functioning. Participants were excluded (all groups) if they had a current substance use disorder; had experienced a stroke within the past 2 years, a transient ischaemic attack or unexplained loss of consciousness within the past 12 months; presented with a current risk of suicide; or lacked the appropriate digital device requirements for participation.

Measures

All participants concurrently completed the two depression patient-reported outcome measures (PROMs) at w0, w2 and w4. M3VAS Reference Moulton, Strawbridge, Tsapekos, Oprea, Carter and Hayes23 asks participants to rate their experience of low mood, anhedonia and suicidality over the past 2 weeks by making a mark on a 100 mm unmarked line, resulting in a score of between 0 and 300. PHQ-9 Reference Kroenke, Spitzer and Williams29 is a widely used depression scale that scores each of the 9 DSM-IV diagnostic criteria for depression, from 0 (not at all) to 3 (nearly every day).

Statistical analysis

In addition to the full-scale score, a subscale of total PHQ-9 scores combining items 1 (anhedonia), 2 (mood) and 9 (suicidality) was calculated to mirror M3VAS (referred to here notionally as ‘PHQ-D’) for directly matched item comparison. The totals of the remaining six items were then combined to assess for any further divergence from the indirect domains of PHQ (referred to here as ‘PHQ-I’).

In addition to summary descriptive statistics (mean, standard deviation) for baseline characteristics and PROM scores at the various time points, the following analyses were conducted according to the study objective.

Objective 1: internal structure over repeated measurements

Factorial validity: factor analysis was evaluated at all three time points for M3VAS, PHQ-9 and the PHQ-D subscale for comparison. To assess suitability, we required a Kaiser–Meyer–Olkin (KMO) Measure of Sampling Adequacy >0.6 and a significant Bartlett’s Test of Sphericity (P < 0.001). Reference Bartlett30 Principal axis factoring was used as the (exploratory) extraction method to evaluate the factor structure of each respective scale. An oblimin rotation, Promax with Kaiser normalisation, was employed to account for potential correlation between the emerging factors. Eigenvalues greater than λ = 1 determined the appropriate number of factors to extract. Reference Guttman31 While only a single factor can be extracted from the 3 item scales, variance explained and factor loadings were explored in considering evidence of unidimensionality (with loadings ≤0.5 considered problematic).

Internal consistency: pairwise correlations between each item were evaluated using Cronbach’s alpha. The test for Cronbach’s alpha generates a value with 1.0 as the maximum outcome, and values closer to 1.0 indicating greater internal validity, Reference Cronbach32,Reference Tavakol and Dennick33 interpreted according to the following ranges: α ≥ 0.90, excellent; 0.90 > α ≥ 0.80, good; 0.80 > α ≥ 0.70, acceptable; 0.70 > α ≥ 0.60, questionable. Reference Pedhazur and Schmelkin34 Alpha if item deleted (AID) was considered for each item, with items of good internal consistency producing values lower than Cronbach’s alpha value. Item total correlation (ITC) values were calculated, with those between 0.3 and 0.8 generally indicating good internal consistency. Reference Pedhazur and Schmelkin34

Objective 2: convergence validity between M3VAS and PHQ-9 scores

The Pearson correlation coefficient r (with corresponding two-tailed P-values) was calculated between the M3VAS and PHQ-9 totals at each time point. This was performed for the total sample population (to observe global convergence across the spectrum of symptoms) and between groups for comparison, with r > 0.70 considered good criterion validity. Reference Mokkink, Elsman, Terwee, Prinsen, Patrick and Alonso35 This was repeated between M3VAS and the PHQ-D and PHQ-I subscales (with P significance threshold adjusted (0.015) for multiple comparisons, Bonferroni method). Correlation against PHQ-9 is also presented graphically using scatter plots, reporting R 2 and 95% confidence intervals associated with the line of best fit. Pearson’s correlation was also measured for individual M3VAS items against the three corresponding PHQ-9 items.

As a secondary examination of the sensitivity of M3VAS in detecting mild depressive symptoms, we analysed participants who scored 0 or 1 out of 3 on each PHQ-D item and observed the proportion who scored over 50 on the corresponding M3VAS item.

Objective 3: responsiveness of M3VAS (against PHQ-9)

As per the criterion approach for assessment of responsiveness, Reference Mokkink, Terwee and de Vet36 changes in score on M3VAS over the various intervals (w0–w2, w2–w4, w0–w4) were assessed for correlation (Pearson’s r) with corresponding changes on PHQ-9 (as the reference standard, r > 0.50 is considered strong Reference Mokkink, Terwee and de Vet36 ) over the same interval. This was also calculated across the total sample and for each group separately, then repeated for the PHQ-D and PHQ-I subscales (P significance threshold similarly adjusted (0.015) for multiple comparisons).

To further assess a construct approach for capturing score changes, we anticipated that an effect of time would be observed on M3VAS scores. This was first assessed by repeated-measures analysis of variance (ANOVA), examining the main effect of time on scale scores for the combined sample and each group separately (i.e. depressed and healthy), as well as the interaction effect of time and group. Reference Lakens37 Time (w0, w2 and w4) represented the within-subjects factor, with group (affective disorder versus healthy controls) the between-subjects factor. Mauchly’s test of sphericity was used to investigate violation of the univariate assumption (requiring P > 0.05). Additionally, paired-samples t-tests Reference David and Gunnink38 were used to determine whether there were statistically significant changes in depressive symptoms over various time windows (P < 0.025 determined significant, Bonferroni method) according to either measure. Finally, the standardised response means (Cohen’s d Reference Cohen39 ) were estimated by dividing the mean change between time points by the standard deviation of the mean change (mean change/s.d. mean change), to provide a coefficient of change (effect size) for each measure, both across the groups and combined.

All analysis was conducted using IBM SPSS Statistics version 27.0 for Windows (IBM Corp., Armonk, NY, USA; see https://www.ibm.com/products/spss-statistics).

Results

Population characteristics

From the RHAPSODY data-set (n = 173, all clinical groups), exclusions were made based on inappropriate group and incomplete/excess data entries at baseline (w0). A total of 74 participants were included in the data analysis cohort for the groups of interest: affective disorder (n = 50, comprising 24 with MDD and 26 with bipolar disorder) and their matched controls (healthy controls, n = 24). Of these 74 participants, 69 (93%) subsequently provided complete data on M3VAS and 66 (89%) on PHQ-9 at w2. At w4 this dropped further to 55 (74%) and 53 (72%), respectively. The baseline sample was highly gynocentric, with 13 male participants (17.5%), 60 female (81.1%) and 1 other (1.4%). The majority (n = 46, 62%) of participants were of White ethnicity. Groups were matched for gender, age and level of education as per protocol, although there was a significant mean difference in body mass index of 4.33 between the groups (t = −2.55, P = 0.006) at baseline, as shown in Table 1.

Table 1 Participant characteristics and symptom scores

BMI, body mass index; M3VAS, Maudsley three-item visual analogue scale for depression; PHQ, Patient Health Questionnaire. Groups were matched for gender, age and level of education as per protocol.

a. There was a significant mean difference in body mass index of 4.33 between groups (t = −2.55, P= 0.006).

Internal structure

M3VAS was suitable for factor analysis at all time points, with KMO >0.60 (0.60 at w0, 0.63 at w2 and 0.63 at w4) and a significant Bartlett’s test of sphericity P < 0.001 (χ 2(3) = 145.06, 96.96 and 56.56, respectively). The extracted factor (λ1 = 2.263, 2.229 and 2.134, respectively) corresponded to a broadly consistent proportion of the total variance – 75.4, 74.3 and 71.1%, respectively. Factor loadings for the individual items were also consistent across the time points, with suicidality (0.704–0.738) lower than the two core symptoms (low mood 0.910–0.949, anhedonia 0.872–0.931; see Supplementary Table 2). The factor structure for PHQ-D (λ1 = 2.219 for w0, 2.295 for w2 and 2.328 for w4) appeared broadly similar in both variance explained (74.0, 76.5 and 77.6%, respectively) and item loading distribution, with suicidality loading slightly higher (0.746–0.823) than for M3VAS.

PHQ-9 demonstrated a single factor at w0 (λ1 = 5.913) and w4 (λ1 = 5.761), with a corresponding lower proportion of variance explained (65.7 and 64.0%, respectively) than the two short scales, and a 2-factor structure at w2 (λ1 = 5.298 [58.9%], λ2 = 1.041 [11.6%]), in which suicidality (1.035), psychomotor retardation (0.779) and low mood (0.726) loaded most strongly onto factor 1; and poor sleep (0.956), low energy (0.942) and low appetite (0.770) loaded most strongly onto factor 2 (full data, including factor loadings, are shown in Supplementary Table 2).

Cronbach’s alpha for M3VAS was 0.83 at w0 and w2, and 0.78 at w4. Alpha values appeared similar to those for M3VAS for the comparable PHQ-D scale (0.82, 0.85 and 0.84, respectively), being higher for PHQ-9 (0.93, 0.91 and 0.93, respectively). For both the shorter scales, ITCs were consistently higher (range 0.72–0.88) and AID lower (range 0.51–0.73) for the two core symptoms than for suicidality (ITC 0.48–0.64, AID 0.85–0.95; see Supplementary Table 3).

Convergent validity

The mean and standard deviation of the total scores for M3VAS and PHQ-9 across groups and time points are detailed in Table 1. The M3VAS total scores correlated strongly with those of PHQ-9 at each time point (r > 0.8), being highest at baseline (r = 0.91). Convergence did not appreciably increase against only the directly relevant items of PHQ-D (0.88 v. 0.87 overall; see Table 2. By group, there appeared to be greater separation between M3VAS convergence with PHQ-D (r = 0.71 in the affective disorder group and 0.61 in healthy controls overall) and PHQ-I (r = 0.59 in the affective disorder group and 0.54 in healthy controls), with correlation against the full PHQ-9 consistently stronger in the affective disorder group than in controls. M3VAS and PHQ-9 correlation at all time points combined is presented visually in Fig. 1.

Fig. 1 Scatter plot correlating total outcome measure scores for M3VAS and PHQ-9. All time point data were pooled (i.e. to include paired data). R 2, coefficient of determination for fit line (solid), with dashed lines indicating 95% confidence interval. M3VAS, Maudsley three-item visual analogue scale for depression; PHQ-9, Patient Health Questionnaire 9.

Table 2 Convergent validity and responsiveness of M3VAS against PHQ-9 and its derivatives

M3VAS, Maudsley three-item visual analogue scale for depression; PHQ-9, Patient Health Questionnaire 9; w0, baseline time point; w2, week 2 time point; PHQ-D contains PHQ-9 items 1, 2 and 9, which match M3VAS symptoms; PHQ-I includes the remaining six PHQ-9 items; w4, week 4 time point; PHQ-9, Pearson’s correlation coefficient between measures at each time point: by group and overall.

*Statistically significant correlation using two-tailed P < 0.015 (i.e. alpha of 0.05 was adjusted for multiple comparisons (x3) of M3VAS at each time point; **P < 0.001; interpretation: r ≥ 0.7 for cross-sectional and ≥0.5 for score changes indicate strong correlation.Reference Guttman31

Exploring individual scale item correlation (see Supplementary Table 4), the strongest associations were consistently observed between corresponding items on M3VAS and PHQ-9 (r = 0.70–0.86). Weaker relationships were noted between the suicidality items and core-depressive symptoms on the alternate scale (r < 0.5). Of those who scored 0 or 1 out of 3 on each PHQ-D item, 30 out of 111 (27%) for low mood, 30/107 (28%) for anhedonia and 6/174 (3.4%) for suicidality rated their severity as >50 on the corresponding M3VAS item across all time points – as an index of potential differing sensitivities to detect mild symptoms.

Longitudinal correlation of score changes (responsiveness criterion approach)

Pearson’s correlation analyses were then conducted between score changes from baseline to the follow-up time points on M3VAS against PHQ-9 – including by group and for the PHQ subscales (Table 2). Agreement was generally strong and significant between M3VAS and PHQ-9 for the pooled study population, with r values ranging from 0.47 to 0.65 (all P < 0.001). In this full sample, score changes appeared more closely correlated between the direct items of PHQ-D (0.58 and 0.73) than the indirect PHQ-I (0.29 and 0.46) over the windows between individual study visits (w0–w2 and w2–w4), although this effect disappeared considering the full monitoring period (w0–w4 change). The weakest correlation of score change between M3VAS and PHQ-9 was observed for the healthy controls group over the w0–w2 interval (R = 0.36 v. 0.45 for the affective disorder group), although this was higher at the w2–w4 interval (0.64 for healthy controls v. 0.62 for the affective disorder group).

Effect of time on scores (responsiveness construct approach)

An effect of time on scores was shown for M3VAS (F = 4.94, P = 0.010), PHQ-9 (F = 12.51, P < 0.001) and PHQ-D (F = 8.53, P < 0.001) on repeated-measures ANOVA in the affective disorder group, but not for any scale in the healthy controls group (Table 3). When the groups were combined, this effect of time was preserved to statistical significance for both PHQ-9 and PHQ-D (although Mauchly’s test of sphericity was not upheld), and was close to the significance threshold for M3VAS (P = 0.052). Group status (as the between-subject factor) was confirmed to have significant interaction with the effect of time on score across measures (see Supplementary Table 5 for full details on ANOVA and Mauchly’s test).

Table 3 Score changes over time for M3VAS, PHQ-9 and PHQ-D and associated effect sizes and effect of time

M3VAS, Maudsley three-item visual analogue scale for depression; PHQ-9, Patient Health Questionnaire 9; PHQ-D contains PHQ-9 items 1, 2 and 9, which match M3VAS symptoms; F, repeated-measures analysis of variance examining effect of time with accompanying P-value; w0–w2, baseline to week 2; w2–w4, week 2 to week 4; w0–w4, baseline to week 4; MC, mean change in score over the stated interval; P, accompanying P-value for associated paired t-test; d, Cohen’s d (standardised response means). Negative values (e.g. in the healthy controls group) indicate score increase.

a. Greenhouse−Geisser correction where sphericity assumption has been violated (see Supplementary Table 5 for Mauchly’s test).

*Above MC denotes significance at P < 0.025 to account for multiple comparisons; interpretation: scores >0.5 indicate moderate effect.Reference Cronbach32

In the affective disorder group, paired-samples t-tests detected significant score reductions (P < 0.025 to account for multiple time point comparisons) across the full study window (w0–w4) on all three scales – M3VAS, PHQ-9 and PHQ-D (Table 3). PHQ-9 appeared to have greater effect size as measured by Cohen’s d (d = 0.74 v. 0.58 for M3VAS) in this group, as well as showing an additional statistically significant score reduction in the first study interval (w0–w2) – as did PHQ-D. No significant changes were detected in any of the measures for the same intervals for the healthy control group. When scores from both groups were pooled, score reductions on M3VAS stopped short of the significance threshold (w0–w4: t(54) = 2.22, P = 0.031, d = 0.30), while both PHQ-9 and PHQ-D continued to detect significant reductions from w0–w2 and w0–w4.

Discussion

In this study we evaluated the key longitudinal psychometric properties of M3VAS for patient-reported symptoms of depression in a non-interventional setting among a group of 74 participants comprising both symptomatic patients and healthy controls.

When measured serially over the 3 time points in the study (w0, w2 and w4), M3VAS maintained good structural validity as demonstrated by strong factor loadings and high item interrelatedness (Cronbach’s alpha) – replicating those shown previously in a cross-sectional validation. Reference Moulton, Strawbridge, Tsapekos, Oprea, Carter and Hayes23 These findings give further support to the suggestion that M3VAS reliably measures a coherent underlying mood construct – a favourable characteristic given the multifactorial structure observed in other depression scales. Reference Bech2,Reference Suzuki, Aoshima, Fukasawa, Yoshida, Higuchi and Shimizu40 However, a larger sample would be needed to confirm longitudinal measurement invariance by confirmatory factor analysis, affirming factorial stability over time. This is a limitation of the current sample and should be incorporated into more substantial evaluation.

There is also some apparent non-linearity in the M3VAS item relationship, given the higher factor loadings and ITCs (and lower AID) for the two core symptoms of depression (low mood and anhedonia) than for the suicidality item. Such findings fit the clinical observation of suicidality having complex underpinnings beyond current depression severity. Reference Sokero, Melartin, Rytsälä, Leskelä, Lestelä-Mielonen and Isometsä41 Responses on VAS could be inherently different for suicidality than for other symptoms, as suggested by the fact much higher proportions with mild/absent rating on PHQ-9 for low mood (item 2) and anhedonia (item 1) concurrently scored above 50 on the corresponding M3VAS item (28 and 27%, respectively) than for suicidality (3.4%). Such inferences are limited given the exclusion of higher-risk patients in this study, and it would be imperative to further explore how those with a clinically verified risk of suicide respond on M3VAS. Future utilisation of M3VAS in both research and clinical practice could potentially be developed to consider the two core symptoms as a distinct measure, themselves forming a more focused measure of depression severity, Reference Kennedy42 and incorporation of the suicidality item could then find clinical relevance to either augment the global view of depression severity or aid more specifically in risk assessment. Determining whether individual item scores change independently or at different speeds may also be of interest, considering the possibility of preferential symptom targeting by a particular drug class or therapy modality. Reference Nutt43

The structural validity of PHQ-9 has been extensively explored elsewhere Reference Chae, Lee and Lee44 and was included here for illustrative contrast against the ultra-short measures, with our results somewhat mirroring the inconsistency in dimensionality and item-to-factor loadings that has been seen across samples. Alpha remained in the excellent range of 0.91–0.93, indicating a high degree of interrelatedness of the items within PHQ-9, a feature that is likely to be raised in longer scales assessing the same construct. Reference Tavakol and Dennick33 PHQ-9 displayed a strong single factor at w0 and w4, and a two-factor structure at w2 – the latter in accord with several large-scale evaluations that distinguish apparent ‘affective’ and ‘somatic’ (sleep and appetite difficulties, fatigue) symptom groups. Reference Krause, Reed and McArdle45,Reference Wang, Liang, Sun, Liu, Wei and Qi46 The utility of parsing out such clusters/symptomatic phenotypes within depression has prompted significant research investment, with an emerging suggestion that such clusters may map to the Research Domain Criteria matrix of mental disorders, Reference Gunzler, Sehgal, Kauffman, Davey, Dolata and Figueroa47 and that their relative predominance in certain groups can impact on chronicity, Reference Bekhuis, Boschloo, Rosmalen, de Boer and Schoevers48 treatment response Reference Stegenga, Kamphuis, King, Nazareth and Geerlings49 and consequent physical health risks. Reference Shell, Williams, Patel, Vrany, Considine and Acton50

This study confirmed strong convergence of M3VAS with PHQ-9 at all time points, with correlation values appearing higher (r = 0.86–0.91 in this sample) than in a previous study comparing against QIDS-SR-16 Reference Moulton, Strawbridge, Tsapekos, Oprea, Carter and Hayes23 (r = 0.72). While it would be premature to draw firm conclusions about the relative convergence of M3VAS between these two gold-standard PROMs, this finding could reflect a weaker validity of QIDS-SR-16, as recently proposed. Reference Weiss, Erritzoe, Giribaldi, Nutt and Carhart-Harris4 Individual item correlations between M3VAS and corresponding symptoms on PHQ were also generally high (Pearson’s r = 0.70–0.92).

Different strands within the psychometric literature endorse varied approaches to capturing responsiveness in an outcome measure, Reference Mokkink, Terwee and de Vet36,Reference Angst51 defined as an ‘instrument’s ability to detect change over time in the construct to be measured’. We have performed a broad evaluation in this study. Despite the study’s short duration and lack of intervention, we hypothesised that a degree of symptom fluctuation, including naturalistic treatment responses, would be expected (at least in people with affective disorder). Appropriate longitudinal validity for M3VAS has been inferred in this study from both the strong correlation of score changes over the 4-week study interval with those seen on PHQ-9 (as the gold-standard, criterion approach), and a significant effect of time on scores in the depressed group as assessed by ANOVA and moderate effect size (Cohen’s d). Such effects were likewise demonstrated on PHQ-9 as the reference standard (construct approach) to a higher magnitude than for M3VAS in this sample. With limited published data on longitudinal comparison of VAS and Likert scales for depression, our findings therefore currently caution against any tentative expectations of M3VAS offering favourable sensitivity to change over established gold-standard Likert scales. However, longer studies with standardised interventions would clearly be optimal for examining PROMs’ relative sensitivity as end-points for capturing clinically relevant treatment responses (including remission), and this is the proposed next step for evaluation of M3VAS. Hypotheses that VAS may offer superiority in that regard have arisen from comparisons with scales assessing other affective constructs – such as pain Reference Parkes, Callaghan, O’Neill, Forsythe, Lunt and Felson52 and general well-being Reference Pfennings, Cohen and van der Ploeg15 – and continue to merit exploration, particularly given their relative ease of administration.

Score change correlation between M3VAS and PHQ-9 did appear slightly weaker in healthy controls, with the reduced score variability in this group being one possible contributor. However, the small numbers in this group challenge interpretation. For complete longitudinal validation of M3VAS, exclusion of measurement error on repeat testing in a larger healthy/euthymic population should be considered. Reference Mokkink, de Vet, Prinsen, Patrick, Alonso and Bouter53 The ability of M3VAS to detect new episodes of depression in these populations may also be a distinct longitudinal measurement property from capturing the degree of treatment response, and would merit further exploration.

Limitations

As mentioned above, expectations of symptom change in this secondary data analysis from the RHAPSODY study were limited by relatively short follow-up, and also by varied treatment status, introducing additional heterogeneity in expected effects. The attrition was also noteworthy (72 and 74% completing w4 PHQ-9 and M3VAS, respectively). While reasons for study exit are not formally captured, one could hypothesise that patients who were more depressed were more likely to find the intensive battery of cognitive and speech tasks (as part of the primary study) intolerably burdensome. Although this could feasibly overestimate the magnitude of symptom change, it would not be deemed a marked source of bias for within-subject comparisons (i.e. correlation) of the different scales, including over time.

Our secondary analysis was not powered a priori to detect specified clinical outcomes or psychometric properties. In reference to COSMIN best practice guidelines, Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet26 both the pooled and affective disorder groups appear adequate for the criterion (correlation) approach which, coupled with the strength of correlation, would be considered appropriate preliminary assessment of longitudinal validity. As per COSMIN, our sample would also be considered appropriate to evaluate responsiveness through a construct approach in testing hypotheses of group differences in change over time (group × time interaction), and an effect of time in the pooled sample (the latter not reaching statistical significance in our study). There is, however, some doubt as to the external validity in extrapolating certain group-specific findings, namely the within-group effect of time (which was compounded by the attrition observed) and the correlations in the smaller control group. Thus, we emphasise the need for larger-scale testing, both for depressed cohorts under controlled treatment conditions and for further validation in euthymic/healthy samples.

Supplementary material

The supplementary material is available online at https://doi.org/10.1192/bjo.2025.10932

Data availability

The data that support the findings of this study are subject to request from the corresponding author, R.S. The data are not publicly available due to ongoing analysis of the primary outcomes for the RHAPSODY study (including speech phenotyping) by the study sponsor. All analysis and research materials associated with this manuscript can be provided on reasonable request. M3VAS is freely available for the use of any non-profit organisation (including academics and clinicians), accessed via https://www.kcl.ac.uk/research/m3vas; for-profit organisations may need to purchase a licence. Please contact the corresponding author for access.

Acknowledgements

The authors thank the patients who participated in the RHAPSODY study. We also thank Elliot Hampsey and Rosie Taylor for their contributions as researchers in the study, as well as all other individuals (researchers, service users, clinicians and students) involved in recruitment, data collection and management, design, oversight and analysis of that study. The authors thank the company Novoic, which sponsored the study, and the particular involvement of Caroline Skirrow, Marton Meszaros and Emil Fristed from Novoic.

Author contributions

D.S. and M.E.M.: conceptualisation, methodology, data curation, formal analysis, writing – original draft; A.J.C., A.H.Y. and R.S.: conceptualisation, funding acquisition, supervision, writing – review and editing.

Funding

This report is independent research funded by the National Institute for Health Research (NIHR; Artificial Intelligence, Project RHAPSODY: investigating the clinical feasibility of using AI-based deep audio and language processing techniques to diagnose neurological and psychiatric diseases, AI_AWARD01984) and NHSX. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR, NHSX or the Department of Health and Social Care.

Declaration of interest

A.H.Y. reports paid lectures and advisory board membership for the following companies, in regard to drugs used in affective and related disorders: AstraZeneca, Eli Lilly, Lundbeck, Sunovion, Servier, LivaNova, Allegan, Bionomics, Sumitomo, Dainippon Pharma and Janssen; consultancy to Johnson & Johnson and LivaNova; principal investigator on the Restore-Life VNS registry study funded by LivaNova, and on ESKETINTRD3004 funded by Janssen. A.J.C. has received grant funding from MRC, affective disorderM Protexin Ltd, NIHR, European Union Horizon Europe/Innovate UK, Beckley Psytech Ltd and Wellcome Trust; has received payment or honoraria for speaking and/or consulting from Janssen, Otsuka, COMPASS Pathways Plc, Viatris and Medscape; and is President of the International Society for Affective Disorders. R.S. has, in the past 3 years, received honoraria for educational activities from Janssen. R.S. and A.H.Y. are members of the British Journal of Psychiatry Open editorial board; neither author had involvement in the review or decision-making process of this article. D.S. and M.E.M. have no conflicts of interest to declare.

Transparency declaration

All authors affirm that the manuscript is an honest, accurate and transparent account of the study being reported, that no important aspects of the study have been omitted and that all discrepancies from the study as planned have been explained.

References

Bagby, RM, Ryder, AG, Schuller, DR, Marshall, MB. The Hamilton Depression Rating Scale: has the gold standard become a lead weight? Am J Psychiatry 2004; 161: 2163–77.10.1176/appi.ajp.161.12.2163CrossRefGoogle ScholarPubMed
Bech, P. Rating scales in depression: limitations and pitfalls. Dialogues Clin Neurosci 2006; 8: 207–15.10.31887/DCNS.2006.8.2/pbechCrossRefGoogle Scholar
Timmerby, N, Andersen, JH, Søndergaard, S, Østergaard, SD, Bech, P. A systematic review of the clinimetric properties of the 6-item version of the hamilton depression rating scale (HAM-D6). Psychother Psychosom 2017; 86: 141–9.10.1159/000457131CrossRefGoogle ScholarPubMed
Weiss, B, Erritzoe, D, Giribaldi, B, Nutt, DJ, Carhart-Harris, RL. A critical evaluation of QIDS-SR-16 using data from a trial of psilocybin therapy versus escitalopram treatment for depression. J Psychopharmacol 2023; 37: 717–32.10.1177/02698811231167848CrossRefGoogle ScholarPubMed
Fried, EI, Flake, JK, Robinaugh, DJ. Revisiting the theoretical and methodological foundations of depression measurement. Nat Rev Psychol 2022; 1: 358–68.10.1038/s44159-022-00050-2CrossRefGoogle ScholarPubMed
Fried, EI. Moving forward: how depression heterogeneity hinders progress in treatment and research. Expert Rev Neurother 2017; 17: 423–5.10.1080/14737175.2017.1307737CrossRefGoogle Scholar
Hamilton, M. A rating scale for depression. J Neurol Neurosurg Psychiatry 1960; 23: 5662.10.1136/jnnp.23.1.56CrossRefGoogle ScholarPubMed
Fried, EI. The 52 symptoms of major depression: lack of content overlap among seven common depression scales. J Affect Disord 2017; 208: 191–7.10.1016/j.jad.2016.10.019CrossRefGoogle ScholarPubMed
Young, AH, Moulton, CD. Antidepressants do work after all. J Psychopharmacol 2020; 34: 1071–3.10.1177/0269881120933127CrossRefGoogle Scholar
Hieronymus, F, Emilsson, JF, Nilsson, S, Eriksson, E. Consistent superiority of selective serotonin reuptake inhibitors over placebo in reducing depressed mood in patients with major depression. Mol Psychiatry 2016; 21: 523–30.10.1038/mp.2015.53CrossRefGoogle ScholarPubMed
østergaard, SD, Bech, P, Miskowiak, KW. Fewer study participants needed to demonstrate superior antidepressant efficacy when using the Hamilton melancholia subscale (HAM-D6) as outcome measure. J Affect Disord 2016; 190: 842–5.10.1016/j.jad.2014.10.047CrossRefGoogle Scholar
Ostergaard, SD, Bech, P, Trivedi, MH, Wisniewski, SR, Rush, AJ, Fava, M. Brief, unidimensional melancholia rating scales are highly sensitive to the effect of citalopram and may have biological validity: implications for the research domain criteria (RDoC). J Affect Disord 2014; 163: 1824.10.1016/j.jad.2014.03.049CrossRefGoogle Scholar
Targum, SD, Sauder, C, Evans, M, Saber, JN, Harvey, PD. Ecological momentary assessment as a measurement tool in depression trials. J Psychiatr Res 2021; 136: 256–64.10.1016/j.jpsychires.2021.02.012CrossRefGoogle ScholarPubMed
Malhi, GS, Hamilton, A, Morris, G, Mannie, Z, Das, P, Outhred, T. The promise of digital mood tracking technologies: are we heading on the right track? Evid Based Ment Health 2017; 20: 102–7.10.1136/eb-2017-102757CrossRefGoogle ScholarPubMed
Pfennings, L, Cohen, L, van der Ploeg, H. Preconditions for sensitivity in measuring change: visual analogue scales compared to rating scales in a Likert format. Psychol Rep 1995; 77: 475–80.10.2466/pr0.1995.77.2.475CrossRefGoogle Scholar
Fähndrich, E, Linden, M. Zur reliabilität und validität der stimmungsmessung mit der visuellen analog-skala (VAS) [Reliability and validity of the Visual Analogue Scale (VAS)]. Pharmacopsychiatria 1982; 15: 90–4.10.1055/s-2007-1019515CrossRefGoogle ScholarPubMed
Malpass, A, Wiles, N, Dowrick, C, Robinson, J, Gilbody, S, Duffy, L, et al. Usefulness of PHQ-9 in primary care to determine meaningful symptoms of low mood: a qualitative study. Br J Gen Pract 2016; 66: e7884.10.3399/bjgp16X683473CrossRefGoogle ScholarPubMed
Cameron, IM, Reid, IC, Lawton, K. PHQ-9: sensitivity to change over time. Br J Gen Pract 2010; 60: 535–6.10.3399/bjgp10X514909CrossRefGoogle ScholarPubMed
Zealley, AK, Aitken, RCB. Measurement of mood. Proc R Soc Med 1969; 62: 993–6.Google ScholarPubMed
Huang, Z, Kohler, IV, Kämpfen, F. A single-item visual analogue scale (VAS) measure for assessing depression among college students. Community Ment Health J 2020; 56: 355–67.10.1007/s10597-019-00469-7CrossRefGoogle ScholarPubMed
Ahearn, EP. The use of visual analog scales in mood disorders: a critical review. J Psychiatr Res 1997; 31: 569–79.10.1016/S0022-3956(97)00029-0CrossRefGoogle ScholarPubMed
Bauer, MS, Crits Christoph, P, Ball, WA, Dewees, E, Mcallister, T, Alahi, P, et al. Independent assessment of manic and depressive symptoms by self-rating: scale characteristics and implications for the study of mania. Arch Gen Psychiatry 1991; 48: 807–12.10.1001/archpsyc.1991.01810330031005CrossRefGoogle Scholar
Moulton, CD, Strawbridge, R, Tsapekos, D, Oprea, E, Carter, B, Hayes, C, et al. The Maudsley 3-item Visual Analogue Scale (M3VAS): validation of a scale measuring core symptoms of depression. J Affect Disord 2021; 282: 280–3.10.1016/j.jad.2020.12.185CrossRefGoogle ScholarPubMed
Hampsey, E, Meszaros, M, Skirrow, C, Strawbridge, R, Taylor, RH, Chok, L, et al. Protocol for Rhapsody: a longitudinal observational study examining the feasibility of speech phenotyping for remote assessment of neurodegenerative and psychiatric disorders. BMJ Open 2022; 12: e061193.10.1136/bmjopen-2022-061193CrossRefGoogle ScholarPubMed
Middag, ME, Silman, D, Cleare, A, Young, A, Carter, B, Strawbridge, R. The Maudsley 3-item visual analogue scale (M3VAS): longitudinal validation of a new measure to capture depression. medRxiv 2023.Google Scholar
Mokkink, LB, Prinsen, CAC, Patrick, DL, Alonso, J, Bouter, LM, de Vet, HCW, et al. COSMIN Study Design checklist for Patient-Reported Outcome Measurement Instruments. COSMIN, 2019 (https://www.cosmin.nl/wp-content/uploads/COSMIN-study-designing-checklist_final.pdf [cited 23 Apr 2025]).Google Scholar
Sheehan, DV, Lecrubier, Y, Sheehan, KH, Amorim, P, Janavs, J, Weiller, E, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry 1998; 59: 2257.Google ScholarPubMed
Busner, J, Targum, SD. The clinical global impressions scale: applying a research tool in clinical practice. Psychiatry (Edgmont) 2007; 4: 2837.Google ScholarPubMed
Kroenke, K, Spitzer, RL, Williams, JBW. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001; 16: 606–13.10.1046/j.1525-1497.2001.016009606.xCrossRefGoogle ScholarPubMed
Bartlett, MS. The effect of standardization on a χ2 approximation in factor analysis. Biometrika 1951; 38: 337–44.Google Scholar
Guttman, L. A new approach to factor analysis: the Radex. In Mathematical Thinking in the Social Sciences (ed. PF Lazarsfeld): 258348. Free Press, 1954.Google Scholar
Cronbach, LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951; 16: 297334.10.1007/BF02310555CrossRefGoogle Scholar
Tavakol, M, Dennick, R. Making sense of Cronbach’s alpha. Int J Med Educ 2011; 2: 53–5.10.5116/ijme.4dfb.8dfdCrossRefGoogle ScholarPubMed
Pedhazur, EJ, Schmelkin, LP. Measurement, Design, and Analysis: an Integrated Approach Student ed. Lawrence Erlbaum Associates, 1991.Google Scholar
Mokkink, LB, Elsman, E, Terwee, CB, Prinsen, CAC, Patrick, DL, Alonso, J, et al. Conducting a Systematic Review of Patient-Reported Outcome Measures. COSMIN, 2025 (https://www.cosmin.nl/wp-content/uploads/COSMIN-manual-V2_final.pdf [cited 17 Apr 2025]).Google Scholar
Mokkink, L, Terwee, C, de Vet, H. Key concepts in clinical epidemiology: responsiveness, the longitudinal aspect of validity. J Clin Epidemiol 2021; 140: 159–62.10.1016/j.jclinepi.2021.06.002CrossRefGoogle ScholarPubMed
Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol 2013; 4: 62627.10.3389/fpsyg.2013.00863CrossRefGoogle Scholar
David, HA, Gunnink, JL. The paired t test under artificial pairing. Am Stat 1997; 51: 912.Google Scholar
Cohen, J. Statistical Power for the Behaviour Sciences. Laurence Erlbaum and Associates, 1977.Google Scholar
Suzuki, A, Aoshima, T, Fukasawa, T, Yoshida, K, Higuchi, H, Shimizu, T, et al. A three-factor model of the MADRS in major depressive disorder. Depress Anxiety 2005; 21: 95–7.10.1002/da.20058CrossRefGoogle ScholarPubMed
Sokero, TP, Melartin, TK, Rytsälä, HJ, Leskelä, US, Lestelä-Mielonen, PS, Isometsä, ET. Prospective study of risk factors for attempted suicide among patients with DSM–IV major depressive disorder. Br J Psychiatry 2005; 186: 314–18.10.1192/bjp.186.4.314CrossRefGoogle ScholarPubMed
Kennedy, SH. Core symptoms of major depressive disorder: relevance to diagnosis and treatment. Dialogues Clin Neurosci 2008; 10: 271–7.10.31887/DCNS.2008.10.3/shkennedyCrossRefGoogle ScholarPubMed
Nutt, DJ. Relationship of neurotransmitters to the symptoms of major depressive disorder. J Clin Psychiatry 2008; 69(suppl E1): 17337.Google Scholar
Chae, D, Lee, J, Lee, EH. Internal structure of the patient health questionnaire-9: a systematic review and meta-analysis. Asian Nurs Res (Korean Soc Nurs Sci) 2025; 19: 112.Google ScholarPubMed
Krause, JS, Reed, KS, McArdle, JJ. Factor structure and predictive validity of somatic and nonsomatic symptoms from the patient health questionnaire-9: a longitudinal study after spinal cord injury. Arch Phys Med Rehabil 2010; 91: 1218–24.10.1016/j.apmr.2010.04.015CrossRefGoogle ScholarPubMed
Wang, Y, Liang, L, Sun, Z, Liu, R, Wei, Y, Qi, S, et al. Factor structure of the patient health questionnaire-9 and measurement invariance across gender and age among Chinese university students. Medicine 2023; 102: E32590.10.1097/MD.0000000000032590CrossRefGoogle ScholarPubMed
Gunzler, D, Sehgal, AR, Kauffman, K, Davey, CH, Dolata, J, Figueroa, M, et al. Identify depressive phenotypes by applying RDOC domains to the PHQ-9. Psychiatry Res 2020; 286: 112872.10.1016/j.psychres.2020.112872CrossRefGoogle Scholar
Bekhuis, E, Boschloo, L, Rosmalen, JGM, de Boer, MK, Schoevers, RA. The impact of somatic symptoms on the course of major depressive disorder. J Affect Disord 2016; 205: 112–8.10.1016/j.jad.2016.06.030CrossRefGoogle ScholarPubMed
Stegenga, BT, Kamphuis, MH, King, M, Nazareth, I, Geerlings, MI. The natural course and outcome of major depressive disorder in primary care: the PREDICT-NL study. Soc Psychiatry Psychiatr Epidemiol 2012; 47: 8795.10.1007/s00127-010-0317-9CrossRefGoogle ScholarPubMed
Shell, AL, Williams, MK, Patel, JS, Vrany, EA, Considine, RV, Acton, AJ, et al. Associations of somatic depressive symptoms with body mass index, systemic inflammation, and insulin resistance in primary care patients with depression. J Behav Med 2022; 45: 882–93.10.1007/s10865-022-00356-9CrossRefGoogle ScholarPubMed
Angst, F. The new COSMIN guidelines confront traditional concepts of responsiveness. BMC Med Res Methodol 2011; 11: 152.10.1186/1471-2288-11-152CrossRefGoogle ScholarPubMed
Parkes, MJ, Callaghan, MJ, O’Neill, TW, Forsythe, LM, Lunt, M, Felson, DT. Sensitivity to change of patient-preference measures for pain in patients with knee osteoarthritis: data from two trials. Arthritis Care Res (Hoboken) 2016; 68: 1224–31.10.1002/acr.22823CrossRefGoogle ScholarPubMed
Mokkink, LB, de Vet, HCW, Prinsen, CAC, Patrick, DL, Alonso, J, Bouter, LM, et al. COSMIN Risk of Bias checklist for systematic reviews of Patient-Reported Outcome Measures. Qual Life Res 2018; 27: 1171–9.10.1007/s11136-017-1765-4CrossRefGoogle ScholarPubMed
Figure 0

Table 1 Participant characteristics and symptom scores

Figure 1

Fig. 1 Scatter plot correlating total outcome measure scores for M3VAS and PHQ-9. All time point data were pooled (i.e. to include paired data). R2, coefficient of determination for fit line (solid), with dashed lines indicating 95% confidence interval. M3VAS, Maudsley three-item visual analogue scale for depression; PHQ-9, Patient Health Questionnaire 9.

Figure 2

Table 2 Convergent validity and responsiveness of M3VAS against PHQ-9 and its derivatives

Figure 3

Table 3 Score changes over time for M3VAS, PHQ-9 and PHQ-D and associated effect sizes and effect of time

Supplementary material: File

Silman et al. supplementary material

Silman et al. supplementary material
Download Silman et al. supplementary material(File)
File 39 KB
Submit a response

eLetters

No eLetters have been published for this article.