PHQ-9 and GAD-7: why these two questionnaires became the clinical gold standard
Why this pair became the standard
In 2001, Kurt Kroenke, Robert Spitzer, and Janet Williams published the PHQ-9 — a brief 9-item depression scale. Five years later, the same authors plus Bernd Löwe released the GAD-7 — a 7-item anxiety companion. Over two decades, this pair has almost entirely displaced longer scales in primary care. The USPSTF, NICE, HEDIS — every major clinical guideline now recommends them as first-line screening.
Three reasons converge. Brevity: 16 items total, 3–5 minutes to complete. Public domain: free to use, no licensing, translated into 80+ languages. Direct DSM alignment: the 9 PHQ-9 items are literally the 9 DSM-IV and DSM-5 criteria for major depressive disorder. GAD-7 was designed by the same team as a companion instrument, with the same scale (0–3) and same cutoff (≥10). What emerges is a single workflow, not two separate tools.
Both instruments are public domain. No permission is required — reproduction, translation, and digital integration are free as long as item wording is preserved. This is a key difference from the BDI-II, the Beck Anxiety Inventory, and the Maslach Burnout Inventory, which remain commercial.
What PHQ-9 measures
PHQ-9 is 9 items, each corresponding to one DSM-IV symptom of major depressive disorder. The assessment period is the past 2 weeks. The scale runs from 0 ("never") to 3 ("nearly every day"). Total score range: 0–27. The original validation by Kroenke et al. (2001) used a sample of 6,000 primary care patients. At a cutoff of ≥10, the instrument showed sensitivity of 88% and specificity of 88% against a blinded clinical interview — a rare convergence of two high parameters at once.
Eighteen years later, an individual participant data meta-analysis by Levis, Benedetti & Thombs (2019) in the BMJ replicated this result almost exactly: 58 studies, 17,357 participants, 2,312 cases of major depression. Against a semi-structured interview (the reference closest to clinical practice), sensitivity was 88%, specificity 85% at a cutoff of ≥10. The 2001 figures held up against the largest verification ever organized.
- Item 1: Little interest or pleasure in doing things — anhedonia
- Item 2: Feeling down, depressed, or hopeless — depressed mood
- Item 3: Trouble with sleep — sleep disturbance
- Item 4: Tiredness or little energy — fatigue
- Item 5: Poor appetite or overeating — appetite change
- Item 6: Feeling bad about oneself, guilt — worthlessness
- Item 7: Trouble concentrating — cognitive impairment
- Item 8: Moving slowly or being restless — psychomotor change
- Item 9: Thoughts of being better off dead — suicidal ideation, immediate clinical evaluation signal
"A PHQ-9 score of 10 or above had a sensitivity of 88% and a specificity of 88% for major depression. These characteristics plus its brevity make the PHQ-9 a useful clinical and research tool."— Kroenke, Spitzer & Williams, 2001, Journal of General Internal Medicine
What GAD-7 measures
GAD-7 was created 5 years after PHQ-9 by the same research group. An initial pool of 13 candidate items was reduced to 7 through factor analysis, item-total correlations, and mapping to DSM-IV criteria for GAD. The scale, assessment period, and cutoffs are identical to PHQ-9. Validation on a sample of 2,740 primary care patients (Spitzer et al., 2006) yielded sensitivity of 89% and specificity of 82% at a cutoff of ≥10.
An important property of GAD-7 that is often missed: it is validated not only for generalized anxiety disorder. Kroenke et al. (2007) in the Annals of Internal Medicine showed GAD-7 works as a screener for panic disorder, social phobia, and PTSD with AUC of 0.80–0.91. One brief questionnaire covers four anxiety syndromes. GAD-7 is not a narrow GAD instrument but a general anxiety screener. The first two GAD-7 items form GAD-2 — an ultra-brief two-item screen.
Thresholds and interpretation: same logic, two scales
The cutoffs of both instruments follow one principle and are easy to remember. 0–4 — minimal or none. 5–9 — mild. 10–14 — moderate. 15+ — severe (PHQ-9 adds a 20+ boundary for "very severe"). The clinical cutoff of ≥10 is the primary threshold for both instruments. Below — monitoring; above — intervention is indicated.
Item 9 of PHQ-9 deserves separate attention. It is a direct question about thoughts of death and self-harm. Any score above zero on this item requires immediate clinical risk assessment — regardless of the total score. This is a rule, not a recommendation: like item 10 in the EPDS, item 9 works as a standalone safety signal, independent of the total. GAD-7 has no equivalent item — if suicide risk is suspected in an anxious patient, add PHQ-9 or a specialized suicidality screening instrument.
PHQ-9 + GAD-7 over time: the MBC standard
The brevity of this pair is not just convenience — it is what makes weekly monitoring possible without overloading the client. Clients asked to complete the BDI-II or CORE-OM every week tend to resist. Three to five minutes of PHQ-9 + GAD-7 pass unnoticed. This is what turns the pair from a screening instrument into a measurement one: a series of readings shows the therapy trajectory in real time.
Löwe et al. (2004) established the minimal clinically important difference (MCID) for PHQ-9 at 5 points. For GAD-7 the commonly cited figure is approximately 4 points. This is a working benchmark for interpreting change: a decrease of 5+ PHQ-9 points from baseline is a clinically noticeable improvement, not measurement noise. MBC methodology involves plotting scores from session to session and discussing the graph with the client — which itself has therapeutic effect, independent of the therapy content.
PHQ-9 and GAD-7 are not "another test" but the backbone of repeated measurement in clinical practice. Completion before each session, discussion of the graph with the client, plan adjustment if scores have not dropped over 4–6 sessions. It is the series, not a single measurement, that turns a questionnaire into a working MBC tool.
Limitations and when to add other instruments
The pair has boundaries. PHQ-9 is focused on the 9 DSM criteria for major depression and does not cover the spectrum of depressive phenomena better captured by BDI-II (self-deprecation, sense of punishment, body image change). GAD-7 covers cognitive anxiety and worry but is less sensitive to somatic anxiety with pronounced physiological arousal — here BAI or HADS-A is more precise. For patients with predominantly somatic complaints or in medical settings, HADS is often preferable, specifically designed with somatic items excluded.
- BDI-II — for more detailed depression assessment or a wider severity range (0–63 vs PHQ-9's 0–27)
- BAI — when primary complaints are somatic: palpitations, dizziness, panic
- HADS — in medical settings where physical symptoms may confound PHQ-9 and GAD-7
- DASS-21 — when a third construct (stress) is needed alongside depression and anxiety
- EPDS — in the perinatal period, where PHQ-9 somatic items produce false positives
A gold standard is not "the best instrument for every case" but "the instrument to use by default." PHQ-9 + GAD-7 cover 80% of screening tasks in primary care and psychotherapy practice. Specialized instruments exist for the remaining 20%. The real strength of the pair emerges not from a single reading but from the series — when the client sees their own graph, and the clinician has objective data for clinical decisions. This is what turns 16 brief items into the foundation of measurement-based care.