Hamilton Depression Rating Scale (HAM-D): clinical interview instead of self-report
HAM-D is not a questionnaire: why this changes everything
The Hamilton Depression Rating Scale (HAM-D, or HDRS) is the only widely used depression instrument that is not a self-report. It is a structured clinical interview: the clinician observes, asks questions, and rates severity based on what they see and hear. The client does not fill out a form — their condition is assessed by a trained observer.
Created by psychiatrist Max Hamilton in 1960, HAM-D became the gold standard of depression research. Over 2,000 randomized clinical trials of antidepressants have used it as the primary efficacy criterion. When pharmaceutical companies demonstrate to regulators that a drug works, they measure HAM-D. This is not coincidental: HAM-D captures what the client cannot always or will not describe themselves.
HAM-D measures what the clinician observes, not what the client reports. This fundamental distinction determines both what the scale can reveal and what it cannot.
HAM-D17 structure: three domains, 17 items
The standard HAM-D17 includes 17 items grouped into three domains: affective symptoms (depressed mood, guilt, suicidal ideation), neurovegetative symptoms (early/middle/late insomnia, appetite and weight changes, loss of libido), psychomotor disturbances (retardation and agitation). Each item is rated 0–2 or 0–4. Total score ranges from 0 to 52.
A score ≤7 in most protocols signifies remission — this is the target of antidepressant treatment. A 50% or greater reduction from baseline is defined as treatment response. These two criteria — "response" and "remission" — became the standard of clinical trials largely due to HAM-D.
What the clinician sees that a questionnaire cannot
The neurovegetative symptoms of depression — sleep disturbances, appetite and weight changes, loss of libido, psychomotor retardation — are poorly captured by self-reports. Clients may not associate them with depression ("I'm just not sleeping well"), may underestimate them, or may not be aware of them at all. A clinician conducting a HAM-D interview observes psychomotor retardation directly: they notice slowed speech and movement that the client does not describe and that will never appear in a PHQ-9 or BDI-II.
This is precisely why HAM-D often registers more severe depression than BDI-II in the same client — especially in melancholic presentations with prominent somatic symptoms. Conversely, in clients where cognitive symptoms dominate (hopelessness, self-blame, rumination), BDI-II may show greater suffering than HAM-D, since the latter devotes far fewer items to these components.
HAM-D + BDI-II: two views of the same depression
Combining HAM-D and BDI-II in measurement-based care yields something neither instrument provides alone: two independent views of depression. HAM-D is the clinician's view, BDI-II is the client's. When both indicate the same severity, it is a cross-confirmed clinical picture. When they diverge — that divergence is itself meaningful information.
The divergence between HAM-D and BDI-II is not a measurement error. It is clinical information. It means that the client and the clinician perceive the depression differently — and that gap itself calls for therapeutic attention.— Adapted from Rush et al., 2006; Uher et al., 2012
- HAM-D higher than BDI-II → neurovegetative/psychomotor symptoms dominate. Client underestimates severity or somatizes
- BDI-II higher than HAM-D → subjective suffering, self-blame, and cognitive symptoms dominate. Client experiences more than is "visible from the outside"
- Both decreasing in parallel → therapy working at both levels — optimal pattern
- HAM-D decreased, BDI-II unchanged → symptoms improved, but client still suffers — relapse risk if treatment ends prematurely
HAM-D over time: from response to remission
With repeated measurement during therapy, HAM-D provides clear clinical benchmarks. As early as 2–4 weeks after treatment onset, a 20–25% score reduction is an early predictor of final response. If reduction is less than 20% by week 4, the current medication or approach is likely not working for this client. This evidence allows clinicians to avoid waiting 12 weeks before adjusting strategy.
HAM-D is highly sensitive to change (effect size d = 0.6–1.2 in antidepressant research). This is precisely why it is used to demonstrate treatment efficacy. For the practitioner, this means: a 4–6 point change over 4–6 weeks is a clinically meaningful signal in either direction. A score dropping to ≤7 is remission as a goal, not just "feeling somewhat better."
When HAM-D is needed and when PHQ-9 is enough
HAM-D requires time and training. Conducting a 15–20 minute structured clinical interview before every session in a busy practice is not practical — PHQ-9 will take 3 minutes and yield a comparable screening result. But there are situations where HAM-D is irreplaceable: suspected melancholic depression with prominent neurovegetative symptoms; monitoring response to pharmacotherapy; when external confirmation is needed — when the client's self-report is questionable due to poor insight, severe agitation, or obvious symptom minimization.
HAM-D is not a replacement for self-reports, but a complement to them. Where PHQ-9 and BDI-II capture the client's voice, HAM-D adds the clinician's observation. The gap between these two perspectives is itself valuable clinical information about the nature of depression in a specific individual.