Do you use language differently when you're depressed?
In 2001 Stirman et al compared the collected works of nine poets who eventually committed suicide and nine poets who didn't (as a control set). Their theory was that the depressed (and eventually suicidal) poets would use more first person singular (I, me, my) and words related to hopelessness and desperation (hate, worthless, death, grave) and that was supported by the data.
Rude et al later found something similar when they compared essays (on a common topic - "coming to college") written by college students. Depressed students used "I" and negative words significantly more often than controls.
Interestingly Oxman et al has found that spoken language patterns can be a good discriminator for classifying patients as depressed or not, so it's not just written language use that may be different.
Anyway, at ICWSM Nairán Ramírez-Esparza from the University of Texas presented a language analysis of some depression discussion boards on About.com. She ran a two part study: the first to confirm Stirman and Rude's findings and the second making use of the fact that the About.com boards are bilingual (there's a Spanish section too) to see how different cultures talk about depression.
Her approach was pretty simple - she collected ~ 400 posts from the depression forum and 400 posts from a breast cancer forum as a control, broke each post down into single words and then used off-the-shelf software to classify them (as verb, adjective, pronoun, positive emotion, negative emotion, etc.). She did this for both English and Spanish sections of the site.
Her results seemed to confirm the earlier studies: first person pronouns were found three times more frequently in the depression forum posts than in the controls and words relating to negative emotions occurred four times as frequently. This was true for both English and Spanish datasets.
The second part of her study was to see if English and Spanish speakers approach depression differently; what do they talk about? She studied this by using normalized word frequency counts then grouping different words into themes.
The top five themes discussed in the English dataset:
Treatment (medicine, doctor, therapist...)
Disclosure (tell, discuss, talk...)
Family (mom, dad, brother, sister...)
Symptoms ...
School
And the top five themes from the Spanish dataset:
Family
Relationship history
Hopelessness
School
Treatment
I'm a bit suspicious of results that are so intuitively appealing (family and romance are more important to Spanish people?). One thing that I did wonder was how much the results are skewed by different community expectations: if you visit a discussion forum where people are sharing stories about their depression and everybody else mentions their family maybe you feel compelled to mention your family too. Maybe the English language forums are dominated by a younger age group and so older visitors shy away, or v.v.
Anyway, it was interesting stuff. Somebody in the audience wondered aloud if this means that you could build a system to identify people at risk of depression (or perhaps more to the point suicide) by analyzing their language online. Maybe this could be built into the next version of the anti-plagiarism software used in high schools and colleges (I'm not advocating that, just saying)...
