Category: eHealth Data

Characterising the Speech of People with Mental Illness

This year, my contribution to Psychonomics 2017 is in collaboration with Kristin Nicodemus of the University of Edinburgh and Alex Cohen of Louisiana State University.

My main research interest is to support people with chronic conditions. I also have a longstanding interest in the complex information that people convey in their speech and language – both intentionally, a signals to others, and unintentionally, as an expression of their socialisation, their anatomy, their physiology, and their current health.

This piece of work brings both together. Alex Cohen has a large collection of speech samples (17K+) from people with varying mental health conditions, which were analysed using a standard set of 88 features that have been used to describe aspects of speech and voice that are relevant to expressing emotion, the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). GeMAPS is attractive, because it represents a consensus from the top researchers and practitioners in the field, and it comes with open source software for extracting those parameters from the speech signal.

GeMAPS contains 88 features and has been used mainly for classification of large data sets, but for smaller studies, it can be tricky to manage. Using principal component analysis in R, we reduced GeMAPS to a smaller set  of  features that are relatively easy to interpret from a phonetic point of view.

Using this reduced feature set, we’ve been able to identify distinct acoustic traces in the speech of people who have a history of depression and the speech of people who have a history of psychosis. These traces on their own are not enough to spot or diagnose mental illness or a history thereof, because they can be caused by many different factors. Instead, they reflect small, subtle changes, one of many traces that a person’s mental health leaves in their behaviour.

PDF of the poster:

Psychonomics 2017 PDF


Cohen A, Elvevåg B. 2014. Automated computerized analysis of speech disturbances in psychiatric disorders. Curr Opin Psychiatry 27:203–209

Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF. 2015. A review of depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49

Elvevåg B, Cohen AS, Wolters MK, Whalley HC, Gountouna V-E, Kuznetsova KA, Watson AR, Nicodemus KK. 2016. An Examination of the Language Construct in NIMH’s Research Domain Criteria: Time for Reconceptualization!. Am J Med Genet Part B 171B:904–919

Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, Sanislow C, Wang P. 2010. Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders. Am J Psychiatry 167:748–751.

RDoC web page: Cognitive Systems / Language

The data used in this study come from these studies:

Cohen, A. S., Dinzeo, T. J., Donovan, N. J., Brown, C. E., & Morrison, S. C. (2015). Vocal acoustic analysis as a biometric indicator of information processing: Implications for neurological and psychiatric disorders. Psychiatry Research, 226(1), 235–241.

Cohen, A. S., Mitchell, K. R., Docherty, N. M., & Horan, W. P. (2016). Vocal Expression in Schizophrenia: Less Than Meets the Ear. Journal of Abnormal Psychology, 125(2), 299–309.

Cohen, A. S., Renshaw, T. L., Mitchell, K. R., & Kim, Y. (2016). A psychometric investigation of “macroscopic” speech measures for clinical and psychological science. Behavior Research Methods, 48(2), 475–486.



Pint of Science: Oh Data, Where Art Thou?

In this post, I provide some background on the health data talk I gave on May 15, 2017, at Pint of Science, Edinburgh. (Slides)

The central argument of the talk is that any data we collect about health and wellbeing have no meaning in themselves – they need to be interpreted in context. Take step counts, for example. Measuring step counts is a somewhat inexact science, because the signals picked up by the accelerometers in a phone or a dedicated pedometer or actigraph need to be converted into the metric of steps (Berendsen et al, 2014; Fulk et al, 2014). Rating threads about pedometers like the FitBit or Jawbone often contain disappointed comments about bad measurements (too many steps counted, too few steps counted, failure to detect stair climbing).

Step counts also need to be interpreted in the context of the person who is taking the steps. 6000 steps in a day is impressive for somebody who barely walks, but an indication of a lazy day for somebody who usually averages 10000 or more.

So, we need to bear two contexts in mind if we want to interpret objective data such as step counts, the context of measurement in which the data were acquired, and the context of the person who generated the data.

When estimating the probability p(cause | symptom) that somebody has a certain condition, such as depression, given the signs they exhibit, such as activity levels measured in step counts, it’s worth considering several related probabilities:

  • p(symptom). The probability that somebody exhibits the symptom. If the symptom is very common, it’s unlikely to be a strong indicator for the cause, especially if it can have multiple causes. A classic example is the humble cough, which can be a sign of the common cold or an indicator of lung cancer.
  • p(cause). The probability that the cause occurs. This is the old adage “When you hear hoofbeats, think horses, not zebras.” Unfortunately, rare diseases are more frequent than one might think.
  • p(symptom | cause). When you look at the diagnostic criteria for most illnesses, you will often find a list of several symptoms, together with the qualification “if two or more of these indicators are present, then …”

Even worse, diseases commonly occur together (Mokraoui et al., 2016), and some of these may have overlapping symptoms.

So, what should we do when we read about yet another algorithm that can diagnose depression? First of all, every diagnosis, in particular when it comes from algorithms, should be treated as a working hypothesis. In fact, some diseases, such as dementia, can only be diagnosed with absolute certainty after a person has died and their brain has been autopsied (Toledo et al., 2013). Secondly, even if the measurements we take are objective and repeatable, we can only make sense of them in the context in which they were taken, which includes both the person and the (measurement) process.

What do you think – is objectivity possible? Am I too pessimistic?


Berendsen, B. A., Hendriks, M. R., Meijer, K., Plasqui, G., Schaper, N. C., & Savelberg, H. H. (2014). Which activity monitor to use? Validity, reproducibility and user friendliness of three activity monitors. BMC Public Health, 14(1), 749.

Fulk, G. D., Combs, S. A., Danks, K. A., Nirider, C. D., Raja, B., & Reisman, D. S. (2014). Accuracy of 2 activity monitors in detecting steps in people with stroke and traumatic brain injury. Physical Therapy, 94(2), 222–9.

Mokraoui, N.-M., Haggerty, J., Almirall, J., & Fortin, M. (2016). Prevalence of self-reported multimorbidity in the general population and in primary care practices: a cross-sectional study. BMC Research Notes, 9(1), 314.

Toledo, J. B., Van Deerlin, V. M., Lee, E. B., Suh, E., Baek, Y., Robinson, J. L., … Trojanowski, J. Q. (2013). A platform for discovery: The University of Pennsylvania Integrated Neurodegenerative Disease Biobank. Alzheimer’s & Dementia, null(null).

What Big Data Can Tell You About Useful mHealth

Maria Wolters, Alan Turing Institute / University of Edinburgh and Henry Potts, University College London
mHealth that Works
“If the user can’t use it, it doesn’t work at all.” This is how Susan Dray summarises her decades of user ex
perience work with clients around the world. If we want to harness the promise of Big Data to draw conclusions about the usability and usefulness of an mHealth app, Dray’s Law is an ideal starting point, because it givesus the fundamental variable we need to measure – how often people use an app.

An mHealth app can only work as intended if people use it, and if they keep using it over the intended period of time. Take food diary apps, such as the ever popular MyFitnessPal. If people don’t open it and log their food, it is of no use.  While regular use is necessary for an app to fulfill its purpose, it is not sufficient. For example, people may only record meals in MyFitnessPal that conform to guidelines and fail to log sweet or fatty foods, or they may use MyFitnessPal to support an eating disorder. Both of these patterns of using the app are contrary to the original goal, which is to help people reach and maintain a healthy bodyweight.

As app analytics 101 tells us, in order to get a good picture of app use, it is not enough to just aggregate the number of downloads, the number of reviews, and the app ratings themselves.

Metrics to Evaluate By

How can app developers achieve that? First of all, developers need to be clear about the time frame for using an app. Stop smoking apps have a natural endpoint – when users feel that they have been successful in kicking the habit. Weight management apps such as MyFitnessPal also often have natural end points (when the goal weight has been reached and maintained), but can be used long-term for people who want to maintain their goal weight or gain and lose weight depending on their sport.

We also need to acknowledge that this time frame can vary from person to person. A person who wants to lose over 20% of their bodyweight is looking at months and years of regular use, while somebody who wants to lose a couple of pounds might be done in a month.

Finally, in order to use the app meaningfully, people will need to spend a certain minimal amount of time in it – be it to track their mood, check the remaining calories or steps for the day, or enter a meal.

With these considerations out of the way, let’s look at the key indicators that can help us leverage Big Data to assess the usefulness of mHealth apps.

Number of Unique Active Users

Do people use your mHealth app once they have downloaded it? Whether this is the number of daily, weekly, or monthly users (or a combination of the three) depends on the goal of your app, but at least one of these numbers should be tracked regularly.

Session Frequency

Do people use your app as often as they should in order to get a benefit? How many of your active users are regulars? Again, the target depends on the goal of your app.

Time in App

How long do people actively spend in your app? Is this long enough to do something meaningful? In a second step, you can track what people actually do in the app, but time itself is a useful, if crude, approximation.

Retention Rate

Do people stick with your app for the amount of time they need to see a difference? If your app is about smoking cessation, you have a problem if people return to your app for years in yet another doomed attempt to kick the cigarettes, but if your app is about helping people maintain a healthy bodyweight, retention over months and years is good.

From Small Data to Big Data

As you start out with a great idea  and a small app, the data streams we have described above will be small and easy to manage. But if you believe in the promise of your app, and keep tracking, hopefully these data streams will grow and allow you to learn more about your customers, their habits, and the innovative ways in which they use your app.

What data streams do you use to measure whether people are actually using your app? What are the benefits and pitfalls you have discovered?