Author: Maria Wolters

Characterising the Speech of People with Mental Illness

This year, my contribution to Psychonomics 2017 is in collaboration with Kristin Nicodemus of the University of Edinburgh and Alex Cohen of Louisiana State University.

My main research interest is to support people with chronic conditions. I also have a longstanding interest in the complex information that people convey in their speech and language – both intentionally, a signals to others, and unintentionally, as an expression of their socialisation, their anatomy, their physiology, and their current health.

This piece of work brings both together. Alex Cohen has a large collection of speech samples (17K+) from people with varying mental health conditions, which were analysed using a standard set of 88 features that have been used to describe aspects of speech and voice that are relevant to expressing emotion, the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). GeMAPS is attractive, because it represents a consensus from the top researchers and practitioners in the field, and it comes with open source software for extracting those parameters from the speech signal.

GeMAPS contains 88 features and has been used mainly for classification of large data sets, but for smaller studies, it can be tricky to manage. Using principal component analysis in R, we reduced GeMAPS to a smaller set  of  features that are relatively easy to interpret from a phonetic point of view.

Using this reduced feature set, we’ve been able to identify distinct acoustic traces in the speech of people who have a history of depression and the speech of people who have a history of psychosis. These traces on their own are not enough to spot or diagnose mental illness or a history thereof, because they can be caused by many different factors. Instead, they reflect small, subtle changes, one of many traces that a person’s mental health leaves in their behaviour.

PDF of the poster:

Psychonomics 2017 PDF


Cohen A, Elvevåg B. 2014. Automated computerized analysis of speech disturbances in psychiatric disorders. Curr Opin Psychiatry 27:203–209

Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF. 2015. A review of depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49

Elvevåg B, Cohen AS, Wolters MK, Whalley HC, Gountouna V-E, Kuznetsova KA, Watson AR, Nicodemus KK. 2016. An Examination of the Language Construct in NIMH’s Research Domain Criteria: Time for Reconceptualization!. Am J Med Genet Part B 171B:904–919

Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, Sanislow C, Wang P. 2010. Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders. Am J Psychiatry 167:748–751.

RDoC web page: Cognitive Systems / Language

The data used in this study come from these studies:

Cohen, A. S., Dinzeo, T. J., Donovan, N. J., Brown, C. E., & Morrison, S. C. (2015). Vocal acoustic analysis as a biometric indicator of information processing: Implications for neurological and psychiatric disorders. Psychiatry Research, 226(1), 235–241.

Cohen, A. S., Mitchell, K. R., Docherty, N. M., & Horan, W. P. (2016). Vocal Expression in Schizophrenia: Less Than Meets the Ear. Journal of Abnormal Psychology, 125(2), 299–309.

Cohen, A. S., Renshaw, T. L., Mitchell, K. R., & Kim, Y. (2016). A psychometric investigation of “macroscopic” speech measures for clinical and psychological science. Behavior Research Methods, 48(2), 475–486.



Writing the Introduction to Your Thesis

The introduction is what people will read first – and it’s always important to make a good first impression. In terms of telling the story of your research, the introduction sets up the main themes and lets the reader know what they should expect.

What is the Problem, and Why does it Matter?

This is what the first one or two paragraphs of the introduction should tell the reader.

Usually, the two questions are answered together. In many of the scientific papers you read, you will find that the general context is described in the first couple of sentences, followed by the problem that arises in the context. For both the context and the problem, authors will usually give references to the scientific literature which explain them in more detail.

Example: Context: Older people are more likely to have several chronic illnesses, and they need to take multiple medications. Remembering those medications is difficult. Problem: Older people often have impaired vision, and so need reminders that are easy to see.

You need to make sure that the problem is the one that your thesis actually addresses. Very often, people set out to solve a large problem (e.g., create medication reminders for everyone), only to find that they need to address a smaller problem first (e.g., create a reminder system that is easy to use for people with visual impairment).

As you explain your problem and its context, you will also mention the people for whom the problem matters in particular. In our example, this would be older people with visual impairments.

How did you Solve the Problem?

Usually, there are many ways to address a given problem. This is particularly true in Human Computer Interaction and Health Informatics, which use methods from disciplines that range from Computer Science to Anthropology. By outlining your approach, you tell the reader what kind of paper to expect, with what kind of research traditions you align yourself, and what methods you are likely to use.


  • I conducted field research with older people to determine what strategies they use to create easily visible reminders, and distilled the results of my research into design guidelines. (Ethnographic, Design angle)
  • I created an app that automatically recognises medication information from prescriptions and feeds the information into an existing reminder app that is for people with low vision. (Computer Science, Optical Character Recognition (OCR) / Computer Vision)

How are you Building on Previous Work?

Since you are usually not the first person to attempt to solve this problem, there should be plenty of other work you can draw on. While the introduction is not the place to conduct a detailed literature review, you can still let you reader know what the main theoretical approach or methodological approach is that you used. Tools are not methods. Using NVivo is not an analysis methodology, thematic analysis or grounded theory is. Using python with scikit-learn is not a machine learning methodology, support vector machines or recurrent neural networks are.

Does it Work?

Perhaps the biggest misconception in Masters and Honours students is that a solution has to work flawlessly, or an experiment has to confirm a hypothesis, for the thesis to be good. Some bugs are inevitable, sometimes, users turn up their noses at the most lovingly created apps and visualisations, and some experiments just won’t turn out the way you want them to. That’s life. However, you do have to have solid methods in place to determine whether your solution or your app works, whether your experiment was sound, whether the conclusions you draw from your field work and qualitative research are valid. In your introduction, you should state what these methods were, and refer to key papers on the methods.


  • I analysed field notes and pictures using grounded theory. Themes were validated through conversation and co-analysis with my supervisor and through discussion with my participants. New themes still emerged from the last piece fieldwork, so I was unable to achieve saturation in the time period available. (Saturation in Grounded Theory means that no new themes come up in your data collection.)
  • I extended an existing OCR solution by using algorithms X, Y, and Z. I trained on data set D and tested my solution with actual prescriptions provided by five older people. The baseline OCR solution outperformed X, Y, and Z.

The Actual Structure of the Introduction Chapter

In the previous sections, we’ve looked at the content of your introduction. A good structure is Motivation (where you talk about the problem, why it matters, and refer to previous work, if necessary) Aims (where you talk about your approach and link it to other approaches, if necessary), Research Questions (concise) and structure of the thesis, where you list each chapter and the rough contents.

Be Short and Sharp!

The most important advice for the introduction is to keep it relatively short, and point forward to other parts of the thesis were you will provide more detail about relevant aspects of your work. The Introduction should be a couple of pages, an initial overview of what the reader is about to encounter, that helps them situate your work in its context (theory, approaches, methods, applications), before you dive deep into the fine details.

Good luck!

Structuring Your Dissertation

The Single Key Tip for Structuring Your Dissertation

With your dissertation, you tell a story about the research journey you undertook during your work on your thesis. It’s a story about what you set out to find, how you investigated your topic, and what you found out in the end. This gives you three key components:

  1. Your starting point
  2. Your path
  3. Your end point, and where to go from here

It’s important that you describe all three points in as much detailed as possible so that people can follow your journey. Take a journey from Edinburgh to London, for example. If you want to describe the journey in detail, you need to say where you set out, what means of transport you used, and where exactly you went. You should also add a reflection to evaluate what you did.

For example, I went from the Informatics Forum to the Alan Turing Institute in London by train. First, I walked from the Forum to Waverley Station for ten minutes. I walked via George IV Bridge, as that path has fewer slow traffic lights. I then took the fast Virgin Trains service directly to London Kings’ Cross, which takes 4 hours 20 minutes. We had a signal failure on the way and got stuck behind a slow train, so we ended up being 25 minutes late. When I arrived at King’s Cross, I turned right, left the station, crossed St Pancras Station behind the Eurostar Terminal, and walked down to the staff entrance of the British Library, where the Alan Turing Institute is located. Overall, it was an efficient way of travelling, but it’s worth adding a buffer of 30-60 minutes to your travel time because trains are often late.

(You see that I referenced all locations, except for well-known ones like Edinburgh and London. Detailed referencing is important – this is how you show your reader that you are aware of what is already known about your topic.)

What does this look like in practice?

Common Elements

Well, all of you should start with an Introduction that motivates your work (why are you doing this), and a brief overview of what you set out to achieve (your research questions / the problem that your app or artefact is designed to address), what your main methods were, and what you have ended up creating – a kind of summary of the story in a couple of pages. Ideally, you end with a list of the following sections and the parts of the story that are told in them in more depth.

Next, you should have a Background or Literature Review chapter. In the background chapter, you describe the starting point of your investigation. This is where you review the relevant theory and previous work. You should aim to show that you have a good grasp of the concepts that you use in your work, explain relevant theories as needed, and tell us about who else has done similar work, and what they found.

After that, thesis structures start to diverge, but the final chapters are again always similar. Discuss your overall journey, reflect on what you have learned, on the main decisions you made, and tell us what could be done next. This is typically done in two chapters, one called Discussion, where you summarise your findings and relate them to what is known in the literature, and Conclusion, where you reflect on your journey and describe possible next steps. The Discussion is a good place to bring in aspects of the literature or previous work that tie in to your findings.

Telling the Tale of Your Work

Design Informatics

In Design Informatics dissertations, you have more freedom to explore different designs, and to show how the theory you’ve read, the theoretical stance that you take, and the research you’ve done have affected your design.

You will typically do one or more iterations of your work. You will do the bulk of your user research for the first iteration, and that usually deserves its own chapter, which you can call User Research or User Requirements. Then, you create your first prototype, and evaluate it. That can be in one chapter, Prototype 1 or just Prototype. If you have time, you will do another prototype and evaluate it, and that’s again a chapter.


Well, you’ve got it easy. If you are doing a traditional experiment, the American Psychological Association has you covered. For each of the experiments you do, including pilot studies, you follow the time honoured structure of Method, Results, Discussion.

Informatics App Development

The strategy I recommend for you is similar to the one for Design Informatics students. In your case, you may have an initial section on User Requirements, followed by iterations of your app. I recommend at least one initial prototype. This can be wireframes only, but it’s a good way of getting feedback from your end users. Your final, working app should always be evaluated with end users. So, your middle chapters will typically be User Requirements, Prototype (Design, Implementation, Feedback), Final Prototype (Design, Implementation), and Evaluation (for the final prototype).

Make sure that your Prototype chapters contain a formal description of your app – a flowchart, an UML diagram, specifications for all the databases you use. Provide pseudocode for your algorithms (not actual implementations) and describe key implementation decisions. Which libraries did you use? Why?


This page is by no means exhaustive. It’s meant to complement the Informatics MSc Dissertation guidelines, and when in doubt, the official guidelines have precedence.

Pint of Science: Oh Data, Where Art Thou?

In this post, I provide some background on the health data talk I gave on May 15, 2017, at Pint of Science, Edinburgh. (Slides)

The central argument of the talk is that any data we collect about health and wellbeing have no meaning in themselves – they need to be interpreted in context. Take step counts, for example. Measuring step counts is a somewhat inexact science, because the signals picked up by the accelerometers in a phone or a dedicated pedometer or actigraph need to be converted into the metric of steps (Berendsen et al, 2014; Fulk et al, 2014). Rating threads about pedometers like the FitBit or Jawbone often contain disappointed comments about bad measurements (too many steps counted, too few steps counted, failure to detect stair climbing).

Step counts also need to be interpreted in the context of the person who is taking the steps. 6000 steps in a day is impressive for somebody who barely walks, but an indication of a lazy day for somebody who usually averages 10000 or more.

So, we need to bear two contexts in mind if we want to interpret objective data such as step counts, the context of measurement in which the data were acquired, and the context of the person who generated the data.

When estimating the probability p(cause | symptom) that somebody has a certain condition, such as depression, given the signs they exhibit, such as activity levels measured in step counts, it’s worth considering several related probabilities:

  • p(symptom). The probability that somebody exhibits the symptom. If the symptom is very common, it’s unlikely to be a strong indicator for the cause, especially if it can have multiple causes. A classic example is the humble cough, which can be a sign of the common cold or an indicator of lung cancer.
  • p(cause). The probability that the cause occurs. This is the old adage “When you hear hoofbeats, think horses, not zebras.” Unfortunately, rare diseases are more frequent than one might think.
  • p(symptom | cause). When you look at the diagnostic criteria for most illnesses, you will often find a list of several symptoms, together with the qualification “if two or more of these indicators are present, then …”

Even worse, diseases commonly occur together (Mokraoui et al., 2016), and some of these may have overlapping symptoms.

So, what should we do when we read about yet another algorithm that can diagnose depression? First of all, every diagnosis, in particular when it comes from algorithms, should be treated as a working hypothesis. In fact, some diseases, such as dementia, can only be diagnosed with absolute certainty after a person has died and their brain has been autopsied (Toledo et al., 2013). Secondly, even if the measurements we take are objective and repeatable, we can only make sense of them in the context in which they were taken, which includes both the person and the (measurement) process.

What do you think – is objectivity possible? Am I too pessimistic?


Berendsen, B. A., Hendriks, M. R., Meijer, K., Plasqui, G., Schaper, N. C., & Savelberg, H. H. (2014). Which activity monitor to use? Validity, reproducibility and user friendliness of three activity monitors. BMC Public Health, 14(1), 749.

Fulk, G. D., Combs, S. A., Danks, K. A., Nirider, C. D., Raja, B., & Reisman, D. S. (2014). Accuracy of 2 activity monitors in detecting steps in people with stroke and traumatic brain injury. Physical Therapy, 94(2), 222–9.

Mokraoui, N.-M., Haggerty, J., Almirall, J., & Fortin, M. (2016). Prevalence of self-reported multimorbidity in the general population and in primary care practices: a cross-sectional study. BMC Research Notes, 9(1), 314.

Toledo, J. B., Van Deerlin, V. M., Lee, E. B., Suh, E., Baek, Y., Robinson, J. L., … Trojanowski, J. Q. (2013). A platform for discovery: The University of Pennsylvania Integrated Neurodegenerative Disease Biobank. Alzheimer’s & Dementia, null(null).

Basic Demographics For Your HCI User Study

Many HCI studies, especially student studies, involve a small section on participant demographics. In this post, I summarise the guidance I tend to give to my own students. There are many aspects of designing demographics questionnaires that this post won’t cover, but for a relatively simple basic user study, this should do the trick.

Principle 1: Don’t ask too many questions.

Make sure that what you are asking is relevant to your study, and focus on the most parsimonious way of getting that information. Fewer questions with fewer tick boxes are less daunting and generate more answers.

For example, when you ask about people’s age, don’t use age groups that cover ages 18-60 in increments of 5, unless you really need that level of granularity.

Principle 2: Respect people’s privacy.

Don’t ask detailed questions which may enable others to identify your participants, given what they know about when you conducted the study, and where you recruited the participants, unless they are relevant.

For example, questions about people’s country of origin are most often used to distinguish between native and non-native speakers of English – which matters if you test a system that extensively uses language. Yet, knowing somebody’s country of origin can easily identify participants who might be the only student from their country on your particular Masters programme.

Principle 3: Make demographics optional

Demographic data can be used to potentially identify people. Also, some people  may not feel comfortable sharing that information with you, especially if all they are asked to do is to evaluate or use an app or a product that you have designed.

Principle 4: Put demographic questions last

If you make somebody reflect on an aspect of themselves that is associated with social stereotypes, they are more likely to conform to or enact those stereotypes later. This is an instance of a phenomenon called Stereotype Threat, and Wikipedia has very useful resources about this topic. If you want to amplify effects of stereotype threat in your data, put demographics first, otherwise, put them last.

Principle 5: Cover the basics.

The basics differ by discipline and research lab. I always like to include the following:

  • age (18-24, 25-34, 35-44, 45-54, 55-64, 60+), with additional categories above 60 if I am working with older participants
  • gender (male, female, prefer not to say, other), which makes space for gender fluid people
  • occupation (student, employed full-time, employed part-time, self-employed, retired, home maker, unemployed, other), which acknowledges the important work that men and women who stay at home do (otherwise, they’d have to refer to themselves as unemployed). I also recommend making this a checkbox category, because people can be students while employed full time
  • highest educational qualification (high school/secondary school; vocational qualification; university graduate; postgraduate qualification). This is the most country-specific one, and there are no hard and fast rules. I mainly like to include it because level of education is a potential indicator of socioeconomic status, and may also affect people’s performance on task

Principle 6: Check whether your participants have exposure to the technology / products that you are testing

There are several formal and informal sets of questions floating around that assess digital literacy, exposure and attitudes to technology, and the like. I prefer to stick with the minimum, which looks at what technology people own and use.

Below is a very basic example of a table that requires people to tick what technologies they own, and how frequently they use them.

Don’t own one





I hope that you find these hints useful. If you want to cite them, feel free to do so; if you have any comments, or would like me to expand on some of the points I made, please leave a comment. Comments are moderated, as I get a lot of spam, but I check regularly.

Psychonomics 2016: Consistency in Group Categorisations

How consistent are people in the way they categorise groups? A summary and references for the poster at Psychonomics by Elaine Niven, Zeyu Wang, Robert Logie, and myself.

Continue Reading

Permanent Post Unlocked – Reader in Design Informatics

Even though I took up the post on September 1, it’s been a whirlwind of a time, including a presentation at Interspeech 2016 and an invited talk at the Design Informatics seminar.

But here I am, finally making it Website official: I have joined the School of Informatics, University of Edinburgh, as a Reader in Design Informatics.

Design Informatics is a very interesting hybrid. It focuses on working with data and tries to bring together highly technical and mathematically challenging disciplines like machine learning with an approach to design that owes a lot to classic product design, art, and human geography.

Research: CATalytics

I’m currently building a small research group with the aim of designing, developing, and evaluating technology that helps people with long-term conditions live rich, fulfilled lives. The provisional name of the group is CATalytics (Context Appropriate Technology and Analytics). Expect a research group update soon!


This semester, I am only guest lecturing on a few courses, but next semester, I will be teaching my Human Factors class. This class is open to everybody who is interested in Human Computer Interaction, and does not require any programming skills.

What Big Data Can Tell You About Useful mHealth

Maria Wolters, Alan Turing Institute / University of Edinburgh and Henry Potts, University College London
mHealth that Works
“If the user can’t use it, it doesn’t work at all.” This is how Susan Dray summarises her decades of user ex
perience work with clients around the world. If we want to harness the promise of Big Data to draw conclusions about the usability and usefulness of an mHealth app, Dray’s Law is an ideal starting point, because it givesus the fundamental variable we need to measure – how often people use an app.

An mHealth app can only work as intended if people use it, and if they keep using it over the intended period of time. Take food diary apps, such as the ever popular MyFitnessPal. If people don’t open it and log their food, it is of no use.  While regular use is necessary for an app to fulfill its purpose, it is not sufficient. For example, people may only record meals in MyFitnessPal that conform to guidelines and fail to log sweet or fatty foods, or they may use MyFitnessPal to support an eating disorder. Both of these patterns of using the app are contrary to the original goal, which is to help people reach and maintain a healthy bodyweight.

As app analytics 101 tells us, in order to get a good picture of app use, it is not enough to just aggregate the number of downloads, the number of reviews, and the app ratings themselves.

Metrics to Evaluate By

How can app developers achieve that? First of all, developers need to be clear about the time frame for using an app. Stop smoking apps have a natural endpoint – when users feel that they have been successful in kicking the habit. Weight management apps such as MyFitnessPal also often have natural end points (when the goal weight has been reached and maintained), but can be used long-term for people who want to maintain their goal weight or gain and lose weight depending on their sport.

We also need to acknowledge that this time frame can vary from person to person. A person who wants to lose over 20% of their bodyweight is looking at months and years of regular use, while somebody who wants to lose a couple of pounds might be done in a month.

Finally, in order to use the app meaningfully, people will need to spend a certain minimal amount of time in it – be it to track their mood, check the remaining calories or steps for the day, or enter a meal.

With these considerations out of the way, let’s look at the key indicators that can help us leverage Big Data to assess the usefulness of mHealth apps.

Number of Unique Active Users

Do people use your mHealth app once they have downloaded it? Whether this is the number of daily, weekly, or monthly users (or a combination of the three) depends on the goal of your app, but at least one of these numbers should be tracked regularly.

Session Frequency

Do people use your app as often as they should in order to get a benefit? How many of your active users are regulars? Again, the target depends on the goal of your app.

Time in App

How long do people actively spend in your app? Is this long enough to do something meaningful? In a second step, you can track what people actually do in the app, but time itself is a useful, if crude, approximation.

Retention Rate

Do people stick with your app for the amount of time they need to see a difference? If your app is about smoking cessation, you have a problem if people return to your app for years in yet another doomed attempt to kick the cigarettes, but if your app is about helping people maintain a healthy bodyweight, retention over months and years is good.

From Small Data to Big Data

As you start out with a great idea  and a small app, the data streams we have described above will be small and easy to manage. But if you believe in the promise of your app, and keep tracking, hopefully these data streams will grow and allow you to learn more about your customers, their habits, and the innovative ways in which they use your app.

What data streams do you use to measure whether people are actually using your app? What are the benefits and pitfalls you have discovered?