It has been a while, no second year review due to mixed feelings during covid. But thanks to my supervisors, I managed to overcome negative thinking. Thanks to them, I managed to finish my work, and I am now proud to be a PhD in computer science 🎉. In this post, I will share with you a summary of my work, some feelings and advises I have to share.

Thesis summary

People with head and neck cancers have speech difficulties after surgery or radiation therapy. It is important for health practitioners to have a measure that reflects the severity of speech. To produce this measure, a perceptual study is commonly performed with a group of five to six clinical experts. This process limits the use of this assessment in practice. Thus, the creation of an automatic measure, similar to the severity index, would allow a better follow-up of the patients by facilitating its obtaining.

To realize such a measure, we relied on a reading task, classically performed. We used the recordings of the cancer corpus, which includes more than 100 people ¹. This corpus represents about one hour of recording to model the severity index. In this PhD work, a review of state-of-the-art methods on speech, emotion and speaker recognition using little data was undertaken. We then attempted to model severity using transfer learning and deep learning. Since the results were not usable, we turned to the so-called « few shot » techniques (learning from only a few examples). Thus, after promising first attempts at phoneme recognition², we obtained promising results for categorizing the severity of patients. Nevertheless, the exploitation of these results for a medical application would require improvements.

We therefore performed projections of the data from our corpus. As some score slices were separable using acoustic parameters, we proposed a new divergence measurement method ³. This one is based on self-supervised speech representations on the Librispeech corpus: the PASE+ model, which is inspired by the Inception Score (generally used in image processing to evaluate the quality of images generated by models). Our method allows us to produce a score similar to the severity index with a Spearman correlation of 0.87 on the reading task of the cancer corpus. The advantage of our approach is that it does not require data from the cancer corpus for training. Thus, we can use the whole corpus for the evaluation of our system. The quality of our results has allowed us to consider a use in a clinical environment through an application on a tablet: tests are underway at the Larrey Hospital in Toulouse.

Keywords: Speech pathology, severity index, speech disorder, ENT cancer, deep learning, learning with a few examples, self-supervised, entropic measurement, few-shot, limited data, limited amount of data, automatic speech processing.

For my manuscript, I used Latex (as all of my articles). To collaborate with others (especially my supervisors) on it, I used overleaf to have nice comments (more information here), a history of modifications and to synchronize my work in progress on GitHub (for safety as I did not want to rewrite this manuscript from scratch). Once my manuscript goes online, I will add a link to it here for those interested.

Thesis presentation

Like my manuscript, I used Latex (with beamer) for my presentation! It is not a common choice (I did not see any thesis presentation using Latex) but if you are used to it, you gain so much time to focus on the content instead of the form. I did create my own theme (not from scratch I am not a mad man 🤣) and it was near complete before working on my thesis presentation (as I used beamer for my weekly presentations with my supervisors).

Now let’s talk a little about my feelings, I was stressed for sure, this moment intimidated me. It is so formal, especially the time to complete the presentation (45 min in my case). Usually when I present things, if I take 5/10 more or fewer minutes than expected, and it is fine. But there, I did not have this margin of error. This stressed me and even if I did rehearsal a lot for this one, the more I prepared, the more I felt stressed. Funny thing is when there were problems (people arriving while I was started, missing slides, slides not shared at the beginning and so on) it makes me forgot about the stressing things of the experience. For those who weren’t there, you can see my presentation/”performance” here (sorry folks, this video is in French). I think I could have reduced my stress if I did a rehearsal in the amphitheater and use a digital chronometer (with a large screen) on both my rehearsal and the final day.

I was more at easy with the questions, there I was not limited by the time to answer the jury. Also, the fact that it was interactive reduce the formal aspect of it. Then, after the jury deliberated, I received my PhD title! What an incredible feeling, a mix of joy, stress and relief. All make me cry at the end, what a great moment!

I hope this will help and/or inspire some of you.

Cheers, Vincent.

I participated in the analysis of the dataset in hal-02921918 ↩
I conducted the review of state-of-the-art techniques and the experiments that lead to this paper: DOI:10.1186/s13636-022-00251-w ↩
This work led to the following publication (yet to be published): Roger, V., Farinas, J., Woisard, V., and Pinquier, J. (2022b). Création d’une mesure entropique de la parole pour évaluer l’intelligibilité de patients atteints de cancers des voies aérodigestives supérieures. In 34e Journées d’Études sur la Parole (JEP2022). ↩