Harvard Uncertainty Speech Corpus
The Harvard Uncertainty Speech Corpus is a collection of speech recordings, elicitation materials, level of certainty annotations, and acoustic-prosodic data. The utterances were recorded in a laboratory, in a question-answering setting. In total, the corpus contains 1700 utterances from 42 speakers of American English, 148.79 minutes of speech.
Corpus properties
- Utterances range in level of certainty {uncertain, neutral, certain} by controlling the difficulty of the questions
- Repeated instances of specific words and phrases (e.g., sycophantic, Red Line, nine)
- Level of certainty labels from a panel of listeners as well as from the speaker
- Crowdsourced item difficulty scores (digit domain only)
How to get the corpus
While this page is in the process of being updated, materials below are available upon request.- Elicitation materials
- vocabulary
- transportation
- digits
- Digital recordings
- available upon request for research purposes
- Level of certainty annotations
- self-reports
- listener annotations
- difficulty scores
- Acoustic-prosodic data
- can be accessed through the dataverse network
Related Publications
- Heather Pon-Barry, Stuart Shieber and Nicholas Longenbaugh. Eliciting and Annotating Uncertainty in Spoken Language. In Proceedings of Language Resources and Evaluation Conference, May 2014.
- Heather Roberta Pon-Barry. Inferring Speaker Affect in Spoken Natural Language Communication. Ph.D. Dissertation, Harvard University, November 2012.
- Heather Pon-Barry and Stuart M. Shieber. Recognizing Uncertainty in Speech. EURASIP Journal on Advances in Signal Processing, 2011(251753), 2011. Special Issue on Emotion and Mental State Recognition from Speech.
- Heather Pon-Barry. Prosodic Manifestations of Confidence and Uncertainty in Spoken Language. Proceedings of Interspeech, pp. 74-77, September 2008.
For questions, please contact Heather Pon-Barry.