Easier speech corpus analysis: A practical introduction to Montreal Corpus Tools (including Speech Corpus Tools)

By Scottish Graduate School of Social Science

Location

Glasgow University Laboratory of Phonetics

13 University Gardens University of Glasgow Glasgow G12 8QQ United Kingdom

Description

The Scottish Graduate School of Social Science welcomes postgraduate research students to the following advanced training event:

Easier speech corpus analysis:

A practical introduction to Montreal Corpus Tools (including Speech Corpus Tools)

28 April 2016

Linguists are increasingly interested in generalization and variation in speech and language. To do this we need to look at big datasets across dialects and languages, and generalize based on things that have been annotated; increasingly we do this with large and varied sources of data. But addressing linguistic questions across a range of dialects, or even languages, is difficult. Corpora of spoken language have annotations in formats specific to that particular corpus. Methodology designed for one format does not always translate to new formats or to larger datasets.

This workshop will present a new set of speech corpus tools which are currently being developed by researchers at McGill, Canada, the Montreal Corpus Tools (MCT). MCT enables users to search and extract specific phonetic measures from one or more spoken language corpora simultaneously, without needing access to the raw data. Analyses can be performed in the time domain (i.e., durations of phones across speakers) or on the acoustics (i.e., speaker vowel spaces in various linguistic contexts).

The workshop will be highly hands on. Participants may attend to find out more about the tools for future analyses they might be intending to carry out, and/or they may bring their own data for analysis during the workshop itself. Participants will be encouraged to install Montreal Corpus Tools on their computers prior to the workshop and to bring a dataset that they would like to analyze, though sample data will be available for those wanting to work through a demonstration.

The workshop will be useful for those working on spoken language in digital corpora, for phonetic, sociolinguistic and/or other linguistic analyses. It could also be useful for those working on automated procedures on spoken language in digital humanities.

Workshop structure:

  • 10.30-11.30: welcome, coffee, setup (ensure software loaded correctly on laptops)
  • 11.30-12.30: presentation of the software with examples
  • 12.30-1.30: lunch
  • 1.30-3.00: opportunity to work through some examples/students’ own data

The presenters will be available in the hour before the workshop to assist in setting the software, and to troubleshoot any potential issues that might arise. The workshop will begin with a presentation of an overview of the interface and basic functionality, as well as of a couple of example analyses to highlight more advanced functionality. Following the presentation, there will be time for participants to use Speech Corpus Tools with their own datasets. The presenters will again be on hand to help and answer any questions that might come up. Slides and walkthroughs of the demos from the workshop will be provided to participants for reference materials.

Technical details

Montreal Corpus Tools requires the following basic format for corpora to be parsed and queried:

  • a collection of audio files of speech, with associated time-aligned orthographic transcriptions
  • the transcriptions must include words (e.g. orthographic units) and phones (sounds which make up words), with beginning and end times notated in some way, and aligned with the sound files.
  • the most common use format is the output of a forced aligner, e.g. FAVE, LaBB-CAT, MAUS. MCT can currently also handle the Buckeye Corpus and TIMIT. Ability to deal with other formats is planned.

Presenters

Michael McAuliffe, Postdoctoral Research Fellow, McGill University, PhD. University of British Columbia, "Attention and salience in lexically-guided perceptual learning". http://mmcauliffe.github.io)

Morgan Sonderegger (McGill, Canada; Ph.D. U. Chicago Linguistics & Comp Sci, 2012; http://people.linguistics.mcgill.ca/~morgan/)

Local Host: Jane Stuart-Smith, English Language/Glasgow University Laboratory of Phonetics (GULP). Jane.Stuart-Smith@glasgow.ac.uk


Organised by

The Scottish Graduate School of Social Science is the UK's largest facilitator of funding, training and support for doctoral students in social science. By combining the expertise of sixteen universities across Scotland, the school facilitates world-class PhD research. The school is funded jointly by the Economic and Social Research Council and the Scottish Funding Council.

SGSSS is a highly attractive environment for doctoral research. Not only do our partner universities offer an excellent research environment, we also offer comprehensive and world-class research training in a number of discipline-specific and interdisciplinary pathways. In addition, the school manages a programme of advanced training courses and an annual summer school which together offers our students further opportunities to develop their research, knowledge exchange and transferable professional skills.

At the heart of the SGSSS is the Doctoral Training Partnership (formerly the Doctoral Training Centre) in Scotland. The SGSSS was established in 2011 and is the biggest of 14 Economic & Social Research Council (ESRC) accredited DTPs in the United Kingdom. The bid for renewed funding has been successful and from 1 October 2017 the SGSSS will be one of the ESRC's 14 Doctoral Training Partnerships (DTP)

Sales Ended