Transcription
Presents an overview of the transcription process and conventions used, along with a sample transcription and link to the written guidelines
An NXT-formatted file containing a word-level transcription is available for each meeting in the AMI Corpus. Transcribers worked from a set of written guidelines and were encouraged to resolve any uncertainties by consulting a discussion Wiki. To ensure a high quality of transcript, data were handled in two, and sometimes three, passes.
First pass transcribers used ChannelTrans, a multispeaker adaptation of the Transcriber tool developed at Berkeley. They were asked to strive for a balance between speed and accuracy. To facilitate the adjustment of time boundaries, transcribers were provided with presegmented ‘empty’ transcription files. Presegmented files were generated automatically by applying a simple energy-based technique to segment silence from speech for each meeting participant (see Audio and Video Signals). First pass transcribers listened to and transcribed only those areas identified by the presegmenter as ‘speech’, also adjusting segment boundaries where needed. Second pass transcribers then reviewed all segments, ensuring that any missed speech was transcribed, and resolving all uncertainties. As a final step, a validation script was run flagging any errors, such as unknown spellings or file format irregularities. Once a transcription was successfully validated, it was up-translated to NXT format.
Speech has been transcribed verbatim using British spellings, and without correcting grammatical errors, e.g. ‘me and him have done this’. An exhaustive list of reduced lexical forms, such as ‘gonna’ and ‘kinda’, is featured. Normal capitalization is used on proper nouns and at the beginning of sentences, along with simplified standard English punctuation, i.e. commas, hyphens, full stops, and question marks.
An additional set of punctuation marks flag certain speech and non-speech events. While missed target pronunciations influenced by a speaker’s native language are unflagged, all other mispronunciations and neologisms are flagged with an asterisk, e.g. ‘velocity*’ (pronounced as ‘velocily’) and ‘bumblebeeish*’. Discontinuity and disfluency at the word or the utterance level are indicated with a hyphen, e.g. ‘I think basi-’ and ‘I just meant - I mean ...’. Particular care has also been taken to indicate whether a turn continues (no punctuation or comma) or ends (full stop, question mark, or hyphen). Simple symbols denote laughing ‘$’, throat noises ‘%’, and other nonverbal vocalizations ‘#’. Other qualitative features of the signal, such as emphasized speech or ‘while laughing’, were ignored. A special category of noises, including onomatopoetic and other highly meaningful sounds, have been indicated with a meta-noise tag enclosed in square brackets, e.g. ‘[sound imitating beep]’. Instances where a string was undecipherable to the second pass transcriber are marked with ‘@’.
Transcripts are time-synchronized with the digitized audio recordings and feature microphone channel IDs for distinguishing speakers. Automatically generated word and phoneme level timings of the transcripts were achieved through forced alignment.
A sample transcription presented in a human-readable format is shown below.
(ID) That’s our number one prototype.
(PM) /@ like a little lightning in it.
(ID) Um do you wanna present the potato,
(ID) or shall I present the Martian?
(UI) /Okay, um -
(PM) /The little lightning bolt in it, very cute.
(UI) /What -
(UI) We call that one the rhombus, uh the rhombus.
(ME) /I could -
(PM) /The v- the rhombus rhombus?
(ID) /That’s
(ID) the rhombus, yep.
(UI) Um this one is known as the potato, uh it’s
(UI) it’s a $ how can I present it? It’s an ergonomic shape,
(ID) /$
(ME) /$
(UI) so it it fits in your hand nicely. Um,
(UI) it’s designed to be used either in your left hand or or
(UI) in your right hand.