Forum Post : Re: Demographic data about the speakers ...
Up one levelSorry we didn't reply earlier - something about the notification seems to have broken.
The demographic information that we collected is in the "participants" corpus resource. The encoding is a little bit odd, but you can get tab-delimited data out of it, for instance, using normal XML processing, or using NXT as follows:
java FunctionQuery -c AMI-metadata.xml -o IS1003a -q '($p participant)($e english_language):($p^$e)' -atts nite:id sex age_at_collection native_language '$e@region' '@extract(($i influence):$e^$i,name)'
The -o argument just names an arbitrary observation (meeting). Corpus resources hold information about the entire set of meetings. If we run the query without naming an observation we get the information as many times as there are meetings, which is slow. This is just a quirk of NXT.
The output looks, for example, like this:
MEE006 M 25.0 English Singapore, Canada Mandarin
The id for the participant; followed by the sex; the age (badly formatted, since they are integers); the native language; the region(s); and any influences on their English that the participant listed.