Skip to content. Skip to navigation

The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings

Sections
Personal tools
You are here: Home Speaker id labels Re: Speaker id labels

Forum Post : Re: Speaker id labels

Up one level
Posted by jcarletta at 2009-03-03 15:55

There are a few ways to extract information from an NXT format corpus into some other format.  In increasing order of difficulty, suitable for increasing difficult kinds of data extraction:

  1. Use FunctionQuery, a command line interface to NXT, to extract what you want into a tab-delimited format, and further process that.
  2. if the information you want is all in one-file, use normal XML processing.
  3. Write a Java program that uses the NOM API to load and traverse the data, writing the output that you want.

You can find more information abut these techniques in the NXT documentation, particularly http://www.ltg.ed.ac.uk/NITE/search/search-methods.html and http://www.ltg.ed.ac.uk/NITE/search/data-processing.html.



Powered by Plone