ASR Run Submission Format:
▪ Each participant has to submit at least one run for each of the translation task s/he registered for.
▪ Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run for each track. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) for the respective track will be used as the PRIMARY run.
▪ Runs have to be submitted as a gzipped TAR archive (format see below) and sent as an email attachment to iwslt2012.ted AT gmail DOT com.
▪ Submissions have to be made in CTM format. See the ctm documentation in the NIST SCTK documentation for details. The confidence values are optional. The channel number has to be '1'. See this ASR output of dev2010 and tst2010 provided by KIT for a valid example. Scoring will be case-insensitive. Submissions have to be in UTF-8.
▪ When producing the output, the segmentation that is provided with the evaluation data needs to be used.
▪ In the CTM files the beginning of each new segment as given by the pre-defined segmentation needs to be indicated by adding a line that is formated likes this: # uttid start-time
▪ The uttid is as follows: talk$number_$star-time
▪ For start-time in the uttID the decimal point is substituted by '_'
▪ Example: # talkid93_607_79 607.79
▪ The text will be scored case-insensitive, but can be submitted case-sensitive
▪ numbers, dates etc. need to be transcribed in words as they are spoken, not in digits
▪ Common acronyms such as NATO, EU, are written as one word, without any special markers between the letters. This applies no matter whether they are spoken as one word or spelled out as a letter sequence
▪ All other letter spelling sequences are written as individual letters with space inbetween
▪ Standard abbreviations, such as "etc." "Mr." are accepted as specified by the glm file in the scoring package
▪ For words pronounced in their contracted form, the orthography for the contracted form may be used. These cases will be normalized by the glm file to their canonical form.
Please, refer to the stm files provided, the example ctm files provided and the glm file for the required out put as well.
TAR archive file structure:
< UserID >/< Set >.< Task >.< UserID >.primary.ctm
/< Set >.< Task >.< UserID >.contrastive1.ctm
/< Set >.< Task >.< UserID >.contrastive2.ctm
< UserID > = user ID of participant used to download data files
< Set > = dev2010 | tst2010 | tst2011 | tst2012
< Task > = ASR_E | ASR_SC_E
Re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only runs of the most recent submission mail will be used for the IWSLT 2012 evaluation and previous mails will be ignored.