diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..e645833
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+*~
+*.pyc
\ No newline at end of file
diff --git a/README.md b/README.md
index 0d410dd..9f8b16c 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,257 @@
-Please check back soon for the scoring tools for the [First DIHARD Challenge](https://coml.lscp.ens.fr/dihard/).
+I. Overview
+=========
+This suite supports evaluation of diarization system output relative
+to a reference diarization subject to the following conditions:
+
+- both the reference and system diarizations are saved within [Rich Transcription Time Marked  (RTTM)](#rttm) files
+- for any pair of recordings, the sets of speakers are disjoint
+
+
+II. Dependencies
+==========
+The following Python packages are required to run this software:
+
+- Python >= 2.7.1 (https://www.python.org/)
+- NumPy >= 1.6.1 (https://github.com/numpy/numpy)
+- SciPy >= 0.10.0 (https://github.com/scipy/scipy)
+- intervaltree >= 2.1.0 (https://pypi.python.org/pypi/intervaltree)
+- tabulate >= 0.5.0 (https://pypi.python.org/pypi/tabulate)
+
+
+III. Metrics
+======
+Diarization error rate
+---------------------------
+Following tradition in this area, we report diarization error rate (DER), which
+is the sum of
+
+- speaker error  --  percentage of scored time for which the wrong speaker id
+  is assigned within a speech region
+- false alarm speech  --   percentage of scored time for which a nonspeech
+  region is incorrectly marked as containing speech
+- missed speech  --  percentage of scored time for which a speech region is
+  incorrectly marked as not containing speech
+
+As with word error rate, a score of zero indicates perfect performance and
+higher scores (which may exceed 100) indicate poorer performance. For more
+details, consult section 6.1 of the [NIST RT-09 evaluation plan](https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf).
+
+
+Clustering metrics
+---------------------------------
+An alternate approach to system evaluation is convert both the reference and
+system outputs to frame-level labels, then evaluate using one of many
+well-known approaches for evaluating clustering performance. Each recording
+is converted to a sequence of 10 ms frames, each of which is assigned a single
+label corresponding to one of the following cases:
+
+- the frame contains no speech
+- the frame contains speech from a single speaker (one label per speaker
+  indentified)
+- the frame contains overlapping speech (one label for each element in the
+  powerset of speakers)
+
+These frame-level labelings are then scored with the following metrics:
+
+### Goodman-Kruskal tau
+Goodman-Kruskal tau is an asymmetric association measure dating back to work
+by Leo Goodman and William Kruskal in the 1950s (Goodman and Kruskal, 1954).
+For a reference labeling ``ref`` and a system labeling ``ref``,
+``GKT(ref, sys)`` corresponds to the fraction of variability in ``sys`` that
+can be explained by ``ref``. Consequently, ``GKT(ref, sys)`` is 1 when ``ref``
+is perfectly predictive of ``sys`` and 0 when it is not predictive at all.
+Correspondingly, ``GKT(sys, ref)`` is 1 when ``sys`` is perfectly predictive
+of ``ref`` and 0 when lacking any predictive power.
+
+### B-cubed precision, recall, and F1
+The B-cubed precision for a single frame assigned speaker ``S`` in the
+reference diarization and ``C`` in the system diarization is the proportion of
+frames assigned ``C`` that are also assigned ``S``. Similarly, the B-cubed
+recall for a frame is the proportion of all frames assigned ``S`` that are
+also assigned ``C``. The overall precision and recall, then, are just the mean
+of the frame-level precision and recall measures and the overall F-1 their
+harmonic mean. For additional details see Bagga and Baldwin (1998).
+
+### Information theoretic measures
+We report four information theoretic measures:
+
+- ``H(ref|sys)``  --  conditional conditional entropy in bits of the reference
+  labeling given the system labeling
+- ``H(sys|ref)``  --  conditional conditional entropy in bits of the system
+  labeling given the reference labeling
+- ``MI``  --  mutual information in bits between the reference and system
+  labelings
+- ``NMI``  --  normalized mutual information between the reference and system
+  labelings; that is, ``MI`` scaled to the interval [0, 1]. In this case, the
+  normalization term used is ``sqrt(H(ref)*H(sys))``.
+
+``H(ref|sys)`` is the number of bits needed to describe the reference
+labeling given that the system labeling is known and ranges from 0 in
+the case that the system labeling is perfectly predictive of the reference
+labeling to ``H(ref)`` in the case that the system labeling is not at
+all predictive of the reference labeling. Similarly, ``H(sys|ref)`` measure
+the number of bits required to describe the system labeling given the
+reference labeling and ranges from 0 to ``H(sys)``.
+
+``MI`` is the number of bits shared by the reference and system labeling and
+indicates the degree to which knowing either reduces uncertainty in the other.
+It is related to conditional entropy and entropy as follows:
+``MI(ref, sys) = H(ref) - H(ref|sys) = H(sys) - H(sys|ref)``. ``NMI`` is
+derived from ``MI`` by normalizing it to the interval [0, 1]. Multiple
+normalizations are possible depending on the upper-bound for ``MI`` that is
+used, but we report ``NMI`` normalized by ``sqrt(H(ref)*H(sys))``.
+
+
+IV. Scoring
+======
+To evaluate system output stored in [RTTM](#rttm) files ``sys1.rttm``,
+``sys2.rttm``, ... against a corresponding reference diarization stored in RTTM
+files ``ref1.rttm``, ``ref2.rttm``, ...:
+
+    python score.py -r ref1.rttm ref2.rttm ... -s sys1.rttm sys2.rttm ...
+
+ which will calculate and report the following metrics both overall and on
+ a per-file basis:
+
+- ``DER``  --  diarization error rate
+- ``B3-Precision``  --  B-cubed precision
+- ``B3-Recall``  --  B-cubed recall
+- ``B3-F1``  --  B-cubed F1
+- ``GKT(ref, sys)``  --  Goodman-Kruskal tau in the direction of the reference
+  diarization to the system diarization
+- ``GKT(sys, ref)``  --  Goodman-Kruskal tau in the direction of the system
+  diarization to the reference diarization
+- ``H(ref|sys)``  --  conditional entropy in bits of the reference diarization
+  given the system diarization
+- ``H(sys|ref)``  --  conditional entropy in bits of the system diarization
+  given the reference diarization
+- ``MI``  --  mutual information in bits
+- ``NMI``  --  normalized mutual information
+
+Alternately, we could have specified the reference and system RTTM files via
+script files of paths (one per line) using the ``-R`` and ``-S`` flags:
+
+    python score.py -R ref.scp -S sys.scp
+
+By default the scoring regions for each file will be determined automatically
+from the reference and speaker turns. However, it is possible to specify
+explicit scoring regions using a NIST [un-partitioned evaluation map (UEM)](#uem) file and the ``-u`` flag. For instance, the following:
+
+    python score.py -u all.uem -R ref.scp -S sys.scp
+
+will load the files to be scored plus scoring regions from ``all.uem``, filter
+out and warn about any speaker turns not present in those files, and trim the
+remaining turns to the relevant scoring regions before computing the metrics
+as before.
+
+DER is scored using the NIST ``md-eval.pl`` tool with
+a default collar size of 0 ms and explicitly including regions that contain
+overlapping speech in the reference diarization. If desired, this behavior
+can be altered using the ``--collar`` and ``--ignore_overlaps`` flags. For
+instance
+
+    python score.py --collar 0.100 --ignore_overlaps -R ref.scp -S sys.scp
+
+would compute DER using a 100 ms collar and with overlapped speech ignored.
+All other metrics are computed off of frame-level labelings generated from the
+reference and system speaker turns **WITHOUT** any use of collars. The default
+frame step is 10 ms, which may be altered via the ``--step`` flag. For more
+details, consult the docstrings within the ``scorelib.metrics`` module.
+
+The overall and per-file results will be printed to STDOUT as a table; for instance
+
+    File                           DER    B3-Precision    B3-Recall    B3-F1    GKT(ref, sys)    GKT(sys, ref)    H(ref|sys)    H(sys|ref)    MI    NMI
+    ---------------------------  -----  --------------  -----------  -------  ---------------  ---------------  ------------  ------------  ----  -----
+    CMU_20020319-1400_d01_NONE    6.10            0.91         1.00     0.95             1.00             0.88          0.22          0.00  2.66   0.96
+    ICSI_20000807-1000_d05_NONE  17.37            0.72         1.00     0.84             1.00             0.68          0.65          0.00  2.79   0.90
+    ICSI_20011030-1030_d02_NONE  13.06            0.80         0.95     0.87             0.95             0.80          0.54          0.11  5.10   0.94
+    LDC_20011116-1400_d06_NONE    5.64            0.95         0.89     0.92             0.85             0.93          0.10          0.27  1.87   0.91
+    LDC_20011116-1500_d07_NONE    1.69            0.96         0.96     0.96             0.95             0.95          0.14          0.12  2.39   0.95
+    NIST_20020305-1007_d01_NONE  42.05            0.51         0.95     0.66             0.93             0.44          1.58          0.11  2.13   0.74
+    *** TOTAL ***                14.31            0.81         0.96     0.88             0.96             0.80          0.55          0.10  5.45   0.94
+
+Some basic control of the formatting of this table is possible via the ``--n_digits`` and
+``--table_format`` flags. The former controls the number of decimal places printed for floating
+point numbers, while the latter controls the table format. For a list of valid table formats plus example
+outputs, consult the [documentation](https://pypi.python.org/pypi/tabulate) for the ``tabulate`` package.
+
+For additional details consult the docstring of ``score.py``.
+
+
+V. File formats
+========
+RTTM
+-------
+Rich Transcription Time Marked (RTTM) files are space-delimited text files containing one turn per line, each line containing ten fields:
+
+- ``Type``  --  segment type; should always by ``SPEAKER``
+- ``File ID``  --  file name; basename of the recording minus extension (e.g.,
+  ``rec1_a``)
+- ``Channel ID``  --  channel (1-indexed) that turn is on; should always be
+  ``1``
+- ``Turn Onset``  --  onset of turn in seconds from beginning of recording
+- ``Turn Duration``  -- duration of turn in seconds
+- ``Orthography Field`` --  should always by ``<NA>``
+- ``Speaker Type``  --  should always be ``<NA>``
+- ``Speaker Name``  --  name of speaker of turn; should be unique within scope
+  of each file
+- ``Confidence Score``  --  system confidence (probability) that information
+  is correct; should always be ``<NA>``
+- ``Signal Lookahead Time``  --  should always be ``<NA>``
+
+For instance:
+
+    SPEAKER CMU_20020319-1400_d01_NONE 1 130.430000 2.350 <NA> <NA> juliet <NA> <NA>
+    SPEAKER CMU_20020319-1400_d01_NONE 1 157.610000 3.060 <NA> <NA> tbc <NA> <NA>
+    SPEAKER CMU_20020319-1400_d01_NONE 1 130.490000 0.450 <NA> <NA> chek <NA> <NA>
+
+If you would like to confirm that a set of RTTM files are valid, use the
+included ``validate_rttm.py`` script. For instance, if you have RTTMs
+``fn1.rttm``, ``fn2.rttm``, ..., then
+
+     python validate_rttm.py fn1.rttm fn2.rttm ...
+
+will iterate over each line of each file and warn on any that do not match the
+spec.
+
+UEM
+------
+Un-partitioned evaluation map (UEM) files are used to specify the scoring
+regions within each recording. For each scoring region, the UEM file contains
+a line with the following four space-delimited fields
+
+- ``File ID``  --  file name; basename of the recording minus extension (e.g.,
+  ``rec1_a``)
+- ``Channel ID``  --  channel (1-indexed) that scoring region is on; ignored by
+  ``score.py``
+- ``Onset``  --  onset of scoring region in seconds from beginning of recording
+- ``Offset``  --  offset of scoring region in seconds from beginning of
+  recording
+
+For instance:
+
+    CMU_20020319-1400_d01_NONE 1 125.000000 727.090000
+    CMU_20020320-1500_d01_NONE 1 111.700000 615.330000
+    ICSI_20010208-1430_d05_NONE 1 97.440000 697.290000
+
+
+VI. References
+=========
+- Bagga, A. and Baldwin, B. (1998). "Algorithms for scoring coreference
+  chains." Proceedings of LREC 1998.
+- Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory.
+- Goodman, L.A. and Kruskal, W.H. (1954). "Measures of association for
+  cross classifications." Journal of the American Statistical Association.
+- NIST. (2009). The 2009 (RT-09) Rich Transcription Meeting Recognition
+  Evaluation Plan. https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf
+- Nguyen, X.V., Epps, J., and Bailey, J. (2010). "Information theoretic
+  measures for clustering comparison: Variants, properties, normalization
+  and correction for chance." Journal of Machine Learning Research.
+- Pearson, R. (2016). GoodmanKruskal: Association Analysis for Categorical
+  Variables. https://CRAN.R-project.org/package=GoodmanKruskal.
+- Rosenberg, A. and Hirschberg, J. (2007). "V-Measure: A conditional
+  entropy-based external cluster evaluation measure." Proceedings of
+  EMNLP 2007.
+- Strehl, A. and Ghosh, J. (2002). "Cluster ensembles  --  A knowledge
+  reuse framework for combining multiple partitions." Journal of Machine
+  Learning Research.
\ No newline at end of file
diff --git a/score.py b/score.py
new file mode 100755
index 0000000..f9a3d96
--- /dev/null
+++ b/score.py
@@ -0,0 +1,303 @@
+#!/usr/bin/env python
+"""Score diarization system output.
+
+To evaluate system output stored in RTTM files ``sys1.rttm``, ``sys2.rttm``,
+... against a corresponding reference diarization stored in RTTM files
+``ref1.rttm``, ``ref2.rttm``, ...:
+
+    python score.py -r ref1.rttm ref2.rttm ... -s sys1.rttm sys2.rttm ...
+
+which will calculate and report the following metrics both overall and on
+a per-file basis:
+
+- diarization error rate (DER)
+- B-cubed precision (B3-Precision)
+- B-cubed recall (B3-Recall)
+- B-cubed F1 (B3-F1)
+- Goodman-Kruskal tau in the direction of the reference diarization to the
+  system diarization (GKT(ref, sys))
+- Goodman-Kruskal tau in the direction of the system diarization to the
+  reference diarization (GKT(sys, ref))
+- conditional entropy of the reference diarization given the system
+  diarization in bits (H(ref|sys))
+- conditional entropy of the system diarization given the reference
+  diarization in bits (H(sys|ref))
+- mutual information in bits (MI)
+- normalized mutual information (NMI)
+
+Alternately, we could have specified the reference and system RTTM files via
+script files of paths (one per line) using the ``-R`` and ``-S`` flags:
+
+    python score.py -R ref.scp -S sys.scp
+
+By default the scoring regions for each file will be determined automatically
+from the reference and speaker turns. However, it is possible to specify
+explicit scoring regions using a NIST un-partitioned evaluation map (UEM) file
+and the ``-u`` flag. For instance, the following:
+
+    python score.py -u all.uem -R ref.scp -S sys.scp
+
+will load the files to be scored + scoring regions from ``all.uem``, filter out
+and warn about any speaker turns not present in those files, and trim the
+remaining turns to the relevant scoring regions before computing the metrics
+as before.
+
+Diarization error rate (DER) is scored using the NIST ``md-eval.pl`` tool with
+a default collar size of 0 ms and explicitly including regions that contain
+overlapping speech in the reference diarization. If desired, this behavior
+can be altered using the ``--collar`` and ``--ignore_overlaps`` flags. For
+instance
+
+    python score.py --collar 0.100 --ignore_overlaps -R ref.scp -S sys.scp
+
+would compute DER using a 100 ms collar and with overlapped speech ignored.
+All other metrics are computed off of frame-level labelings generated from the
+reference and system speaker turns **WITHOUT** any use of collars. The default
+frame step is 10 ms, which may be altered via the ``--step`` flag. For more
+details, consult the docstrings within the ``scorelib.metrics`` module.
+
+The overall and per-file results will be printed to STDOUT as a table formatted
+using the ``tabulate`` package. Some basic control of the formatting of this
+table is possible via the ``--n_digits`` and ``--table_format`` flags. The
+former controls the number of decimal places printed for floating point
+numbers, while the latter controls the table format. For a list of valid
+table formats plus example outputs, consult the documentation for the
+``tabulate`` package:
+
+    https://pypi.python.org/pypi/tabulate
+"""
+from __future__ import print_function
+from __future__ import unicode_literals
+import argparse
+import os
+import sys
+
+from tabulate import tabulate
+
+from scorelib import __version__ as VERSION
+from scorelib.argparse import ArgumentParser
+from scorelib.rttm import load_rttm
+from scorelib.turn import merge_turns, trim_turns
+from scorelib.score import score
+from scorelib.six import iterkeys
+from scorelib.uem import gen_uem, load_uem
+from scorelib.utils import error, info, warn, xor
+
+
+class RefRTTMAction(argparse.Action):
+    def __call__(self, parser, namespace, values, option_string=None):
+        setattr(namespace, self.dest, values)
+        if not xor(namespace.ref_rttm_fns, namespace.ref_rttm_scpf):
+            parser.error('Exactly one of -r and -R must be set.')
+
+
+class SysRTTMAction(argparse.Action):
+    def __call__(self, parser, namespace, values, option_string=None):
+        setattr(namespace, self.dest, values)
+        if not xor(namespace.sys_rttm_fns, namespace.sys_rttm_scpf):
+            parser.error('Exactly one of -s and -S must be set.')
+
+
+def load_rttms(rttm_fns):
+    """Load speaker turns from RTTM files.
+
+    Parameters
+    ----------
+    rttm_fns : list of str
+        Paths to RTTM files.
+
+    Returns
+    -------
+    turns : list of Turn
+        Speaker turns.
+
+    file_ids : set
+        File ids found in ``rttm_fns``.
+    """
+    turns = []
+    file_ids = set()
+    for rttm_fn in rttm_fns:
+        if not os.path.exists(rttm_fn):
+            error('Unable to open RTTM file: %s' % rttm_fn)
+            sys.exit(1)
+        try:
+            turns_, _, file_ids_ = load_rttm(rttm_fn)
+            turns.extend(turns_)
+            file_ids.update(file_ids_)
+        except IOError as e:
+            error('Invalid RTTM file: %s. %s' % (rttm_fn, e))
+            sys.exit(1)
+    return turns, file_ids
+
+
+def check_for_empty_files(ref_turns, sys_turns, uem):
+    """Warn on files in UEM without reference or speaker turns."""
+    ref_file_ids = set([turn.file_id for turn in ref_turns])
+    sys_file_ids = set([turn.file_id for turn in sys_turns])
+    for file_id in sorted(iterkeys(uem)):
+        if file_id not in ref_file_ids:
+            warn('File "%s" missing in reference RTTMs.' % file_id)
+        if file_id not in sys_file_ids:
+            warn('File "%s" missing in system RTTMs.' % file_id)
+    # TODO: Clarify below warnings; this indicates that there are no
+    #       ELIGIBLE reference/system turns.
+    if not ref_turns:
+        warn('No reference speaker turns found within UEM scoring regions.')
+    if not sys_turns:
+        warn('No system speaker turns found within UEM scoring regions.')
+
+
+def load_script_file(fn):
+    """Load file names from ``fn``."""
+    with open(fn, 'rb') as f:
+        return [line.decode('utf-8').strip() for line in f]
+
+
+def print_table(file_to_scores, global_scores, n_digits=2,
+                table_format='simple'):
+    """Pretty print scores as table.
+
+    Parameters
+    ----------
+    file_to_scores : dict
+        Mapping from file ids in ``uem`` to ``Scores`` instances.
+
+    global_scores : Scores
+        Global scores.
+
+    n_digits : int, optional
+        Number of decimal digits to display.
+        (Default: 3)
+
+    table_format : str, optional
+        Table format. Passed to ``tabulate.tabulate``.
+        (Default: 'simple')
+    """
+    col_names = ['File',
+                 'DER', # Diarization error rate.
+                 'B3-Precision', # B-cubed precision.
+                 'B3-Recall', # B-cubed recall.
+                 'B3-F1', # B-cubed F1.
+                 'GKT(ref, sys)', # Goodman-Krustal tau (ref, sys).
+                 'GKT(sys, ref)', # Goodman-Kruskal tau (sys, ref).
+                 'H(ref|sys)',  # Conditional entropy of ref given sys.
+                 'H(sys|ref)',  # Conditional entropy of sys given ref.
+                 'MI', # Mutual information.
+                 'NMI', # Normalized mutual information.
+                ]
+    rows = []
+    for file_id in sorted(iterkeys(file_to_scores)):
+        scores = file_to_scores[file_id]
+        row = [file_id, scores.der, scores.bcubed_precision,
+               scores.bcubed_recall, scores.bcubed_f1, scores.tau_ref_sys,
+               scores.tau_sys_ref, scores.ce_ref_sys, scores.ce_sys_ref,
+               scores.mi, scores.nmi]
+        rows.append(row)
+    rows.append(['*** OVERALL ***', global_scores.der, global_scores.bcubed_precision,
+                 global_scores.bcubed_recall, global_scores.bcubed_f1,
+                 global_scores.tau_ref_sys, global_scores.tau_sys_ref,
+                 global_scores.ce_ref_sys, global_scores.ce_sys_ref,
+                 global_scores.mi, global_scores.nmi])
+    floatfmt = '.%df' % n_digits
+    tbl = tabulate(
+        rows, headers=col_names, floatfmt=floatfmt, tablefmt=table_format)
+    print(tbl)
+
+
+
+if __name__ == '__main__':
+    # Parse command line arguments.
+    parser = ArgumentParser(
+        description='Score diarization from RTTM files.', add_help=True,
+        usage='%(prog)s [options]')
+    parser.add_argument(
+        '-r', nargs='+', default=[], metavar='STR', dest='ref_rttm_fns',
+        action=RefRTTMAction,
+        help='reference RTTM files (default: %(default)s)')
+    parser.add_argument(
+        '-R', nargs=None, metavar='STR', dest='ref_rttm_scpf',
+        action=RefRTTMAction,
+        help='reference RTTM script file (default: %(default)s)')
+    parser.add_argument(
+        '-s', nargs='+', default=[], metavar='STR', dest='sys_rttm_fns',
+        action=SysRTTMAction,
+        help='system RTTM files (default: %(default)s)')
+    parser.add_argument(
+        '-S', nargs=None, metavar='STR', dest='sys_rttm_scpf',
+        action=SysRTTMAction,
+        help='system RTTM script file (default: %(default)s)')
+    parser.add_argument(
+        '-u,--uem', nargs=None, metavar='STR', dest='uemf',
+        help='un-partitioned evaluation map file (default: %(default)s)')
+    parser.add_argument(
+        '--collar', nargs=None, default=0.0, type=float, metavar='FLOAT',
+        help='collar size in seconds for DER computaton '
+             '(default: %(default)s)')
+    parser.add_argument(
+        '--ignore_overlaps', action='store_true', default=False,
+        help='ignore overlaps when computing DER')
+    parser.add_argument(
+        '--step', nargs=None, default=0.010, type=float, metavar='FLOAT',
+        help='step size in seconds (default: %(default)s)')
+    parser.add_argument(
+        '--n_digits', nargs=None, default=2, type=int, metavar='INT',
+        help='number of decimal places to print (default: %(default)s)')
+    parser.add_argument(
+        '--table_fmt', nargs=None, dest='table_format', default='simple',
+        metavar='STR',
+        help='tabulate table format (default: %(default)s)')
+    parser.add_argument(
+        '--version', action='version',
+        version='%(prog)s ' + VERSION)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    args = parser.parse_args()
+
+    # Check that at least one reference RTTM and at least one system RTTM
+    # was specified.
+    if args.ref_rttm_scpf is not None:
+        args.ref_rttm_fns = load_script_file(args.ref_rttm_scpf)
+    if args.sys_rttm_scpf is not None:
+        args.sys_rttm_fns = load_script_file(args.ref_rttm_scpf)
+    if len(args.ref_rttm_fns) < 1:
+        error('No reference RTTMs specified.')
+        sys.exit(1)
+    if len(args.sys_rttm_fns) < 1:
+        error('No system RTTMs specified.')
+        sys.exit(1)
+
+    # Load speaker/reference speaker turns and UEM. If no UEM specified,
+    # determine it automatically.
+    info('Loading speaker turns from reference RTTMs...', file=sys.stderr)
+    ref_turns, ref_file_ids = load_rttms(args.ref_rttm_fns)
+    info('Loading speaker turns from system RTTMs...', file=sys.stderr)
+    sys_turns, sys_file_ids = load_rttms(args.sys_rttm_fns)
+    if args.uemf is not None:
+        info('Loading universal evaluation map...', file=sys.stderr)
+        uem = load_uem(args.uemf)
+    else:
+        warn('No universal evaluation map specified. Approximating from '
+             'reference and speaker turn extents...')
+        uem = gen_uem(ref_turns, sys_turns)
+
+    # Trim turns to UEM scoring regions and merge any that overlap.
+    info('Trimming reference speaker turns to UEM scoring regions...',
+         file=sys.stderr)
+    ref_turns = trim_turns(ref_turns, uem)
+    info('Trimming system speaker turns to UEM scoring regions...',
+         file=sys.stderr)
+    sys_turns = trim_turns(sys_turns, uem)
+    info('Checking for overlapping reference speaker turns...',
+         file=sys.stderr)
+    ref_turns = merge_turns(ref_turns)
+    info('Checking for overlapping system speaker turns...',
+         file=sys.stderr)
+    sys_turns = merge_turns(sys_turns)
+
+    # Score.
+    check_for_empty_files(ref_turns, sys_turns, uem)
+    file_to_scores, global_scores = score(
+        ref_turns, sys_turns, uem, args.collar, args.ignore_overlaps,
+        args.step)
+    print_table(file_to_scores, global_scores, args.n_digits, args.table_format)
diff --git a/scorelib/__init__.py b/scorelib/__init__.py
new file mode 100644
index 0000000..4ecf640
--- /dev/null
+++ b/scorelib/__init__.py
@@ -0,0 +1,2 @@
+"""Diarization system scoring."""
+__version__ = '1.0.0'
diff --git a/scorelib/argparse.py b/scorelib/argparse.py
new file mode 100644
index 0000000..909c371
--- /dev/null
+++ b/scorelib/argparse.py
@@ -0,0 +1,17 @@
+"""Custom argument parser and action classes."""
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+import argparse
+import sys
+
+
+__all__ = ['ArgumentParser']
+
+
+class ArgumentParser(argparse.ArgumentParser):
+    """Sub-class of ``ArgumentParser`` that write errors to STDERR."""
+    def error(self, message):
+        sys.stderr.write('error: %s\n' % message)
+        self.print_help()
+        sys.exit(2)
diff --git a/scorelib/md-eval-22.pl b/scorelib/md-eval-22.pl
new file mode 100755
index 0000000..27b7bc9
--- /dev/null
+++ b/scorelib/md-eval-22.pl
@@ -0,0 +1,2906 @@
+#!/usr/bin/perl -w
+use strict;
+
+my $version = "22";
+
+#################################
+# History:
+#
+# version 22:  * JGF: added an option '-m FILE' to hold a CSV speaker map file.
+#
+# version 21:  * JGF: added a flag '-n' to not remove the directory paths from the source
+#                files in the UEM file.
+#
+# version 20:  * change metadata discard rule:  rather than discard if the midpoint
+#                (or endpoint) of the metadata object lies in a no-eval zone, discard
+#                if there is ANY overlap whatsoever between the metadata object and
+#                a no-eval zone.  This holds for system output objects only if the
+#                system output metadata object is not mapped to a ref object.
+#              * optimize IP and SU mapping by giving a secondary bonus mapping score
+#                to candidate ref-sys MD map pairs if the end-words of both coincide.
+#
+# version 19:  * bug fix in subroutine speakers_match
+#              * bug fix in tag_ref_words_with_metadata_info
+#
+# version 18:  * cosmetic fix to error message in eval_condition
+#              * added conditional output options for word coverage performance
+#              * added secondary MD word coverage optimization to word alignment
+#              * further optimize word alignment by considering MD subtypes
+#              * further optimize MD alignment by considering MD subtypes
+#              * add a new SU discard rule:  discard if TEND in no-eval zone
+#              * enforce legal values for su_extent_limit
+#
+# version 17:  create_speaker_segs modified to accommodate the same speaker
+#              having multiple overlapping speaker segments.  (This is an
+#              error and pathological condition, but the system must either
+#              disallow (abort on) the condition, or perform properly under
+#              the pathological condition.  The second option is chosen.)
+#
+# version 16:  * If neither -w nor -W is specified, suppress warnings about
+#                ref SPEAKER records subsuming no lexemes.
+#              * Output the overall speaker diarization stats after the
+#                stats for the individual files
+#              * Do not alter the case of alphabetic characters in the filename
+#                field from the ref rttm file
+#              * Made the format of the overall speaker error line more similar to
+#                the corresponding line of output from SpkrSegEval, to facilitate
+#                use of existing "grep" commands in existing scripts.
+#
+# version 15:  * bug fix in create_speaker_segs to accommodate
+#                contiguous same-speaker segments
+#              * added conditional file/channel scoring to
+#                speaker diarization evaluation
+#
+# version 14:  bug fix in md_score
+#
+# version 13:  add DISCOURSE_RESPONSE as a FILLER subtype
+#
+# version 12:  make REF LEXEMES optional if they aren't required
+#
+# version 11:  change default for noscore MD regions
+#
+# version 10:  bug fix
+#
+# version 09:
+#    * avoid crash when metadata discard yields no metadata
+#    * make evaluated ref_wds sensitive to metadata type
+#    * defer discarding of system output metadata until after
+#      metadata mapping, then discard only unmapped events.
+#    * extend 1-speaker scoring inhibition to metadata
+#    * eliminate demand for SPKR-INFO subtype for speakers
+#    * correct ref count of IP and SU exact boundary words
+#    * add official RT-04F scores
+#    * add conditional analyses for file/chnl/spkr/gender
+#
+# version 08:
+#    * bug fixes speaker diarization scoring
+#      - count of EVAL_WORDS corrected
+#      - no-score extended to nearest SPEAKER boundary
+#
+# version 07:
+#    * warning issued when discarding metadata events
+#      that cover LEXEMEs in the evaluation region
+#
+# version 06:
+#    * eliminated unused speakers from speaker scoring
+#    * changed discard algorithm for unannotated SU's and
+#      complex EDIT's to discard sys SU's and EDIT's when
+#      their midpoints overlap (rather than ANY overlap).
+#    * fixed display_metadata_mapping
+#
+# version 05:
+#    * upgraded display_metadata_mapping
+#
+# version 04:
+#    * diagnostic metadata mapping output added
+#    * uem_from_rttm bug fix
+#
+# version 03:
+#    * adjusted times used for speaker diarization
+#    * changed usage of max_extend to agree with cookbook
+#
+# version 02: speaker diarization evaluation added
+#
+# version 01: a merged version of df-eval-v14 and su-eval-v16
+#
+#################################
+
+#global data
+my $epsilon = 1E-8;
+my $miss_name = "  MISS";
+my $fa_name = "  FALSE ALARM";
+my %rttm_datatypes = (SEGMENT        => {eval => 1, "<na>" => 1},
+		      NOSCORE        => {"<na>" => 1},
+		      NO_RT_METADATA => {"<na>" => 1},
+		      LEXEME         => {lex => 1, fp => 1, frag => 1, "un-lex" => 1,
+					 "for-lex" => 1, alpha => 1, acronym => 1,
+					 interjection => 1, propernoun => 1, other => 1},
+		      "NON-LEX"      => {laugh => 1, breath => 1, lipsmack => 1,
+					 cough => 1, sneeze => 1, other => 1},
+		      "NON-SPEECH"   => {noise => 1, music => 1, other => 1},
+		      FILLER         => {filled_pause => 1, discourse_marker => 1,
+					 discourse_response => 1, explicit_editing_term => 1,
+					 other => 1},
+		      EDIT           => {repetition => 1, restart => 1, revision => 1,
+					 simple => 1, complex => 1, other => 1},
+		      IP             => {edit => 1, filler => 1, "edit&filler" => 1,
+					 other => 1},
+		      SU             => {statement => 1, backchannel => 1, question => 1,
+					 incomplete => 1, unannotated => 1, other => 1},
+		      CB             => {coordinating => 1, clausal => 1, other => 1},
+		      "A/P"          => {"<na>" => 1},
+		      SPEAKER        => {"<na>" => 1},
+		      "SPKR-INFO"    => {adult_male => 1, adult_female => 1, child => 1, unknown => 1});
+my %md_subtypes = (FILLER => $rttm_datatypes{FILLER},
+		   EDIT   => $rttm_datatypes{EDIT},
+		   IP     => $rttm_datatypes{IP},
+		   SU     => $rttm_datatypes{SU});
+my %spkr_subtypes = (adult_male => 1, adult_female => 1, child => 1, unknown => 1);
+
+my $noeval_mds = {
+    DEFAULT => {
+	NOSCORE        => {"<na>" => 1},
+	NO_RT_METADATA => {"<na>" => 1},
+    },
+};
+my $noscore_mds = {
+    DEFAULT => {
+	NOSCORE        => {"<na>" => 1},
+	LEXEME         => {"un-lex" => 1},
+	SU             => {unannotated => 1},
+    },
+    MIN => {
+	NOSCORE        => {"<na>" => 1},
+	SU             => {unannotated => 1},
+    },
+    FRAG_UNLEX => {
+	NOSCORE        => {"<na>" => 1},
+	LEXEME         => {frag => 1, "un-lex" => 1},
+	SU             => {unannotated => 1},
+    },
+    FRAG => {
+	NOSCORE        => {"<na>" => 1},
+	LEXEME         => {frag => 1},
+	SU             => {unannotated => 1},
+    },
+    NONE => {
+    },
+};
+my $noeval_sds = {
+    DEFAULT => {
+	NOSCORE        => {"<na>" => 1},
+    },
+};
+my $noscore_sds = {
+    DEFAULT => {
+	NOSCORE        => {"<na>" => 1},
+	"NON-LEX"      => {laugh => 1, breath => 1, lipsmack => 1,
+			   cough => 1, sneeze => 1, other => 1},
+    },
+};
+
+my %speaker_map;
+
+my $default_extend = 0.50; #the maximum time (in seconds) to extend a no-score zone
+my $default_collar = 0.00; #the no-score collar (in +/- seconds) to attach to SPEAKER boundaries
+my $default_tgap = 1.00; #the max gap (in seconds) between matching ref/sys words
+my $default_Tgap = 1.00; #the max gap (in seconds) between matching ref/sys metadata events
+my $default_Wgap = 0.10; #the max gap (in words) between matching ref/sys metadata events
+my $default_su_time_limit = 0.50; #the max extent (in seconds) to match for SU's
+my $default_su_word_limit = 2.00; #the max extent (in words) to match for SU's
+my $default_word_delta_score = 10.0; #the max delta score for word-based DP alignment of ref/sys words
+my $default_time_delta_score = 1.00; #the max delta score for time-based DP alignment of ref/sys words
+
+my $usage = "\n\nUsage: $0 [-h] -r <ref_file> -s <src_file>\n\n".
+    "Description:  md-eval evaluates EARS metadata detection performance\n".
+    "      by comparing system metadata output data with reference data\n".
+    "INPUT:\n".
+    "  -R <ref-list>  A file containing a list of the reference metadata files\n".
+    "       being evaluated, in RTTM format.  If the word-mediated alignment\n".
+    "       option is used then this data must include reference STT data\n".
+    "       in addition to the metadata being evaluated.\n".
+    "  OR\n".
+    "  -r <ref-file>  A file containing reference metadata, in RTTM format\n\n".
+    "  -S <sys-list>  A file containing a list of the system output metadata\n".
+    "       files to be evaluated, in RTTM format.  If the word-mediated\n".
+    "       alignment option is used then this data must include system STT\n".
+    "       output data in addition to the metadata to be evaluated.\n".
+    "  OR\n".
+    "  -s <sys-file>  A file containin system output metadata, in RTTM format\n\n".
+    "  input options:\n".
+    "    -x to include complex edits in the analysis and scoring.\n".
+    "    -w for word-mediated alignment.\n".
+    "       * The default (time-mediated) alignment aligns ref and sys metadata\n".
+    "         according to the time overlap of the original ref and sys metadata\n".
+    "         time intervals.\n".
+    "       * Word-mediated alignment aligns ref and sys metadata according to\n".
+    "         the alignment of the words that are subsumed within the metadata\n".
+    "         time intervals.\n".
+    "    -W for word-optimized mapping.\n".
+    "       * The default (time-optimized) mapping maps ref and sys metadata\n".
+    "         so as to maximize the time overlap of mapped metadata events.\n".
+    "       * Word-optimized mapping maps ref and sys metadata so as to\n".
+    "         maximize the overlap in terms of the number of reference words\n".
+    "         that are subsumed within the overlapping time interval.\n".
+    "    -a <cfgs> Conditional analysis options for metadata detection performance:\n".
+    "         c for performance versus channel,\n".
+    "         f for performance versus file,\n".
+    "         g for performance versus gender, and\n".
+    "         s for performance versus speaker.\n".
+    "    -A <cf> Conditional analysis options for word coverage performance:\n".
+    "         c for performance versus channel,\n".
+    "         f for performance versus file,\n".
+    "    -t <time gap> The maximum time gap allowed between matching reference\n".
+    "       and system output words (in seconds).  Default value is $default_tgap.\n".
+    "    -T <time gap> The maximum time gap allowed between matching reference\n".
+    "       and system output metadata (in seconds).  Default value is $default_Tgap.\n".
+    "    -l <SU extent limit>  The maximum SU extent used to compute overlap\n".
+    "       between reference and system output SU's.  For time-optimized SU\n".
+    "       mapping this is the maximum time extent.  For word-optimized SU\n".
+    "       mapping (using the -W option) this is the maximum number of words.\n".
+    "       SU extent is limited to the last part of the SU. Default value is\n".
+    "       $default_su_time_limit for time-optimized mapping, $default_su_word_limit for word-optimized mapping.\n".
+    "    -u <uem-file> A file containing the evaluation partitions,\n".
+    "       in UEM format.\n".
+    "    -g <glm-file> A file containing word transformations used to\n".
+    "       standardize the representation of words.\n".
+    "    -o to include overlapping speech in MD evaluation.  With this option,\n".
+    "       separate recognition passes are made for each reference speaker.\n".
+    "    -c <collar> is the no-score zone around reference speaker segment\n".
+    "       boundaries.  (Speaker Diarization output is not evaluated within\n".
+    "       +/- collar seconds of a reference speaker segment boundary.)\n".
+    "       Default value is $default_collar seconds.\n".
+    "    -1 to limit scoring to those time regions in which only a single\n".
+    "       speaker is speaking\n".
+    "    -y <name> to select named no-eval conditions for metadata\n".
+    "    -Y <name> to select named no-score conditions for metadata\n".
+    "    -z <name> to select named no-eval conditions for speaker diarization\n".
+    "    -Z <name> to select named no-score conditions for speaker diarization\n".
+    "    -e to examine metadata mapping\n".
+    "    -d to print word alignment and error calculation details\n".
+    "    -D to print metadata event alignment and error calculation details\n".
+    "    -m to print speaker mapping details for speaker diarization\n".
+    "    -M FILE to print speaker mapping details for speaker diarization to a CSV file called 'FILE'\n".
+    "    -v to print the event sequence for each diarization source file\n".
+    "    -n to keep the directory names of the UEM source file entries\n".
+    "OUTPUT:\n".
+    "  Performance statistics are written to STDOUT.\n".
+    "\n";
+
+######
+# Intro
+my ($date, $time) = date_time_stamp();
+print "command line (run on $date at $time) Version: $version  ", $0, " ", join(" ", @ARGV), "\n";
+
+use vars qw ($opt_h $opt_w $opt_W $opt_d $opt_D $opt_R $opt_r $opt_S $opt_s $opt_l $opt_c $opt_x);
+use vars qw ($opt_t $opt_T $opt_g $opt_p $opt_P $opt_o $opt_a $opt_A $opt_u $opt_1 $opt_m $opt_v $opt_e);
+use vars qw ($opt_y $opt_Y $opt_z $opt_Z $opt_n $opt_M);
+$opt_y = $opt_Y = $opt_z = $opt_Z = "DEFAULT";
+use Getopt::Std;
+getopts ('nhdDwWox1mvec:R:r:S:s:t:T:g:p:P:a:A:u:l:y:Y:z:Z:M:');
+not defined $opt_h or die
+    "\n$usage";
+defined $opt_r or defined $opt_R or die
+    "\nCOMMAND LINE ERROR:  no reference data specified$usage";
+not defined $opt_r or not defined $opt_R or die
+    "\nCOMMAND LINE ERROR:  both reference file list and reference file specified$usage";
+defined $opt_s or defined $opt_S or die
+    "\nCOMMAND LINE ERROR:  no system output data specified$usage";
+not defined $opt_s or not defined $opt_S or die
+    "\nCOMMAND LINE ERROR:  both system output file list and system output file specified$usage";
+my $word_gap = defined $opt_t ? $opt_t : $default_tgap;
+my $md_gap = $opt_W ? $default_Wgap : (defined $opt_T ? $opt_T : $default_Tgap);
+my $su_extent_limit = defined $opt_l ? $opt_l :
+    ($opt_W ? $default_su_word_limit : $default_su_time_limit);
+$opt_W ? ($su_extent_limit >= 1 or die "\nCOMMAND LINE ERROR:  SU extent limit must be at least 1 for word-based MD alignment$usage") :
+         ($su_extent_limit > 0 or die "\nCOMMAND LINE ERROR:  SU extent limit must be positive for time-based MD alignment$usage");
+my $max_wd_delta_score = $opt_w ? $default_word_delta_score : $default_time_delta_score;
+$max_wd_delta_score = $opt_p if defined $opt_p;
+my $max_md_delta_score = $opt_W ? $default_word_delta_score : $default_time_delta_score;
+$max_md_delta_score = $opt_P if defined $opt_P;
+my $collar = defined($opt_c) ? $opt_c : $default_collar;
+$collar >= 0 or die
+    "\nCOMMAND LINE ERROR:  Speaker Diarization scoring collar ('$collar') must be non-negative$usage";
+my $max_extend = $default_extend;
+$opt_a = "" unless defined $opt_a;
+$opt_A = "" unless defined $opt_A;
+start_speaker_map_file($opt_M) if $opt_M;
+
+my $noeval_md = eval_condition ($opt_y, $noeval_mds, "no-eval", "metadata");
+my $noscore_md = eval_condition ($opt_Y, $noscore_mds, "no-score", "metadata");
+my $noeval_sd = eval_condition ($opt_z, $noeval_sds, "no-score", "speaker diarization");
+my $noscore_sd = eval_condition ($opt_Z, $noscore_sds, "no-score", "speaker diarization");
+
+my %type_order = (NOSCORE        => 0,
+		  NO_RT_METADATA => 1,
+		  SEGMENT        => 2,
+		  SPEAKER        => 3,
+		  SU             => 4,
+		  "A/P"          => 5,
+		  "NON-SPEECH"   => 6,
+		  EDIT           => 7,
+		  FILLER         => 8,
+		  IP             => 9,
+		  CB             => 10,
+		  "NON-LEX"      => 11,
+		  LEXEME         => 12);
+my %event_order = (END => 0,
+		   MID => 1,
+		   BEG => 2);
+my %source_order = (REF => 0,
+		    SYS => 1);
+
+{
+    my (%ref, %sys, $glm, $uem);
+
+    print_parameters ();
+    ($glm) = get_glm_data ($opt_g);
+    get_rttm_file (\%ref, $opt_r, $glm);
+    get_rttm_data (\%ref, $opt_R, $glm);
+    get_rttm_file (\%sys, $opt_s, $glm);
+    get_rttm_data (\%sys, $opt_S, $glm);
+
+    ($uem) = get_uem_data ($opt_u, $opt_n);
+    evaluate (\%ref, \%sys, $uem);
+}
+
+exit 0;
+
+#################################
+
+sub eval_condition {
+
+    my ($name, $conditions, $exclusion, $evaluation) = @_;
+
+    $name = "DEFAULT" unless $name;
+    return $conditions->{$name} if defined $conditions->{$name};
+    print STDERR "\nCOMMAND LINE ERROR:  unknown name ($name) of $exclusion conditions for $evaluation\n".
+	"    available $exclusion conditions for $evaluation are:\n\n";
+    foreach $name (sort keys %$conditions) {
+	printf STDERR "%-24stype    subtype\n", "    for \"$name\":";
+	foreach my $type (sort keys %{$conditions->{$name}}) {
+	    foreach my $subt (sort keys %{$conditions->{$name}{$type}}) {
+		printf STDERR "%28s    %s\n", $type, $subt;
+	    }
+	}
+	print "\n";
+    }
+    die "$usage";
+}    
+
+#################################
+
+sub print_parameters {
+
+    print $opt_w ? "\nWord-based metadata alignment, max gap between matching words = $word_gap sec\n" :
+	"\nTime-based metadata alignment\n";
+    print "\nMetadata evaluation parameters:\n";
+    $opt_W ? (print "    word-optimized metadata mapping\n".
+	            "        max gap between matching metadata events = $md_gap words\n".
+	            "        max extent to match for SU's = $su_extent_limit words\n") :
+	     (print "    time-optimized metadata mapping\n".
+		    "        max gap between matching metadata events = $md_gap sec\n".
+		    "        max extent to match for SU's = $su_extent_limit sec\n");
+    print "\nSpeaker Diarization evaluation parameters:\n".
+	  "    The max time to extend no-score zones for NON-LEX exclusions is $max_extend sec\n".
+	  "    The no-score collar at SPEAKER boundaries is $collar sec\n";
+    printf "\nExclusion zones for evaluation and scoring are:\n".
+	   "                             -----MetaData-----        -----SpkrData-----\n".
+	   "     exclusion set name:%12s%11s%15s%11s\n".
+	   "     token type/subtype      no-eval   no-score        no-eval   no-score\n",
+	       $opt_y, $opt_Y, $opt_z, $opt_Z;
+    print  "             (UEM)              X                         X\n";
+    foreach my $type (sort keys %rttm_datatypes) {
+	foreach my $subt (sort keys %{$rttm_datatypes{$type}}) {
+	    next unless ($noeval_md->{$type}{$subt} or
+			 $noscore_md->{$type}{$subt} or
+			 $noeval_sd->{$type}{$subt} or
+			 $noscore_sd->{$type}{$subt});
+	    printf "%15s/%-14s", $type, $subt;
+	    printf "%3s", $noeval_md->{$type}{$subt} ? "X" : "";
+	    printf "%10s", $noscore_md->{$type}{$subt} ? "X" : "";
+	    printf "%16s", $noeval_sd->{$type}{$subt} ? "X" : "";
+	    printf "%10s\n", $noscore_sd->{$type}{$subt} ? "X" : "";
+	}
+    }
+}
+
+#################################
+
+sub get_glm_data {
+
+    my ($file) = @_;
+    my ($record, @fields, $word, %words, %data);
+
+    return unless defined $file;
+    open DATA, $file or die
+	"\nCOMMAND LINE ERROR:  unable to open glm file '$file'$usage";
+    while ($record = <DATA>) {
+	next if $record =~ /^\s*$/;
+	next if $record =~ /^\s*(\[|\*|\%|\;)/;
+        @fields = split /\s+=>\s+/, lc $record;
+	shift @fields if $fields[0] eq "";
+        next unless @fields > 1;
+	$fields[0] =~ s/^\s+//;
+        $fields[1] =~ s/[^a-z-'_ \.].*//;
+        next if $fields[0] =~ /^\s*$/ or $fields[1] =~ /^\s*$/;
+        $data{$fields[0]} = [split /\s+/, $fields[1]];
+    }
+    close DATA;
+    return {%data};
+}
+
+#################################
+
+sub get_uem_data {
+
+    my ($file, $keepDirectoryPath) = @_;
+    my ($record, @fields, $seg, $chnl, %data);
+
+    return unless defined $file;
+    open DATA, $file or die
+	"\nCOMMAND LINE ERROR:  unable to open uem file '$file'$usage";
+    while ($record = <DATA>) {
+	next if $record =~ /^\s*[\#;]|^\s*$/;
+	@fields = split /\s+/, $record;
+	shift @fields if $fields[0] eq "";
+	@fields >= 4 or die
+	    ("\n\nFATAL ERROR:  insufficient number of fields in UEM record\n".
+	     "    record is: '$record'\n\n");
+	undef $seg;
+	$seg->{FILE} = shift @fields;
+	$seg->{CHNL} = lc shift @fields;
+	$seg->{TBEG} = lc shift @fields;
+	$seg->{TEND} = lc shift @fields;
+        $seg->{FILE} =~ s/.*\/// if (! $keepDirectoryPath);      #strip directory 
+        $seg->{FILE} =~ s/\.[^.]*//;   #strip file type
+        $seg->{TBEG} =~ s/[^0-9\.]//g; #strip non-numeric (commas)
+        $seg->{TEND} =~ s/[^0-9\.]//g; #strip non-numeric (commas)
+	push @{$data{$seg->{FILE}}{$seg->{CHNL}}}, $seg;
+    }
+    close DATA;
+
+#sort and check data
+    foreach $file (keys %data) {
+	foreach $chnl (keys %{$data{$file}}) {
+	    @{$data{$file}{$chnl}} =
+		sort {$a->{TBEG} <=> $b->{TBEG}} @{$data{$file}{$chnl}};
+	    my $prev_seg;
+	    foreach $seg (@{$data{$file}{$chnl}}) {
+		$seg->{TEND} > $seg->{TBEG} or die
+		    "\n\nFATAL ERROR:  non-positive evaluation segment length in UEM data for file $file, channel $chnl\n\n";
+		not defined $prev_seg or $seg->{TBEG} >= $prev_seg->{TEND} or die
+		    ("\n\nFATAL ERROR:  UEM file has overlapping evaluation segments\n".
+		     "    file $file, channel $chnl:  ($prev_seg->{TBEG},$prev_seg->{TEND}),".
+		     " ($seg->{TBEG},$seg->{TEND})\n\n");
+		$prev_seg = $seg;
+	    }
+	}
+    }
+    return {%data};
+}
+
+#################################
+
+sub get_rttm_data {
+
+    my ($data, $list, $glm) = @_;
+
+    return unless defined $list;
+    open LIST, $list or die
+	"\nCOMMAND LINE ERROR:  unable to open file list '$list'$usage";
+    while (my $file = <LIST>) {
+	get_rttm_file ($data, $file, $glm);
+    }
+    close LIST;
+}    
+
+#################################
+
+sub get_rttm_file {
+
+    my ($data, $rttm_file, $glm) = @_;
+    my ($record, @fields, $data_type, $file, $chnl, $word, @words, $token);
+
+    return unless defined $rttm_file;
+    open DATA, $rttm_file or die
+	"\nCOMMAND LINE ERROR:  unable to open RTTM file '$rttm_file'$usage";
+    while ($record = <DATA>) {
+	next if $record =~ /^\s*[\#;]|^\s*$/;
+	@fields = split /\s+/, $record;
+	shift @fields if $fields[0] eq "";
+	@fields >= 9 or die
+	    ("\n\nFATAL ERROR:  insufficient number of fields in RTTM file '$rttm_file'\n".
+	     "    input RTTM record is: '$record'\n\n");
+	$data_type = uc shift @fields;
+	undef $token;
+	$token->{TYPE} = $data_type;
+	$token->{FILE} = $file = shift @fields;
+	$token->{CHNL} = $chnl = lc shift @fields;
+	$token->{TBEG} = lc shift @fields;
+	$token->{TBEG} =~ s/\*//;
+	$token->{TDUR} = lc shift @fields;
+	$token->{TDUR} =~ s/\*//;
+	$token->{TDUR} = 0 if $token->{TDUR} eq "<na>";
+	$token->{TDUR} >= 0 or die
+	    ("\n\nFATAL ERROR -- negative metadata duration in file $file,'\n".
+	     "    input RTTM record is: '$record'\n\n");
+	$token->{WORD} = lc shift @fields;
+	$token->{SUBT} = lc shift @fields;
+	$rttm_datatypes{$token->{TYPE}}{$token->{SUBT}} or die
+	    ("\n\nFATAL ERROR:  unknown RTTM data type/subtype ('$token->{TYPE}'/'$token->{SUBT}') in file $rttm_file\n".
+	     "    input RTTM record is: '$record'\n\n");
+	$token->{SPKR} = shift @fields;
+	$token->{CONF} = lc shift @fields;
+	$token->{CONF} = "-"       unless defined $token->{CONF};
+	$token->{SPKR} = "<na>" unless defined $token->{SPKR};
+	if ($data_type eq "SPKR-INFO") {
+	    not defined $data->{$file}{$chnl}{$data_type}{$token->{SPKR}} or die
+		("\n\nFATAL ERROR:  multiple $data_type records for speaker $token->{SPKR} in file $file\n".
+		 "    input RTTM record is: '$record'\n\n");
+	    defined $spkr_subtypes{$token->{SUBT}} or die
+		("\n\nFATAL ERROR:  unknown $data_type subtype ($token->{SUBT}) in file '$file'\n".
+		 "    input RTTM record is: '$record'\n\n");
+	    $data->{$file}{$chnl}{$data_type}{$token->{SPKR}}{GENDER} = $token->{SUBT};
+	}
+	else {
+	    $token->{TEND} = $token->{TBEG}+$token->{TDUR};
+	    $token->{TMID} = $token->{TBEG}+$token->{TDUR}/2;
+	}
+
+	if ($data_type eq "LEXEME") {
+	    $token->{WTYP} = ($token->{SUBT} =~ /^fp$/         ? "fp"      :
+			      ($token->{SUBT} =~ /^frag$/      ? "frag"    :
+			       ($token->{SUBT} =~ /^un-lex$/   ? "un-lex"  :
+				($token->{SUBT} =~ /^for-lex$/ ? "for-lex" : "lex"))));
+	    @words = standardize_word ($token, $glm);
+	    foreach $word (@words) {
+		push @{$data->{$file}{$chnl}{LEXEME}}, $word;
+		push @{$data->{$file}{$chnl}{RTTM}}, $word;
+	    }
+	}
+	elsif ($data_type eq "SPEAKER") {
+	    push @{$data->{$file}{$chnl}{SPEAKER}{$token->{SPKR}}}, $token;
+	    push @{$data->{$file}{$chnl}{RTTM}}, $token;
+	}
+	elsif ($md_subtypes{$token->{TYPE}}) {
+	    defined $md_subtypes{$token->{TYPE}}{$token->{SUBT}} or die
+		("\n\nFATAL ERROR:  unknown $data_type subtype ($token->{SUBT}) in file '$file'\n".
+		 "    input RTTM record is: '$record'\n\n");
+	    push @{$data->{$file}{$chnl}{$data_type}}, $token;
+	    push @{$data->{$file}{$chnl}{RTTM}}, $token;
+	}
+	elsif ($data_type ne "SPKR-INFO") {
+	    push @{$data->{$file}{$chnl}{RTTM}}, $token;
+	}
+    }
+    close DATA;
+
+#sort and check data
+    foreach $file (keys %$data) {
+	foreach $chnl (keys %{$data->{$file}}) {
+	    foreach $data_type (keys %{$data->{$file}{$chnl}}) {
+		next if $data_type eq "SPKR-INFO";
+		if ($data_type eq "SPEAKER") {
+		    foreach my $spkr (keys %{$data->{$file}{$chnl}{$data_type}}) {
+			my $gender = $data->{$file}{$chnl}{"SPKR-INFO"}{$spkr}{GENDER};
+			$gender = $data->{$file}{$chnl}{"SPKR-INFO"}{$spkr}{GENDER} = "unknown" if not $gender;
+			@{$data->{$file}{$chnl}{$data_type}{$spkr}} =
+			    sort {$a->{TMID}<=>$b->{TMID}} @{$data->{$file}{$chnl}{$data_type}{$spkr}};
+			my $prev_token;
+			foreach $token (@{$data->{$file}{$chnl}{$data_type}{$spkr}}) {
+			    $token->{SUBT} = $gender;
+			    next unless $prev_token;
+			    not $prev_token or $token->{TBEG} >= $prev_token->{TEND}-$epsilon or die
+				("\n\nFATAL ERROR:  RTTM file has overlapping $data_type tokens for speaker $spkr\n".
+				 "    in file $file, channel $chnl:  ($prev_token->{TBEG},$prev_token->{TEND}),".
+				 " ($token->{TBEG},$token->{TEND})\n\n");
+			    $prev_token = $token;
+			}
+		    }
+		}
+		else {
+		    @{$data->{$file}{$chnl}{$data_type}} =
+			sort {$a->{TMID} <=> $b->{TMID}} @{$data->{$file}{$chnl}{$data_type}};
+		}
+	    }
+	}
+    }
+}
+
+#################################
+
+sub evaluate {
+
+    my ($ref_data, $sys_data, $uem_data) = @_;
+    my ($uem, $uem_sd_eval, $uem_sd_score, $uem_md_eval, $uem_md_score);
+    my ($ref_wds, $sys_wds, $ref_mds, $sys_mds, $type, %scores, $ref_rttm, $sys_rttm);
+
+    foreach my $file (sort keys %$ref_data) {
+	foreach my $chnl (sort keys %{$ref_data->{$file}}) {
+	    $ref_rttm = $ref_data->{$file}{$chnl}{RTTM};
+	    $sys_rttm = $sys_data->{$file}{$chnl}{RTTM};
+	    $ref_wds = $ref_data->{$file}{$chnl}{LEXEME} ? $ref_data->{$file}{$chnl}{LEXEME} : [];
+	    $sys_wds = $sys_data->{$file}{$chnl}{LEXEME} ? $sys_data->{$file}{$chnl}{LEXEME} : [];
+	    $uem = $uem_data->{$file}{$chnl};
+	    $uem = uem_from_rttm ($ref_rttm) if not defined $uem;
+	    @$ref_wds > 0 or not $opt_w or die
+		"\n\nFATAL ERROR:  no reference words for file '$file' and channel '$chnl'\n\n";
+	    @$sys_wds > 0 or not $opt_w or die
+		"\n\nFATAL ERROR:  no system output words for file '$file' and channel '$chnl'\n".
+		    "              Words are required for word-mediated alignment\n\n";
+	    if ($ref_wds and ($opt_w or $opt_e)) {
+	        tag_words_with_metadata_attributes ($ref_rttm, $ref_wds);
+	        tag_words_with_metadata_attributes ($sys_rttm, $sys_wds);
+	        perform_word_alignment ($file, $chnl, $ref_wds, $sys_wds, $uem);
+	    }
+	    $uem_md_eval = add_exclusion_zones_to_uem ($noeval_md, $uem, $ref_rttm);
+	    $uem_md_score = add_exclusion_zones_to_uem ($noscore_md, $uem_md_eval, $ref_rttm);
+	    $uem_md_score = exclude_overlapping_speech_from_uem ($uem_md_score, $ref_rttm) if $opt_1;
+ 	    tag_scoreable_words ($ref_wds, $uem_md_score);
+	    foreach $type (sort keys %md_subtypes) {
+		$ref_mds = $ref_data->{$file}{$chnl}{$type};
+		next unless defined $ref_mds;
+		@$ref_wds > 0 or die
+		    "\n\nFATAL ERROR:  no reference words for file '$file' and channel '$chnl'\n\n";
+		$sys_mds = $sys_data->{$file}{$chnl}{$type};
+		$sys_mds = $sys_data->{$file}{$chnl}{$type} = [] unless defined $sys_mds;
+		map_metadata_to_words ($sys_mds, $sys_wds, $ref_mds, $ref_wds);
+		discard_unevaluated_metadata ($uem_md_eval, $type, $ref_mds, $ref_wds, "REF");
+		next if @$ref_mds == 0;
+		align_data ($ref_mds, $sys_mds, "", \&md_score, $max_md_delta_score);
+		trace_best_path ($ref_mds, $sys_mds);
+		discard_metadata_subtype ("EDIT", "complex", $ref_mds, $sys_mds) if $type eq "EDIT" and $opt_x;
+		discard_metadata_subtype ("SU", "unannotated", $ref_mds, $sys_mds) if $type eq "SU";
+		discard_unevaluated_metadata ($uem_md_eval, $type, $sys_mds, $ref_wds, "SYS");
+		($scores{$type}{$file}{$chnl}) = score_metadata_path
+		    ($type, $file, $chnl, $ref_mds, $sys_mds, $ref_wds);
+	    }
+
+	    $ref_mds = $ref_data->{$file}{$chnl}{SPEAKER};
+	    if (defined $ref_mds) {
+		@$ref_wds > 0 or not $opt_W or die
+		    "\n\nFATAL ERROR:  no reference words for file '$file' and channel '$chnl'\n\n";
+		$uem_sd_eval = add_exclusion_zones_to_uem ($noeval_sd, $uem, $ref_rttm);
+		$sys_mds = $sys_data->{$file}{$chnl}{SPEAKER};
+		$sys_mds = $sys_data->{$file}{$chnl}{SPEAKER} = {} unless defined $sys_mds;
+		map_spkrdata_to_words ($sys_mds, $sys_wds, $ref_mds, $ref_wds);
+		($scores{SPEAKER}{$file}{$chnl}) = score_speaker_diarization
+		    ($file, $chnl, $ref_mds, $sys_mds, $ref_wds, $uem_sd_eval, $ref_rttm);
+	    }
+
+	    if ($opt_e) {
+		discard_unevaluated_metadata ($uem, "LEXEME", $ref_rttm);
+		discard_unevaluated_metadata ($uem, "LEXEME", $sys_rttm);
+		discard_unevaluated_metadata ($uem_md_eval, "", $ref_rttm);
+		discard_metadata_subtype ("EDIT", "complex", $ref_rttm, $sys_rttm) if $opt_x;
+		discard_metadata_subtype ("SU", "unannotated", $ref_rttm, $sys_rttm);
+		discard_unevaluated_metadata ($uem_md_eval, "", $sys_rttm);
+		display_metadata_mapping ($file, $chnl, $ref_rttm, $sys_rttm, $ref_wds);
+	    }
+	}
+    }
+
+    foreach $type (sort keys %md_subtypes) {
+	md_performance_analysis ($type, $scores{$type}, $md_subtypes{$type}, $ref_data)
+	    if $scores{$type};
+    }
+    sd_performance_analysis ($scores{SPEAKER}, \%spkr_subtypes)
+	if $scores{SPEAKER};
+}
+
+#################################
+
+sub perform_word_alignment {
+
+    my ($file, $chnl, $ref_wds, $sys_wds, $uem) = @_;
+
+    my @ref_wds = @$ref_wds;
+    my @sys_wds = @$sys_wds;
+    discard_unevaluated_words ($uem, \@ref_wds);
+    discard_unevaluated_words ($uem, \@sys_wds);
+    @ref_wds > 0 or die
+	"\n\nFATAL ERROR:  no reference words in UEM portion of file '$file' and channel '$chnl'\n\n";
+    @sys_wds > 0 or not $opt_w or die
+	"\n\nFATAL ERROR:  no system output words in UEM portion of file '$file' and channel '$chnl'\n".
+	    "              Words are required for word-mediated alignment\n\n";
+    return unless @sys_wds > 0;
+
+    if ($opt_o) {
+	foreach my $spkr (word_kinds ($ref_wds, "SPKR")) {
+	    align_data ($ref_wds, $sys_wds, $spkr, \&word_score, $max_wd_delta_score);
+	    trace_best_path ($ref_wds, $sys_wds, $spkr);
+	}
+	decide_who_spoke_the_words ($ref_wds, $sys_wds);
+    }
+    else {
+	align_data ($ref_wds, $sys_wds, "", \&word_score, $max_wd_delta_score);
+	trace_best_path ($ref_wds, $sys_wds);
+    }
+
+#map system output word times to ref words
+    foreach my $wd (@$sys_wds) {
+	$wd->{RTBEG} = adjust_sys_time_to_ref ($wd->{TBEG}, $sys_wds);
+	$wd->{RTEND} = adjust_sys_time_to_ref ($wd->{TEND}, $sys_wds);
+	$wd->{RTDUR} = $wd->{RTEND} - $wd->{RTBEG};
+	$wd->{RTMID} = $wd->{RTBEG} + $wd->{RTDUR}/2;
+    }
+    score_word_path ($file, $chnl, $ref_wds, $sys_wds) if $opt_d;
+}
+
+################################
+
+sub time_in_eval_partition {
+
+    my ($time, $uem_eval) = @_;
+
+    return 1 unless defined $uem_eval; #not using UEM partition specification
+    foreach my $partition (@$uem_eval) {
+	return 1 if event_covers_time ($partition, $time);
+    }
+    return 0;
+}
+
+#################################
+
+sub discard_unevaluated_words {
+
+    my ($uem, $wds) = @_;
+
+    for (my $index=0; $index<@$wds; $index++) {
+	splice (@$wds, $index--, 1)
+	    if ($wds->[$index]{TYPE} eq "LEXEME" and
+		not time_in_eval_partition ($wds->[$index]{TMID}, $uem));
+    }
+}
+
+#################################
+
+sub discard_unevaluated_metadata {
+
+    my ($uem_eval, $type, $mds, $ref_wds, $src) = @_;
+
+    for (my $index=0; $index<@$mds; $index++) {
+	my $md = $mds->[$index];
+	next if (($type and $md->{TYPE} ne $type) or
+		 (not $type and not $md_subtypes{$md->{TYPE}}) or
+		 $md->{MAPPTR} or
+		 md_in_uem ($md, $uem_eval));
+	warn_if_discarded_md_covers_scored_lexemes ($md, $ref_wds, $uem_eval, $src) if $ref_wds;
+	splice (@$mds, $index--, 1);
+    }
+}
+
+#################################
+
+sub warn_if_discarded_md_covers_scored_lexemes {
+
+    my ($md, $ref_wds, $uem, $source) = @_;
+    my ($wbeg, $wend, $index);
+    
+    ($wbeg, $wend) = md_word_indices ($md, $ref_wds);
+
+    for ($index=$wbeg; $index<=$wend; $index++) {
+	next unless ($ref_wds->[$index]{SCOREABLE} and
+		     time_in_eval_partition ($ref_wds->[$index]{TMID}, $uem));
+	warn "\nWARNING:  A $source metadata event is being deleted that covers evaluated reference LEXEMEs\n".
+	    "    (type=$md->{TYPE}, subtype=$md->{SUBT}, spkr=$md->{SPKR}, TBEG=$md->{TBEG}, TEND=$md->{TEND})\n";
+	last;
+    }	
+}
+
+#################################
+
+sub discard_metadata_subtype {
+
+    my ($type, $subtype, $ref_mds, $sys_mds) = @_;
+    my ($iref, $isys, $ref_md, $sys_md);
+
+#discard all sys $type events that map to a ref event with subtype = $subtype
+#or that are unmapped and have midpoints that lie within a ref event with subtype = $subtype
+    for ($iref=0; $iref<@$ref_mds; $iref++) {
+	$ref_md = $ref_mds->[$iref];
+	next unless ($ref_md->{TYPE} eq $type and
+		     $ref_md->{SUBT} eq $subtype);
+	for ($isys=0; $isys<@$sys_mds; $isys++) {
+	    $sys_md = $sys_mds->[$isys];
+	    splice (@$sys_mds, $isys--, 1)
+		if ($sys_md->{TYPE} eq $type and
+		    (($sys_md->{MAPPTR} and $sys_md->{MAPPTR}{SUBT} eq $subtype) or
+		     (not $sys_md->{MAPPTR} and event_covers_time ($ref_md, $sys_md->{RTMID}))));
+	}
+
+#discard all ref $type/$subtype events
+	splice (@$ref_mds, $iref--, 1);
+    }
+}
+
+#################################
+
+sub tag_scoreable_words {
+
+    my ($wds, $uem_eval) = @_;
+
+    foreach my $wd (@$wds) {
+	$wd->{SCOREABLE} = time_in_eval_partition ($wd->{TMID}, $uem_eval);
+    }
+}
+
+#################################
+
+sub tag_words_with_metadata_attributes {
+
+    my ($mds, $wds) = @_;
+    my ($md, $iwbeg, $iwend, $iw, $wd, $type);
+
+    foreach $md (@$mds) {
+	$type = $md->{TYPE};
+	next unless $type =~ /^(FILLER|EDIT|SU|IP)$/;
+	($iwbeg, $iwend) = md_word_indices ($md, $wds);
+	if ($type =~ /^(FILLER|EDIT)$/) {
+	    for ($iw=$iwbeg; $iw<=$iwend; $iw++) {
+		$wds->[$iw]{ATTRIBUTES}{$md->{TYPE}} = $md->{SUBT};
+	    }
+	}
+	elsif ($type =~ /^(SU|IP)$/) {
+	    $wds->[$iwend]{ATTRIBUTES}{$md->{TYPE}} = $md->{SUBT};
+	}
+    }
+    return;
+}
+
+#################################
+
+sub tag_ref_words_with_metadata_info {
+
+    my ($mds, $wds, $src) = @_;
+    my ($md, $iwbeg, $iwend, $iw, $type);
+
+    foreach $md (@$mds) {
+	$type = $md->{TYPE};
+	($iwbeg, $iwend) = $src eq "REF" ?
+	    ($md->{WBEG}, $md->{WEND}) : ($md->{RWBEG}, $md->{RWEND}) ;
+	if ($type =~ /^(FILLER|EDIT)$/) {
+	    for ($iw=max($iwbeg,0); $iw<=min($iwend,@$wds-1); $iw++) {
+		$wds->[$iw]{"$src-$type"}{$md->{SUBT}}{MAP}++;
+	    }
+	}
+	elsif ($type =~ /^(SU|IP)$/) {
+	    $iwend = min(max($iwend,0),@$wds-1);
+	    $wds->[$iwend]{"$src-$type"}{$md->{SUBT}}{defined $md->{MAPPTR} ? "MAP" : "NOT"}++;
+	}
+    }
+    return;
+}
+
+#################################
+
+sub md_performance_analysis {
+
+    my ($metadata_type, $counts, $subtypes, $ref_data) = @_;
+    my ($file, $chnl, $spkr, $word, $type, $type_counts, $key);
+    my (@files, @chnls, @spkrs, @types, %nevent, %nwerr);
+    my ($subtype, $sys_subtype, %nconf, %offsets);
+
+#compute marginal counts
+    @files = keys %$counts;
+    foreach $file (@files) {
+	@chnls = keys %{$counts->{$file}};
+	foreach $chnl (@chnls) {
+	    $type_counts = $counts->{$file}{$chnl};
+	    foreach $type ("REF", "DEL", "INS", "SUB") {
+	        next unless defined $type_counts->{WORDS}{$type};
+		$nwerr{ALL}{$type} += $type_counts->{WORDS}{$type};
+		$nwerr{"c=$chnl f=$file"}{$type} += $type_counts->{WORDS}{$type} if $opt_A =~ /c/i and $opt_A =~ /f/i;
+		$nwerr{"c=$chnl"}{$type} += $type_counts->{WORDS}{$type} if $opt_A =~ /c/i and not $opt_A =~ /f/i;
+		$nwerr{"f=$file"}{$type} += $type_counts->{WORDS}{$type} if $opt_A =~ /f/i and not $opt_A =~ /c/i;
+	    }
+	    foreach $type ("WBEG", "WEND") {
+		foreach $key (keys %{$type_counts->{WORD_OFFSET}{$type}}) {
+		    $offsets{ALL}{$type}{$key} += $type_counts->{WORD_OFFSET}{$type}{$key};
+		}
+	    }
+	    my $spkr_info = $ref_data->{$file}{$chnl}{"SPKR-INFO"};
+	    $spkr_info->{unknown}{GENDER} = "unknown" unless defined $spkr_info->{unknown};
+	    foreach $type (keys %$type_counts) {
+		next unless $type =~ /^(REF|DEL|INS|SUB|CONFUSION)$/;
+		@spkrs = keys %{$type_counts->{$type}};
+		foreach $spkr (@spkrs) {
+		    my $gndr = $spkr_info->{$spkr}{GENDER};
+		    foreach $subtype (keys %$subtypes) {
+			my $count = $type_counts->{$type}{$spkr}{$subtype};
+			next unless $count;
+			if ($type eq "CONFUSION") {
+			    foreach $sys_subtype (keys%$count) {
+				$nconf{ALL}{$subtype}{$sys_subtype} += $count->{$sys_subtype};
+				$nconf{ALL}{$subtype}{$sys_subtype} = 0 if not $nconf{ALL}{$subtype}{$sys_subtype};
+				$nconf{ALL}{$sys_subtype}{$subtype} = 0 if not $nconf{ALL}{$sys_subtype}{$subtype};
+			    }
+			    next;
+			}
+			$nconf{ALL}{$subtype}{"{Miss}"} += $count if $type eq "DEL";
+			$nconf{ALL}{"{FA}"}{$subtype} += $count if $type eq "INS";
+			$nconf{ALL}{$subtype}{"{Miss}"} = 0 unless defined $nconf{ALL}{$subtype}{"{Miss}"};
+			$nconf{ALL}{"{FA}"}{$subtype} = 0 unless defined $nconf{ALL}{"{FA}"}{$subtype};
+			$nevent{ALL}{$type} += $count;
+			$nevent{"c=$chnl f=$file"}{$type} += $count if $opt_a =~ /c/i and $opt_a =~ /f/i;
+			$nevent{"c=$chnl"}{$type} += $count if $opt_a =~ /c/i and not $opt_a =~ /f/i;
+			$nevent{"f=$file"}{$type} += $count if $opt_a =~ /f/i and not $opt_a =~ /c/i;
+			$nevent{"s=$spkr"}{$type} += $count if $opt_a =~ /s/i;
+			$nevent{"g=$gndr"}{$type} += $count if $opt_a =~ /g/i;
+		    }
+		}
+	    }
+	}
+    }
+    print_md_scores ($metadata_type, \%nevent, \%nconf, \%offsets, \%nwerr);
+}
+
+#################################
+
+sub print_offset_stats {
+
+    my ($counts) = @_;
+    my (@offsets, $count, $min, $max, $i);
+
+    @offsets = (keys %{$counts->{WBEG}}, keys %{$counts->{WEND}});
+    $min = min (-3, @offsets);
+    $max = max (3, @offsets);
+    print "  word offsets:  <-3  ";
+    for ($i=-3; $i<=3; $i++) {
+	printf "%5d", $i;
+    }
+    print "     >3\n";
+    print "           BEG:";
+    for ($count=0,$i=$min; $i<-3; $i++) {
+	$count += $counts->{WBEG}{$i} if defined $counts->{WBEG}{$i};
+    }	    
+    printf "%5d  ", $count if defined $count;
+    print "    -  ", unless defined $count;
+    for ($i=-3; $i<=3; $i++) {
+	$count = $counts->{WBEG}{$i};
+	printf "%5d", $count if defined $count;
+	print "    -", unless defined $count;
+    }
+    for ($count=0,$i=4; $i<=$max; $i++) {
+	$count += $counts->{WBEG}{$i} if defined $counts->{WBEG}{$i};
+    }	    
+    printf "%7d", $count if defined $count;
+    print "      -", unless defined $count;
+    
+    print "\n           END:";
+    for ($count=0,$i=$min; $i<-3; $i++) {
+	$count += $counts->{WEND}{$i} if defined $counts->{WEND}{$i};
+    }	    
+    printf "%5d  ", $count if defined $count;
+    print "    -  ", unless defined $count;
+    for ($i=-3; $i<=3; $i++) {
+	$count = $counts->{WEND}{$i};
+	printf "%5d", $count if defined $count;
+	print "    -", unless defined $count;
+    }
+    for ($count=0,$i=4; $i<=$max; $i++) {
+	$count += $counts->{WEND}{$i} if defined $counts->{WEND}{$i};
+    }	    
+    printf "%7d", $count if defined $count;
+    print "      -", unless defined $count;
+    print "\n";
+}
+
+#################################
+
+sub print_md_scores {
+
+    my ($metadata_type, $event_counts, $conf_counts, $offset_counts, $word_counts) = @_;
+    my ($type, $nerr, $norm, $name, $ref, $sys, $category, $counts);
+    my ($count, $min, $max, $i, @offsets);
+    my $head_format = "%36s   %5s %5s %5s   %6s %6s %6s   %6s %6s\n";
+    my $data_format = "%-28.28s   %5d   %5d %5d %5s   %6.2f %6.2f %6.2f   %6.2f %6.2f\n";
+    my @header = ("Nref", "Ndel", "Nins", "Nsub", "%Del", "%Ins", "%Sub", "%D+I", "%Tot");
+
+    $counts = $word_counts->{ALL};
+    $nerr = $counts->{DEL} + $counts->{INS};
+    $nerr += $counts->{SUB} if $metadata_type =~ /^(SU|FILLER)$/;
+    printf "\n*** Performance analysis for %ss ***  overall error SCORE = %.2f%s\n",
+        $metadata_type,	100*$nerr/max($counts->{REF},$epsilon), "%";
+
+#metadata word detection
+    print "\nSU (exact) end detection statistics" if $metadata_type eq "SU";
+    print "\nIP (exact) detection statistics" if $metadata_type eq "IP";
+    print "\n$metadata_type word coverage statistics" unless $metadata_type =~ /^(SU|IP)$/;
+    print " -- in terms of reference words\n";
+    printf $head_format, @header;
+    foreach $category (sort keys %$word_counts) {
+	printf $data_format, ($category ne "ALL" ? $category : " "x17 ."ALL",
+			      error_output ($word_counts->{$category}));
+    }
+
+#metadata event detection
+    print "\n$metadata_type detection statistics -- in terms of \# of $metadata_type"."s\n";
+    printf $head_format, @header;
+    foreach $category (sort keys %$event_counts) {
+	printf $data_format, ($category ne "ALL" ? $category : " "x17 ."ALL",
+			      error_output ($event_counts->{$category}));
+    }
+
+#metadata event classification
+    print "\n$metadata_type detection confusion matrix -- in terms of \# of $metadata_type"."s\n";
+    foreach $category (sort keys %$conf_counts) {
+	$counts = $conf_counts->{$category};
+	printf "%24.24s", "$category - ref\\sys";
+	foreach $name (sort keys %$counts, "{Miss}") {
+	    next if $name eq "{FA}";
+	    print "    " if $name eq "{Miss}";
+	    printf "%10.8s", $name;
+	}
+	print "\n";
+	foreach $ref (sort keys %$counts) {
+	    print "\n" if $ref eq "{FA}";
+	    printf "%24.24s", $ref;
+	    foreach $sys (sort keys %$counts, "{Miss}") {
+		next if $sys eq "{FA}" or ($ref eq "{FA}" and $sys eq "{Miss}");
+		print "    " if $sys eq "{Miss}";
+		printf "%8d  ", $counts->{$ref}{$sys} ? $counts->{$ref}{$sys} : 0;
+	    }
+	    print "\n";
+	}
+    }
+
+#offsets
+    foreach $category (sort keys %$offset_counts) {
+	print "\n$metadata_type word offset statistics for $category data\n";
+	print_offset_stats ($offset_counts->{$category});
+    }
+}
+
+#################################
+
+sub error_output {
+
+    my ($counts) = @_;
+    my (@output, $item, $nerr);
+
+    foreach $item ("REF", "DEL", "INS", "SUB") {
+	$counts->{$item} = 0 unless defined $counts->{$item};
+	push @output, $counts->{$item};
+	$nerr += $counts->{$item} unless $item eq "REF";
+    }
+
+    my $norm = 100/max($counts->{REF},$epsilon);
+    foreach my $item ("DEL", "INS", "SUB") {
+	push @output, min(999.99,$norm*$counts->{$item});
+    }
+    my $dpi = $counts->{"DEL"}+$counts->{"INS"};
+    my $tot = $dpi+$counts->{"SUB"};
+    push @output, min(999.99,$norm*$dpi), min(999.99,$norm*$tot);
+    return @output;
+}
+
+#################################
+
+sub word_kinds {
+
+    my ($words, $kind) = @_;
+    my ($word, %count);
+
+    foreach $word (@$words) {
+	$count{$word->{$kind}}++;
+    }
+    return sort keys %count;
+}
+
+#################################
+
+sub standardize_word {
+
+    my ($word, $glm) = @_;
+    my (@split_word, @words, $tbeg, $tdur, $part, $new_word);
+
+    $word->{WORD} =~ lc $word->{WORD}; #lower case
+
+    if (defined $glm->{$word->{WORD}}) { #split glm words
+	@split_word = @{$glm->{$word->{WORD}}};
+    }
+    elsif ($word->{WORD} =~ /^([^-]+|mm-hmm|uh-huh|um-hmm)$/) {
+	return $word;
+    }
+    elsif ($word->{WORD} =~ /.+-.+/) { #split hyphenated words
+	$word->{WORD} =~ s/(.+)-(.+)/$1 $2/g;
+	@split_word = split /\s+/, $word->{WORD};
+    }
+    else { #don't split word
+	return $word;
+    }
+
+#split word and prorate time equally to each part
+    $tbeg = $word->{TBEG};
+    $tdur = $word->{TDUR}/@split_word;
+    foreach $part (@split_word) {
+	$new_word = {FILE => $word->{FILE}, CHNL => $word->{CHNL}, TBEG => $tbeg,
+		     TDUR => $tdur, TEND => $tbeg+$tdur, TMID => $tbeg+$tdur/2,
+		     WORD => $part, CONF => $word->{CONF}, SPKR => $word->{SPKR},
+		     TYPE => $word->{TYPE}, SUBT => $word->{SUBT}, WTYP => $word->{WTYP}};
+	push @words, $new_word;
+	$tbeg += $tdur;
+    }
+    return @words;
+}
+
+#################################
+
+sub decide_who_spoke_the_words {
+
+    my ($ref_wds, $sys_wds) = @_;
+    my ($ref_index, $ref_word, $sys_index, $index, $word, $md_index, $md);
+    my ($sys_word, $spkr, $score, $best_spkr, $best_score, @speakers);
+
+#select the best ref word for each STT output word that has multiple reference word matches
+    for ($sys_index=0; $sys_index<@$sys_wds; $sys_index++) {
+	$sys_word = $sys_wds->[$sys_index];
+	next unless defined $sys_word->{SPKRS};
+	undef $best_score;
+	@speakers = sort keys %{$sys_word->{SPKRS}};
+	next unless @speakers > 1;
+	foreach $spkr (@speakers) {
+	    next unless defined $sys_word->{SPKRS}{$spkr};
+	    $ref_word = $sys_word->{SPKRS}{$spkr}{REFPTR};
+	    $score = $ref_word->{PATHS}{$sys_index}{SCORE};
+	    next if defined $best_score and $best_score > $score;
+	    $best_score = $score;
+	    $best_spkr = $spkr;
+	}
+	next unless defined $best_score;
+	foreach $spkr (@speakers) {
+	    next if $spkr eq $best_spkr;
+	    $sys_word->{SPKRS}{$spkr} = undef;
+	    $ref_word = $sys_word->{SPKRS}{$best_spkr}{REFPTR};
+	}
+    }
+}
+
+#################################
+
+sub event_covers_time {
+
+    my ($event, $time) = @_;
+
+    return ($time < $event->{TBEG} or
+	    $time > $event->{TEND}) ? 0 : 1;
+}
+
+#################################
+
+sub word_score {
+
+    my ($ref_word, $sys_word) = @_;
+    my ($tbeg, $tend, $rw, $sw, $score, $word);
+    my ($attribute, $attributes, $ref_attributes, $sys_attributes);
+
+#compute joint word coverage
+    $score = 0;
+    if (defined $ref_word and defined $sys_word) {
+	return undef unless overlap ($ref_word, $sys_word, $word_gap);
+	if (($ref_attributes = $ref_word->{ATTRIBUTES}) and
+	    ($sys_attributes = $sys_word->{ATTRIBUTES})) {
+	    foreach $attribute ("EDIT", "FILLER", "IP", "SU") {
+	        next unless (defined $ref_attributes->{$attribute} and
+			     defined $sys_attributes->{$attribute});
+		$score += ($ref_attributes->{$attribute} eq
+			   $sys_attributes->{$attribute}) ? 0.02 : 0.01;
+	    }
+	}
+	return $score if #both word type and word spelling match
+	    ((   $ref_word->{WORD} eq $sys_word->{WORD} and
+		 $ref_word->{WTYP} eq $sys_word->{WTYP})
+	     
+	     or ($ref_word->{WTYP} eq "lex" and 
+		 $sys_word->{WTYP} eq "frag" and
+		 ($sw = $sys_word->{WORD}, $sw =~ s/^-*|-*$//g, $sw) #make sure that $sw is non-null
+		 and ($ref_word->{WORD} =~ /$sw/))
+	     
+	     or ($ref_word->{WTYP} eq "frag" and 
+		 $sys_word->{WTYP} eq "lex" and
+		 ($rw = $ref_word->{WORD}, $rw =~ s/^-*|-*$//g, $rw) #make sure that $rw is non-null
+		 and ($sys_word->{WORD} =~ /$rw/))
+	     
+	     or ($ref_word->{WTYP} eq "fp" and 
+		 $sys_word->{WTYP} eq "fp")
+	     
+	     or ($ref_word->{WTYP} eq "frag" and 
+		 $sys_word->{WTYP} eq "frag"));
+	
+	return $score - 0.1*max(1,ref_count($ref_word)) if #word type match, except for lex's
+	    ((   $ref_word->{WTYP} eq $sys_word->{WTYP} and
+		 $ref_word->{WTYP} ne "lex"));
+	
+	return $score - max(1,ref_count($ref_word),ref_count($sys_word));
+    }
+    $word = defined $ref_word ? $ref_word : $sys_word;
+    return 0 unless defined $word;
+    $score = $word->{WTYP} eq "lex" ? -ref_count($word) : -0.2*max(1,ref_count($word));
+    $attributes = $word->{ATTRIBUTES};
+    if (defined $attributes) {
+	foreach $attribute ("EDIT", "FILLER", "IP", "SU") {
+	    $score += 0.005 if defined $word->{$attribute};
+	}
+    }
+    return $score;
+}
+
+#################################
+
+sub wd_err_count {
+
+    my ($ref_word, $sys_word) = @_;
+    
+    my $word_score = word_score($ref_word,$sys_word);
+    return (defined $word_score and $word_score > -0.5) ? 0 : 1;
+}
+
+#################################
+
+sub ref_count {
+
+    my ($word) = @_;
+
+    return 0 unless defined $word;
+    return 0 if $word->{WTYP} =~ /^(non-lex|misc)$/;
+
+#hyphenated words get a count of 2 (except for mm-hmm, uh-huh and hm-hmm)
+    my $WORD = $word->{WORD};
+    $WORD =~ s/^-*|-*$//g;
+    return $WORD =~ /^([^-]+|mm-hmm|uh-huh|um-hmm)$/ ? 1 : 2;
+}
+
+#################################
+
+sub overlap {
+
+    my ($ref, $sys, $tgap) = @_;
+    
+    return 0 unless $ref and $sys;
+    $tgap = 0 unless defined $tgap;
+    my $tovl = (min($ref->{TEND}, $sys->{TEND}) -
+		max($ref->{TBEG}, $sys->{TBEG})) + $tgap;
+    return $tovl > 0 ? $tovl/(1 + $tgap/max($ref->{TDUR},$epsilon)) : 0;
+}
+
+################################
+
+sub md_in_uem {
+
+    my ($md, $uem_eval) = @_;
+
+    return 1 unless defined $uem_eval; #not using UEM partition specification
+    foreach my $partition (@$uem_eval) {
+        return 1 if ($md->{TEND} <= $partition->{TEND}+$epsilon and
+		     $md->{TBEG} >= $partition->{TBEG}-$epsilon);
+    }
+    return 0;
+}
+
+#################################
+
+sub map_spkrdata_to_words {
+
+    my ($sys_mds, $sys_wds, $ref_mds, $ref_wds) = @_;
+    my ($spkr, $md, @ref_spkr_mds, @sys_spkr_mds);
+
+    foreach $spkr (keys %$ref_mds) {
+	foreach $md (@{$ref_mds->{$spkr}}) {
+	    push @ref_spkr_mds, $md;
+	}
+    }
+
+    foreach $spkr (keys %$sys_mds) {
+	foreach $md (@{$sys_mds->{$spkr}}) {
+	    push @sys_spkr_mds, $md;
+	}
+    }
+
+    map_metadata_to_words (\@sys_spkr_mds, $sys_wds, \@ref_spkr_mds, $ref_wds);
+}
+
+#################################
+
+sub map_metadata_to_words {
+
+    my ($sys_mds, $sys_wds, $ref_mds, $ref_wds) = @_;
+
+#map system output metadata times to ref words
+    foreach my $md (@$sys_mds) {
+	if ($opt_w) { #adjust times/words to agree with ref-sys word alignment
+	    $md->{RTBEG} = adjust_sys_time_to_ref ($md->{TBEG}, $sys_wds);
+	    $md->{RTEND} = adjust_sys_time_to_ref ($md->{TEND}, $sys_wds);
+	}
+	else { #map system output metadata event to reference data normally
+	    $md->{RTBEG} = $md->{TBEG};
+	    $md->{RTEND} = $md->{TEND};
+	}
+	$md->{RTDUR} = $md->{RTEND} - $md->{RTBEG};
+	$md->{RTMID} = $md->{RTBEG} + $md->{RTDUR}/2;
+	($md->{RWBEG}, $md->{RWEND}) = md_ref_word_indices ($md, $ref_wds);
+	$md->{RWDUR} = $md->{RWEND} - $md->{RWBEG} + 1;
+    }
+
+#map reference metadata times to ref words
+    foreach my $md (@$ref_mds) {
+	($md->{WBEG}, $md->{WEND}) = md_word_indices ($md, $ref_wds);
+	$md->{WDUR} = $md->{WEND} - $md->{WBEG} + 1;
+	next if ($md->{WDUR} > 0 or
+		 $md->{TYPE} =~ /^(IP|CB)$/);
+        next if (not $opt_W and not $opt_w and $md->{TYPE} eq "SPEAKER");
+	warn "\nWARNING:  reference metadata event subsumes no reference words\n"
+	    ."    file='$md->{FILE}', chnl='$md->{CHNL}', tbeg='$md->{TBEG}',"
+		." tend='$md->{TEND}', type='$md->{TYPE}', subtype='$md->{SUBT}'\n";
+    }
+
+#friendly (unused) check of system metadata times versus sys words
+    return unless $opt_w;
+    foreach my $md (@$sys_mds) {
+	(my $wbeg, my $wend) = md_word_indices ($md, $sys_wds);
+	next if ($wend - $wbeg >= 0 or
+		 $md->{TYPE} =~ /^(IP|CB)$/);
+	warn "\nWARNING:  system output metadata event subsumes no system output words\n"
+	    ."    file='$md->{FILE}', chnl='$md->{CHNL}', tbeg='$md->{TBEG}',"
+		." tend='$md->{TEND}', type='$md->{TYPE}', subtype='$md->{SUBT}'\n";
+    }
+}
+
+#################################
+
+sub adjust_sys_time_to_ref {
+
+    my ($ts, $sys_wds) = @_;
+    my ($ts1, $ts2, $tr, $tr1, $tr2, $ws1, $ws2, $ref_wd);
+
+#given a time in the system output, find the time in the reference
+#that harmonizes with the alignment of system output words
+
+#find the nearest right reference anchor point
+    $ws2 = 0;
+    $ws2++ while ($ws2 < @$sys_wds and
+		  ($sys_wds->[$ws2]{TEND} < $ts or
+		   not defined $sys_wds->[$ws2]{MAPPTR}));
+    if ($ws2 < @$sys_wds) {
+	$ref_wd = $sys_wds->[$ws2]{MAPPTR};
+	($ts2, $tr2) = $sys_wds->[$ws2]{TBEG} < $ts ?
+	    ($sys_wds->[$ws2]{TEND}, $ref_wd->{TEND}) :
+	    ($sys_wds->[$ws2]{TBEG}, $ref_wd->{TBEG});
+    }
+
+#find the nearest left reference anchor point
+    $ws1 = min($ws2, @$sys_wds-1);
+    $ws1-- while ($ws1 >= 0 and 
+		  ($sys_wds->[$ws1]{TBEG} > $ts or
+		   not defined $sys_wds->[$ws1]{MAPPTR}));
+    if ($ws1 >= 0) {
+	$ref_wd = $sys_wds->[$ws1]{MAPPTR};
+	($ts1, $tr1) = $sys_wds->[$ws1]{TEND} > $ts ? 
+	    ($sys_wds->[$ws1]{TBEG}, $ref_wd->{TBEG}) :
+	    ($sys_wds->[$ws1]{TEND}, $ref_wd->{TEND});
+    }
+
+#make adjustment
+    $tr = (($ws1 < 0 and $ws2 >= @$sys_wds) ? $ts               : #no adjustment possible
+	   ($ws1 < 0)                       ? $tr2 + ($ts-$ts2) : #extrapolate left without scale change
+	   ($ws2 >= @$sys_wds)              ? $tr1 + ($ts-$ts1) : #extrapolate right without scale change
+	   ($ts == $ts1)                    ? $tr1              : #no interpolation necessary
+	   $tr1 + ($ts-$ts1)*($tr2-$tr1)/($ts2-$ts1));            #normal interpolation
+    return $tr;
+}
+
+#################################
+
+sub md_word_indices {
+
+    my ($md, $wds) = @_;
+
+#find the word indices of the first and last words with midpoints inside the metadata event
+    my $i = 0;
+    $i++ while ($i<@$wds and ($wds->[$i]{TMID}) < $md->{TBEG});
+    my $j = max($i-1,0);
+    $j++ while ($j<@$wds and ($wds->[$j]{TMID}) <= $md->{TEND});
+    return ($i, --$j);
+}
+
+#################################
+
+sub md_ref_word_indices {
+
+    my ($md, $wds) = @_;
+
+#find the word indices of the first and last words with midpoints inside the metadata event
+    my $i = 0;
+    $i++ while ($i<@$wds and ($wds->[$i]{TMID}) < $md->{RTBEG});
+    my $j = max($i-1,0);
+    $j++ while ($j<@$wds and ($wds->[$j]{TMID}) <= $md->{RTEND});
+    return ($i, --$j);
+}
+
+#################################
+
+sub align_data {
+
+    my ($refs, $syss, $spkr, $scorer, $max_delta_score) = @_;
+    my ($ref, $sys, $prev_ref, $path, $ref_path);
+    my ($ref_index, $sys_index, $index, $pruning_threshold);
+    my ($score, $path_score, $best_score, %cum_insertion_score);
+
+#compute cumulative insertion score for sys output
+    $cum_insertion_score{-1} = 0;
+    for ($sys_index=0; $sys_index<@$syss; $sys_index++) {
+	$sys = $syss->[$sys_index];
+	$cum_insertion_score{$sys_index} = $cum_insertion_score{$sys_index-1};
+	$cum_insertion_score{$sys_index} += &$scorer (undef, $sys);
+    }
+
+#find the best path by incremental optimization through the ref transcription
+    $prev_ref->{PATHS}{-1}{SCORE} = 0;
+    for ($ref_index=0; $ref_index<@$refs; $ref_index++) {
+	$ref = $refs->[$ref_index];
+	next if $spkr and $ref->{SPKR} ne $spkr;
+
+#find best score and compute pruning threshold
+	$best_score = undef;
+	foreach $index (keys %{$prev_ref->{PATHS}}) {
+	    $path_score = $prev_ref->{PATHS}{$index}{SCORE} +
+		$cum_insertion_score{@$syss-1}-$cum_insertion_score{$index};
+	    $best_score = $path_score if not defined $best_score or $best_score < $path_score;
+	}
+	$pruning_threshold = $best_score - $max_delta_score;
+	
+#extend paths with scores above pruning threshold
+	foreach $index (keys %{$prev_ref->{PATHS}}) {
+	    $path_score = $prev_ref->{PATHS}{$index}{SCORE} +
+		$cum_insertion_score{@$syss-1}-$cum_insertion_score{$index};
+	    next unless $path_score > $pruning_threshold;
+	    $ref->{PATHS}{$index}{PATHPTR} = $index;
+	    $ref->{PATHS}{$index}{PREVREF} = $prev_ref;
+	    $ref->{PATHS}{$index}{SCORE} = $prev_ref->{PATHS}{$index}{SCORE} +
+		&$scorer ($ref, undef);
+	}
+
+#compare the current ref event to all sys events
+	for ($sys_index=0; $sys_index<@$syss; $sys_index++) {
+	    $sys = $syss->[$sys_index];
+	    $score = &$scorer ($ref, $sys);
+	    next unless defined $score;
+
+#update each path for this {ref, sys} match
+	    foreach $index (sort {$a<=>$b} keys %{$prev_ref->{PATHS}}) {
+		next unless $index < $sys_index;
+		$path_score = $score + $prev_ref->{PATHS}{$index}{SCORE} +
+		    $cum_insertion_score{$sys_index-1}-$cum_insertion_score{$index};
+		if (not defined $ref->{PATHS}{$sys_index}
+		    or $path_score > $ref->{PATHS}{$sys_index}{SCORE}) {
+		    $ref->{PATHS}{$sys_index}{SCORE} = $path_score;
+		    $ref->{PATHS}{$sys_index}{PREVREF} = $prev_ref;
+		    $ref->{PATHS}{$sys_index}{PATHPTR} = $index;
+		    $ref->{PATHS}{$sys_index}{SYSPTR} = $sys;
+		}
+	    }
+	}
+	$prev_ref=$ref;
+    }
+
+#add insertion score for remaining unmapped sys events
+    foreach $index (sort {$a<=>$b} keys %{$prev_ref->{PATHS}}) {
+	$prev_ref->{PATHS}{$index}{SCORE} +=
+	    $cum_insertion_score{@$syss-1}-$cum_insertion_score{$index} if $index < @$syss-1;
+    }
+}
+
+#################################
+
+sub md_score {
+
+    my ($ref_md, $sys_md) = @_;
+    my ($beg, $end, $overlap, $ref_beg, $sys_beg, $md_dur);
+    my $subtype_bonus = 1.1; #multiplicative bonus for matching subtypes
+    my $endword_bonus = 1.001; #multiplicative bonus for matching boundaries
+
+    return 0 unless defined $ref_md and defined $sys_md;
+
+    if ($opt_W) { #compute md mapping score as ref-sys overlap in (ref) words
+	$ref_beg = $ref_md->{WBEG};
+	$sys_beg = $sys_md->{RWBEG};
+	if ($ref_md->{TYPE} eq "SU") {
+	    $ref_beg = max($ref_beg, $ref_md->{WEND}-($su_extent_limit-1));
+	    $sys_beg = max($sys_beg, $sys_md->{RWEND}-($su_extent_limit-1));
+	}
+	$beg = max($ref_beg, $sys_beg);
+	$end = min($ref_md->{WEND}, $sys_md->{RWEND});
+	$overlap = $end - $beg + 1;
+	$md_dur = $ref_md->{WEND} - $ref_beg + 1;
+    }
+    else { #compute md mapping score as ref-sys overlap in time
+	$ref_beg = $ref_md->{TBEG};
+	$sys_beg = $sys_md->{RTBEG};
+	if ($ref_md->{TYPE} eq "SU") {
+	    $ref_beg = max($ref_beg, $ref_md->{TEND}-$su_extent_limit);
+	    $sys_beg = max($sys_beg, $sys_md->{RTEND}-$su_extent_limit);
+	}
+	$beg = max($ref_beg, $sys_beg);
+	$end = min($ref_md->{TEND}, $sys_md->{RTEND});
+	$overlap = $end - $beg;
+	$md_dur = $ref_md->{TEND} - $ref_beg;
+    }
+    $overlap += $epsilon if $ref_md->{TYPE} =~ /^(IP|CB)$/;
+    $overlap += $md_gap;
+    return undef if $overlap < 0;
+    $overlap *= $subtype_bonus if $ref_md->{SUBT} eq $sys_md->{SUBT};
+    $overlap *= $endword_bonus if $ref_md->{WEND} eq $sys_md->{RWEND};
+    return $overlap if $md_dur+$md_gap < max($md_dur,$epsilon);
+    return $overlap * max($md_dur,$epsilon)/($md_dur+$md_gap);
+}
+
+#################################
+
+sub trace_best_path {
+
+    my ($refs, $syss, $spkr) = @_;
+    my ($ref, $path, $pathptr, $best_score, $prev_ref, $ref_index, $index, $sys);
+
+#find the last word for the selected channel and speaker
+    return unless @$refs and @$syss;
+    $ref_index = @$refs-1;
+    $ref_index-- while (defined $spkr and $refs->[$ref_index]{SPKR} ne $spkr);
+    $spkr = "ALL" unless defined $spkr;
+
+#identify the best path for the selected ending word
+    $ref = $refs->[$ref_index];
+    undef $best_score;
+    foreach $index (sort {$a<=>$b} keys %{$ref->{PATHS}}) {
+	$path = $ref->{PATHS}{$index};
+	if (not defined $best_score or $path->{SCORE} > $best_score) {
+	    $best_score = $path->{SCORE};
+	    $pathptr = $path->{PATHPTR};
+	    $prev_ref = $path->{PREVREF};
+	    $sys = $path->{SYSPTR};
+	}
+    }
+    if (defined $sys) {
+	$sys->{SPKRS}{$spkr}{REFPTR} = $ref;
+	$sys->{MAPPTR} = $ref;
+	$ref->{MAPPTR} = $sys;
+    }
+
+#trace the path back 
+    while ($pathptr != -1) {
+	$ref = $prev_ref;
+	$path = $ref->{PATHS}{$pathptr};
+	$pathptr = $path->{PATHPTR};
+	$prev_ref = $path->{PREVREF};
+	next unless defined $path->{SYSPTR};
+	$sys = $path->{SYSPTR};
+	$sys->{SPKRS}{$spkr}{REFPTR} = $ref;
+	$sys->{MAPPTR} = $ref;
+	$ref->{MAPPTR} = $sys;
+    }
+}
+
+#################################
+
+sub delta_metadata_error_words {
+
+#accumulates the number of metadata error words difference
+#between ref beg/end point of metadata event and sys beg/end point of metadata event
+
+    my ($location, $ref_index, $sys_index, $ref_wds) = @_;
+
+    my $dw = 0;
+    my $index = min($ref_index,$sys_index);
+    my $istop = max($ref_index,$sys_index);
+    while ($index != $istop) {
+	$index++ if $location eq "END";
+	$dw++ if $index >= 0 and $index < @$ref_wds and $ref_wds->[$index]{SCOREABLE};
+	$index++ if $location eq "BEG";
+    }
+    return $sys_index > $ref_index ? $dw : 0-$dw;
+}
+
+#################################
+
+sub print_path_score {
+
+    my ($ref, $sys, $ref_count, $err_count, $err_type) = @_;
+
+#print header
+    unless (defined $ref or defined $sys) {
+	printf " ref del ins sub %16.16s %-7s%8s%8s %-12.12s", "REF:  token", "type",
+	    "tbeg", "tend", "speaker";
+	printf " %16.16s %-7s  %7s%8s %8s%8s %-12.12s\n", "SYS:  token", "type",
+	    "Rtbeg", "Rtend", "tbeg", "tend", "sys-speaker" if $opt_w;
+	printf " %16.16s %-7s%8s%8s %-12.12s\n", "SYS:  token", "type",
+	    "tbeg", "tend", "speaker" unless $opt_w;
+	return;
+    }
+
+#print ref
+    my %errors = (REF=>"-", DEL=>"-", INS=>"-", SUB=>"-");
+    $errors{REF} = $ref_count if defined $ref_count;
+    $errors{$err_type} = $err_count if defined $err_type;
+    printf "%4s%4s%4s%4s", $errors{REF}, $errors{DEL}, $errors{INS}, $errors{SUB};
+
+    if (defined $ref) {
+	printf " %16.16s %-7s%8.2f%8.2f %-12.12s", $ref->{TYPE} =~ /^(LEXEME|NON-LEX|NON-SPEECH)$/ ?
+	    ($ref->{WORD}, $ref->{WTYP}) : ($ref->{SUBT}, $ref->{TYPE}), $ref->{TBEG}, $ref->{TEND}, $ref->{SPKR};
+    }
+    else {
+	printf " %16.16s %-7s%8s%8s %-12.12s", "---", "---", "--- ", "--- ", "---";
+    }
+
+#print sys
+    if ($opt_w) {
+	if (defined $sys) {
+	    printf " %16.16s %-7s (%7.2f%8.2f)%8.2f%8.2f %-12.12s\n", $sys->{TYPE} =~ /^(LEXEME|NON-LEX|NON-SPEECH)$/ ?
+		($sys->{WORD}, $sys->{WTYP}) : ($sys->{SUBT}, $sys->{TYPE}), $sys->{RTBEG}, $sys->{RTEND}, $sys->{TBEG}, $sys->{TEND}, $sys->{SPKR};
+	}
+	else {
+	    printf " %16.16s %-7s (%7s%8s)%8s%8s %-12.12s\n", "---", "---", "--- ", "--- ", "--- ", "--- ", "---";
+	}
+    }
+    else {
+	if (defined $sys) {
+	    printf " %16.16s %-7s%8.2f%8.2f %-12.12s\n", $sys->{TYPE} =~ /^(LEXEME|NON-LEX|NON-SPEECH)$/ ?
+		($sys->{WORD}, $sys->{WTYP}) : ($sys->{SUBT}, $sys->{TYPE}), $sys->{TBEG}, $sys->{TEND}, $sys->{SPKR};
+	}
+	else {
+	    printf " %16.16s %-7s%8s%8s %-12.12s\n", "---", "---", "--- ", "--- ", "---";
+	}
+    }	
+}
+
+#################################
+
+sub score_metadata_path {
+
+    my ($type, $file, $chnl, $ref_mds, $sys_mds, $ref_wds) = @_;
+    my ($ref_md, @sys_mds, $sys_index, $sys_md, $md, $spkr, $iw);
+    my (%count, $ref_count, $err_count, $ref_wd, $dw);
+
+    print "\n$type alignment and scoring details for channel $chnl of file $file\n" if $opt_D;
+    print_path_score () if $opt_D;
+
+#tabulate boundary/depod errors
+    tag_ref_words_with_metadata_info ($ref_mds, $ref_wds, "REF");
+    tag_ref_words_with_metadata_info ($sys_mds, $ref_wds, "SYS");
+    for ($iw=0; $iw<@$ref_wds; $iw++) {
+	$ref_wd = $ref_wds->[$iw];
+	next unless $ref_wd->{SCOREABLE} or $type =~ /^(IP|SU)$/;
+	my $nref = my $nsys = my $nins = my $ncor = 0;
+	foreach my $subtype (keys %{$md_subtypes{$type}}) {
+	    my $nr = $ref_wd->{"REF-$type"}{$subtype}{MAP};
+	    my $nm = $ref_wd->{"REF-$type"}{$subtype}{NOT};
+	    my $ns = $ref_wd->{"SYS-$type"}{$subtype}{MAP};
+	    my $ni = $ref_wd->{"SYS-$type"}{$subtype}{NOT};
+	    $nref += $nr if $nr;
+	    $nref += $nm if $nm;
+	    $nsys += $ns if $ns;
+	    $nins += $ni if $ni;
+	    $ncor += min($nr,$ns) if $nr and $ns;
+	}
+	$count{WORDS}{REF} += $nref;
+	$count{WORDS}{DEL} += max($nref-$nsys,0);
+	$count{WORDS}{INS} += max($nsys-$nref,0) + ($nins ? $nins : 0);
+	$count{WORDS}{SUB} += min($nref,$nsys) - $ncor;
+    }
+
+#tabulate beg/end word offset errors
+    foreach $ref_md (@$ref_mds) {
+	next unless ($sys_md = $ref_md->{MAPPTR});
+	$dw = delta_metadata_error_words ("BEG", $ref_md->{WBEG}, $sys_md->{RWBEG}, $ref_wds);
+	$count{WORD_OFFSET}{WBEG}{$dw}++;
+	$dw = delta_metadata_error_words ("END", $ref_md->{WEND}, $sys_md->{RWEND}, $ref_wds);
+	$count{WORD_OFFSET}{WEND}{$dw}++;
+    }
+
+#tabulate detection errors
+    @sys_mds = @$sys_mds;
+    $sys_md = shift @sys_mds;
+    foreach $ref_md (@$ref_mds) {
+	$spkr = $ref_md->{SPKR};
+	$ref_count = md_err_count ($ref_md, undef);
+	$count{REF}{$spkr}{$ref_md->{SUBT}} += $ref_count if defined $ref_count;
+	if ($ref_md->{MAPPTR}) {
+	    while ($sys_md and
+		   $sys_md ne $ref_md->{MAPPTR}) {
+		printf "%sUNEXPECTED MAPPED SYS MD:  %16s %-7s%8.2f%8.2f %-16s\n",
+		    " "x44, $sys_md->{SUBT}, $sys_md->{TYPE}, $sys_md->{TBEG},
+		    $sys_md->{TEND}, $sys_md->{SPKR} if $sys_md->{MAPPTR};
+		$err_count = md_err_count (undef, $sys_md);
+		$count{INS}{ref_spkr_of_md($sys_md,$ref_wds)}{$sys_md->{SUBT}} += $err_count;
+		print_path_score (undef, $sys_md, 0, $err_count, "INS") if $opt_D;
+		$sys_md = shift @sys_mds;
+	    }
+	    if ($sys_md) {
+		$err_count = md_err_count ($ref_md, $sys_md);
+		$count{SUB}{$spkr}{$ref_md->{SUBT}} += $err_count;
+		$count{CONFUSION}{$spkr}{$ref_md->{SUBT}}{$sys_md->{SUBT}} += $ref_count;
+		print_path_score ($ref_md, $sys_md, $ref_count, $err_count, "SUB") if $opt_D;
+		$sys_md = shift @sys_mds;
+	    }
+	    else {
+		printf "%sSYS MD NOT FOUND FOR REF MD:   %16s %-7s%8.2f%8.2f %-16s\n",
+		    " "x40, $ref_md->{SUBT}, $ref_md->{TYPE}, $ref_md->{TBEG},
+		    $ref_md->{TEND}, $ref_md->{SPKR} if $ref_md->{MAPPTR};
+	    }
+	}
+	else {
+	    $err_count = md_err_count ($ref_md, undef);
+	    $count{DEL}{$spkr}{$ref_md->{SUBT}} += $err_count;
+	    print_path_score ($ref_md, undef, $ref_count, $err_count, "DEL") if $opt_D;
+	}
+    }
+    while ($sys_md) {
+	printf "%sUNEXPECTED MAPPED SYS MD:  %16s %-7s%8.2f%8.2f %-16s\n",
+	    " "x44, $sys_md->{SUBT}, $sys_md->{TYPE}, $sys_md->{TBEG},
+	    $sys_md->{TEND}, $sys_md->{SPKR} if $sys_md->{MAPPTR};
+	$err_count = md_err_count (undef, $sys_md);
+	$count{INS}{ref_spkr_of_md($sys_md,$ref_wds)}{$sys_md->{SUBT}} += $err_count;
+	print_path_score (undef, $sys_md, 0, $err_count, "INS") if $opt_D;
+	$sys_md = shift @sys_mds;
+    }
+    return {%count};
+}
+
+#################################
+
+sub md_err_count {
+
+    my ($ref_md, $sys_md) = @_;
+
+    return 1 if (not defined $sys_md         or not defined $ref_md         or
+		 not defined $sys_md->{TYPE} or not defined $ref_md->{TYPE} or
+		 not defined $sys_md->{SUBT} or not defined $ref_md->{SUBT} or
+		 $sys_md->{TYPE} ne $ref_md->{TYPE} or
+		 $sys_md->{SUBT} ne $ref_md->{SUBT});
+    return 0;
+}
+
+#################################
+
+sub ref_spkr_of_md {
+
+    my ($md, $ref_wds) = @_;
+    my $spkr;
+
+    for (my $index =min($md->{RWBEG},$md->{RWEND});
+	    $index<=max($md->{RWBEG},$md->{RWEND}); $index++) {
+	next unless $index >= 0 and $index < @$ref_wds;
+	$spkr = $ref_wds->[$index]{SPKR} unless $spkr;
+	return "unknown" unless $ref_wds->[$index]{SPKR} eq $spkr;
+    }
+    return defined $spkr ? $spkr : "unknown";
+}
+
+#################################
+
+sub score_word_path {
+
+    my ($file, $chnl, $ref_wds, $sys_wds) = @_;
+    my ($ref_wrd, @sys_wds, $sys_wrd);
+    my ($ref_count, $err_count);
+
+    print "\nWord alignment and scoring details for channel $chnl of file $file\n";
+    print_path_score ();
+
+#tabulate errors
+    @sys_wds = @$sys_wds;
+    $sys_wrd = shift @sys_wds;
+    foreach $ref_wrd (@$ref_wds) {
+	$ref_count = ref_count($ref_wrd);
+	if ($ref_wrd->{MAPPTR}) {
+	    while ($sys_wrd and
+		   $sys_wrd ne $ref_wrd->{MAPPTR}) {
+		printf "%71s%16s %-7s%s%8.2f%8.2f %-16s\n", "UNEXPECTED MAPPED SYS WORD:",
+		    $sys_wrd->{WORD}, $sys_wrd->{WTYP}, " "x18, $sys_wrd->{TBEG},
+		    $sys_wrd->{TDUR}, $sys_wrd->{SPKR} if $sys_wrd->{MAPPTR};
+		$err_count = wd_err_count(undef, $sys_wrd);
+		print_path_score (undef, $sys_wrd, 0, $err_count, "INS");
+		$sys_wrd = shift @sys_wds;
+	    }
+	    if ($sys_wrd) {
+		$err_count = wd_err_count($ref_wrd, $sys_wrd);
+		print_path_score ($ref_wrd, $sys_wrd, $ref_count, $err_count, "SUB");
+		$sys_wrd = shift @sys_wds;
+	    }
+	    else {
+		printf "%71s%16s %-7s%s%8.2f%8.2f %-16s\n", "SYS WRD NOT FOUND FOR REF WRD:",
+		    $ref_wrd->{WORD}, $ref_wrd->{WTYP}, " "x18, $ref_wrd->{TBEG},
+		    $ref_wrd->{TDUR}, $ref_wrd->{SPKR} if $ref_wrd->{MAPPTR};
+	    }
+	}
+	else {
+	    $err_count = wd_err_count($ref_wrd, undef);
+	    print_path_score ($ref_wrd, undef, $ref_count, $err_count, "DEL");
+	}
+    }
+    while ($sys_wrd) {
+	printf "%71s%16s %-7s%8.2f%8.2f %-16s\n", "UNEXPECTED MAPPED SYS WORD:",
+	    $sys_wrd->{WORD}, $sys_wrd->{WTYP}, $sys_wrd->{TBEG},
+	    $sys_wrd->{TDUR}, $sys_wrd->{SPKR} if $sys_wrd->{MAPPTR};
+	$err_count = wd_err_count(undef, $sys_wrd);
+	print_path_score (undef, $sys_wrd, 0, $err_count, "INS");
+	$sys_wrd = shift @sys_wds;
+    }
+}
+
+#################################
+
+sub date_time_stamp {
+
+    my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime();
+    my @months = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
+    my ($date, $time);
+
+    $time = sprintf "%2.2d:%2.2d:%2.2d", $hour, $min, $sec;
+    $date = sprintf "%4.4s %3.3s %s", 1900+$year, $months[$mon], $mday;
+    return ($date, $time);
+}
+
+#################################
+
+sub max {
+
+    my ($max, $next);
+
+    return unless defined ($max=pop);
+    while (defined ($next=pop)) {
+	$max = $next if $next > $max;
+    }
+    return $max;
+}
+
+#################################
+
+sub min {
+
+    my ($min, $next);
+
+    return unless defined ($min=pop);
+    while (defined ($next=pop)) {
+	$min = $next if $next < $min;
+    }
+    return $min;
+}
+
+#################################
+
+sub score_speaker_diarization {
+
+    my ($file, $chnl, $ref_spkr_data, $sys_spkr_data, $ref_wds, $uem_eval, $rttm_data) = @_;
+    my ($uem_score, $ref_eval, $sys_eval, $spkr_overlap, $spkr_map);
+    my ($eval_segs, $score_segs, %stats, @ref_wds, $wrd, $ref_spkr, $sys_spkr);
+    my ($nref, $nsys, $nmap, $spkr, $seg, $type, $spkr_info, $noscore_nl);
+
+    $stats{EVAL_WORDS} = $stats{SCORED_WORDS} =	$stats{MISSED_WORDS} = $stats{ERROR_WORDS} = $epsilon;
+    @ref_wds = @$ref_wds;
+    $wrd = shift @ref_wds;
+    foreach $seg (@$uem_eval) {
+	$stats{EVAL_TIME} += $seg->{TEND}-$seg->{TBEG};
+	$wrd = shift @ref_wds while ($wrd and $wrd->{TMID} < $seg->{TBEG});
+	while ($wrd and $wrd->{TMID} <= $seg->{TEND}) {
+	    $stats{EVAL_WORDS}++;
+	    $wrd = shift @ref_wds;
+	}
+    }
+
+    $eval_segs = create_speaker_segs ($uem_eval, $ref_spkr_data, $sys_spkr_data);
+    foreach $seg (@$eval_segs) {
+	foreach $ref_spkr (keys %{$seg->{REF}}) {
+	    $spkr_info->{REF}{$ref_spkr}{TIME} += $seg->{TDUR};
+	    $spkr_info->{REF}{$ref_spkr}{TYPE} = $ref_spkr_data->{$ref_spkr}[0]{SUBT};
+	}
+	foreach $sys_spkr (keys %{$seg->{SYS}}) {
+	    $spkr_info->{SYS}{$sys_spkr}{TIME} += $seg->{TDUR};
+	    $spkr_info->{SYS}{$sys_spkr}{TYPE} = $sys_spkr_data->{$sys_spkr}[0]{SUBT};
+	}
+	next unless keys %{$seg->{REF}} > 0;
+	$stats{EVAL_SPEECH} += $seg->{TDUR};
+	foreach $ref_spkr (keys %{$seg->{REF}}) {
+	    foreach $sys_spkr (keys %{$seg->{SYS}}) {
+		$spkr_overlap->{$ref_spkr}{$sys_spkr} += $seg->{TDUR};
+	    }
+	}
+    }
+    $speaker_map{$file}{$chnl} = $spkr_map = map_speakers ($spkr_overlap)
+	if defined $spkr_overlap;
+    print_speaker_map ($spkr_map, $spkr_overlap) if $opt_m;
+    update_speaker_map_file ($spkr_map, $spkr_overlap, $file, $chnl, $opt_M) if $opt_M;
+
+    $uem_score = $collar > 0 ? add_collars_to_uem ($uem_eval, $ref_spkr_data) : $uem_eval;
+    $uem_score = add_exclusion_zones_to_uem ($noscore_sd, $uem_score, $rttm_data);
+    $noscore_nl->{"NON-LEX"} = $noscore_sd->{"NON-LEX"};
+    $uem_score = add_exclusion_zones_to_uem ($noscore_nl, $uem_score, $rttm_data, $max_extend);
+    $uem_score = exclude_overlapping_speech_from_uem ($uem_score, $rttm_data) if $opt_1;
+    tag_scoreable_words ($ref_wds, $uem_score);
+    $score_segs = create_speaker_segs ($uem_score, $ref_spkr_data, $sys_spkr_data);
+    print_speaker_segs ($score_segs, $file, $chnl) if $opt_v;
+    ($stats{TYPE}{NSPK}) = speaker_mapping_scores ($spkr_map, $spkr_info);
+    score_speaker_segments (\%stats, $score_segs, $ref_wds, $spkr_map, $spkr_info);
+    return {%stats};
+}
+
+#################################
+
+sub speaker_mapping_scores {
+
+    my ($spkr_map, $spkr_info) = @_;
+    my ($ref_spkr, $ref_type, $sys_spkr, $sys_type, %imap, %stats);
+
+    foreach $ref_spkr (keys %{$spkr_info->{REF}}) {
+	next unless $spkr_info->{REF}{$ref_spkr}{TIME};
+	$ref_type = $spkr_info->{REF}{$ref_spkr}{TYPE};
+	$stats{REF}{$ref_type}++;
+	$sys_spkr = $spkr_map->{$ref_spkr};
+	$sys_type = defined $sys_spkr ? $spkr_info->{SYS}{$sys_spkr}{TYPE} : $miss_name;
+	$stats{JOINT}{$ref_type}{$sys_type}++;
+	$imap{$sys_spkr} = $ref_spkr if defined $sys_spkr;
+    }
+    foreach $sys_spkr (keys %{$spkr_info->{SYS}}) {
+	next unless $spkr_info->{SYS}{$sys_spkr}{TIME};
+	$sys_type = $spkr_info->{SYS}{$sys_spkr}{TYPE};
+	$stats{SYS}{$sys_type}++;
+	$stats{JOINT}{$fa_name}{$sys_type}++
+	    unless defined $imap{$sys_spkr};
+    }
+    return {%stats};
+}
+
+#################################
+
+sub score_speaker_segments {
+
+    my ($stats, $score_segs, $ref_wds, $spkr_map, $spkr_info) = @_;
+    my ($ref_spkr, $ref_type, $sys_spkr, $sys_type, %type_stats);
+    my (@ref_wds, $wrd, $seg, $seg_dur, $nref, $nsys);
+
+    @ref_wds = @$ref_wds;
+    $wrd = shift @ref_wds;
+    foreach $seg (@$score_segs) {
+	$seg_dur = $seg->{TDUR};
+	$stats->{SCORED_TIME} += $seg_dur;
+	$nref = keys %{$seg->{REF}};
+	$nsys = keys %{$seg->{SYS}};
+	$stats->{SCORED_SPEECH} += $nref ? $seg_dur : 0;
+	$stats->{MISSED_SPEECH} += ($nref and not $nsys) ? $seg_dur : 0;
+	$stats->{FALARM_SPEECH} += ($nsys and not $nref) ? $seg_dur : 0;
+	$stats->{SCORED_SPEAKER} += $seg_dur*$nref;
+	$stats->{MISSED_SPEAKER} += $seg_dur*max($nref-$nsys,0);
+	$stats->{FALARM_SPEAKER} += $seg_dur*max($nsys-$nref,0);
+
+	my $scored_wrds = my $missed_wrds = my $error_wrds = 0;
+	$wrd = shift @ref_wds while ($wrd and $wrd->{TMID} < $seg->{TBEG});
+	while ($wrd and $wrd->{TMID} <= $seg->{TEND}) {
+	    next unless $wrd->{SCOREABLE};
+	    $scored_wrds++;
+	    $missed_wrds++ if not $nsys;
+	    $error_wrds++ unless speakers_match ($seg->{REF}, $seg->{SYS}, $spkr_map);
+	    $wrd = shift @ref_wds;
+	}
+	$stats->{SCORED_WORDS} += $scored_wrds;
+	$stats->{MISSED_WORDS} += $missed_wrds;
+	$stats->{ERROR_WORDS} += $error_wrds;
+
+	my $nmap = 0, my %num_types;
+	foreach $ref_spkr (keys %{$seg->{REF}}) {
+	    $ref_type = $spkr_info->{REF}{$ref_spkr}{TYPE};
+	    $num_types{REF}{$ref_type}++;
+	    $sys_spkr = $spkr_map->{$ref_spkr};
+	    $nmap++ if defined $sys_spkr and defined $seg->{SYS}{$sys_spkr};
+	}
+	$stats->{SPEAKER_ERROR} += $seg_dur*(min($nref,$nsys) - $nmap);
+
+	foreach $sys_spkr (keys %{$seg->{SYS}}) {
+	    $sys_type = $spkr_info->{SYS}{$sys_spkr}{TYPE};
+	    $num_types{SYS}{$sys_type}++;
+	}
+	foreach $ref_type (keys %{$num_types{REF}}) {
+	    $nref = $num_types{REF}{$ref_type};
+	    $type_stats{REF}{$ref_type} += $nref*$seg_dur;
+	    foreach $sys_type (keys %{$num_types{SYS}}) {
+		$nsys = $num_types{SYS}{$sys_type};
+		$type_stats{JOINT}{$ref_type}{$sys_type} += min($nref,$nsys)*$seg_dur;
+	    }
+	    $type_stats{JOINT}{$ref_type}{$miss_name} += max($nref-$nsys,0)*$seg_dur;
+	}
+	foreach $sys_type (keys %{$num_types{SYS}}) {
+	    $nsys = $num_types{SYS}{$sys_type};
+	    $type_stats{SYS}{$sys_type} += $nsys*$seg_dur;
+	    $type_stats{JOINT}{$fa_name}{$sys_type} += max($nsys-$nref,0)*$seg_dur;
+	}
+    }
+    $stats->{TYPE}{TIME} = {%type_stats};
+}
+
+#################################
+
+sub speakers_match {
+
+    my ($ref_spkrs, $sys_spkrs, $spkr_map) = @_;
+
+    return 0 unless keys %$ref_spkrs == keys %$sys_spkrs;
+    foreach my $ref_spkr (keys %$ref_spkrs) {
+	return 0 unless (defined $spkr_map->{$ref_spkr} and
+			 defined $sys_spkrs->{$spkr_map->{$ref_spkr}});
+    }
+    return 1;
+}
+
+#################################
+
+sub add_collars_to_uem {
+
+    my ($uem_eval, $ref_data) = @_;
+    my (@events, $event, $uem, $uem_score, $spkr, $spkr_seg, $tbeg, $evaluate);
+
+    foreach $uem (@$uem_eval) {
+	push @events, {EVENT => "BEG", TIME => $uem->{TBEG}};
+	push @events, {EVENT => "END", TIME => $uem->{TEND}};
+    }
+#add no-score collars
+    foreach $spkr (keys %$ref_data) {
+	foreach $spkr_seg (@{$ref_data->{$spkr}}) {
+	    push @events, {EVENT => "END", TIME => $spkr_seg->{TBEG}-$collar};
+	    push @events, {EVENT => "BEG", TIME => $spkr_seg->{TBEG}+$collar};
+	    push @events, {EVENT => "END", TIME => $spkr_seg->{TEND}-$collar};
+	    push @events, {EVENT => "BEG", TIME => $spkr_seg->{TEND}+$collar};
+	}
+    }
+    @events = sort {($a->{TIME} < $b->{TIME} ? -1 :
+		     ($a->{TIME} > $b->{TIME} ? 1 :
+		      $a->{EVENT} eq "END"))} @events;
+    $evaluate = 0;
+    foreach $event (@events) {
+	if ($event->{EVENT} eq "BEG") {
+	    $evaluate++;
+	    $tbeg = $event->{TIME} if $evaluate == 1;
+	}
+	else {
+	    $evaluate--;
+	    push @$uem_score, {TBEG => $tbeg, TEND => $event->{TIME}}
+	        if $evaluate == 0 and $event->{TIME} > $tbeg;
+	}
+    }
+    return $uem_score;
+}
+
+#################################
+
+sub exclude_overlapping_speech_from_uem {
+
+    my ($uem_data, $rttm_data) = @_;
+    my ($token, @spkr_events, $event, $spkr_cnt, $tbeg_overlap, $uem, @events, $uem_ex);
+
+#overlapping speech computed from SPEAKER data
+    foreach $token (@$rttm_data) {
+	next unless ($token->{TYPE} eq "SPEAKER" and
+		     $token->{TDUR} > 0);
+	push @spkr_events, {EVENT => "BEG", TIME => $token->{TBEG}, SPKR => $token->{SPKR}};
+	push @spkr_events, {EVENT => "END", TIME => $token->{TEND}, SPKR => $token->{SPKR}};
+    }
+    @spkr_events = sort {($a->{TIME} < $b->{TIME} ? -1 :
+			  ($a->{TIME} > $b->{TIME} ? 1 :
+			   $a->{EVENT} eq "BEG"))} @spkr_events;
+
+#create noscore zones
+    foreach $event (@spkr_events) {
+	if ($event->{EVENT} eq "BEG") {
+	    next unless ++$spkr_cnt == 2;
+	    $tbeg_overlap = $event->{TIME};
+	}
+	else {
+	    next unless --$spkr_cnt == 1;
+	    push @events, {TYPE => "NSZ", EVENT => "BEG", TIME => $tbeg_overlap};
+	    push @events, {TYPE => "NSZ", EVENT => "END", TIME => $event->{TIME}};
+	}
+    }
+	
+#merge noscore zones with UEM data
+    foreach $uem (@$uem_data) {
+	next unless $uem->{TEND}-$uem->{TBEG} > 0;
+	push @events, {TYPE => "UEM", EVENT => "BEG", TIME => $uem->{TBEG}};
+	push @events, {TYPE => "UEM", EVENT => "END", TIME => $uem->{TEND}};
+    }
+    @events = sort {($a->{TIME} < $b->{TIME} ? -1 :
+		     ($a->{TIME} > $b->{TIME} ? 1 :
+		      $a->{EVENT} eq "BEG"))} @events;
+
+    my $tbeg = my $evl_cnt = my $nsz_cnt = my $evaluating = 0;
+    foreach $event (@events) {
+	$evl_cnt += $event->{EVENT} eq "BEG" ? 1 : -1 if $event->{TYPE} eq "UEM";
+	$nsz_cnt += $event->{EVENT} eq "BEG" ? 1 : -1 if $event->{TYPE} eq "NSZ";
+	if ($evaluating and
+	    ($evl_cnt == 0 or $nsz_cnt > 0) and
+	    $event->{TIME} > $tbeg) {
+	    push @$uem_ex, {TBEG => $tbeg, TEND => $event->{TIME}};
+	    $evaluating = 0;
+	}
+	elsif ($evl_cnt > 0 and $nsz_cnt == 0) {
+	    $tbeg = $event->{TIME};
+	    $evaluating = 1;
+	}
+    }
+	    
+    return $uem_ex;
+}
+
+#################################
+
+sub add_exclusion_zones_to_uem {
+
+    my ($excluded_tokens, $uem_score, $rttm_data, $max_extend) = @_;
+    my (@events, $event, $uem, $uem_ex, $spkr, $spkr_seg, $tbeg, $evaluating, $token);
+    my (@ns_events, $evl_cnt, $lex_cnt, $nsz_cnt, $tstart, $tstop);
+    my ($tbeg_lex, $tbeg_nsz, $tend_lex, $tend_nsz, $tseg);
+
+    return $uem_score unless defined $excluded_tokens and (keys %$excluded_tokens) > 0;
+
+#gather data needed to create noscore zones
+    foreach $token (@$rttm_data) {
+	if ($token->{TYPE} eq "LEXEME" and
+	    not defined $excluded_tokens->{LEXEME}{$token->{SUBT}} and
+	    $token->{TDUR} > 0) {
+	    push @ns_events, {TYPE => "LEX", EVENT => "BEG", TIME => $token->{TBEG}};
+	    push @ns_events, {TYPE => "LEX", EVENT => "END", TIME => $token->{TEND}};
+	}
+	elsif ($token->{TYPE} eq "SPEAKER" and
+	       $token->{TDUR} > 0) {
+	    push @ns_events, {TYPE => "SEG", EVENT => "BEG", TIME => $token->{TBEG}};
+	    push @ns_events, {TYPE => "SEG", EVENT => "END", TIME => $token->{TEND}};
+	}
+	elsif (defined $excluded_tokens->{$token->{TYPE}}{$token->{SUBT}} and
+	       $token->{TDUR} > 0) {
+	    push @ns_events, {TYPE => "NSZ", EVENT => "BEG", TIME => $token->{TBEG}};
+	    push @ns_events, {TYPE => "NSZ", EVENT => "END", TIME => $token->{TEND}};
+	}
+    }
+    @ns_events = sort {($a->{TIME} < $b->{TIME} ? -1 :
+			($a->{TIME} > $b->{TIME} ? 1 :
+			 $a->{EVENT} eq "BEG"))} @ns_events;
+
+#create noscore zones
+    $evaluating = 1;
+    $max_extend = $epsilon if not $max_extend or $max_extend < $epsilon;
+    $tseg = $tbeg_nsz = $tbeg_lex = $tend_nsz = $tend_lex = 0;
+    $lex_cnt = $nsz_cnt = 0;
+    foreach $event (@ns_events) {
+	if ($event->{TYPE} eq "LEX") {
+	    if ($event->{EVENT} eq "BEG") {
+		$tbeg_lex = $event->{TIME} if $lex_cnt++ == 0;
+	    }
+	    else {
+		$tend_lex = $event->{TIME} if $lex_cnt-- == 1;
+	    }
+	}
+	elsif ($event->{TYPE} eq "NSZ") {
+	    if ($event->{EVENT} eq "BEG") {
+		$tbeg_nsz = $event->{TIME} if $nsz_cnt++ == 0;
+	    }
+	    else {
+		$tend_nsz = $event->{TIME} if $nsz_cnt-- == 1;
+	    }
+	}
+	elsif ($event->{TYPE} eq "SEG") {
+	    $tseg = $event->{TIME};
+	}
+
+	if ($evaluating) {
+	    next if ($nsz_cnt == 0 or
+		     $event->{TYPE} ne "NSZ");
+	    $tstop = ($lex_cnt > 0 ? $event->{TIME} :
+		      max($tend_lex, $tseg, $event->{TIME}-$max_extend));
+	    push @events, {TYPE => "NSZ", EVENT => "BEG", TIME => $tstop};
+	    $evaluating = 0;
+	}
+	elsif ($nsz_cnt == 0 and
+	       ($lex_cnt > 0 or
+		$event->{TYPE} eq "SEG")) {
+	    $tstart = min($tend_nsz+$max_extend, $event->{TIME});
+	    push @events, {TYPE => "NSZ", EVENT => "END", TIME => $tstart};
+	    $evaluating = 1;
+	}
+	elsif ($nsz_cnt == 1 and
+	       $event->{TYPE} eq "NSZ" and
+	       $event->{EVENT} eq "BEG" and
+	       $event->{TIME} > $tend_nsz+2*$max_extend) {
+	    push @events, {TYPE => "NSZ", EVENT => "END", TIME => $tend_nsz+$max_extend};
+	    push @events, {TYPE => "NSZ", EVENT => "BEG", TIME => $event->{TIME}-$max_extend};
+	    $evaluating = 0;
+	}
+    }
+
+#merge noscore zones with UEM data
+    foreach $uem (@$uem_score) {
+	next unless $uem->{TEND}-$uem->{TBEG} > 0;
+	push @events, {TYPE => "UEM", EVENT => "BEG", TIME => $uem->{TBEG}};
+	push @events, {TYPE => "UEM", EVENT => "END", TIME => $uem->{TEND}};
+    }
+    @events = sort {($a->{TIME} < $b->{TIME} ? -1 :
+		     ($a->{TIME} > $b->{TIME} ? 1 :
+		      $a->{EVENT} eq "BEG"))} @events;
+    $evl_cnt = $evaluating = 0;
+    foreach $event (@events) {
+	$evl_cnt += $event->{EVENT} eq "BEG" ? 1 : -1 if $event->{TYPE} eq "UEM";
+	$nsz_cnt += $event->{EVENT} eq "BEG" ? 1 : -1 if $event->{TYPE} eq "NSZ";
+	if ($evaluating and
+	    ($evl_cnt == 0 or $nsz_cnt > 0) and
+	    $event->{TIME} > $tbeg) {
+	    push @$uem_ex, {TBEG => $tbeg, TEND => $event->{TIME}};
+	    $evaluating = 0;
+	}
+	elsif ($evl_cnt > 0 and $nsz_cnt == 0) {
+	    $tbeg = $event->{TIME};
+	    $evaluating = 1;
+	}
+    }
+	    
+    return $uem_ex;
+}
+
+#################################
+
+sub uem_from_rttm {
+
+    my ($rttm_data) = @_;
+    my ($token, $tbeg, $tend);
+
+    ($tbeg, $tend) = (1E30, 0);
+    foreach $token (@$rttm_data) {
+	($tbeg, $tend) = (min($tbeg,$token->{TBEG}), max($tend,$token->{TEND})) if
+	    $token->{TYPE} =~ /^(SEGMENT|SPEAKER|SU|EDIT|FILLER|IP|CB|A\/P|LEXEME|NON-LEX)$/;
+    }
+
+    return [{TBEG => $tbeg, TEND => $tend}];
+}
+
+#################################
+
+sub create_speaker_segs {
+
+    my ($uem_score, $ref_data, $sys_data) = @_;
+    my ($spkr, $seg, @events, $event, $uem, $segments, $tbeg, $tend);
+    my ($evaluate, %ref_spkrs, %sys_spkrs, $spkrs);
+
+    foreach $uem (@$uem_score) {
+	next unless $uem->{TEND} > $uem->{TBEG}+$epsilon;
+	push @events, {TYPE => "UEM", EVENT => "BEG", TIME => $uem->{TBEG}};
+	push @events, {TYPE => "UEM", EVENT => "END", TIME => $uem->{TEND}};
+    }
+    foreach $spkr (keys %$ref_data) {
+	foreach $seg (@{$ref_data->{$spkr}}) {
+	    next unless $seg->{TDUR} > 0;
+	    push @events, {TYPE => "REF", SPKR => $spkr, EVENT => "BEG", TIME => $seg->{TBEG}};
+	    push @events, {TYPE => "REF", SPKR => $spkr, EVENT => "END", TIME => $seg->{TEND}};
+	}
+    }
+    foreach $spkr (keys %$sys_data) {
+	foreach $seg (@{$sys_data->{$spkr}}) {
+	    next unless $seg->{TDUR} > 0;
+	    push @events, {TYPE => "SYS", SPKR => $spkr, EVENT => "BEG", TIME => $seg->{RTBEG}};
+	    push @events, {TYPE => "SYS", SPKR => $spkr, EVENT => "END", TIME => $seg->{RTEND}};
+	}
+    }
+    @events = sort {($a->{TIME} < $b->{TIME}-$epsilon  ? -1 :
+		     ($a->{TIME} > $b->{TIME}+$epsilon ?  1 :
+		      ($a->{EVENT} eq "END"        ? -1 : 1)))} @events;
+    $evaluate = 0;
+    foreach $event (@events) {
+	if ($evaluate and $tbeg<$event->{TIME}) {
+	    $tend = $event->{TIME};
+	    push @$segments, {REF => {%ref_spkrs},
+			      SYS => {%sys_spkrs},
+			      TBEG => $tbeg,
+			      TEND => $tend,
+			      TDUR => $tend-$tbeg};
+	    $tbeg = $tend;
+	}
+	if ($event->{TYPE} eq "UEM") {
+	    $evaluate = $event->{EVENT} eq "BEG";
+	    $tbeg = $event->{TIME} if $evaluate;
+	}
+	else {
+	    $spkrs = $event->{TYPE} eq "REF" ? \%ref_spkrs : \%sys_spkrs;
+	    ($event->{EVENT} eq "BEG") ? $spkrs->{$event->{SPKR}}++ : $spkrs->{$event->{SPKR}}--;
+	    $spkrs->{$event->{SPKR}} <= 1 or warn
+	        "WARNING:  speaker $event->{SPKR} speaking more than once at time $event->{TIME}\n";
+	    delete $spkrs->{$event->{SPKR}} unless $spkrs->{$event->{SPKR}};
+	}
+    }
+    return $segments;
+}
+
+#################################
+
+sub sd_performance_analysis {
+
+    my ($scores, $subtypes) = @_;
+    my ($file, $chnl, $class, $kind, $ref_type, $sys_type);
+    my ($xscores, %cum_scores, $count);
+
+#accumulate statistics
+    foreach $file (keys %$scores) {
+	foreach $chnl (keys %{$scores->{$file}}) {
+	    $xscores = $scores->{$file}{$chnl};
+	    foreach $ref_type (keys %$xscores) {
+		next if $ref_type eq "TYPE";
+		$count = $xscores->{$ref_type};
+		$cum_scores{ALL}{$ref_type} += $count;
+		$cum_scores{"c=$chnl f=$file"}{$ref_type} += $xscores->{$ref_type} if $opt_a =~ /c/i and $opt_a =~ /f/i;
+		$cum_scores{"c=$chnl"}{$ref_type} += $xscores->{$ref_type} if $opt_a =~ /c/i and not $opt_a =~ /f/i;
+		$cum_scores{"f=$file"}{$ref_type} += $xscores->{$ref_type} if $opt_a =~ /f/i and not $opt_a =~ /c/i;
+	    }
+	    $xscores = $xscores->{TYPE};
+	    foreach my $class ("TIME", "NSPK") {
+		foreach my $kind ("REF", "SYS") {
+		    foreach $ref_type (keys %{$xscores->{$class}{$kind}}) {
+			$count = $xscores->{$class}{$kind}{$ref_type};
+			$cum_scores{ALL}{TYPE}{$class}{$kind}{$ref_type} += $count;
+			$cum_scores{"c=$chnl f=$file"}{TYPE}{$class}{$kind}{$ref_type} += $count if $opt_a =~ /c/i and $opt_a =~ /f/i;
+			$cum_scores{"c=$chnl"}{TYPE}{$class}{$kind}{$ref_type} += $count if $opt_a =~ /c/i and not $opt_a =~ /f/i;
+			$cum_scores{"f=$file"}{TYPE}{$class}{$kind}{$ref_type} += $count if $opt_a =~ /f/i and not $opt_a =~ /c/i;
+		    }
+		}
+		foreach $ref_type (keys %{$xscores->{$class}{JOINT}}) {
+		    foreach $sys_type (keys %{$xscores->{$class}{JOINT}{$ref_type}}) {
+			$count = $xscores->{$class}{JOINT}{$ref_type}{$sys_type};
+			$cum_scores{ALL}{TYPE}{$class}{JOINT}{$ref_type}{$sys_type} += $count;
+			$cum_scores{"c=$chnl f=$file"}{TYPE}{$class}{JOINT}{$ref_type}{$sys_type} += $count if $opt_a =~ /c/i and $opt_a =~ /f/i;
+			$cum_scores{"c=$chnl"}{TYPE}{$class}{JOINT}{$ref_type}{$sys_type} += $count if $opt_a =~ /c/i and not $opt_a =~ /f/i;
+			$cum_scores{"f=$file"}{TYPE}{$class}{JOINT}{$ref_type}{$sys_type} += $count if $opt_a =~ /f/i and not $opt_a =~ /c/i;
+		    }
+		}
+	    }
+	}
+    }
+
+    foreach my $condition (sort keys %cum_scores) {
+	print_sd_scores ($condition, $cum_scores{$condition}) if $condition !~ /ALL/;
+    }
+    print_sd_scores ("ALL", $cum_scores{ALL});
+}
+
+#################################
+
+sub print_sd_scores {
+
+    my ($condition, $scores) = @_;
+
+    printf "\n*** Performance analysis for Speaker Diarization for $condition ***\n\n";
+
+    #printf "    EVAL TIME =%10.2f secs\n", $scores->{EVAL_TIME};
+    #printf "  EVAL SPEECH =%10.2f secs (%5.1f percent of evaluated time)\n", $scores->{EVAL_SPEECH},
+    #    100*$scores->{EVAL_SPEECH}/$scores->{EVAL_TIME};
+    #printf "  SCORED TIME =%10.2f secs (%5.1f percent of evaluated time)\n",
+    #    $scores->{SCORED_TIME}, 100*$scores->{SCORED_TIME}/$scores->{EVAL_TIME};
+    #printf "SCORED SPEECH =%10.2f secs (%5.1f percent of scored time)\n",
+    #    $scores->{SCORED_SPEECH}, 100*$scores->{SCORED_SPEECH}/$scores->{SCORED_TIME};
+    #printf "   EVAL WORDS =%7d        \n", $scores->{EVAL_WORDS};
+    #printf " SCORED WORDS =%7d         (%5.1f percent of evaluated words)\n",
+    #    $scores->{SCORED_WORDS}, 100*$scores->{SCORED_WORDS}/$scores->{EVAL_WORDS};
+    #print "---------------------------------------------\n";
+    #printf "MISSED SPEECH =%10.2f secs (%5.1f percent of scored time)\n",
+    ##    $scores->{MISSED_SPEECH}, 100*$scores->{MISSED_SPEECH}/$scores->{SCORED_TIME};
+    #printf "FALARM SPEECH =%10.2f secs (%5.1f percent of scored time)\n",
+    #    $scores->{FALARM_SPEECH}, 100*$scores->{FALARM_SPEECH}/$scores->{SCORED_TIME};
+    #printf " MISSED WORDS =%7d         (%5.1f percent of scored words)\n",
+    #    $scores->{MISSED_WORDS}, 100*$scores->{MISSED_WORDS}/$scores->{SCORED_WORDS};
+    #print "---------------------------------------------\n";
+    #printf "SCORED SPEAKER TIME =%10.2f secs (%5.1f percent of scored speech)\n",
+    #    $scores->{SCORED_SPEAKER}, 100*$scores->{SCORED_SPEAKER}/$scores->{SCORED_SPEECH};
+    #printf "MISSED SPEAKER TIME =%10.2f secs (%5.1f percent of scored speaker time)\n",
+    #    $scores->{MISSED_SPEAKER}, 100*$scores->{MISSED_SPEAKER}/$scores->{SCORED_SPEAKER};
+    #printf "FALARM SPEAKER TIME =%10.2f secs (%5.1f percent of scored speaker time)\n",
+    #    $scores->{FALARM_SPEAKER}, 100*$scores->{FALARM_SPEAKER}/$scores->{SCORED_SPEAKER};
+    #printf " SPEAKER ERROR TIME =%10.2f secs (%5.1f percent of scored speaker time)\n",
+    #    $scores->{SPEAKER_ERROR}, 100*$scores->{SPEAKER_ERROR}/$scores->{SCORED_SPEAKER};
+    #printf "SPEAKER ERROR WORDS =%7d         (%5.1f percent of scored speaker words)\n",
+    #    $scores->{ERROR_WORDS}, 100*$scores->{ERROR_WORDS}/$scores->{SCORED_WORDS};
+    #print "---------------------------------------------\n";
+    #
+    #
+    #
+    printf "SCORED SPEAKER TIME =%f secs\n", $scores->{SCORED_SPEAKER};
+    printf "MISSED SPEAKER TIME =%f secs\n", $scores->{MISSED_SPEAKER};
+    printf "FALARM SPEAKER TIME =%f secs\n", $scores->{FALARM_SPEAKER};
+    printf "SPEAKER ERROR TIME =%f secs\n", $scores->{SPEAKER_ERROR};    
+#    if ($condition eq "ALL") {
+#      printf " OVERALL SPEAKER DIARIZATION ERROR = %.2f percent of scored speaker time\n",
+#         100*($scores->{MISSED_SPEAKER} + $scores->{FALARM_SPEAKER} + $scores->{SPEAKER_ERROR})/
+#	    $scores->{SCORED_SPEAKER};
+#    } else {
+      printf " OVERALL SPEAKER DIARIZATION ERROR = %.2f percent of scored speaker time  %s\n",
+         100*($scores->{MISSED_SPEAKER} + $scores->{FALARM_SPEAKER} + $scores->{SPEAKER_ERROR})/
+    	    $scores->{SCORED_SPEAKER}, "`($condition)";
+#    }
+    print "---------------------------------------------\n";
+    printf " Speaker type confusion matrix -- speaker weighted\n";
+    summarize_speaker_type_performance ("NSPK", $scores->{TYPE}{NSPK});
+    print "---------------------------------------------\n";
+    printf " Speaker type confusion matrix -- time weighted\n";
+    summarize_speaker_type_performance ("TIME", $scores->{TYPE}{TIME});
+    print "---------------------------------------------\n";
+}
+
+#################################
+
+sub summarize_speaker_type_performance {
+
+    my ($class, $stats) = @_;
+    my ($ref_type, $sys_type, $sys_stat);
+
+    print "  REF\\SYS (count)      " if $class eq "NSPK";
+    print "  REF\\SYS (seconds)    " if $class eq "TIME";
+    foreach $sys_type ((sort keys %{$stats->{SYS}}), $miss_name) {
+	printf "%-20s", $sys_type;
+    }
+    print "\n";
+
+    my $ref_tot = 0;
+    foreach $ref_type (keys %{$stats->{REF}}) {
+	$ref_tot += $stats->{REF}{$ref_type};
+    }
+    
+    foreach $ref_type ((sort keys %{$stats->{REF}}), $fa_name) {
+	printf "%-16s", $ref_type;
+	foreach $sys_type ((sort keys %{$stats->{SYS}}), $miss_name) {
+	    next if $ref_type eq $fa_name and $sys_type eq $miss_name;
+	    $sys_stat = $stats->{JOINT}{$ref_type}{$sys_type};
+	    $sys_stat = 0 unless defined $sys_stat;
+	    printf "%11d /%6.1f",   $sys_stat, min(999.9,$ref_tot ? 100*$sys_stat/$ref_tot : 9E9) if $class eq "NSPK";
+	    printf "%11.2f /%6.1f", $sys_stat, min(999.9,$ref_tot ? 100*$sys_stat/$ref_tot : 9E9) if $class eq "TIME";
+	    print "%";
+	}
+	print "\n";
+    }
+}
+
+#################################
+
+sub map_speakers {
+
+    my ($spkr_overlap) = @_;
+
+#compute the costs
+    my $cost = {};
+    foreach my $ref_spkr (keys %$spkr_overlap) {
+	foreach my $sys_spkr (keys %{$spkr_overlap->{$ref_spkr}}) {
+	    $cost->{$ref_spkr}{$sys_spkr} = -$spkr_overlap->{$ref_spkr}{$sys_spkr};
+	}
+    }
+
+#find the mapping that maximizes the cumulative match time between ref and sys spkrs
+    my $map = weighted_bipartite_graph_match ($cost);
+    return $map;
+}
+
+#################################
+
+sub inverse_speaker_map {
+
+    my ($speaker_map) = @_;
+    my ($speaker, $inverse_speaker_map);
+
+    foreach $speaker (keys %$speaker_map) {
+	$inverse_speaker_map->{$speaker_map->{$speaker}} = $speaker;
+    }
+    return $inverse_speaker_map;
+}
+
+#################################
+
+sub print_speaker_map {
+
+    my ($spkr_map, $time_overlap) = @_;
+    my ($ref_spkr, $sys_spkr);
+
+    foreach $ref_spkr (sort keys %$time_overlap) {
+	$sys_spkr = $spkr_map->{$ref_spkr};
+	print "'$ref_spkr' => ", defined $sys_spkr ? "'$sys_spkr'\n" : "<nil>\n";
+	foreach $sys_spkr (sort keys %{$time_overlap->{$ref_spkr}}) {
+	    my $time = $time_overlap->{$ref_spkr}{$sys_spkr};
+	    printf "%9.2f secs matched to '$sys_spkr'\n", defined $time ? $time : 0;
+	}
+    }
+}
+
+#################################
+
+sub start_speaker_map_file {
+    my ($outFile) = @_;
+    open (FILE, ">$outFile") || die "Error: Unable to open speaker map CSV file '$outFile' for write";
+    print FILE "File,Channel,RefSpeaker,SysSpeaker,isMapped,timeOverlap\n";
+    close FILE;
+}
+
+#################################
+
+sub update_speaker_map_file {
+
+    my ($spkr_map, $time_overlap, $file, $chnl, $outFile) = @_;
+
+    open (FILE, ">>$outFile") || die "Error: Failed to open speaker map CSV file '$outFile' for append";
+    foreach my $ref_spkr (sort keys %$time_overlap) {
+	foreach my $sys_spkr (sort keys %{$time_overlap->{$ref_spkr}}) {
+	    my $time = sprintf("%.4f",$time_overlap->{$ref_spkr}{$sys_spkr});
+	    print FILE "$file,$chnl,$ref_spkr,$sys_spkr";
+	    print FILE ",".((defined($spkr_map->{$ref_spkr}) && $sys_spkr eq $spkr_map->{$ref_spkr}) ? "mapped" : "notmapped");
+	    print FILE ",$time\n";
+	}
+    }
+    close FILE,
+}
+
+#################################
+
+sub print_speaker_segs {
+
+    my ($segs, $file, $chnl) = @_;
+    my ($seg, @segs, $spkr, $sep);
+
+    @segs = @$segs;
+    while ($seg = shift @segs) {
+	printf "beg/dur/end = %7.3f/%7.3f/%7.3f; REF = (", $seg->{TBEG}, $seg->{TDUR}, $seg->{TEND};
+	print "<none>" unless defined keys %{$seg->{REF}};
+	$sep = "";
+	foreach $spkr (sort keys %{$seg->{REF}}) {
+	    print "$sep$spkr";
+	    $sep = ", ";
+	}
+	print "); SYS = (";
+	$sep = "";
+	print "<none>" unless defined keys %{$seg->{SYS}};
+	foreach $spkr (sort keys %{$seg->{SYS}}) {
+	    print "$sep$spkr";
+	    $sep = ", ";
+	}
+	print "); file = $file; chnl = $chnl\n";
+    }
+}
+
+#################################
+
+sub sort_time {
+
+    my ($token, $key) = @_;
+
+    my $time = $token->{"R$key"};
+    $time = $token->{$key} if not defined $time;
+    return int(100*$time+0.5)/100
+}
+
+#################################
+
+sub display_metadata_mapping {
+
+    my ($file, $chnl, $ref_rttm, $sys_rttm, $ref_wds) = @_;
+    my ($type, $sys_token, @events, $event, %type_cnt);
+    my ($mapped, $beg_mapped, $end_mapped, $whole, $spkr_map, $sys_speaker_field);
+    my %ref_tag = (NOSCORE        => "XS", NO_RT_METADATA => "NM", SEGMENT        => "SG", SPEAKER        => "SP",
+		   SU             => "SU", "A/P"          => "AP", "NON-SPEECH"   => "NS", EDIT           => "ED",
+		   FILLER         => "FL", IP             => "IP", CB             => "CB", "NON-LEX"      => "NL",
+		   LEXEME         => "LX");
+    my %sys_tag = (SPEAKER        => "SP", SU             => "SU", EDIT           => "ED", FILLER         => "FL",
+		   IP             => "IP", LEXEME         => "LX");
+
+#create a vector of rttm events
+    foreach my $token (@$ref_rttm) {
+	next unless defined $ref_tag{$token->{TYPE}};
+	push @events, {EVENT => "BEG", TIME => sort_time ($token, "TBEG"), TYPE => $token->{TYPE}, SRC => "REF", TOKEN => $token};
+	push @events, {EVENT => "END", TIME => sort_time ($token, "TEND"), TYPE => $token->{TYPE}, SRC => "REF", TOKEN => $token}
+	    unless $token->{TYPE} =~ /^(IP|CB)$/;
+	$token->{COUNT} = ++$type_cnt{$token->{TYPE}};
+    }
+    foreach my $token (@$sys_rttm) {
+	next unless defined $sys_tag{$token->{TYPE}};
+	push @events, {EVENT => "BEG", TIME => sort_time ($token, "TBEG"), TYPE => $token->{TYPE}, SRC => "SYS", TOKEN => $token};
+	push @events, {EVENT => "END", TIME => sort_time ($token, "TEND"), TYPE => $token->{TYPE}, SRC => "SYS", TOKEN => $token}
+	    unless $token->{TYPE} =~ /^(IP|CB)$/;
+    }
+
+    @events = sort sort_events @events;
+
+    $spkr_map = inverse_speaker_map ($speaker_map{$file}{$chnl});
+
+    print "\nChronological display of sys data aligned with ref data for file '$file', channel '$chnl'\n";
+    print "----------------------- reference ----------------------- | mapped | --------------------- system output ---------------------\n";
+    print "    --type-- -subtyp- -----word/spkr-----  -tbeg-  -tend- | ref_ID |     --type-- -subtyp- -----word/spkr-----  -tbeg-  -tend-\n";
+
+    while (@events) {
+        my ($token, $ref, $ref_beg, $ref_end, $sys, $sys_beg, $sys_end);
+	while (@events and
+	       (not $token or
+		$token eq $events[0]->{TOKEN} or 
+		($events[0]->{TOKEN}{MAPPTR} and
+		 $token eq $events[0]->{TOKEN}{MAPPTR}))) { # collect events to display on the same line
+	    $event = shift @events;
+	    $token = $event->{TOKEN};
+	    $event->{SRC} eq "REF" ? ($ref = $token, ($event->{EVENT} eq "BEG" ? $ref_beg : $ref_end) = 1) :
+	                             ($sys = $token, ($event->{EVENT} eq "BEG" ? $sys_beg : $sys_end) = 1);
+	}
+	if ($ref) {
+	    printf "%-3.3s %-8.8s %-8.8s %-19.19s%8s%8s | %-6.6s |",
+	    (($ref->{TYPE} =~ /^(IP|CB)$/ or ($ref_beg and $ref_end)) ? "" : ($ref_beg ? "beg" : "end")),
+	    $ref->{TYPE}, $ref->{SUBT},
+	    $ref->{WORD} ne "<na>" ? uc $ref->{WORD} : $ref->{SPKR},
+	    $ref_beg ? (sprintf "%8.2f", $ref->{TBEG}) : "",
+	    $ref_end ? (sprintf "%8.2f", $ref->{TEND}) : "",
+	    $ref->{MAPPTR} ? (sprintf "%s%d", $ref_tag{$ref->{TYPE}}, $ref->{COUNT}) :
+		($md_subtypes{$ref->{TYPE}} ? "*Miss*" : "");
+	} elsif ($sys) {
+	    $ref = $sys->{MAPPTR};
+	    printf "%s%8s%8s | %-6.6s |", " "x41,
+	    $sys_beg ? (sprintf "%8.2f", defined $sys->{RTBEG} ? $sys->{RTBEG} : $sys->{TBEG}) : "",
+	    $sys_end ? (sprintf "%8.2f", defined $sys->{RTEND} ? $sys->{RTEND} : $sys->{TEND}) : "",
+	    $ref ? (sprintf "%s%d", $sys_tag{$ref->{TYPE}}, $ref->{COUNT}) :
+		($md_subtypes{$sys->{TYPE}} ? "**FA**" : "");
+	}
+	if ($sys) {
+	    $sys_speaker_field = $sys ? $sys->{SPKR} : "";
+	    $sys_speaker_field .= "=>$spkr_map->{$sys->{SPKR}}" if $spkr_map->{$sys->{SPKR}};
+	    printf "%3.3s %-8.8s %-8.8s %-19.19s%8s%8s",
+	    (($sys->{TYPE} =~ /^(IP|CB)$/ or ($sys_beg and $sys_end)) ? "" : ($sys_beg ? "beg" : "end")),
+	    $sys->{TYPE}, $sys->{SUBT},
+	    $sys->{WORD} ne "<na>" ? uc $sys->{WORD} : $sys_speaker_field,
+	    $sys_beg ? (sprintf "%8.2f", $sys->{TBEG}) : "",
+	    $sys_end ? (sprintf "%8.2f", $sys->{TEND}) : "";
+	    if ($md_subtypes{$sys->{TYPE}} and $ref = $sys->{MAPPTR}) {
+		my $dw = $sys_end ?
+		    ($ref->{WEND} <= $sys->{RWEND} ? 
+		     delta_metadata_error_words ("END", max($ref->{WEND}, $sys->{RWBEG}-1), $sys->{RWEND}, $ref_wds) :
+		     delta_metadata_error_words ("END", $ref->{WEND}, max($ref->{WBEG}-1, $sys->{RWEND}), $ref_wds)) :
+		    ($ref->{WBEG} <= $sys->{RWBEG} ? 
+		     delta_metadata_error_words ("BEG", $ref->{WBEG}, min(1+$ref->{WEND}, $sys->{RWBEG}), $ref_wds) :
+		     delta_metadata_error_words ("BEG", min($ref->{WBEG}, 1+$sys->{RWEND}), $sys->{RWBEG}, $ref_wds));
+		print " dw=$dw" if abs ($dw) > 0;
+	    }
+	}
+	print "\n";
+    }
+}
+
+#################################
+
+sub sort_events {
+
+    return ($a->{TIME} <=> $b->{TIME} or
+	    $event_order{$a->{EVENT}} <=> $event_order{$b->{EVENT}} or
+	    (($type_order{$a->{TYPE}} <=> $type_order{$b->{TYPE}})*($a->{EVENT} eq "END" ? -1 : 1)) or
+	    $source_order{$a->{SRC}} <=> $source_order{$b->{SRC}});
+}
+
+#################################
+
+sub weighted_bipartite_graph_match {
+    my ($score) = @_;
+    
+    my $required_precision = 1E-12;
+    my $INF = 1E30;
+    my (@row_mate, @col_mate, @row_dec, @col_inc);
+    my (@parent_row, @unchosen_row, @slack_row, @slack);
+    my ($k, $l, $row, $col, @col_min, $cost, %cost);
+    my $t = 0;
+    
+    unless (defined $score) {
+	warn "input to BGM is undefined\n";
+	return undef;
+    }
+    return {} if (keys %$score) == 0;
+    
+    my @rows = sort keys %{$score};
+    my $miss = "miss";
+    $miss .= "0" while exists $score->{$miss};
+    my (@cols, %cols);
+    my $min_score = $INF;
+    foreach $row (@rows) {
+	foreach $col (keys %{$score->{$row}}) {
+	    $min_score = min($min_score,$score->{$row}{$col});
+	    $cols{$col} = $col;
+	}
+    }
+    @cols = sort keys %cols;
+    my $fa = "fa";
+    $fa .= "0" while exists $cols{$fa};
+    my $reverse_search = @rows < @cols; # search is faster when ncols <= nrows
+    foreach $row (@rows) {
+	foreach $col (keys %{$score->{$row}}) {
+	    ($reverse_search ? $cost{$col}{$row} : $cost{$row}{$col})
+		= $score->{$row}{$col} - $min_score;
+	}
+    }
+    push @rows, $miss;
+    push @cols, $fa;
+    if ($reverse_search) {
+	my @xr = @rows;
+	@rows = @cols;
+	@cols = @xr;
+    }
+
+    my $nrows = @rows;
+    my $ncols = @cols;
+    my $nmax = max($nrows,$ncols);
+    my $no_match_cost = -$min_score*(1+$required_precision);
+
+    # subtract the column minimas
+    for ($l=0; $l<$nmax; $l++) {
+	$col_min[$l] = $no_match_cost;
+	next unless $l < $ncols;
+	$col = $cols[$l];
+	foreach $row (keys %cost) {
+	    next unless defined $cost{$row}{$col};
+	    my $val = $cost{$row}{$col};
+	    $col_min[$l] = $val if $val < $col_min[$l];
+	}
+    }
+    
+    # initial stage
+    for ($l=0; $l<$nmax; $l++) {
+	$col_inc[$l] = 0;
+	$slack[$l] = $INF;
+    }
+    
+  ROW:
+    for ($k=0; $k<$nmax; $k++) {
+	$row = $k < $nrows ? $rows[$k] : undef;
+	my $row_min = $no_match_cost;
+	for (my $l=0; $l<$ncols; $l++) {
+	    my $col = $cols[$l];
+	    my $val = ((defined $row and defined $cost{$row}{$col}) ? $cost{$row}{$col}: $no_match_cost) - $col_min[$l];
+	    $row_min = $val if $val < $row_min;
+	}
+	$row_dec[$k] = $row_min;
+	for ($l=0; $l<$nmax; $l++) {
+	    $col = $l < $ncols ? $cols[$l]: undef;
+	    $cost = ((defined $row and defined $col and defined $cost{$row}{$col}) ?
+		     $cost{$row}{$col} : $no_match_cost) - $col_min[$l];
+	    if ($cost==$row_min and not defined $row_mate[$l]) {
+		$col_mate[$k] = $l;
+		$row_mate[$l] = $k;
+                # matching row $k with column $l
+		next ROW;
+	    }
+	}
+	$col_mate[$k] = -1;
+	$unchosen_row[$t++] = $k;
+    }
+    
+    goto CHECK_RESULT if $t == 0;
+    
+    my $s;
+    my $unmatched = $t;
+    # start stages to get the rest of the matching
+    while (1) {
+	my $q = 0;
+	
+	while (1) {
+	    while ($q < $t) {
+		# explore node q of forest; if matching can be increased, update matching
+		$k = $unchosen_row[$q];
+		$row = $k < $nrows ? $rows[$k] : undef;
+		$s = $row_dec[$k];
+		for ($l=0; $l<$nmax; $l++) {
+		    if ($slack[$l]>0) {
+			$col = $l < $ncols ? $cols[$l]: undef;
+			$cost = ((defined $row and defined $col and defined $cost{$row}{$col}) ?
+				 $cost{$row}{$col} : $no_match_cost) - $col_min[$l];
+			my $del = $cost - $s + $col_inc[$l];
+			if ($del < $slack[$l]) {
+			    if ($del == 0) {
+				goto UPDATE_MATCHING unless defined $row_mate[$l];
+				$slack[$l] = 0;
+				$parent_row[$l] = $k;
+				$unchosen_row[$t++] = $row_mate[$l];
+			    }
+			    else {
+				$slack[$l] = $del;
+				$slack_row[$l] = $k;
+			    }
+			}
+		    }
+		}
+		
+		$q++;
+	    }
+	    
+	    # introduce a new zero into the matrix by modifying row_dec and col_inc
+	    # if the matching can be increased update matching
+	    $s = $INF;
+	    for ($l=0; $l<$nmax; $l++) {
+		if ($slack[$l] and ($slack[$l]<$s)) {
+		    $s = $slack[$l];
+		}
+	    }
+	    for ($q = 0; $q<$t; $q++) {
+		$row_dec[$unchosen_row[$q]] += $s;
+	    }
+	    
+	    for ($l=0; $l<$nmax; $l++) {
+		if ($slack[$l]) {
+		    $slack[$l] -= $s;
+		    if ($slack[$l]==0) {
+			# look at a new zero and update matching with col_inc uptodate if there's a breakthrough
+			$k = $slack_row[$l];
+			unless (defined $row_mate[$l]) {
+			    for (my $j=$l+1; $j<$nmax; $j++) {
+				if ($slack[$j]==0) {
+				    $col_inc[$j] += $s;
+				}
+			    }
+			    goto UPDATE_MATCHING;
+			}
+			else {
+			    $parent_row[$l] = $k;
+			    $unchosen_row[$t++] = $row_mate[$l];
+			}
+		    }
+		}
+		else {
+		    $col_inc[$l] += $s;
+		}
+	    }
+	}
+	
+      UPDATE_MATCHING:  # update the matching by pairing row k with column l
+	while (1) {
+	    my $j = $col_mate[$k];
+	    $col_mate[$k] = $l;
+	    $row_mate[$l] = $k;
+            # matching row $k with column $l
+	    last UPDATE_MATCHING if $j < 0;
+	    $k = $parent_row[$j];
+	    $l = $j;
+	}
+	
+	$unmatched--;
+	goto CHECK_RESULT if $unmatched == 0;
+	
+	$t = 0;  # get ready for another stage
+	for ($l=0; $l<$nmax; $l++) {
+	    $parent_row[$l] = -1;
+	    $slack[$l] = $INF;
+	}
+	for ($k=0; $k<$nmax; $k++) {
+	    $unchosen_row[$t++] = $k if $col_mate[$k] < 0;
+	}
+    }  # next stage
+    
+  CHECK_RESULT:  # rigorously check results before handing them back
+    for ($k=0; $k<$nmax; $k++) {
+	$row = $k < $nrows ? $rows[$k] : undef;
+	for ($l=0; $l<$nmax; $l++) {
+	    $col = $l < $ncols ? $cols[$l]: undef;
+	    $cost = ((defined $row and defined $col and defined $cost{$row}{$col}) ?
+		     $cost{$row}{$col} : $no_match_cost) - $col_min[$l];
+	    if ($cost < ($row_dec[$k] - $col_inc[$l])) {
+		next unless $cost < ($row_dec[$k] - $col_inc[$l]) - $required_precision*max(abs($row_dec[$k]),abs($col_inc[$l]));
+		warn "BGM: this cannot happen: cost{$row}{$col} ($cost) cannot be less than row_dec{$row} ($row_dec[$k]) - col_inc{$col} ($col_inc[$l])\n";
+		return undef;
+	    }
+	}
+    }
+    
+    for ($k=0; $k<$nmax; $k++) {
+	$row = $k < $nrows ? $rows[$k] : undef;
+	$l = $col_mate[$k];
+	$col = $l < $ncols ? $cols[$l]: undef;
+	$cost = ((defined $row and defined $col and defined $cost{$row}{$col}) ?
+		 $cost{$row}{$col} : $no_match_cost) - $col_min[$l];
+	if (($l<0) or ($cost != ($row_dec[$k] - $col_inc[$l]))) {
+	    next unless $l<0 or abs($cost - ($row_dec[$k] - $col_inc[$l])) > $required_precision*max(abs($row_dec[$k]),abs($col_inc[$l]));
+	    warn "BGM: every row should have a column mate: row $row doesn't, col: $col\n";
+	    return undef;
+	}
+    }
+    
+    my %map;
+    for ($l=0; $l<@row_mate; $l++) {
+	$k = $row_mate[$l];
+	$row = $k < $nrows ? $rows[$k] : undef;
+	$col = $l < $ncols ? $cols[$l]: undef;
+	next unless defined $row and defined $col and defined $cost{$row}{$col};
+	$reverse_search ? ($map{$col} = $row) : ($map{$row} = $col);
+    }
+    return {%map};
+}
diff --git a/scorelib/metrics.py b/scorelib/metrics.py
new file mode 100644
index 0000000..6ec44ec
--- /dev/null
+++ b/scorelib/metrics.py
@@ -0,0 +1,468 @@
+"""Functions for scoring frame-level diarization output."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import os
+import re
+import shutil
+import subprocess
+import tempfile
+
+import numpy as np
+from scipy.sparse import coo_matrix
+
+from .rttm import write_rttm
+from .uem import gen_uem, write_uem
+from .utils import clip, xor
+
+__all__ = ['bcubed', 'conditional_entropy', 'contingency_matrix', 'der',
+           'goodman_kruskal_tau', 'mutual_information']
+
+
+EPS = np.finfo(float).eps
+
+
+def contingency_matrix(ref_labels, sys_labels):
+    """Return contingency matrix between ``ref_labels`` and ``sys_labels``."""
+    ref_classes, ref_class_inds = np.unique(ref_labels, return_inverse=True)
+    sys_classes, sys_class_inds = np.unique(sys_labels, return_inverse=True)
+    n_frames = ref_labels.size
+    # Following works because coo_matrix sums duplicate entries. Is roughly
+    # twice as fast as np.histogram2d.
+    cmatrix = coo_matrix(
+        (np.ones(n_frames), (ref_class_inds, sys_class_inds)),
+        shape=(ref_classes.size, sys_classes.size),
+        dtype=np.int)
+    cmatrix = cmatrix.toarray()
+    return cmatrix, ref_classes, sys_classes
+
+
+def bcubed(ref_labels, sys_labels, cm=None):
+    """Return B-cubed precision, recall, and F1.
+
+    The B-cubed precision of an item is the proportion of items with its
+    system label that share its reference label (Bagga and Baldwin, 1998).
+    Similarly, the B-cubed recall of an item is the proportion of items
+    with its reference label that share its system label. The overall B-cubed
+    precision and recall, then, are the means of the precision and recall for
+    each item.
+
+    Parameters
+    ----------
+    ref_labels : ndarray, (n_frames,)
+        Reference labels.
+
+    sys_labels : ndarray, (n_frames,)
+        System labels.
+
+    cm : ndarray, (n_ref_classes, n_sys_classes)
+        Contingency matrix between reference and system labelings. If None,
+        will be computed automatically from ``ref_labels`` and ``sys_labels``.
+        Otherwise, the given value will be used and ``ref_labels`` and
+        ``sys_labels`` ignored.
+        (Default: None)
+
+    Returns
+    -------
+    precision : float
+        B-cubed precision.
+
+    recall : float
+        B-cubed recall.
+
+    f1 : float
+        B-cubed F1.
+
+    References
+    ----------
+    Bagga, A. and Baldwin, B. (1998). "Algorithms for scoring coreference
+    chains." Proceedings of LREC 1998.
+    """
+    if cm is None:
+        cm, _, _ = contingency_matrix(ref_labels, sys_labels)
+    cm = cm.astype('float64')
+    cm_norm = cm / cm.sum()
+    precision = np.sum(cm_norm * (cm / cm.sum(axis=0)))
+    recall = np.sum(cm_norm * (cm / np.expand_dims(cm.sum(axis=1), 1)))
+    f1 = 2*(precision*recall)/(precision + recall)
+    return precision, recall, f1
+
+
+def goodman_kruskal_tau(ref_labels, sys_labels, cm=None):
+    """Return Goodman-Kruskal tau between ``ref_labels`` and ``sys_labels``.
+
+    Parameters
+    ----------
+    ref_labels : ndarray, (n_frames,)
+        Reference labels.
+
+    sys_labels : ndarray, (n_frames,)
+        System labels.
+
+    cm : ndarray, (n_ref_classes, n_sys_classes)
+        Contingency matrix between reference and system labelings. If None,
+        will be computed automatically from ``ref_labels`` and ``sys_labels``.
+        Otherwise, the given value will be used and ``ref_labels`` and
+        ``sys_labels`` ignored.
+        (Default: None)
+
+    Returns
+    -------
+    tau_ref_sys : float
+        Value between 0 and 1 that is high when ``ref_labels`` is predictive
+        of ``sys_labels`` and low when ``ref_labels`` provides essentially no
+        information about ``sys_labels``.
+
+    tau_sys_ref : float
+        Value between 0 and 1 that is high when ``sys_labels`` is predictive
+        of ``ref_labels`` and low when ``sys_labels`` provides essentially no
+        information about ``ref_labels``.
+
+    References
+    ----------
+    - Goodman, L.A. and Kruskal, W.H. (1954). "Measures of association for
+      cross classifications." Journal of the American Statistical Association.
+    - Pearson, R. (2016). GoodmanKruskal: Association Analysis for Categorical
+      Variables. https://CRAN.R-project.org/package=GoodmanKruskal.
+    """
+    if cm is None:
+        cm, _, _ = contingency_matrix(ref_labels, sys_labels)
+    cm = cm.astype('float64')
+    cm = cm / cm.sum()
+    ref_marginals = cm.sum(axis=1)
+    sys_marginals = cm.sum(axis=0)
+    n_ref_classes, n_sys_classes = cm.shape
+
+    # Tau(ref, sys).
+    if n_sys_classes == 1:
+        # Special case: only single class in system labeling, so any
+        #               reference labeling is perfectly predictive.
+        tau_ref_sys = 1.
+    else:
+        vy = 1 - np.sum(sys_marginals**2)
+        xy_term = np.sum(cm**2, axis=1)
+        vy_bar_x = 1 - np.sum(xy_term / ref_marginals)
+        tau_ref_sys = (vy - vy_bar_x) / vy
+
+    # Tau(sys, ref).
+    if n_ref_classes == 1:
+        # Special case: only single class in reference labeling, so any
+        #               system labeling is perfectly predictive.
+        tau_sys_ref = 1.
+    else:
+        vx = 1 - np.sum(ref_marginals**2)
+        yx_term = np.sum(cm**2, axis=0)
+        vx_bar_y = 1 - np.sum(yx_term / sys_marginals)
+        tau_sys_ref = (vx - vx_bar_y) / vx
+
+    return tau_ref_sys, tau_sys_ref
+
+
+def conditional_entropy(ref_labels, sys_labels, cm=None, nats=False):
+    """Return conditional entropy of ``ref_labels`` given ``sys_labels``.
+
+    The conditional entropy ``H(ref | sys)`` quantifies how much information
+    is needed to describe the reference labeling given that the system labeling
+    is known. It is 0 when the labelings are identical and increases as the
+    system labeling becomes less descriptive of the reference labeling.
+
+    Parameters
+    ----------
+    ref_labels : ndarray, (n_frames,)
+        Reference labels.
+
+    sys_labels : ndarray, (n_frames,)
+        System labels.
+
+    cm : ndarray, (n_ref_classes, n_sys_classes)
+        Contingency matrix between reference and system labelings. If None,
+        will be computed automatically from ``ref_labels`` and ``sys_labels``.
+        Otherwise, the given value will be used and ``ref_labels`` and
+        ``sys_labels`` ignored.
+        (Default: None)
+
+    nats : bool, optional
+        If True, return conditional entropy in nats. Otherwise, return in bits.
+        (Default: False)
+
+    References
+    ----------
+    - https://en.wikipedia.org/wiki/Conditional_entropy
+    - Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory.
+    - Rosenberg, A. and Hirschberg, J. (2007). "V-Measure: A conditional
+      entropy-based external cluster evaluation measure." Proceedings of EMNLP
+      2007.
+    """
+    log = np.log if nats else np.log2
+    if cm is None:
+        cm, _, _ = contingency_matrix(ref_labels, sys_labels)
+    sys_marginals = cm.sum(axis=0)
+    N = cm.sum()
+    ref_inds, sys_inds = np.nonzero(cm)
+    vals = cm[ref_inds, sys_inds] # Non-zero values of contingency matrix.
+    sys_marginals = sys_marginals[sys_inds] # Corresponding marginals.
+    sigma = vals/N * (log(sys_marginals) - log(vals))
+    return sigma.sum()
+
+
+VALID_NORM_METHODS = set(['min', 'sum', 'sqrt', 'max'])
+
+def mutual_information(ref_labels, sys_labels, cm=None, nats=False,
+                       norm_method='sqrt'):
+    """Return mutual information between ``ref_labels`` and ``sys_labels``.
+
+    The mutual information ``I(ref, sys)`` quantifies how much information is
+    shared by the reference and system labelings; that is, how much knowing
+    one labeling reduces uncertainty about the other. It is 0 in the case that
+    the labelings are independent and increases as they become more predictive
+    of each other with a least upper bound of ``min(H(ref), H(sys))``.
+
+    Normalized mutual information converts mutual information into a similarity
+    metric ranging on [0, 1]. Multiple normalization schemes are available,
+    set by the ``norm_method`` argument, which takes the following values:
+
+    - ``min``  --  normalize by ``min(H(ref), H(sys))``
+    - ``sum``  --  normalize by ``0.5*(H(ref) + H(sys))``
+    - ``sqrt``  --  normalize by ``sqrt(H(ref)*H(sys))``
+    - ``max``  --  normalize by ``max(H(ref), H(sys))``
+
+    Parameters
+    ----------
+    ref_labels : ndarray, (n_frames,)
+        Reference labels.
+
+    sys_labels : ndarray, (n_frames,)
+        System labels.
+
+    cm : ndarray, (n_ref_classes, n_sys_classes)
+        Contingency matrix between reference and system labelings. If None,
+        will be computed automatically from ``ref_labels`` and ``sys_labels``.
+        Otherwise, the given value will be used and ``ref_labels`` and
+        ``sys_labels`` ignored.
+        (Default: None)
+
+    nats : bool, optional
+        If True, return nats. Otherwise, return bits.
+        (Default: False)
+
+    norm_method : str, optional
+        Normalization method for NMI computation.
+        (Default: 'sqrt')
+
+    Returns
+    -------
+    mi : float
+        Mutual information.
+
+    nmi : float
+        Normalized mutual information.
+
+    References
+    ----------
+    - https://en.wikipedia.org/wiki/Mutual_information
+    - Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory.
+    - Strehl, A. and Ghosh, J. (2002). "Cluster ensembles  -- A knowledge
+      reuse framework for combining multiple partitions." Journal of Machine
+      Learning Research.
+    - Nguyen, X.V., Epps, J., and Bailey, J. (2010). "Information theoretic
+      measures for clustering comparison: Variants, properties, normalization
+      and correction for chance." Journal of Machine Learning Research.
+    """
+    if norm_method not in VALID_NORM_METHODS:
+        raise ValueError('"%s" is not a valid NMI normalization method.')
+    log = np.log if nats else np.log2
+    if cm is None:
+        cm, _, _ = contingency_matrix(ref_labels, sys_labels)
+
+    # Special cases in which one or more of H(ref) and H(sys) is
+    # 0.
+    n_ref_classes, n_sys_classes = cm.shape
+    if xor(n_ref_classes == 1, n_sys_classes == 1):
+        # Case 1: MI is by definition 0 as should be NMI, regardless of
+        #         normalization.
+        return 0.0, 0.0
+    elif n_ref_classes == n_sys_classes == 1:
+        # Case 2: MI is 0, but as the data is not split, each clustering
+        #         is perfectly predictive of the other, so set NMI to 1.
+        return 0.0, 1.0
+
+    # Mutual information.
+    N = cm.sum()
+    ref_marginals = cm.sum(axis=1)
+    sys_marginals = cm.sum(axis=0)
+    ref_inds, sys_inds = np.nonzero(cm)
+    vals = cm[ref_inds, sys_inds] # Non-zero values of contingency matrix.
+    outer = ref_marginals[ref_inds]*sys_marginals[sys_inds]
+    sigma = (vals/N) * (
+        log(vals) - log(outer) + log(N))
+    mi = sigma.sum()
+    mi = max(mi, 0.)
+
+    # Normalized mutual information.
+    def h(p):
+        p = p[p > 0]
+        return max(-np.sum(p*log(p)), 0)
+    h_ref = h(ref_marginals / N)
+    h_sys = h(sys_marginals / N)
+    if norm_method == 'max':
+        denom = max(h_ref, h_sys)
+    elif norm_method == 'sum':
+        denom = 0.5*(h_ref + h_sys)
+    elif norm_method == 'sqrt':
+        denom = np.sqrt(h_ref*h_sys)
+    elif norm_method == 'min':
+        denom = min(h_ref, h_sys)
+    nmi = mi / denom
+    nmi = clip(nmi, 0., 1.)
+
+    return mi, nmi
+
+
+SCRIPT_DIR = os.path.abspath(os.path.dirname(__file__))
+MDEVAL_BIN = os.path.join(SCRIPT_DIR, 'md-eval-22.pl')
+FILE_REO = re.compile(r'(?<=Speaker Diarization for).+(?=\*\*\*)')
+SCORED_SPEAKER_REO = re.compile(r'(?<=SCORED SPEAKER TIME =)[\d.]+')
+MISS_SPEAKER_REO = re.compile(r'(?<=MISSED SPEAKER TIME =)[\d.]+')
+FA_SPEAKER_REO = re.compile(r'(?<=FALARM SPEAKER TIME =)[\d.]+')
+ERROR_SPEAKER_REO = re.compile(r'(?<=SPEAKER ERROR TIME =)[\d.]+')
+
+# TODO: Working with md-eval is a PITA, even with modifications to the
+#       reporting. Suggest looking into moving over to pyannote's
+#       implementation.
+def der(ref_turns, sys_turns, collar=0.0, ignore_overlaps=False, uem=None):
+    """Return overall diarization error rate.
+
+    Diarization error rate (DER), introduced for the NIST Rich Transcription
+    evaluations, is computed as the sum of the following:
+
+    - speaker error  --  percentage of scored time for which the wrong speaker
+      id is assigned within a speech region
+    - false alarm speech  --   percentage of scored time for which a nonspeech
+      region is incorrectly marked as containing speech
+    - missed speech  --  percentage of scored time for which a speech region is
+      incorrectly marked as not containing speech
+
+    As with word error rate, a score of zero indicates perfect performance and
+    higher scores (which may exceed 100) indicate poorer performance.
+
+    DER is computed as defined in the NIST RT-09 evaluation plan using version
+    22 of the ``md-eval.pl`` scoring script. When ``ignore_overlaps=False``,
+    this is equivalent to running the following command:
+
+        md-eval.pl -r ref.rttm -s sys.rttm -c collar -u uemf
+
+    where ``ref.rttm`` and ``sys.rttm`` are RTTM files produced from
+    ``ref_turns`` and ``sys_turns`` respectively and ``uemf`` is an
+    Un-partitioned Evaluation Map (UEM) file delimiting the scoring regions.
+    If a ``UEM`` instance is supplied via the``uem`` argument, this file will
+    be created from the supplied UEM. Otherwise, it will be generated
+    automatically from ``ref_turns`` and ``sys_turns`` using the
+    ``uem.gen_uem`` function. Similarly, when ``ignore_overlaps=True``:
+
+        md-eval.pl -r ref.rttm -s sys.rttm -c collar -u uemf -1
+
+    Parameters
+    ----------
+    ref_turns : list of Turn
+        Reference speaker turns.
+
+    sys_turns : list of Turn
+        System speaker turns.
+
+    collar : float, optional
+        Size of forgiveness collar in seconds. Diarization output will not be
+        evaluated within +/- ``collar`` seconds of reference speaker
+        boundaries.
+        (Default: 0.0)
+
+    ignore_overlaps : bool, optional
+        If True, ignore regions in the reference diarization in which more
+        than one speaker is speaking.
+        (Default: False)
+
+    uem : UEM, optional
+        Evaluation map. If not supplied, will be generated automatically from
+        ``ref_turns`` and ``sys_turns``.
+        (Default: None)
+
+    Returns
+    -------
+    der : float
+        Overall percent diarization error.
+
+    References
+    ----------
+    NIST. (2009). The 2009 (RT-09) Rich Transcription Meeting Recognition
+    Evaluation Plan. https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf
+    """
+    tmp_dir = tempfile.mkdtemp()
+
+    # Write RTTMs.
+    ref_rttm_fn = os.path.join(tmp_dir, 'ref.rttm')
+    write_rttm(ref_rttm_fn, ref_turns)
+    sys_rttm_fn = os.path.join(tmp_dir, 'sys.rttm')
+    write_rttm(sys_rttm_fn, sys_turns)
+
+    # Write UEM.
+    if uem is None:
+        uem = gen_uem(ref_turns, sys_turns)
+    uemf = os.path.join(tmp_dir, 'all.uem')
+    write_uem(uemf, uem)
+
+    # Actually score.
+    try:
+        cmd = [MDEVAL_BIN,
+               '-af',
+               '-r', ref_rttm_fn,
+               '-s', sys_rttm_fn,
+               '-c', str(collar),
+               '-u', uemf,
+              ]
+        if ignore_overlaps:
+            cmd.append('-1')
+        stdout = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
+    except subprocess.CalledProcessError as e:
+        stdout = e.output
+    finally:
+        shutil.rmtree(tmp_dir)
+
+    # Parse md-eval output to extract by-file and total scores.
+    stdout = stdout.decode('utf-8')
+    file_ids = [m.strip() for m in FILE_REO.findall(stdout)]
+    file_ids = [file_id[2:] if file_id.startswith('f=') else file_id
+                for file_id in file_ids]
+    scored_speaker_times = np.array(
+        [float(m) for m in SCORED_SPEAKER_REO.findall(stdout)])
+    miss_speaker_times = np.array(
+        [float(m) for m in MISS_SPEAKER_REO.findall(stdout)])
+    fa_speaker_times = np.array(
+        [float(m) for m in FA_SPEAKER_REO.findall(stdout)])
+    error_speaker_times = np.array(
+        [float(m) for m in ERROR_SPEAKER_REO.findall(stdout)])
+    with np.errstate(invalid='ignore', divide='ignore'):
+        error_times = miss_speaker_times + fa_speaker_times + error_speaker_times
+        ders = error_times / scored_speaker_times
+    ders[np.isnan(ders)] = 0 # Numerator and denominator both 0.
+    ders[np.isinf(ders)] = 1 # Numerator > 0, but denominator = 0.
+    ders *= 100. # Convert to percent.
+
+    # Reconcile with UEM, keeping in mind that in the edge case where no
+    # reference turns are observed for a file, md-eval doesn't report results
+    # for said file.
+    file_to_der_base = dict(zip(file_ids, ders))
+    file_to_der = {}
+    for file_id in uem:
+        try:
+            der = file_to_der_base[file_id]
+        except KeyError:
+            # Check for any system turns for that file, which should be FAs,
+            # assuming that the turns have been cropped to the UEM scoring
+            # regions.
+            n_sys_turns = len(
+                [turn for turn in sys_turns if turn.file_id == file_id])
+            der = 100. if n_sys_turns else 0.0
+        file_to_der[file_id] = der
+    global_der = file_to_der_base['ALL']
+
+    return file_to_der, global_der
diff --git a/scorelib/rttm.py b/scorelib/rttm.py
new file mode 100644
index 0000000..a1fe32e
--- /dev/null
+++ b/scorelib/rttm.py
@@ -0,0 +1,152 @@
+"""Functions for reading/writing RTTM files."""
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+
+from .turn import Turn
+from .utils import format_float
+
+__all__ = ['load_rttm', 'write_rttm', 'validate_rttm']
+
+
+def _parse_rttm_line(line):
+    line = line.decode('utf-8').strip()
+    fields = line.split()
+    if len(fields) < 9:
+        raise IOError('Number of fields < 9. LINE: "%s"' % line)
+    file_id = fields[1]
+    speaker_id = fields[7]
+
+    # Check valid turn onset.
+    try:
+        onset = float(fields[3])
+    except ValueError:
+        raise IOError('Turn onset not FLOAT. LINE: "%s"' % line)
+    if onset < 0:
+        raise IOError('Turn onset < 0 seconds. LINE: "%s"' % line)
+
+    # Check valid turn duration.
+    try:
+        dur = float(fields[4])
+    except ValueError:
+        raise IOError('Turn duration not FLOAT. LINE: "%s"' % line)
+    if dur <= 0:
+        raise IOError('Turn duration <= 0 seconds. LINE: "%s"' % line)
+
+    return Turn(onset, dur=dur, speaker_id=speaker_id, file_id=file_id)
+
+
+def load_rttm(rttmf):
+    """Load speaker turns from RTTM file.
+
+    For a description of the RTTM format, consult Appendix A of the NIST RT-09
+    evaluation plan.
+
+    Parameters
+    ----------
+    rttmf : str
+        Path to RTTM file.
+
+    Returns
+    -------
+    turns : list of Turn
+        Speaker turns.
+
+    speaker_ids : set
+        Speaker ids present in ``rttmf``.
+
+    file_ids : set
+        File ids present in ``rttmf``.
+
+    References
+    ----------
+    NIST. (2009). The 2009 (RT-09) Rich Transcription Meeting Recognition
+    Evaluation Plan. https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf
+    """
+    with open(rttmf, 'rb') as f:
+        turns = []
+        speaker_ids = set()
+        file_ids = set()
+        for line in f:
+            if line.startswith(b'SPKR-INFO'):
+                continue
+            turn = _parse_rttm_line(line)
+            turns.append(turn)
+            speaker_ids.add(turn.speaker_id)
+            file_ids.add(turn.file_id)
+    return turns, speaker_ids, file_ids
+
+
+def write_rttm(rttmf, turns, n_digits=3):
+    """Write speaker turns to RTTM file.
+
+    For a description of the RTTM format, consult Appendix A of the NIST RT-09
+    evaluation plan.
+
+    Parameters
+    ----------
+    rttmf : str
+        Path to output RTTM file.
+
+    turns : list of Turn
+        Speaker turns.
+
+    n_digits : int, optional
+        Number of decimal digits to round to.
+        (Default: 3)
+
+    References
+    ----------
+    NIST. (2009). The 2009 (RT-09) Rich Transcription Meeting Recognition
+    Evaluation Plan. https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf
+    """
+    with open(rttmf, 'wb') as f:
+        for turn in turns:
+            fields = ['SPEAKER',
+                      turn.file_id,
+                      '1',
+                      format_float(turn.onset, n_digits),
+                      format_float(turn.dur, n_digits),
+                      '<NA>',
+                      '<NA>',
+                      turn.speaker_id,
+                      '<NA>',
+                      '<NA>']
+            line = ' '.join(fields)
+            f.write(line.encode('utf-8'))
+            f.write(b'\n')
+
+
+def validate_rttm(rttmf):
+    """Validate RTTM file.
+
+    Parameters
+    ----------
+    rttmf : str
+        Path to RTTM file.
+
+    Returns
+    -------
+    file_ids : set of str
+        File ids present in ``rttmf``.
+
+    speaker_ids : set of str
+        Speaker ids present in ``rttm``.
+
+    error_messages : list of str
+         Errors encountered in file.
+    """
+    with open(rttmf, 'rb') as f:
+        file_ids = set()
+        speaker_ids = set()
+        error_messages = []
+        for line in f:
+            if line.startswith(b'SPKR-INFO'):
+                continue
+            try:
+                turn = _parse_rttm_line(line)
+                file_ids.add(turn.file_id)
+                speaker_ids.add(turn.speaker_id)
+            except IOError as e:
+                error_messages.append(e.args[0])
+    return file_ids, speaker_ids, error_messages
diff --git a/scorelib/score.py b/scorelib/score.py
new file mode 100644
index 0000000..0412345
--- /dev/null
+++ b/scorelib/score.py
@@ -0,0 +1,240 @@
+"""Functions for scoring paired system/reference RTTM files."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+from collections import defaultdict
+
+import numpy as np
+from scipy.linalg import block_diag
+
+from . import metrics
+from .six import iteritems, itervalues, python_2_unicode_compatible
+
+__all__ = ['score', 'turns_to_frames']
+
+
+def turns_to_frames(turns, score_onset, score_offset, step=0.010,
+                    as_string=False):
+    """Return frame-level labels corresponding to diarization.
+
+    Parameters
+    ----------
+    turns : list of Turn
+        Speaker turns. Should all be from single file.
+
+    score_onset : float
+        Scoring region onset in seconds from beginning of file.
+
+    score_offset : float
+        Scoring region offset in seconds from beginning of file.
+
+    step : float, optional
+        Frame step size  in seconds.
+        (Default: 0.01)
+
+    as_string : bool, optional
+        If True, returned frame labels will be strings that are the class
+        names. Else, they will be integers.
+
+    Returns
+    -------
+    labels : ndarray, (n_frames,)
+        Frame-level labels.
+    """
+    file_ids = set([turn.file_id for turn in turns])
+    if score_offset <= score_onset:
+        raise ValueError('score_onset must be less than score_offset: '
+                         '%.3f >= %.3f' % (score_onset, score_offset))
+    if len(file_ids) > 1:
+        raise ValueError('Turns should be from a single file.')
+
+    # Create matrix whose i,j-th entry is True IFF the j-th speaker was
+    # present at frame i.
+    onsets = [turn.onset for turn in turns]
+    offsets = [turn.offset for turn in turns]
+    speaker_ids = [turn.speaker_id for turn in turns]
+    speaker_classes, speaker_class_inds = np.unique(
+        speaker_ids, return_inverse=True)
+    speaker_classes = np.concatenate([speaker_classes, ['non-speech']])
+    dur = score_offset - score_onset
+    n_frames = int(dur/step)
+    X = np.zeros((n_frames, speaker_classes.size), dtype='bool')
+    times = score_onset + step*np.arange(n_frames)
+    bis = np.searchsorted(times, onsets)
+    eis = np.searchsorted(times, offsets)
+    for bi, ei, speaker_class_ind in zip(bis, eis, speaker_class_inds):
+        X[bi:ei, speaker_class_ind] = True
+    is_nil = ~(X.any(axis=1))
+    X[is_nil, -1] = True
+
+    # Now, convert to frame-level labelings.
+    pows = 2**np.arange(X.shape[1])
+    labels = np.sum(pows*X, axis=1)
+    if as_string:
+        def speaker_mask(n):
+            return [bool(int(x))
+                    for x in np.binary_repr(n, speaker_classes.size)][::-1]
+        label_classes = np.array(['_'.join(speaker_classes[speaker_mask(n)])
+                                  for n in range(2**speaker_classes.size)])
+        try:
+            # Save some memory in the (majority of) cases where speaker ids are
+            # ASCII.
+            label_classes = label_classes.astype('string')
+        except UnicodeEncodeError:
+            pass
+        labels = label_classes[labels]
+    return labels
+
+
+@python_2_unicode_compatible
+class Scores(object):
+    """Structure containing metrics.
+
+    Parameters
+    ----------
+    der : float
+        Diarization error rate in percent.
+
+    bcubed_precision : float
+        B-cubed precision.
+
+    bcubed_recall : float
+        B-cubed recall.
+
+    bcubed_f1 : float
+        B-cubed F1.
+
+    tau_ref_sys : float
+        Value between 0 and 1 that is high when the reference diarization is
+        predictive of the system diarization and low when the reference
+        diarization provides essentially no information about the system
+        diarization.
+
+    tau_sys_ref : float
+        Value between 0 and 1 that is high when the system diarization is
+        predictive of the reference diarization and low when the system
+        diarization provides essentially no information about the reference
+        diarization.
+
+    ce_ref_sys : float
+        Conditional entropy of the reference diarization given the system
+        diarization.
+
+    ce_sys_ref : float
+        Conditional entropy of the system diarization given the reference
+        diarization.
+
+    mi : float
+        Mutual information.
+
+    nmi : float
+        Normalized mutual information.
+    """
+    def __init__(self, der, bcubed_precision, bcubed_recall, bcubed_f1,
+                 tau_ref_sys, tau_sys_ref, ce_ref_sys, ce_sys_ref, mi, nmi):
+        self.der = der
+        self.bcubed_precision = bcubed_precision
+        self.bcubed_recall = bcubed_recall
+        self.bcubed_f1 = bcubed_f1
+        self.tau_ref_sys = tau_ref_sys
+        self.tau_sys_ref = tau_sys_ref
+        self.ce_ref_sys = ce_ref_sys
+        self.ce_sys_ref = ce_sys_ref
+        self.mi = mi
+        self.nmi = nmi
+
+    def __str__(self):
+        return ('DER: %.2f, B-cubed precision: %.2f, B-cubed recall: %.2f, '
+                'B-cubed F1: %.2f, GKT(ref, sys): %.2f, GKT(sys, ref): %.2f, '
+                'CE(ref|sys): %.2f, CE(sys|ref): %.2f, MI: %.2f, NMI: %.2f' %
+                (self.der, self.bcubed_precision, self.bcubed_recall,
+                 self.bcubed_f1, self.tau_ref_sys, self.tau_sys_ref,
+                 self.ce_ref_sys, self.ce_sys_ref, self.mi, self.nmi))
+
+
+def score(ref_turns, sys_turns, uem, der_collar=0.0,
+          der_ignore_overlaps=True, step=0.010, nats=False):
+    """Score diarization.
+
+    Parameters
+    ----------
+    ref_turns : list of Turn
+        Reference speaker turns.
+
+    sys_turns : list of Turn
+        System speaker turns.
+
+    uem : UEM
+        Un-partitioned evaluation map.
+
+    der_collar : float, optional
+        Size of forgiveness collar in seconds to use in computing Diarization
+        Erro Rate (DER). Diarization output will not be evaluated within +/-
+        ``collar`` seconds of reference speaker boundaries.
+        (Default: 0.0)
+
+    der_ignore_overlaps : bool, optional
+        If True, ignore regions in the reference diarization in which more
+        than one speaker is speaking when computing DER.
+        (Default: True)
+
+    step : float, optional
+        Frame step size  in seconds. Not relevant for computation of DER.
+        (Default: 0.01)
+
+    nats : bool, optional
+        If True, use nats as unit for information theoretic metrics.
+        Otherwise, use bits.
+        (Default: False)
+
+    Returns
+    -------
+    file_to_scores : dict
+        Mapping from file ids in ``uem`` to ``Scores`` instances.
+
+    global_scores : Scores
+        Global scores.
+    """
+    def groupby(turns):
+        file_to_turns = defaultdict(list)
+        for turn in turns:
+            file_to_turns[turn.file_id].append(turn)
+        return file_to_turns
+    file_to_ref_turns = groupby(ref_turns)
+    file_to_sys_turns = groupby(sys_turns)
+
+    # Build contingency matrices.
+    file_to_cm = {}
+    for file_id, (score_onset, score_offset) in iteritems(uem):
+        ref_labels = turns_to_frames(
+            file_to_ref_turns[file_id], score_onset, score_offset, step=step)
+        sys_labels = turns_to_frames(
+            file_to_sys_turns[file_id], score_onset, score_offset, step=step)
+        file_to_cm[file_id], _, _ = metrics.contingency_matrix(
+            ref_labels, sys_labels)
+    global_cm = block_diag(*list(itervalues(file_to_cm)))
+
+    # Score.
+    def compute_metrics(cm):
+        bcubed_precision, bcubed_recall, bcubed_f1 = metrics.bcubed(
+            None, None, cm)
+        tau_ref_sys, tau_sys_ref = metrics.goodman_kruskal_tau(
+            None, None, cm)
+        ce_ref_sys = metrics.conditional_entropy(None, None, cm, nats)
+        ce_sys_ref = metrics.conditional_entropy(None, None, cm.T, nats)
+        mi, nmi = metrics.mutual_information(None, None, cm, nats)
+        return Scores(None, bcubed_precision, bcubed_recall, bcubed_f1,
+                      tau_ref_sys, tau_sys_ref, ce_ref_sys, ce_sys_ref,
+                      mi, nmi)
+    file_to_der, global_der = metrics.der(
+        ref_turns, sys_turns, der_collar, der_ignore_overlaps, uem)
+    file_to_scores = {}
+    for file_id, cm in iteritems(file_to_cm):
+        scores = compute_metrics(cm)
+        scores.der = file_to_der[file_id]
+        file_to_scores[file_id] = scores
+    global_scores = compute_metrics(global_cm)
+    global_scores.der = global_der
+
+    return file_to_scores, global_scores
diff --git a/scorelib/six.py b/scorelib/six.py
new file mode 100644
index 0000000..190c023
--- /dev/null
+++ b/scorelib/six.py
@@ -0,0 +1,868 @@
+"""Utilities for writing code that runs on Python 2 and 3"""
+
+# Copyright (c) 2010-2015 Benjamin Peterson
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+from __future__ import absolute_import
+
+import functools
+import itertools
+import operator
+import sys
+import types
+
+__author__ = "Benjamin Peterson <benjamin@python.org>"
+__version__ = "1.10.0"
+
+
+# Useful for very coarse version differentiation.
+PY2 = sys.version_info[0] == 2
+PY3 = sys.version_info[0] == 3
+PY34 = sys.version_info[0:2] >= (3, 4)
+
+if PY3:
+    string_types = str,
+    integer_types = int,
+    class_types = type,
+    text_type = str
+    binary_type = bytes
+
+    MAXSIZE = sys.maxsize
+else:
+    string_types = basestring,
+    integer_types = (int, long)
+    class_types = (type, types.ClassType)
+    text_type = unicode
+    binary_type = str
+
+    if sys.platform.startswith("java"):
+        # Jython always uses 32 bits.
+        MAXSIZE = int((1 << 31) - 1)
+    else:
+        # It's possible to have sizeof(long) != sizeof(Py_ssize_t).
+        class X(object):
+
+            def __len__(self):
+                return 1 << 31
+        try:
+            len(X())
+        except OverflowError:
+            # 32-bit
+            MAXSIZE = int((1 << 31) - 1)
+        else:
+            # 64-bit
+            MAXSIZE = int((1 << 63) - 1)
+        del X
+
+
+def _add_doc(func, doc):
+    """Add documentation to a function."""
+    func.__doc__ = doc
+
+
+def _import_module(name):
+    """Import module, returning the module after the last dot."""
+    __import__(name)
+    return sys.modules[name]
+
+
+class _LazyDescr(object):
+
+    def __init__(self, name):
+        self.name = name
+
+    def __get__(self, obj, tp):
+        result = self._resolve()
+        setattr(obj, self.name, result)  # Invokes __set__.
+        try:
+            # This is a bit ugly, but it avoids running this again by
+            # removing this descriptor.
+            delattr(obj.__class__, self.name)
+        except AttributeError:
+            pass
+        return result
+
+
+class MovedModule(_LazyDescr):
+
+    def __init__(self, name, old, new=None):
+        super(MovedModule, self).__init__(name)
+        if PY3:
+            if new is None:
+                new = name
+            self.mod = new
+        else:
+            self.mod = old
+
+    def _resolve(self):
+        return _import_module(self.mod)
+
+    def __getattr__(self, attr):
+        _module = self._resolve()
+        value = getattr(_module, attr)
+        setattr(self, attr, value)
+        return value
+
+
+class _LazyModule(types.ModuleType):
+
+    def __init__(self, name):
+        super(_LazyModule, self).__init__(name)
+        self.__doc__ = self.__class__.__doc__
+
+    def __dir__(self):
+        attrs = ["__doc__", "__name__"]
+        attrs += [attr.name for attr in self._moved_attributes]
+        return attrs
+
+    # Subclasses should override this
+    _moved_attributes = []
+
+
+class MovedAttribute(_LazyDescr):
+
+    def __init__(self, name, old_mod, new_mod, old_attr=None, new_attr=None):
+        super(MovedAttribute, self).__init__(name)
+        if PY3:
+            if new_mod is None:
+                new_mod = name
+            self.mod = new_mod
+            if new_attr is None:
+                if old_attr is None:
+                    new_attr = name
+                else:
+                    new_attr = old_attr
+            self.attr = new_attr
+        else:
+            self.mod = old_mod
+            if old_attr is None:
+                old_attr = name
+            self.attr = old_attr
+
+    def _resolve(self):
+        module = _import_module(self.mod)
+        return getattr(module, self.attr)
+
+
+class _SixMetaPathImporter(object):
+
+    """
+    A meta path importer to import six.moves and its submodules.
+
+    This class implements a PEP302 finder and loader. It should be compatible
+    with Python 2.5 and all existing versions of Python3
+    """
+
+    def __init__(self, six_module_name):
+        self.name = six_module_name
+        self.known_modules = {}
+
+    def _add_module(self, mod, *fullnames):
+        for fullname in fullnames:
+            self.known_modules[self.name + "." + fullname] = mod
+
+    def _get_module(self, fullname):
+        return self.known_modules[self.name + "." + fullname]
+
+    def find_module(self, fullname, path=None):
+        if fullname in self.known_modules:
+            return self
+        return None
+
+    def __get_module(self, fullname):
+        try:
+            return self.known_modules[fullname]
+        except KeyError:
+            raise ImportError("This loader does not know module " + fullname)
+
+    def load_module(self, fullname):
+        try:
+            # in case of a reload
+            return sys.modules[fullname]
+        except KeyError:
+            pass
+        mod = self.__get_module(fullname)
+        if isinstance(mod, MovedModule):
+            mod = mod._resolve()
+        else:
+            mod.__loader__ = self
+        sys.modules[fullname] = mod
+        return mod
+
+    def is_package(self, fullname):
+        """
+        Return true, if the named module is a package.
+
+        We need this method to get correct spec objects with
+        Python 3.4 (see PEP451)
+        """
+        return hasattr(self.__get_module(fullname), "__path__")
+
+    def get_code(self, fullname):
+        """Return None
+
+        Required, if is_package is implemented"""
+        self.__get_module(fullname)  # eventually raises ImportError
+        return None
+    get_source = get_code  # same as get_code
+
+_importer = _SixMetaPathImporter(__name__)
+
+
+class _MovedItems(_LazyModule):
+
+    """Lazy loading of moved objects"""
+    __path__ = []  # mark as package
+
+
+_moved_attributes = [
+    MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"),
+    MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"),
+    MovedAttribute("filterfalse", "itertools", "itertools", "ifilterfalse", "filterfalse"),
+    MovedAttribute("input", "__builtin__", "builtins", "raw_input", "input"),
+    MovedAttribute("intern", "__builtin__", "sys"),
+    MovedAttribute("map", "itertools", "builtins", "imap", "map"),
+    MovedAttribute("getcwd", "os", "os", "getcwdu", "getcwd"),
+    MovedAttribute("getcwdb", "os", "os", "getcwd", "getcwdb"),
+    MovedAttribute("range", "__builtin__", "builtins", "xrange", "range"),
+    MovedAttribute("reload_module", "__builtin__", "importlib" if PY34 else "imp", "reload"),
+    MovedAttribute("reduce", "__builtin__", "functools"),
+    MovedAttribute("shlex_quote", "pipes", "shlex", "quote"),
+    MovedAttribute("StringIO", "StringIO", "io"),
+    MovedAttribute("UserDict", "UserDict", "collections"),
+    MovedAttribute("UserList", "UserList", "collections"),
+    MovedAttribute("UserString", "UserString", "collections"),
+    MovedAttribute("xrange", "__builtin__", "builtins", "xrange", "range"),
+    MovedAttribute("zip", "itertools", "builtins", "izip", "zip"),
+    MovedAttribute("zip_longest", "itertools", "itertools", "izip_longest", "zip_longest"),
+    MovedModule("builtins", "__builtin__"),
+    MovedModule("configparser", "ConfigParser"),
+    MovedModule("copyreg", "copy_reg"),
+    MovedModule("dbm_gnu", "gdbm", "dbm.gnu"),
+    MovedModule("_dummy_thread", "dummy_thread", "_dummy_thread"),
+    MovedModule("http_cookiejar", "cookielib", "http.cookiejar"),
+    MovedModule("http_cookies", "Cookie", "http.cookies"),
+    MovedModule("html_entities", "htmlentitydefs", "html.entities"),
+    MovedModule("html_parser", "HTMLParser", "html.parser"),
+    MovedModule("http_client", "httplib", "http.client"),
+    MovedModule("email_mime_multipart", "email.MIMEMultipart", "email.mime.multipart"),
+    MovedModule("email_mime_nonmultipart", "email.MIMENonMultipart", "email.mime.nonmultipart"),
+    MovedModule("email_mime_text", "email.MIMEText", "email.mime.text"),
+    MovedModule("email_mime_base", "email.MIMEBase", "email.mime.base"),
+    MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"),
+    MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"),
+    MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"),
+    MovedModule("cPickle", "cPickle", "pickle"),
+    MovedModule("queue", "Queue"),
+    MovedModule("reprlib", "repr"),
+    MovedModule("socketserver", "SocketServer"),
+    MovedModule("_thread", "thread", "_thread"),
+    MovedModule("tkinter", "Tkinter"),
+    MovedModule("tkinter_dialog", "Dialog", "tkinter.dialog"),
+    MovedModule("tkinter_filedialog", "FileDialog", "tkinter.filedialog"),
+    MovedModule("tkinter_scrolledtext", "ScrolledText", "tkinter.scrolledtext"),
+    MovedModule("tkinter_simpledialog", "SimpleDialog", "tkinter.simpledialog"),
+    MovedModule("tkinter_tix", "Tix", "tkinter.tix"),
+    MovedModule("tkinter_ttk", "ttk", "tkinter.ttk"),
+    MovedModule("tkinter_constants", "Tkconstants", "tkinter.constants"),
+    MovedModule("tkinter_dnd", "Tkdnd", "tkinter.dnd"),
+    MovedModule("tkinter_colorchooser", "tkColorChooser",
+                "tkinter.colorchooser"),
+    MovedModule("tkinter_commondialog", "tkCommonDialog",
+                "tkinter.commondialog"),
+    MovedModule("tkinter_tkfiledialog", "tkFileDialog", "tkinter.filedialog"),
+    MovedModule("tkinter_font", "tkFont", "tkinter.font"),
+    MovedModule("tkinter_messagebox", "tkMessageBox", "tkinter.messagebox"),
+    MovedModule("tkinter_tksimpledialog", "tkSimpleDialog",
+                "tkinter.simpledialog"),
+    MovedModule("urllib_parse", __name__ + ".moves.urllib_parse", "urllib.parse"),
+    MovedModule("urllib_error", __name__ + ".moves.urllib_error", "urllib.error"),
+    MovedModule("urllib", __name__ + ".moves.urllib", __name__ + ".moves.urllib"),
+    MovedModule("urllib_robotparser", "robotparser", "urllib.robotparser"),
+    MovedModule("xmlrpc_client", "xmlrpclib", "xmlrpc.client"),
+    MovedModule("xmlrpc_server", "SimpleXMLRPCServer", "xmlrpc.server"),
+]
+# Add windows specific modules.
+if sys.platform == "win32":
+    _moved_attributes += [
+        MovedModule("winreg", "_winreg"),
+    ]
+
+for attr in _moved_attributes:
+    setattr(_MovedItems, attr.name, attr)
+    if isinstance(attr, MovedModule):
+        _importer._add_module(attr, "moves." + attr.name)
+del attr
+
+_MovedItems._moved_attributes = _moved_attributes
+
+moves = _MovedItems(__name__ + ".moves")
+_importer._add_module(moves, "moves")
+
+
+class Module_six_moves_urllib_parse(_LazyModule):
+
+    """Lazy loading of moved objects in six.moves.urllib_parse"""
+
+
+_urllib_parse_moved_attributes = [
+    MovedAttribute("ParseResult", "urlparse", "urllib.parse"),
+    MovedAttribute("SplitResult", "urlparse", "urllib.parse"),
+    MovedAttribute("parse_qs", "urlparse", "urllib.parse"),
+    MovedAttribute("parse_qsl", "urlparse", "urllib.parse"),
+    MovedAttribute("urldefrag", "urlparse", "urllib.parse"),
+    MovedAttribute("urljoin", "urlparse", "urllib.parse"),
+    MovedAttribute("urlparse", "urlparse", "urllib.parse"),
+    MovedAttribute("urlsplit", "urlparse", "urllib.parse"),
+    MovedAttribute("urlunparse", "urlparse", "urllib.parse"),
+    MovedAttribute("urlunsplit", "urlparse", "urllib.parse"),
+    MovedAttribute("quote", "urllib", "urllib.parse"),
+    MovedAttribute("quote_plus", "urllib", "urllib.parse"),
+    MovedAttribute("unquote", "urllib", "urllib.parse"),
+    MovedAttribute("unquote_plus", "urllib", "urllib.parse"),
+    MovedAttribute("urlencode", "urllib", "urllib.parse"),
+    MovedAttribute("splitquery", "urllib", "urllib.parse"),
+    MovedAttribute("splittag", "urllib", "urllib.parse"),
+    MovedAttribute("splituser", "urllib", "urllib.parse"),
+    MovedAttribute("uses_fragment", "urlparse", "urllib.parse"),
+    MovedAttribute("uses_netloc", "urlparse", "urllib.parse"),
+    MovedAttribute("uses_params", "urlparse", "urllib.parse"),
+    MovedAttribute("uses_query", "urlparse", "urllib.parse"),
+    MovedAttribute("uses_relative", "urlparse", "urllib.parse"),
+]
+for attr in _urllib_parse_moved_attributes:
+    setattr(Module_six_moves_urllib_parse, attr.name, attr)
+del attr
+
+Module_six_moves_urllib_parse._moved_attributes = _urllib_parse_moved_attributes
+
+_importer._add_module(Module_six_moves_urllib_parse(__name__ + ".moves.urllib_parse"),
+                      "moves.urllib_parse", "moves.urllib.parse")
+
+
+class Module_six_moves_urllib_error(_LazyModule):
+
+    """Lazy loading of moved objects in six.moves.urllib_error"""
+
+
+_urllib_error_moved_attributes = [
+    MovedAttribute("URLError", "urllib2", "urllib.error"),
+    MovedAttribute("HTTPError", "urllib2", "urllib.error"),
+    MovedAttribute("ContentTooShortError", "urllib", "urllib.error"),
+]
+for attr in _urllib_error_moved_attributes:
+    setattr(Module_six_moves_urllib_error, attr.name, attr)
+del attr
+
+Module_six_moves_urllib_error._moved_attributes = _urllib_error_moved_attributes
+
+_importer._add_module(Module_six_moves_urllib_error(__name__ + ".moves.urllib.error"),
+                      "moves.urllib_error", "moves.urllib.error")
+
+
+class Module_six_moves_urllib_request(_LazyModule):
+
+    """Lazy loading of moved objects in six.moves.urllib_request"""
+
+
+_urllib_request_moved_attributes = [
+    MovedAttribute("urlopen", "urllib2", "urllib.request"),
+    MovedAttribute("install_opener", "urllib2", "urllib.request"),
+    MovedAttribute("build_opener", "urllib2", "urllib.request"),
+    MovedAttribute("pathname2url", "urllib", "urllib.request"),
+    MovedAttribute("url2pathname", "urllib", "urllib.request"),
+    MovedAttribute("getproxies", "urllib", "urllib.request"),
+    MovedAttribute("Request", "urllib2", "urllib.request"),
+    MovedAttribute("OpenerDirector", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPDefaultErrorHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPRedirectHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPCookieProcessor", "urllib2", "urllib.request"),
+    MovedAttribute("ProxyHandler", "urllib2", "urllib.request"),
+    MovedAttribute("BaseHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPPasswordMgr", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPPasswordMgrWithDefaultRealm", "urllib2", "urllib.request"),
+    MovedAttribute("AbstractBasicAuthHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPBasicAuthHandler", "urllib2", "urllib.request"),
+    MovedAttribute("ProxyBasicAuthHandler", "urllib2", "urllib.request"),
+    MovedAttribute("AbstractDigestAuthHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPDigestAuthHandler", "urllib2", "urllib.request"),
+    MovedAttribute("ProxyDigestAuthHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPSHandler", "urllib2", "urllib.request"),
+    MovedAttribute("FileHandler", "urllib2", "urllib.request"),
+    MovedAttribute("FTPHandler", "urllib2", "urllib.request"),
+    MovedAttribute("CacheFTPHandler", "urllib2", "urllib.request"),
+    MovedAttribute("UnknownHandler", "urllib2", "urllib.request"),
+    MovedAttribute("HTTPErrorProcessor", "urllib2", "urllib.request"),
+    MovedAttribute("urlretrieve", "urllib", "urllib.request"),
+    MovedAttribute("urlcleanup", "urllib", "urllib.request"),
+    MovedAttribute("URLopener", "urllib", "urllib.request"),
+    MovedAttribute("FancyURLopener", "urllib", "urllib.request"),
+    MovedAttribute("proxy_bypass", "urllib", "urllib.request"),
+]
+for attr in _urllib_request_moved_attributes:
+    setattr(Module_six_moves_urllib_request, attr.name, attr)
+del attr
+
+Module_six_moves_urllib_request._moved_attributes = _urllib_request_moved_attributes
+
+_importer._add_module(Module_six_moves_urllib_request(__name__ + ".moves.urllib.request"),
+                      "moves.urllib_request", "moves.urllib.request")
+
+
+class Module_six_moves_urllib_response(_LazyModule):
+
+    """Lazy loading of moved objects in six.moves.urllib_response"""
+
+
+_urllib_response_moved_attributes = [
+    MovedAttribute("addbase", "urllib", "urllib.response"),
+    MovedAttribute("addclosehook", "urllib", "urllib.response"),
+    MovedAttribute("addinfo", "urllib", "urllib.response"),
+    MovedAttribute("addinfourl", "urllib", "urllib.response"),
+]
+for attr in _urllib_response_moved_attributes:
+    setattr(Module_six_moves_urllib_response, attr.name, attr)
+del attr
+
+Module_six_moves_urllib_response._moved_attributes = _urllib_response_moved_attributes
+
+_importer._add_module(Module_six_moves_urllib_response(__name__ + ".moves.urllib.response"),
+                      "moves.urllib_response", "moves.urllib.response")
+
+
+class Module_six_moves_urllib_robotparser(_LazyModule):
+
+    """Lazy loading of moved objects in six.moves.urllib_robotparser"""
+
+
+_urllib_robotparser_moved_attributes = [
+    MovedAttribute("RobotFileParser", "robotparser", "urllib.robotparser"),
+]
+for attr in _urllib_robotparser_moved_attributes:
+    setattr(Module_six_moves_urllib_robotparser, attr.name, attr)
+del attr
+
+Module_six_moves_urllib_robotparser._moved_attributes = _urllib_robotparser_moved_attributes
+
+_importer._add_module(Module_six_moves_urllib_robotparser(__name__ + ".moves.urllib.robotparser"),
+                      "moves.urllib_robotparser", "moves.urllib.robotparser")
+
+
+class Module_six_moves_urllib(types.ModuleType):
+
+    """Create a six.moves.urllib namespace that resembles the Python 3 namespace"""
+    __path__ = []  # mark as package
+    parse = _importer._get_module("moves.urllib_parse")
+    error = _importer._get_module("moves.urllib_error")
+    request = _importer._get_module("moves.urllib_request")
+    response = _importer._get_module("moves.urllib_response")
+    robotparser = _importer._get_module("moves.urllib_robotparser")
+
+    def __dir__(self):
+        return ['parse', 'error', 'request', 'response', 'robotparser']
+
+_importer._add_module(Module_six_moves_urllib(__name__ + ".moves.urllib"),
+                      "moves.urllib")
+
+
+def add_move(move):
+    """Add an item to six.moves."""
+    setattr(_MovedItems, move.name, move)
+
+
+def remove_move(name):
+    """Remove item from six.moves."""
+    try:
+        delattr(_MovedItems, name)
+    except AttributeError:
+        try:
+            del moves.__dict__[name]
+        except KeyError:
+            raise AttributeError("no such move, %r" % (name,))
+
+
+if PY3:
+    _meth_func = "__func__"
+    _meth_self = "__self__"
+
+    _func_closure = "__closure__"
+    _func_code = "__code__"
+    _func_defaults = "__defaults__"
+    _func_globals = "__globals__"
+else:
+    _meth_func = "im_func"
+    _meth_self = "im_self"
+
+    _func_closure = "func_closure"
+    _func_code = "func_code"
+    _func_defaults = "func_defaults"
+    _func_globals = "func_globals"
+
+
+try:
+    advance_iterator = next
+except NameError:
+    def advance_iterator(it):
+        return it.next()
+next = advance_iterator
+
+
+try:
+    callable = callable
+except NameError:
+    def callable(obj):
+        return any("__call__" in klass.__dict__ for klass in type(obj).__mro__)
+
+
+if PY3:
+    def get_unbound_function(unbound):
+        return unbound
+
+    create_bound_method = types.MethodType
+
+    def create_unbound_method(func, cls):
+        return func
+
+    Iterator = object
+else:
+    def get_unbound_function(unbound):
+        return unbound.im_func
+
+    def create_bound_method(func, obj):
+        return types.MethodType(func, obj, obj.__class__)
+
+    def create_unbound_method(func, cls):
+        return types.MethodType(func, None, cls)
+
+    class Iterator(object):
+
+        def next(self):
+            return type(self).__next__(self)
+
+    callable = callable
+_add_doc(get_unbound_function,
+         """Get the function out of a possibly unbound function""")
+
+
+get_method_function = operator.attrgetter(_meth_func)
+get_method_self = operator.attrgetter(_meth_self)
+get_function_closure = operator.attrgetter(_func_closure)
+get_function_code = operator.attrgetter(_func_code)
+get_function_defaults = operator.attrgetter(_func_defaults)
+get_function_globals = operator.attrgetter(_func_globals)
+
+
+if PY3:
+    def iterkeys(d, **kw):
+        return iter(d.keys(**kw))
+
+    def itervalues(d, **kw):
+        return iter(d.values(**kw))
+
+    def iteritems(d, **kw):
+        return iter(d.items(**kw))
+
+    def iterlists(d, **kw):
+        return iter(d.lists(**kw))
+
+    viewkeys = operator.methodcaller("keys")
+
+    viewvalues = operator.methodcaller("values")
+
+    viewitems = operator.methodcaller("items")
+else:
+    def iterkeys(d, **kw):
+        return d.iterkeys(**kw)
+
+    def itervalues(d, **kw):
+        return d.itervalues(**kw)
+
+    def iteritems(d, **kw):
+        return d.iteritems(**kw)
+
+    def iterlists(d, **kw):
+        return d.iterlists(**kw)
+
+    viewkeys = operator.methodcaller("viewkeys")
+
+    viewvalues = operator.methodcaller("viewvalues")
+
+    viewitems = operator.methodcaller("viewitems")
+
+_add_doc(iterkeys, "Return an iterator over the keys of a dictionary.")
+_add_doc(itervalues, "Return an iterator over the values of a dictionary.")
+_add_doc(iteritems,
+         "Return an iterator over the (key, value) pairs of a dictionary.")
+_add_doc(iterlists,
+         "Return an iterator over the (key, [values]) pairs of a dictionary.")
+
+
+if PY3:
+    def b(s):
+        return s.encode("latin-1")
+
+    def u(s):
+        return s
+    unichr = chr
+    import struct
+    int2byte = struct.Struct(">B").pack
+    del struct
+    byte2int = operator.itemgetter(0)
+    indexbytes = operator.getitem
+    iterbytes = iter
+    import io
+    StringIO = io.StringIO
+    BytesIO = io.BytesIO
+    _assertCountEqual = "assertCountEqual"
+    if sys.version_info[1] <= 1:
+        _assertRaisesRegex = "assertRaisesRegexp"
+        _assertRegex = "assertRegexpMatches"
+    else:
+        _assertRaisesRegex = "assertRaisesRegex"
+        _assertRegex = "assertRegex"
+else:
+    def b(s):
+        return s
+    # Workaround for standalone backslash
+
+    def u(s):
+        return unicode(s.replace(r'\\', r'\\\\'), "unicode_escape")
+    unichr = unichr
+    int2byte = chr
+
+    def byte2int(bs):
+        return ord(bs[0])
+
+    def indexbytes(buf, i):
+        return ord(buf[i])
+    iterbytes = functools.partial(itertools.imap, ord)
+    import StringIO
+    StringIO = BytesIO = StringIO.StringIO
+    _assertCountEqual = "assertItemsEqual"
+    _assertRaisesRegex = "assertRaisesRegexp"
+    _assertRegex = "assertRegexpMatches"
+_add_doc(b, """Byte literal""")
+_add_doc(u, """Text literal""")
+
+
+def assertCountEqual(self, *args, **kwargs):
+    return getattr(self, _assertCountEqual)(*args, **kwargs)
+
+
+def assertRaisesRegex(self, *args, **kwargs):
+    return getattr(self, _assertRaisesRegex)(*args, **kwargs)
+
+
+def assertRegex(self, *args, **kwargs):
+    return getattr(self, _assertRegex)(*args, **kwargs)
+
+
+if PY3:
+    exec_ = getattr(moves.builtins, "exec")
+
+    def reraise(tp, value, tb=None):
+        if value is None:
+            value = tp()
+        if value.__traceback__ is not tb:
+            raise value.with_traceback(tb)
+        raise value
+
+else:
+    def exec_(_code_, _globs_=None, _locs_=None):
+        """Execute code in a namespace."""
+        if _globs_ is None:
+            frame = sys._getframe(1)
+            _globs_ = frame.f_globals
+            if _locs_ is None:
+                _locs_ = frame.f_locals
+            del frame
+        elif _locs_ is None:
+            _locs_ = _globs_
+        exec("""exec _code_ in _globs_, _locs_""")
+
+    exec_("""def reraise(tp, value, tb=None):
+    raise tp, value, tb
+""")
+
+
+if sys.version_info[:2] == (3, 2):
+    exec_("""def raise_from(value, from_value):
+    if from_value is None:
+        raise value
+    raise value from from_value
+""")
+elif sys.version_info[:2] > (3, 2):
+    exec_("""def raise_from(value, from_value):
+    raise value from from_value
+""")
+else:
+    def raise_from(value, from_value):
+        raise value
+
+
+print_ = getattr(moves.builtins, "print", None)
+if print_ is None:
+    def print_(*args, **kwargs):
+        """The new-style print function for Python 2.4 and 2.5."""
+        fp = kwargs.pop("file", sys.stdout)
+        if fp is None:
+            return
+
+        def write(data):
+            if not isinstance(data, basestring):
+                data = str(data)
+            # If the file has an encoding, encode unicode with it.
+            if (isinstance(fp, file) and
+                    isinstance(data, unicode) and
+                    fp.encoding is not None):
+                errors = getattr(fp, "errors", None)
+                if errors is None:
+                    errors = "strict"
+                data = data.encode(fp.encoding, errors)
+            fp.write(data)
+        want_unicode = False
+        sep = kwargs.pop("sep", None)
+        if sep is not None:
+            if isinstance(sep, unicode):
+                want_unicode = True
+            elif not isinstance(sep, str):
+                raise TypeError("sep must be None or a string")
+        end = kwargs.pop("end", None)
+        if end is not None:
+            if isinstance(end, unicode):
+                want_unicode = True
+            elif not isinstance(end, str):
+                raise TypeError("end must be None or a string")
+        if kwargs:
+            raise TypeError("invalid keyword arguments to print()")
+        if not want_unicode:
+            for arg in args:
+                if isinstance(arg, unicode):
+                    want_unicode = True
+                    break
+        if want_unicode:
+            newline = unicode("\n")
+            space = unicode(" ")
+        else:
+            newline = "\n"
+            space = " "
+        if sep is None:
+            sep = space
+        if end is None:
+            end = newline
+        for i, arg in enumerate(args):
+            if i:
+                write(sep)
+            write(arg)
+        write(end)
+if sys.version_info[:2] < (3, 3):
+    _print = print_
+
+    def print_(*args, **kwargs):
+        fp = kwargs.get("file", sys.stdout)
+        flush = kwargs.pop("flush", False)
+        _print(*args, **kwargs)
+        if flush and fp is not None:
+            fp.flush()
+
+_add_doc(reraise, """Reraise an exception.""")
+
+if sys.version_info[0:2] < (3, 4):
+    def wraps(wrapped, assigned=functools.WRAPPER_ASSIGNMENTS,
+              updated=functools.WRAPPER_UPDATES):
+        def wrapper(f):
+            f = functools.wraps(wrapped, assigned, updated)(f)
+            f.__wrapped__ = wrapped
+            return f
+        return wrapper
+else:
+    wraps = functools.wraps
+
+
+def with_metaclass(meta, *bases):
+    """Create a base class with a metaclass."""
+    # This requires a bit of explanation: the basic idea is to make a dummy
+    # metaclass for one level of class instantiation that replaces itself with
+    # the actual metaclass.
+    class metaclass(meta):
+
+        def __new__(cls, name, this_bases, d):
+            return meta(name, bases, d)
+    return type.__new__(metaclass, 'temporary_class', (), {})
+
+
+def add_metaclass(metaclass):
+    """Class decorator for creating a class with a metaclass."""
+    def wrapper(cls):
+        orig_vars = cls.__dict__.copy()
+        slots = orig_vars.get('__slots__')
+        if slots is not None:
+            if isinstance(slots, str):
+                slots = [slots]
+            for slots_var in slots:
+                orig_vars.pop(slots_var)
+        orig_vars.pop('__dict__', None)
+        orig_vars.pop('__weakref__', None)
+        return metaclass(cls.__name__, cls.__bases__, orig_vars)
+    return wrapper
+
+
+def python_2_unicode_compatible(klass):
+    """
+    A decorator that defines __unicode__ and __str__ methods under Python 2.
+    Under Python 3 it does nothing.
+
+    To support Python 2 and 3 with a single code base, define a __str__ method
+    returning text and apply this decorator to the class.
+    """
+    if PY2:
+        if '__str__' not in klass.__dict__:
+            raise ValueError("@python_2_unicode_compatible cannot be applied "
+                             "to %s because it doesn't define __str__()." %
+                             klass.__name__)
+        klass.__unicode__ = klass.__str__
+        klass.__str__ = lambda self: self.__unicode__().encode('utf-8')
+    return klass
+
+
+# Complete the moves implementation.
+# This code is at the end of this module to speed up module loading.
+# Turn this module into a package.
+__path__ = []  # required for PEP 302 and PEP 451
+__package__ = __name__  # see PEP 366 @ReservedAssignment
+if globals().get("__spec__") is not None:
+    __spec__.submodule_search_locations = []  # PEP 451 @UndefinedVariable
+# Remove other six meta path importers, since they cause problems. This can
+# happen if six is removed from sys.modules and then reloaded. (Setuptools does
+# this for some reason.)
+if sys.meta_path:
+    for i, importer in enumerate(sys.meta_path):
+        # Here's some real nastiness: Another "instance" of the six module might
+        # be floating around. Therefore, we can't use isinstance() to check for
+        # the six meta path importer, since the other six instance will have
+        # inserted an importer with different class.
+        if (type(importer).__name__ == "_SixMetaPathImporter" and
+                importer.name == __name__):
+            del sys.meta_path[i]
+            break
+    del i, importer
+# Finally, add the importer to the meta path import hook.
+sys.meta_path.append(_importer)
diff --git a/scorelib/turn.py b/scorelib/turn.py
new file mode 100644
index 0000000..f0f9a74
--- /dev/null
+++ b/scorelib/turn.py
@@ -0,0 +1,169 @@
+"""Classes for representing speaker turns and interacting with RTTM files."""
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+from collections import defaultdict
+
+from intervaltree import IntervalTree
+
+from .six import iterkeys, python_2_unicode_compatible
+from .utils import clip, warn, xor
+
+__all__ = ['merge_turns', 'trim_turns', 'Turn']
+
+
+@python_2_unicode_compatible
+class Turn(object):
+    """Speaker turn class.
+
+    A turn represents a segment of audio attributed to a single speaker.
+
+    Parameters
+    ----------
+    onset : float
+        Onset of turn in seconds from beginning of recording.
+
+    offset : float, optional
+        Offset of turn in seconds from beginning of recording. If None, then
+        computed from ``onset`` and ``dur``.
+        (Default: None)
+
+    dur : float, optional
+        Duration of turn in seconds. If None, then computed from ``onset`` and
+        ``offset``.
+        (Default: None)
+
+    speaker_id : str, optional
+        Speaker id.
+        (Default: None)
+
+    file_id : str, optional
+        File id.
+        (Default: none)
+    """
+    def __init__(self, onset, offset=None, dur=None, speaker_id=None,
+                 file_id=None):
+        if not xor(offset is None, dur is None):
+            raise ValueError('Exactly one of offset or dur must be given')
+        if onset < 0:
+            raise ValueError('Turn onset must be >= 0 seconds')
+        if offset:
+            dur = offset - onset
+        if dur <= 0:
+            raise ValueError('Turn duration must be > 0 seconds')
+        if not offset:
+            offset = onset + dur
+        self.onset = onset
+        self.offset = offset
+        self.dur = dur
+        self.speaker_id = speaker_id
+        self.file_id = file_id
+
+    def __str__(self):
+        return ('FILE: %s, SPEAKER: %s, ONSET: %f, OFFSET: %f, DUR: %f' %
+                (self.file_id, self.speaker_id, self.onset, self.offset,
+                 self.dur))
+
+    def __repr__(self):
+        speaker_id = ("'%s'" % self.speaker_id if self.speaker_id is not None
+                      else None)
+        file_id = ("'%s'" % self.file_id if self.file_id is not None
+                   else None)
+        return ('Turn(%f, %f, None, %s, %s)' %
+                (self.onset, self.offset, speaker_id, file_id))
+
+
+def merge_turns(turns):
+    """Merge overlapping turns by same speaker within each file."""
+    # Split turns by file and speaker.
+    turn_map = defaultdict(list)
+    file_to_speakers = defaultdict(set)
+    for turn in turns:
+        turn_map[(turn.file_id, turn.speaker_id)].append(turn)
+        file_to_speakers[turn.file_id].add(turn.speaker_id)
+
+    # Merge separately within each file and for each speaker.
+    new_turns = []
+    file_ids = set([file_id for file_id, _ in iterkeys(turn_map)])
+    for file_id in sorted(file_ids):
+        for speaker_id in sorted(file_to_speakers[file_id]):
+            speaker_turns = turn_map[(file_id, speaker_id)]
+            speaker_it = IntervalTree.from_tuples(
+                [(turn.onset, turn.offset) for turn in speaker_turns])
+            n_turns_pre = len(speaker_it)
+            speaker_it.merge_overlaps()
+            n_turns_post = len(speaker_it)
+            if n_turns_post < n_turns_pre:
+                speaker_turns = []
+                for intrvl in speaker_it:
+                    speaker_turns.append(
+                        Turn(intrvl.begin, intrvl.end, speaker_id=speaker_id,
+                             file_id=file_id))
+                speaker_turns = sorted(
+                    speaker_turns, key=lambda x: (x.onset, x.offset))
+                warn('Merging overlapping speaker turns. '
+                     'FILE: %s, SPEAKER: %s' % (file_id, speaker_id))
+            new_turns.extend(speaker_turns)
+    turns = new_turns
+
+    return turns
+
+
+def trim_turns(turns, uem=None, score_onset=None, score_offset=None):
+    """Trim turns to scoring regions defined in UEM.
+
+    Parameters
+    ----------
+    turns : list of Turn
+        Speaker turns.
+
+    uem : UEM, optional
+        Un-partitioned evaluation map.
+        (Default: None)
+
+    score_onset : float, optional
+        Onset of scoring region in seconds from beginning of file. Only valid
+        if ``uem=None``.
+        (Default: None)
+
+    score_offset : float, optional
+        Offset of scoring region in seconds from beginning of file. Only
+        valid if ``uem=None``.
+        (Default: None)
+
+    Returns
+    -------
+    trimmed_turns : list of Turn
+        Trimmed turns.
+    """
+    if uem is not None:
+        if not (score_onset is None and score_offset is None):
+            raise ValueError('Either uem or score_onset and score_offset must '
+                             'be specified.')
+    else:
+        if score_onset is None or score_offset is None:
+            raise ValueError('Either uem or score_onset and score_offset must '
+                             'be specified.')
+        if score_onset < 0:
+            raise ValueError('Scoring region onset must be >= 0 seconds')
+        if score_offset <= score_onset:
+            raise ValueError('Scoring region duration must be > 0 seconds')
+
+    new_turns = []
+    for turn in turns:
+        if uem is not None:
+            if turn.file_id not in uem:
+                warn('Skipping turn from file not in UEM. TURN: %s' % turn)
+                continue
+            score_onset, score_offset = uem[turn.file_id]
+        turn_onset = clip(turn.onset, score_onset, score_offset)
+        turn_offset = clip(turn.offset, score_onset, score_offset)
+        if turn.onset != turn_onset or turn.offset != turn_offset:
+            warn('Truncating turn extending past UEM scoring region '
+                 '[%.3f, %.3f]. TURN: %s' % (score_onset, score_offset, turn))
+        if turn_offset <= turn_onset:
+            continue
+        new_turns.append(Turn(
+            turn_onset, turn_offset, speaker_id=turn.speaker_id,
+            file_id=turn.file_id))
+    return new_turns
diff --git a/scorelib/uem.py b/scorelib/uem.py
new file mode 100644
index 0000000..53c66d5
--- /dev/null
+++ b/scorelib/uem.py
@@ -0,0 +1,147 @@
+"""Functions for reading/writing and manipulating NIST un-partitioned
+evaluation maps.
+
+An un-partitioned evaluation map (UEM) specifies the time regions within each
+file that will be scored.
+"""
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+from collections import defaultdict, Iterable, Mapping
+import itertools
+import os
+
+from .six import iteritems, iterkeys
+from .utils import format_float
+
+__all__ = ['gen_uem', 'load_uem', 'write_uem', 'UEM']
+
+
+class UEM(dict):
+    """Un-partitioned evaluaion map (UEM).
+
+    A UEM defines a mapping from file ids to scoring regions.
+    """
+    def __setitem__(self, k, v):
+        if not isinstance(v, Iterable):
+            raise ValueError('Not a valid interval.')
+        v = list(v)
+        if len(v) != 2:
+            raise ValueError('Not a valid interval.')
+        onset, offset = v
+        onset = float(onset)
+        offset = float(offset)
+        if onset >= offset:
+            raise ValueError('Not a valid interval.')
+        super(UEM, self).__setitem__(k, (onset, offset))
+
+    def update(self, iterable, **kwargs):
+        if iterable is not None:
+            if isinstance(iterable, Mapping):
+                for k, v in iteritems(iterable):
+                    self.__setitem__(k, v)
+            else:
+                for k, v in iterable:
+                    self.__setitem__(k, v)
+        if kwargs:
+            for k, v in iteritems(kwargs):
+                self.__setitem__(k, v)
+
+
+def load_uem(uemf):
+    """Load un-partitioned evaluation map from file in NIST format.
+
+    The un-partitioned evaluation map (UEM) file format contains
+    one record per line, each line consisting of NN space-delimited
+    fields:
+
+    - file id  --  file id
+    - channel  --  channel (1-indexed)
+    - onset  --  onset of evaluation region in seconds from beginning of file
+    - offset  --  offset of evaluation region in seconds from beginning of
+      file
+
+    Lines beginning with semicolons are regarded as comments and ignored.
+
+    Parameters
+    ----------
+    uemf : str
+        Path to UEM file.
+
+    Returns
+    -------
+    uem : UEM
+        Evaluation map.
+    """
+    with open(uemf, 'rb') as f:
+        uem = UEM()
+        for line in f:
+            if line.startswith(b';'):
+                continue
+            fields = line.decode('utf-8').strip().split()
+            file_id = os.path.splitext(fields[0])[0]
+            onset = float(fields[2])
+            offset = float(fields[3])
+            uem[file_id] = (onset, offset)
+    return uem
+
+
+def write_uem(uemf, uem, n_digits=3):
+    """Write un-partitioned evaluation map to file in NIST format.
+
+    Parameters
+    ----------
+    uemf : str
+        Path to output UEM file.
+
+    uem : UEM
+        Evaluation map.
+
+    n_digits : int, optional
+        Number of decimal digits to round to.
+        (Default: 3)
+    """
+    with open(uemf, 'wb') as f:
+        for file_id in sorted(iterkeys(uem)):
+            onset, offset = uem[file_id]
+            line = ' '.join([file_id,
+                             '1',
+                             format_float(onset, n_digits),
+                             format_float(offset, n_digits)
+                            ])
+            f.write(line.encode('utf-8'))
+            f.write(b'\n')
+
+
+def gen_uem(ref_turns, sys_turns):
+    """Generate un-partitioned evaluation map.
+
+    For each file, the extent of the scoring region is set as follows:
+
+    - onset = min(minimum reference onset, minimum system onset)
+    - offset = max(maximum reference onset, maximum system offset)
+
+    Parameters
+    ----------
+    ref_turns : list of Turn
+        Reference speaker turns.
+
+    sys_turns : list of Turn
+        System speaker turns.
+
+    Returns
+    -------
+    uem : UEM
+        Un-partitioned evaluation map.
+    """
+    file_ids = set()
+    onsets = defaultdict(set)
+    offsets = defaultdict(set)
+    for turn in itertools.chain(ref_turns, sys_turns):
+        file_ids.add(turn.file_id)
+        onsets[turn.file_id].add(turn.onset)
+        offsets[turn.file_id].add(turn.offset)
+    uem = UEM()
+    for file_id in file_ids:
+        uem[file_id] = (min(onsets[file_id]), max(offsets[file_id]))
+    return uem
diff --git a/scorelib/utils.py b/scorelib/utils.py
new file mode 100644
index 0000000..e01527b
--- /dev/null
+++ b/scorelib/utils.py
@@ -0,0 +1,65 @@
+"""Utility functions."""
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+import sys
+
+from . import six
+
+__all__ = ['clip', 'error', 'format_float', 'info', 'warn', 'xor']
+
+
+def error(msg, file=sys.stderr):
+    """Log error message ``msg`` to stderr."""
+    msg = 'ERROR: %s' % msg
+    if six.PY2:
+        msg = msg.encode('utf-8')
+    print(msg, file=file)
+
+
+def info(msg, print_level=False, file=sys.stdout):
+    """Log info message ``msg`` to stdout."""
+    if print_level:
+        msg = 'INFO: %s' %msg
+    if six.PY2:
+        msg = msg.encode('utf-8')
+    print(msg, file=file)
+
+
+def warn(msg, file=sys.stderr):
+    """Log warning message ``msg`` to stderr."""
+    msg = 'WARNING: %s' %msg
+    if six.PY2:
+        msg = msg.encode('utf-8')
+    print(msg, file=file)
+
+
+def xor(x, y):
+    """Return truth value of ``x`` XOR ``y``."""
+    return bool(x) != bool(y)
+
+
+def format_float(x, n_digits=3):
+    """Format floating point number for output as string.
+
+    Parameters
+    ----------
+    x : float
+        Number.
+
+    n_digits : int, optional
+        Number of decimal digits to round to.
+        (Default: 3)
+
+    Returns
+    -------
+    s : str
+        Formatted string.
+    """
+    fmt_str = '%%.%df' % n_digits
+    return fmt_str % round(x, n_digits)
+
+
+def clip(x, lower, upper):
+    """Clip ``x`` to [``lower``, ``upper``]."""
+    return min(max(x, lower), upper)
diff --git a/validate_rttm.py b/validate_rttm.py
new file mode 100755
index 0000000..cd1af64
--- /dev/null
+++ b/validate_rttm.py
@@ -0,0 +1,49 @@
+#!/usr/bin/env python
+"""Validate RTTM files.
+
+To validate a RTTM files ``f1.rttm``, ``f2.rttm``, ...
+
+    python validate_rttm.py f1.rttm f2.rttm ...
+
+which will for each file report the following:
+
+- the number of unique file ids found
+- the number of speaker ids found
+- each line containing an error + an error message
+"""
+from __future__ import print_function
+from __future__ import unicode_literals
+import argparse
+import sys
+
+from scorelib import __version__ as VERSION
+from scorelib.rttm import validate_rttm
+from scorelib.utils import error, info
+
+
+if __name__ == '__main__':
+    # Parse command line arguments.
+    parser = argparse.ArgumentParser(
+        description='Validate RTTM files.', add_help=True,
+        usage='%(prog)s [options] rttm_fns')
+    parser.add_argument(
+        'rttm_fns', nargs='+', help='RTTM files')
+    parser.add_argument(
+        '--version', action='version',
+        version='%(prog)s ' + VERSION)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    args = parser.parse_args()
+
+    for rttm_fn in args.rttm_fns:
+        info('Validating %s...' % rttm_fn)
+        file_ids, speaker_ids, error_messages = validate_rttm(rttm_fn)
+        file_ids = sorted(file_ids)
+        info('%d file ids found: %s' %
+             (len(file_ids), ', '.join(file_ids)))
+        speaker_ids = sorted(speaker_ids)
+        info('%d speaker ids found: %s' %
+             (len(speaker_ids), ', '.join(speaker_ids)))
+        for msg in error_messages:
+            error(msg, file=sys.stdout)