GitMA Metrics#

Note

This module provides methods to compute inter annotator agreement metrics. These are integrated in the CatmaProject class.

Examples to use the metrics functions can be found in the demo notebook.

filter_ac_by_tag(ac1, ac2, tag_filter=None, filter_both_ac=True)#

Returns lists of annotations filtered by tags. If filter_both_ac=False only the first collection’s annotations get filtered.

Parameters
  • ac1 (AnnotationCollection) – First annotation collection.

  • ac2 (AnnotationCollection) – Second annotation collection.

  • tag_filter (list, optional) – The list of tags to be included. Defaults to None.

  • filter_both_ac (bool, optional) – If True both collections get filtered . Defaults to True.

Returns

Two filtered list of annotations.

Return type

Tuple[List[Annotation]]

get_same_text(annotation_list1, annotation_list2)#

All text parts annotated only by one annotator get excluded.

Parameters
  • annotation_list1 (List[Annotation]) – Annotations from first collection.

  • annotation_list2 (List[Annotation]) – Annotations from second collection.

Returns

Annotations from both collections.

Return type

Tuple[List[Annotation]]

test_max_overlap(silver_annotation, second_annotator_annotations)#

Looks for best matching annotation in second annotator’s annotations.

Parameters
  • silver_annotation (Annotation) – Annotation that will be matched.

  • second_annotator_annotations (list) – List of annotations.

Returns

Annotation Object

Return type

Annotation

test_overlap(an1, an2)#

Test if annotation an2 starts or ends within annotation an1’s span.

Parameters
Returns

True if any overlap exists.

Return type

bool

get_overlap_percentage(an_pair)#

Computes the overlap percentage of two annotations by averaging the overlapping proportion of both annotation spans.

Parameters

an_pair (List[Annotation]) – Two overlapping annotations.

Returns

Overlap percentage between 0 and 1.0.

Return type

float

get_confusion_matrix(pair_list, level='tag')#

Generates confusion matrix for two

Parameters
  • pair_list (List[Tuple[Annotation]]) – List of overlapping annotations as tuples.

  • level (str, optional) – ‘tag’ or any property with prefix ‘prop:’ in the annotation collections. Defaults to ‘tag’.

Returns

Confusion matrix as pandas data frame.

Return type

pd.DataFrame

class EmptyTag#

Bases: object

Helper class for missing annotations.

class EmptyAnnotation(start_point, end_point, property_dict)#

Bases: object

Helper class for missing annotations.

Parameters
  • start_point (int) – Text pointer.

  • end_point (int) – Text pointer.

  • property_dict (dict) – Property dictionary

get_annotation_pairs(ac1, ac2, tag_filter=None, filter_both_ac=False, property_filter=None, verbose=True)#

For each annotation in ac1, finds the best matching annotation (maximum overlap) in ac2. Where there is no matching annotation in ac2, an EmptyAnnotation is substituted. Returns a list of tuples of the matched pairs.

The filter parameters can be used so that only annotations using one of the specified tags or the specified property are included.

Parameters
  • ac1 (AnnotationCollection) – First annotation collection.

  • ac2 (AnnotationCollection) – Second annotation collection.

  • tag_filter (list, optional) – The list of tags to be included. Defaults to None (no filter / all tags included).

  • filter_both_ac (bool, optional) – If True the tag_filter is applied to both collections. Defaults to False.

  • property_filter (str, optional) – If not None, only annotations with this property are included. Defaults to None (no filter / all annotations included).

  • verbose (bool, optional) – Whether to print results to stdout. Defaults to True.

Returns

List of paired annotation tuples.

Return type

List[Union[Tuple[Annotation, EmptyAnnotation], Tuple[Annotation, Annotation]]]

get_iaa_data(annotation_pairs, level='tag', include_empty_annotations=True)#

Yields 3-tuples (Coder, Item, Label) for nltk.AnnotationTask data input. If level is not “tag” it has to be a property name, which exists in all annotations.

an_list = [
    (Annotation(), Annotation()),
    (Annotation(), Annotation()),
    (Annotation(), Annotation()),
    (Annotation(), Annotation())
]

— to —

aTask_data = [
    (1, 1, 'non_event'),
    (2, 1, 'non_event'),
    (1, 2, 'non_event'),
    (2, 2, 'non_event'),
    (3, 3, 'non_event'),
    (3, 3, 'stative_event'),
]
Parameters
  • annotation_pairs (List[Tuple[Annotation]]) – List of annotation pairs.

  • level (str, optional) – ‘tag’ or any property in the annotation collections with the prefix ‘prop:’.

gamma_agreement(project, annotation_collections, alpha=3, beta=1, delta_empty=0.01, n_samples=30, precision_level=0.01)#

Computes Gamma IAA based on Mathet et. al “The Unified and Holistic Method Gamma” using the pygamma-agreement library. For further installation steps of pygamma-agreement and different disagreement options see the Github site.

Parameters
  • project (type) – The CATMA project that holds the used annotation collections.

  • annotation_collections (List[AnnotationCollection]) – List of annotation collections to be included.

  • alpha (int, optional) – Coefficient weighting the positional dissimilarity value. Defaults to 3.

  • beta (int, optional) – Coefficient weighting the categorical dissimilarity value. Defaults to 1.

  • delta_empty (float, optional) – description. Defaults to 0.01.

  • n_samples (int, optional) – Number of random continuum sampled from this continuum. Defaults to 30.

  • precision_level (float, optional) – Optional float or “high”, “medium”, “low” error percentage of the gamma estimation. Defaults to 0.01.

Raises

ImportWarning – If pygamma has not been installed.