GitMA Metrics#

Note

This module provides methods to compute inter annotator agreement metrics. These are integrated in the CatmaProject class.

Examples to use the metrics functions can be found in the demo notebook.

filter_ac_by_tag(ac1, ac2, tag_filter=None, filter_both_ac=True)#

Returns lists of annotations filtered by tags. If filter_both_ac=False only the first collection’s annotations get filtered.

Parameters
  • ac1 (AnnotationCollection) – First annotation collection.

  • ac2 (AnnotationCollection) – Second annotation collection.

  • tag_filter (list, optional) – The list of tags to be included. Defaults to None.

  • filter_both_ac (bool, optional) – If True both collections get filtered . Defaults to True.

Returns

Two filtered list of annotations.

Return type

Tuple[List[Annotation]]

get_same_text(annotation_list1, annotation_list2)#

All text parts annotated only by one annotator get excluded.

Parameters
  • annotation_list1 (List[Annotation]) – Annotations from first collection.

  • annotation_list2 (List[Annotation]) – Annotations from second collection.

Returns

Annotations from both collections.

Return type

Tuple[List[Annotation]]

test_max_overlap(silver_annotation, second_annotator_annotations)#

Looks for best matching Annotation in second annotator annotations.

Parameters
  • silver_annotation (Annotation) – Annotation that will be matched

  • second_annotator_annotations (list) – List of Annotations

Returns

Annotation Object

Return type

Annotation

test_overlap(an1, an2)#

Test if annotation 2 starts or ends within annotations 1 span.

Parameters
Returns

True if any overlap exists.

Return type

bool

get_overlap_percentage(an_pair)#

Computes the overlap percentage of two annotations by averaging the overlapping proportion of both annotation spans.

Parameters

an_pair (List[Annotation]) – Two overlapping annotations.

Returns

Overlap percentage between 0 and 1.0.

Return type

float

get_confusion_matrix(pair_list, level='tag')#

Generates confusion matrix for two

Parameters
  • pair_list (List[Tuple[Annotation]]) – List of overlapping annotations as tuples.

  • level (str, optional) – ‘tag’ or any property with prefix ‘prop:’ in the annotation collections. Defaults to ‘tag’.

Returns

Confusion matrix as pandas data frame.

Return type

pd.DataFrame

class EmptyTag#

Bases: object

Helper class for missing annotations.

class EmptyAnnotation(start_point, end_point, property_dict)#

Bases: object

Helper class for missing annotations.

Parameters
  • start_point (int) – Text pointer.

  • end_point (int) – Text pointer.

  • property_dict (dict) – Property dictionary

get_annotation_pairs(ac1, ac2, tag_filter=None, filter_both_ac=False, property_filter=None)#

Returns list of all overlapping annotations in two annotation collections. tag_filter can be defined as list of tag names if not all annotations are included.

Parameters
  • ac1 (AnnotationCollection) – First annotation collection.

  • ac2 (AnnotationCollection) – Second annotation collection.

  • tag_filter (list, optional) – List of included tag names. Defaults to None.

  • filter_both_ac (bool, optional) – If True both annotation collections get filterde. Defaults to False.

  • property_filter (str, optional) – List of included properties. Defaults to None.

Returns

List of paired annotations.

Return type

List[Tuple[Annotation]]

get_iaa_data(annotation_pairs, level='tag', include_empty_annotations=True)#

Yields 3-tuples (Coder, Item, Label) for nltk.AnnotationTask data input. If level is not “tag” it has to be a property name, which exists in all annotations.

an_list = [
    (Annotation(), Annotation()),
    (Annotation(), Annotation()),
    (Annotation(), Annotation()),
    (Annotation(), Annotation())
]

— to —

aTask_data = [
    (1, 1, 'non_event'),
    (2, 1, 'non_event'),
    (1, 2, 'non_event'),
    (2, 2, 'non_event'),
    (3, 3, 'non_event'),
    (3, 3, 'stative_event'),
]
Parameters
  • annotation_pairs (List[Tuple[Annotation]]) – List of annotation pairs.

  • level (str, optionale) – ‘tag’ or any property in the annotation collections with the prefix ‘prop:’.

gamma_agreement(project, annotation_collections, alpha=3, beta=1, delta_empty=0.01, n_samples=30, precision_level=0.01)#

Computes Gamma IAA based on Mathet et. al “The Unified and Holistic Method Gamma” using the pygamma-agreement library. For further installation steps of pygamma-agreement and different disagreement options see the Github site.

Parameters
  • project (type) – The CATMA project that holds the used annotation collections.

  • annotation_collections (List[AnnotationCollection]) – List of annotation collections to be included.

  • alpha (int, optional) – Coefficient weighting the positional dissimilarity value. Defaults to 3.

  • beta (int, optional) – Coefficient weighting the categorical dissimilarity value. Defaults to 1.

  • delta_empty (float, optional) – description. Defaults to 0.01.

  • n_samples (int, optional) – Number of random continuum sampled from this continuum. Defaults to 30.

  • precision_level (float, optional) – Optional float or “high”, “medium”, “low” error percentage of the gamma estimation. Defaults to 0.01.

Raises

ImportWarning – If pygamma has not been installed.