GitMA Metrics#
Note
This module provides methods to compute inter annotator agreement metrics.
These are integrated in the CatmaProject class.
Examples to use the metrics functions can be found in the demo notebook.
- filter_ac_by_tag(ac1, ac2, tag_filter=None, filter_both_ac=True)#
Returns lists of annotations filtered by tags. If
filter_both_ac=Falseonly the first collection’s annotations get filtered.- Parameters
ac1 (AnnotationCollection) – First annotation collection.
ac2 (AnnotationCollection) – Second annotation collection.
tag_filter (list, optional) – The list of tags to be included. Defaults to None.
filter_both_ac (bool, optional) – If
Trueboth collections get filtered . Defaults to True.
- Returns
Two filtered list of annotations.
- Return type
Tuple[List[Annotation]]
- get_same_text(annotation_list1, annotation_list2)#
All text parts annotated only by one annotator get excluded.
- Parameters
annotation_list1 (List[Annotation]) – Annotations from first collection.
annotation_list2 (List[Annotation]) – Annotations from second collection.
- Returns
Annotations from both collections.
- Return type
Tuple[List[Annotation]]
- test_max_overlap(silver_annotation, second_annotator_annotations)#
Looks for best matching annotation in second annotator’s annotations.
- Parameters
silver_annotation (Annotation) – Annotation that will be matched.
second_annotator_annotations (list) – List of annotations.
- Returns
Annotation Object
- Return type
- test_overlap(an1, an2)#
Test if annotation
an2starts or ends within annotationan1’s span.- Parameters
an1 (Annotation) – First annotation.
an2 (Annotation) – Second annotation.
- Returns
True if any overlap exists.
- Return type
bool
- get_overlap_percentage(an_pair)#
Computes the overlap percentage of two annotations by averaging the overlapping proportion of both annotation spans.
- Parameters
an_pair (List[Annotation]) – Two overlapping annotations.
- Returns
Overlap percentage between 0 and 1.0.
- Return type
float
- get_confusion_matrix(pair_list, level='tag')#
Generates confusion matrix for two
- Parameters
pair_list (List[Tuple[Annotation]]) – List of overlapping annotations as tuples.
level (str, optional) – ‘tag’ or any property with prefix ‘prop:’ in the annotation collections. Defaults to ‘tag’.
- Returns
Confusion matrix as pandas data frame.
- Return type
pd.DataFrame
- class EmptyTag#
Bases:
objectHelper class for missing annotations.
- class EmptyAnnotation(start_point, end_point, property_dict)#
Bases:
objectHelper class for missing annotations.
- Parameters
start_point (int) – Text pointer.
end_point (int) – Text pointer.
property_dict (dict) – Property dictionary
- get_annotation_pairs(ac1, ac2, tag_filter=None, filter_both_ac=False, property_filter=None, verbose=True)#
For each annotation in
ac1, finds the best matching annotation (maximum overlap) inac2. Where there is no matching annotation inac2, anEmptyAnnotationis substituted. Returns a list of tuples of the matched pairs.The filter parameters can be used so that only annotations using one of the specified tags or the specified property are included.
- Parameters
ac1 (AnnotationCollection) – First annotation collection.
ac2 (AnnotationCollection) – Second annotation collection.
tag_filter (list, optional) – The list of tags to be included. Defaults to
None(no filter / all tags included).filter_both_ac (bool, optional) – If
Truethetag_filteris applied to both collections. Defaults toFalse.property_filter (str, optional) – If not
None, only annotations with this property are included. Defaults toNone(no filter / all annotations included).verbose (bool, optional) – Whether to print results to stdout. Defaults to
True.
- Returns
List of paired annotation tuples.
- Return type
List[Union[Tuple[Annotation, EmptyAnnotation], Tuple[Annotation, Annotation]]]
- get_iaa_data(annotation_pairs, level='tag', include_empty_annotations=True)#
Yields 3-tuples (Coder, Item, Label) for nltk.AnnotationTask data input. If level is not “tag” it has to be a property name, which exists in all annotations.
an_list = [ (Annotation(), Annotation()), (Annotation(), Annotation()), (Annotation(), Annotation()), (Annotation(), Annotation()) ]
— to —
aTask_data = [ (1, 1, 'non_event'), (2, 1, 'non_event'), (1, 2, 'non_event'), (2, 2, 'non_event'), (3, 3, 'non_event'), (3, 3, 'stative_event'), ]
- Parameters
annotation_pairs (List[Tuple[Annotation]]) – List of annotation pairs.
level (str, optional) – ‘tag’ or any property in the annotation collections with the prefix ‘prop:’.
- gamma_agreement(project, annotation_collections, alpha=3, beta=1, delta_empty=0.01, n_samples=30, precision_level=0.01)#
Computes Gamma IAA based on Mathet et. al “The Unified and Holistic Method Gamma” using the
pygamma-agreementlibrary. For further installation steps of pygamma-agreement and different disagreement options see the Github site.- Parameters
project (type) – The CATMA project that holds the used annotation collections.
annotation_collections (List[AnnotationCollection]) – List of annotation collections to be included.
alpha (int, optional) – Coefficient weighting the positional dissimilarity value. Defaults to 3.
beta (int, optional) – Coefficient weighting the categorical dissimilarity value. Defaults to 1.
delta_empty (float, optional) – description. Defaults to 0.01.
n_samples (int, optional) – Number of random continuum sampled from this continuum. Defaults to 30.
precision_level (float, optional) – Optional float or “high”, “medium”, “low” error percentage of the gamma estimation. Defaults to 0.01.
- Raises
ImportWarning – If pygamma has not been installed.