GitMA Metrics#

Note

This module provides methods to compute inter annotator agreement metrics. These are integrated in the CatmaProject class.

Examples to use the metrics functions can be found in the demo notebook.

filter_ac_by_tag(ac1, ac2, tag_filter=None, filter_both_ac=True)#

Returns lists of annotations filtered by tags. If filter_both_ac=False only the first collection’s annotations get filtered.

Parameters

ac1 (AnnotationCollection) – First annotation collection.
ac2 (AnnotationCollection) – Second annotation collection.
tag_filter (list, optional) – The list of tags to be included. Defaults to None.
filter_both_ac (bool, optional) – If True both collections get filtered . Defaults to True.

Returns

Two filtered list of annotations.

Return type

Tuple[List[Annotation]]

get_same_text(annotation_list1, annotation_list2)#

All text parts annotated only by one annotator get excluded.

Parameters

annotation_list1 (List[Annotation]) – Annotations from first collection.
annotation_list2 (List[Annotation]) – Annotations from second collection.

Returns

Annotations from both collections.

Return type

Tuple[List[Annotation]]

test_max_overlap(silver_annotation, second_annotator_annotations)#

Looks for best matching annotation in second annotator’s annotations.

Parameters

silver_annotation (Annotation) – Annotation that will be matched.
second_annotator_annotations (list) – List of annotations.

Returns

Annotation Object

Return type

Annotation

test_overlap(an1, an2)#

Test if annotation an2 starts or ends within annotation an1’s span.

Parameters

an1 (Annotation) – First annotation.
an2 (Annotation) – Second annotation.

Returns

True if any overlap exists.

Return type

bool

get_overlap_percentage(an_pair)#

Computes the overlap percentage of two annotations by averaging the overlapping proportion of both annotation spans.

Parameters: an_pair (List[Annotation]) – Two overlapping annotations.
Returns: Overlap percentage between 0 and 1.0.
Return type: float

get_confusion_matrix(pair_list, level='tag')#

Generates confusion matrix for two

Parameters

pair_list (List[Tuple[Annotation]]) – List of overlapping annotations as tuples.
level (str, optional) – ‘tag’ or any property with prefix ‘prop:’ in the annotation collections. Defaults to ‘tag’.

Returns

Confusion matrix as pandas data frame.

Return type

pd.DataFrame

get_cooccurence_matrix(annotationdata)#

Generates cooccurence matrix for annotation data

Parameters: annotationdata (Set[List[coderid,itemid,tag]]) – List of overlapping annotations as a set.
Returns: Cooccurence matrix as pandas data frame.
Return type: pd.DataFrame

class EmptyTag#

Bases: object

Helper class for missing annotations.

class EmptyAnnotation(start_point, end_point, property_dict)#

Bases: object

Helper class for missing annotations.

Parameters

start_point (int) – Text pointer.
end_point (int) – Text pointer.
property_dict (dict) – Property dictionary

get_annotation_pairs(ac1, ac2, tag_filter=None, filter_both_ac=False, property_filter=None, verbose=True)#

For each annotation in ac1, finds the best matching annotation (maximum overlap) in ac2. Where there is no matching annotation in ac2, an EmptyAnnotation is substituted. Returns a list of tuples of the matched pairs.

The filter parameters can be used so that only annotations using one of the specified tags or the specified property are included.

Parameters

ac1 (AnnotationCollection) – First annotation collection.
ac2 (AnnotationCollection) – Second annotation collection.
tag_filter (list, optional) – The list of tags to be included. Defaults to None (no filter / all tags included).
filter_both_ac (bool, optional) – If True the tag_filter is applied to both collections. Defaults to False.
property_filter (str, optional) – If not None, only annotations with this property are included. Defaults to None (no filter / all annotations included).
verbose (bool, optional) – Whether to print results to stdout. Defaults to True.

Returns

List of paired annotation tuples.

Return type

List[Union[Tuple[Annotation, EmptyAnnotation], Tuple[Annotation, Annotation]]]

get_annotation_pairs_for_multiple_annotators(ac_dict, ac_names=[], tag_filter=[], filter_both_ac=True, include_empty_annotations=True, property_filter=None, verbose=True)#

Get annotation data in the NLTK format for IAA calculation for two or more annotators without duplicate pairs. :type ac_dict: :py:class:dict``[:py:class:``str, :py:class:~gitma.annotation_collection.AnnotationCollection] :param ac_dict: Dictionary of all annotation collections. :type ac_dict: dict :type ac_names: :py:class:list :param ac_names: List of annotation collection names to include in the IAA calculation. If empty, all ACs in the project will be used. :type ac_names: list :type tag_filter: :py:class:list :param tag_filter: List of tags that should be included for iaa calculation. If empty, all tags will be used. :type tag_filter: list :type filter_both_ac: :py:class:bool :param filter_both_ac: Whether to apply tag_filter on both ACs in the pair or just on the first AC. Default is True. :type filter_both_ac: bool :type include_empty_annotations: :py:class:bool :param include_empty_annotations: Whether to include empty annotations in the IAA data. If False, only annotations with a matching annotation in the second collection are included. Default is True. :type include_empty_annotations: bool :type property_filter: :py:data:~typing.Optional``[:py:class:``str] :param property_filter: Property to filter by as a string with the property name. If None, all properties will be used. verbose (bool, optional): Whether to print results to stdout. Defaults to True. :type property_filter: str, optional

Returns: //www.nltk.org/api/nltk.metrics.agreement.html#nltk.metrics.agreement.AnnotationTask.**init**) for IAA calculation.
Return type: Set of annotation tuples in [NLTK format](https

get_iaa_data(annotation_pairs, level='tag', include_empty_annotations=True)#

Yields 3-tuples (Coder, Item, Label) for nltk.AnnotationTask data input. If level is not “tag” it has to be a property name, which exists in all annotations.

an_list = [
    (Annotation(), Annotation()),
    (Annotation(), Annotation()),
    (Annotation(), Annotation()),
    (Annotation(), Annotation())
]

— to —

aTask_data = [
    (1, 1, 'non_event'),
    (2, 1, 'non_event'),
    (1, 2, 'non_event'),
    (2, 2, 'non_event'),
    (3, 3, 'non_event'),
    (3, 3, 'stative_event'),
]

Parameters

annotation_pairs (List[Tuple[Annotation]]) – List of annotation pairs.
level (str, optional) – ‘tag’ or any property in the annotation collections with the prefix ‘prop:’.

gamma_agreement(project, annotation_collections, alpha=3, beta=1, delta_empty=0.01, n_samples=30, precision_level=0.01)#

Computes Gamma IAA based on Mathet et. al “The Unified and Holistic Method Gamma” using the pygamma-agreement library. For further installation steps of pygamma-agreement and different disagreement options see the Github site.

Parameters

project (type) – The CATMA project that holds the used annotation collections.
annotation_collections (List[AnnotationCollection]) – List of annotation collections to be included.
alpha (int, optional) – Coefficient weighting the positional dissimilarity value. Defaults to 3.
beta (int, optional) – Coefficient weighting the categorical dissimilarity value. Defaults to 1.
delta_empty (float, optional) – description. Defaults to 0.01.
n_samples (int, optional) – Number of random continuum sampled from this continuum. Defaults to 30.
precision_level (float, optional) – Optional float or “high”, “medium”, “low” error percentage of the gamma estimation. Defaults to 0.01.

Raises

ImportWarning – If pygamma has not been installed.

Property

GitMA Network

Quick search

GitMA Metrics#