CatmaProject#
- class CatmaProject(project_name, projects_directory='./', included_acs=None, excluded_acs=None, ac_filter_keyword=None, load_from_gitlab=False, gitlab_access_token=None, backup_directory='./')#
Bases:
object
Class that represents a CATMA project including all documents, tagsets and annotation collections.
You can either load the project from a local Git clone or you can load it directly from GitLab after generating an access token in the CATMA GUI. See the examples in the docs for details.
- Parameters
project_name (str) – The CATMA project name. Defaults to None.
projects_directory (str, optional) – The directory where your CATMA projects are located. Defaults to ‘./’.
included_acs (list, optional) – All annotation collections that should get loaded. If annotation collections are neither included nor excluded all annotation collections get loaded. Defaults to None.
excluded_acs (list, optional) – Annotation collections that should not get loaded. Defaults to None.
ac_filter_keyword (str, bool) – Only annotation collections with the given keyword get loaded.
load_from_gitlab (bool, optional) – Whether the CATMA project should be loaded directly from CATMA’s GitLab backend. Defaults to False.
gitlab_access_token (str, optional) – The private CATMA GitLab access token. Defaults to None.
backup_directory (str, optional) – The directory where your project clone should be located. Defaults to ‘./’.
- Raises
FileNotFoundError – If the local or remote CATMA project was not found.
- uuid: str#
The project’s UUID.
- projects_directory: str#
The directory where the project is located.
- name: str#
The project’s name.
- tagsets: List[gitma.tagset.Tagset]#
List of gitma.Tagset objects.
- tagset_dict: Dict[str, gitma.tagset.Tagset]#
Dictionary of the project’s tagsets with the UUIDs as keys and gitma.Tagset objects as values.
- texts: List[gitma.text.Text]#
List of the gitma.Text objects.
- text_dict: Dict[str, gitma.text.Text]#
Dictionary of the project’s texts with titles as keys and gitma.Text objects as values.
- annotation_collections: List[gitma.annotation_collection.AnnotationCollection]#
List of gitma.AnnotationCollection objects.
- ac_dict: Dict[str, gitma.annotation_collection.AnnotationCollection]#
Dictionary of the project’s annotation collections with names as keys and gitma.AnnotationCollection objects as values.
- to_json(annotation_collections='all', rename_dict=None, included_tags=None, directory='./')#
Saves all annotations as a single JSON file.
- Parameters
annotation_collections (Union[List[str], str], optional) – Parameter to define the exported annotation collections. Defaults to ‘all’.
rename_dict (Union[Dict[str, str], None], optional) – Dictionary to rename annotation collections. Defaults to None.
included_tags (Union[list, None]) – Tags included in the annotations list. If
None
all tags are included. Defaults to None.directory (str) – Backup directory. Defaults to ‘./’.
- update()#
Updates local git folder and reloads CatmaProject.
Warning: This method can only be used if you have Git installed.
- annotations()#
Generator that yields all annotations as gitma.annotation.Annotation objects.
- Yields
Annotation – gitma.annotation.Annotation
- all_tags()#
Generator that yields all tags as gitma.tag.Tag objects.
- Yields
Tag – gitma.tag.Tag
- stats()#
Shows some CATMA Project stats.
- Returns
DataFrame with projects stats sorted by the Annotation Collection names.
- Return type
pd.DataFrame
- write_annotation_json(text_title, annotation_collection_name, tagset_name, tag_name, start_points, end_points, property_annotations, author)#
Function to write a new annotation into this project.
- Parameters
text_title (str) – The text title.
annotation_collection_name (str) – The name of the target annotation collection.
tagset_name (str) – The tagset’s name.
tag_name (str) – The tag’s name.
start_points (list) – The start points of the annotation spans.
end_points (list) – The end points of the annotation spans.
property_annotations (dict) – A dictionary with property names mapped to value lists.
author (str) – The annotation’s author.
- create_gold_annotations(ac_1_name, ac_2_name, gold_ac_name, excluded_tags=None, min_overlap=1.0, same_tag=True, property_values='none', push_to_gitlab=False)#
Searches for matching annotations in 2 AnnotationCollections and copies all matches in a third AnnotationCollection. By default only matching Property Values get copied.
- Parameters
ac_1_name (str) – AnnotationCollection 1 Name.
ac_2_name (str) – AnnnotationCollection 2 Name.
gold_ac_name (str) – AnnotationCollection Name for Gold Annotations.
excluded_tags (list, optional) – Annotations with this Tags will not be included in the Gold Annotations. Defaults to None.
min_overlap (float, optional) – The minimal overlap to genereate a gold annotation. Defaults to 1.0.
same_tag (bool, optional) – Whether both annotations need to be the same tag. Defaults to True.
property_values (str, optional) – Whether only matching Property Values from AnnonationCollection 1 shall be copied. Default to ‘matching’. Further options: ‘none’.
push_to_gitlab (bool, optional) – Whether the gold annotations shall be uploaded to the CATMA GitLab. Default to False.
- merge_annotations()#
Concatenates all annotation collections to one pandas data frame and resets index.
- Returns
Data frame including all annotation in the CATMA project.
- Return type
pd.DataFrame
- merge_annotations_per_document()#
Merges all annotations per document to one annotation collection.
- Returns
Dictionary with document titles as keys and annotations per document as pandas data frame.
- Return type
Dict[str, pd.DataFrame]
- plot_annotation_progression()#
Plot the annotation progression for every annotator in a CATMA project.
- Returns
Plotly scatter plot.
- Return type
go.Figure
- plot_interactive(color_col='annotation collection')#
This function generates one Plotly scatter plot per annotated document in a CATMA project. By default the colors represent the annotation collections. By that they can be deactivated with the interactive legend.
- Parameters
color_col (str, optional) – ‘annotation collection’, ‘annotator’, ‘tag’ or any property with the prefix ‘prop:’. Defaults to ‘annotation collection’.
- Returns
Plotly scatter plot.
- Return type
go.Figure
- plot_annotations(color_col='annotation collection')#
This function generates one Plotly scatter plot per annotated document in a CATMA project. By default the colors represent the annotation collections. By that they can be deactivated with the interactive legend.
- Parameters
color_col (str, optional) – ‘annotation collection’, ‘annotator’, ‘tag’ or any property with the prefix ‘prop:’. Defaults to ‘annotation collection’.
- Returns
Plotly scatter plot.
- Return type
go.Figure
- cooccurrence_network(annotation_collections='all', character_distance=100, included_tags=None, excluded_tags=None, level='tag', plot_stats=False, save_as_gexf=False)#
Draws cooccurrence network graph for annotations.
Every tag is represented by a node and every edge represents two cooccurent tags. You can by the
character_distance
parameter when two annotations are considered cooccurent. If you setcharacter_distance=0
only the tags of overlapping annotations will be represented as connected nodes.See the examples in the docs for details about the usage.
- Parameters
annotation_collections (Union[str, List[str]]) – List with the names of the included annotation collections. If set to ‘all’ all annotation collections are included. Defaults to ‘all’.
character_distance (int, optional) – In which distance annotations are considered coocurrent. Defaults to 100.
included_tags (list, optional) – List of included tags. Defaults to None.
excluded_tags (list, optional) – List of excluded tags. Defaults to None.
level (str, optional) – ‘tag’ or any property name with ‘prop:’ as prefix. Defaults to ‘tag’.
plot_stats (bool, optional) – Whether to return network stats. Defaults to False.
save_as_gexf (bool, optional) – If given any string the network gets saved as Gephi file with the string as filename.
- disagreement_network(annotation_collections='all', included_tags=None, excluded_tags=None, level='tag', plot_stats=False, save_as_gexf=False)#
Draws disagreement network.
Every edge in the network represents two overlapping annotations from different annotation collections and with different tags or property values.
- Parameters
annotation_collections (Union[str, List[str]], optional) – List with the names of the included annotation collections. If set to ‘all’ all annotation collections are included. Defaults to ‘all’.
included_tags (list, optional) – List of included tags. Defaults to None.
excluded_tags (list, optional) – List of excluded tags. Defaults to None.
level (str, optional) – ‘tag’ or any property name with ‘prop:’ as prefix. Defaults to ‘tag’.
plot_stats (bool, optional) – Whether to return network stats. Defaults to False.
save_as_gexf (bool, optional) – If given any string the network gets saved as Gephi file with the string as filename.
- compare_annotation_collections(annotation_collections, color_col='tag')#
Plots annotations of multiple annotation collections of the same texts as line plot.
- Parameters
annotation_collections (list) – A list of annotation collection names.
color_col (str, optional) – Either ‘tag’ or one property name with prefix ‘prop:’. Defaults to ‘tag’.
- Raises
ValueError – If one of the annotation collection’s names does not exist.
- Returns
Plotly Line Plot.
- Return type
go.Figure
- get_iaa(ac1_name, ac2_name, tag_filter=None, filter_both_ac=False, level='tag', include_empty_annotations=True, distance='binary', return_as_dict=False)#
Computes Inter Annotator Agreement for 2 Annotation Collections. See the demo notebook for details.
- Parameters
ac1_name (str) – AnnotationCollection name to be compared.
ac2_name (str) – AnnotationCollection name to be compared with.
tag_filter (list, optional) – Which Tags should be included. If None all are included. Default to None.
filter_both_ac (bool, optional) – Whether the tag filter shall be aplied to both annotation collections. Defaults to False.
level (str, optional) – Whether the Annotation Tag or a specified Property should be compared. Defaults to ‘tag’.
include_empty_annotations (bool, optionale) – If
False
only annotations with a overlapping annotation in the second collection get included. Defaults to True.distance (str, optional) – The IAA distance function. Either ‘binary’ or ‘interval’. See the NLTK API for further informations. Defaults to ‘binary’.
- gamma_agreement(annotation_collections, alpha=3, beta=1, delta_empty=0.01, n_samples=30, precision_level=0.01)#
- pygamma_table(annotation_collections='all')#
Concatenates annotation collections to pygamma table.
- Parameters
annotation_collections (Union[str, list], optional) – List of annotation collections. Defaults to ‘all’.
- Returns
Concatenated annotation collections as pd.DataFrame in pygamma format.
- Return type
pd.DataFrame
Examples#
Load a local CATMA project#
If you load a CATMA project you already have cloned only the project’s name and its location are required to load the project:
project = gitma.CatmaProject(
project_name='DemoProject',
projects_directory='../user_projects/'
)
Adding the paramter included_acs
, excluded_acs
and ac_filter_keyword
, you can select which annotation collections
get loaded.
Assuming you have a project with three annotation collections named ‘ac1’, ‘ac2’ and ‘AC3’ you can select the two first annotation
collections by any of these methods:
# option 1
project = gitma.CatmaProject(
project_name='DemoProject',
projects_directory='../user_projects/',
included_acs=['ac1', 'ac2']
)
# option 2
project = gitma.CatmaProject(
project_name='DemoProject',
projects_directory='../user_projects/',
excluded_acs=['AC3']
)
# option 3
project = gitma.CatmaProject(
project_name='DemoProject',
projects_directory='../user_projects/',
ac_filter_keyword='ac'
)
Load a remote CATMA project#
If you load your project from the CATMA gitlab three further paramters are required:
project = gitma.CatmaProject(
project_name='DemoProject',
load_from_gitlab=True,
gitlab_access_token='<your_access_token>',
backup_directory='../user_projects/'
)
By loading a remote CATMA project it will be cloned in the backup_directory
.
After you loaded a CATMA project in this directory once you have to load this project as a local project, as demonstrated above.
Plot a cooccurrence network for the annotations in your project#
You can plot coocurrent annotations:
project.cooccurrence_network()
You can customize your network by the following parameters:
project.cooccurrence_network(
annotation_collections=[ # define the included annotation collections
'<your_first_annotation_collection>',
'<your_second_annotation_collection>'
],
level='prop:<your_property>', # set a property as level
character_distance=50, # define which distance is considered cooccurrent
included_tags=None, # define a list with tags included
excluded_tags=None, # define a list with tags excluded
save_as_gexf='my_gephi_file' # save your network as Gephi file
)