AnnotationCollection#
- class AnnotationCollection(ac_uuid, catma_project, context=50)#
Bases:
object
Class which represents a CATMA annotation collection.
- Parameters
ac_uuid (str) – The annotation collection’s UUID
catma_project (CatmaProject) – The parent CatmaProject
context (int, optional) – The text span to be considered for the annotation context. Defaults to 50.
- Raises
FileNotFoundError – If the path of the annotation collection’s header.json does not exist.
- uuid: str#
The annotation collection’s UUID.
- projects_directory: str#
The directory where the parent project is located.
- project_uuid: str#
The parent project’s UUID.
- directory: str#
The annotation collection’s directory.
- name: str#
The annotation collection’s name.
- plain_text_id: str#
The UUID of the annotation collection’s document.
- text: gitma.text.Text#
The document of the annotation collection as a gitma.Text object.
- text_version: str#
The document’s version.
- tags: List[gitma.tag.Tag]#
List of tags found in the annotation collection as a list of gitma.Tag objects.
- annotations: list#
List of annotations in annotation collection as gitma.Annotation objects.
- df: pandas.core.frame.DataFrame#
Annotations as a pandas.DataFrame.
- to_list(tags=None)#
Returns list of annotations as dictionaries using the
Annotation.to_dict()
method.- Parameters
tags (Union[list, None]) – Tags included in the annotations list. If
None
all tags are included. Defaults to None.- Returns
List of annotations as dictionaries.
- Return type
List[dict]
- annotation_dict()#
Creates dictionary with UUIDs as keys an Annotation objects as values.
- Returns
Dictionary with UUIDs as keys an Annotation objects as values.
- Return type
Dict[str, Annotation]
- duplicate_by_prop(prop)#
Duplicates the rows in the annotation collection’s DataFrame if the given Property has multiple Property Values the annotations represented by a DataFrame row.
- Parameters
prop (str) – A property used in the annotation collection.
- Raises
ValueError – If the property has not been used in the annotation collection.
- Returns
A duplicate of the annotation collection’s DataFrame.
- Return type
pd.DataFrame
- push_annotations(commit_message='new annotations')#
Process
git add .
,git commit
andgit push
for a single annotation collection.Note: Works only if git is installed and the CATMA access token is stored in the git credential manager.
- Parameters
commit_message (str, optional) – Customize the commit message. Defaults to ‘new annotations’.
- plot_annotations(y_axis='tag', color_prop='tag')#
Creates interactive Plotly Scatter Plot to a explore a annotation collection.
- Parameters
y_axis (str, optional) – The columns in AnnotationCollection DataFrame used for y axis. Defaults to ‘tag’.
color_prop (str, optional) – A Property’s name used in the AnnotationCollection . Defaults to None.
- Returns
Plotly scatter plot.
- Return type
go.Figure
- filter_by_tag_path(path_element)#
Filters annotation collection data frame for annations with the given
path_element
in the tag’s full path.- Parameters
path_element (str) – Any tag name with the used tagsets.
- Returns
Data frame in the format of the annotation collection data frames.
- Return type
pd.DataFrame
- plot_scaled_annotations(tag_scale=None, bin_size=50, smoothing_window=100)#
Plots a graph with scaled annotations. This function is still under development.
- Parameters
tag_scale (dict, optional) – description. Defaults to None.
bin_size (int, optional) – description. Defaults to 50.
smoothing_window (int, optional) – description. Defaults to 100.
- Raises
Exception – description
- cooccurrence_network(character_distance=100, included_tags=None, excluded_tags=None, level='tag', plot_stats=False, save_as_gexf=False)#
Draws cooccurrence network graph where every tag is a node and every edge represents two cooccurent tags. You can by the
character_distance
parameter when two annotations are considered cooccurent. If you setcharacter_distance=0
only the tags of overlapping annotations will be represented as connected nodes.- Parameters
character_distance (int, optional) – In which distance annotations are considered coocurrent. Defaults to 100.
included_tags (list, optional) – List of included tags. Defaults to None.
level (str, optional) – Select ‘tag’ or any property in your annotation collections with the prefix ‘prop’.
excluded_tags (list, optional) – List of excluded tags. Defaults to None.
plot_stats (bool, optional) – Whether to return network stats. Defaults to False.
save_as_gexf (bool, optional) – If given any string as filename the network gets saved as Gephi file.
- to_pygamma_table()#
Returns the annotation collection’s DataFrame in the format pygamma takes as input.
- Returns
DataFrame with four columns: ‘annotator’, ‘tag’, ‘start_point’ and ‘end_point’.
- Return type
pd.DataFrame
- tag_stats(tag_col='tag', stopwords=None, ranking=10)#
Computes the following data for each tag in the annotation collection:
the count of annotations with a tag
the complete text span annotated with a tag
the average text span annotated with a tag
the n-most frequent token in the text span annotated with a tag
- Parameters
tag_col (str, optional) – Whether the data for the tag a property or annotators gets computed. Defaults to ‘tag’.
stopwords (list, optional) – A list with stopword tokens. Defaults to None.
ranking (int, optional) – The number of most frequent token to be included. Defaults to 10.
- Returns
The data as pandas DataFrame.
- Return type
pd.DataFrame
- property_stats()#
Counts for each property the property values.
- Returns
DataFrame with properties as index and property values as header.
- Return type
pd.DataFrame
- get_annotation_by_tag(tag_name)#
Creates list of all annotations with a given name.
- Parameters
tag_name (str) – The searched tag’s name.
- Returns
List of annotations as gitma.Annotation objects.
- Return type
List[Annotation]
- annotate_properties(tag, prop, value)#
Set value for given property. This function uses the
gitma.Annotation.set_property_values()
method.- Parameters
tag (str) – The parent tag of the property.
prop (str) – The property to be annotated.
value (list) – The new property value.
- rename_property_value(tag, prop, old_value, new_value)#
Renames Property of all annotations with the given tag name. Replaces only the property value defined by the parameter
old_value
.- Parameters
tag (str) – The tag’s name-
prop (str) – The property’s name-
old_value (str) – The old property value that will be replaced.
new_value (str) – The new property value that will replace the old property value.
- delete_properties(tag, prop)#
Deletes a property from all annotations with a given tag name.
- Parameters
tag (str) – The annotations tag name.
prop (str) – The name of the property that will be removed.
- to_stanford_tsv(tags='all', file_name='tsv_annotation_export', spacy_model='de_core_news_sm')#
Takes a CATMA
AnnotationCollection
and writes a tsv-file which can be used to train a stanford NER model. Every token in the collection’s text gets a tag if it lays in an annotated text segment.- Parameters
tags (Union[list, str], optional) – List of tags, that should be considered. If set to ‘all’ all annotations are included. Defaults to ‘all’.
file_name (str, optional) – name of the tsv-file. Defaults to ‘tsv_annotation_export’.
spacy_model (str, optional) – a spacy model as listed in https://spacy.io/usage/models. Default to ‘de_core_news_sm’.
- write_annotation_csv(tags='all', property='all', only_missing_prop_values=False, filename='PropertyAnnotationTable')#
Creates csv file to add propertiy values to existing annotations. The added property values can be imported with the
read_annotation_csv()
method.- Parameters
tags (Union[str, list], optional) – List of tag names to be included. If set to ‘all’ all annotations will be written into the csv file. Defaults to ‘all’.
property (str, optional) – The property to be included. If set to ‘all’ all annotations will be written into the csv file. Defaults to ‘all’.
only_missing_prop_values (bool, optional) – Whether only empy properties should be included. Defaults to False.
filename (str, optional) – The csv file name. Defaults to ‘PropertyAnnotationTable’.
- read_annotation_csv(filename='PropertyAnnotationTable.csv', push_to_gitlab=False)#
Reads csv file created by the
write_annotation_csv()
method and updates the annotation json files. Additionally, ifpush_to_gitlab=True
the annotations get imported in the CATMA Gitlab backend.- Parameters
filename (str, optional) – The annotation csv file’s name/directory. Defaults to ‘PropertyAnnotationTable.csv’.
push_to_gitlab (bool, optional) – Whether to push the annotations to gitlab. Defaults to False.
Examples#
Add property values via csv table#
The AnnotationCollection
class can be used to add property values to existing annotations using a csv table.
Step 1: Load your project and create a csv table to annotate the existing annotations/properties:
import gitma
project = gitma.CatmaProject(
project_name='<your_project_name>',
projects_directory='<your_projects_directory>'
)
project.ac_dict['<your_annotation_collection_name>'].write_annotation_csv(
filename='PropertyAnnotationTable' # default name
)
The method write_annotation_csv
creates a csv table in in this format:
id |
annotation_collection |
text |
tag |
property |
values |
---|---|---|---|---|---|
CATMA_95259D62-E441-4009-AD8D-7F5124BF2323 |
bettelweib-event_type |
Am Fuße der Alpen, bei Locarno im oberen Italien, befand sich ein Schloß |
stative_event |
characters |
marquis,marquise |
CATMA_AB9A223C-C6F7-495A-817F-ED57E1B69A70 |
bettelweib-event_type |
das man jetzt in Schutt und Trümmern liegen sieht |
process_event |
intentional |
yes |
CATMA_AB9A223C-C6F7-495A-817F-ED57E1B69A70 |
bettelweib-event_type |
das man jetzt in Schutt und Trümmern liegen sieht |
process_event |
characters |
|
CATMA_2A4C8A4E-2842-44D2-B2E2-F9A6AE2B8063 |
bettelweib-event_type |
wenn man vom St. Gotthard kommt |
non_event |
characters |
Step 2: Add property values:
For every property per annotation a table row will be created.
Caution
In these tables only the values column is editable!
If you want to add multiple values for a property seperate the values by a comma. It is recommended to use a csv editor like https://edit-csv.net/. Finish your annotations by saving the csv file.
Step 3: Load the added property values in your CATMA project
After you finished the property annotations within the csv file you can load the annotations to the CATMA gitlab.
project.ac_dict['<your_annotation_collection_name'].read_annotation_csv(
filename='PropertyAnnotationTable.csv', # default name
push_to_gitlab=True # default is False
)
Caution
The push to gitlab will only work if you have git installed and your CATMA access token is stored in the git credential manager.
Step 4 (optional): Import your annotation to CATMA
If push_to_gitlab=False
you can push the changed annotations to gitlab on your own.
To do so the read_annotation_csv
method will print the annotation collection’s directory.
Use the git bash or another terminal, go to the annotation collection’s directory and run:
git add .
git commit -m 'new property annotations'
git push origin HEAD:master