| Title: | Convert Acoustic Telemetry Data Between Institutional Formats |
|---|---|
| Description: | Surimi takes as input data files representing acoustic telemetry, which may have column names and structures specific to particular institutions. It can convert, with minimal code on the user's part, data from one format to another, allowing data from one institution to be easily used across software packages that may expect different formats. |
| Authors: | Bruce Delo [aut, ctb, cre] |
| Maintainer: | Bruce Delo <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.0.0.1 |
| Built: | 2026-06-04 19:23:15 UTC |
| Source: | https://github.com/ocean-tracking-network/surimi |
This is the helper function that we use in the sapply when mutating the WORMS_species_aphia_id into existence.
get_aphiaid_from_lookup(sciname, lookup)get_aphiaid_from_lookup(sciname, lookup)
sciname |
A Scientific name as a string. |
lookup |
The named list containing key-value pairs of scientific names and aphiaIDs. |
Returns the appropriate aphiaID corresponding to the sciname.
Takes a column of scientific names and creates a lookup table (read: named list) of the unique scientific names against their aphia IDs. We can use worrms to query the WORMS REST service for the aphiaIDs, but doing it for every row is time intensive in a way we don't want. This way, we can create the lookup client-side and then do all the querying only as we need to.
get_unique_aphiaids(scinames)get_unique_aphiaids(scinames)
scinames |
A vector (dataframe column) containing the list of scientific names from a detection extract dataframe in Surimi. |
Returns a named list with the scientific name as the key and the aphiaID as the value.
Takes a GLATOS detection sheet and optionally receiver metadata and returns an ATO object.
glatos_to_ato(glatos_detections, glatos_receivers = "")glatos_to_ato(glatos_detections, glatos_receivers = "")
glatos_detections |
The dataframe containing detection information. |
glatos_receivers |
The dataframe containing receiver information. |
Returns an ATO object.
Takes a GLATOS workbook and returns an ATO object.
glatos_workbook_to_ato(glatos_workbook)glatos_workbook_to_ato(glatos_workbook)
glatos_workbook |
Path to the glatos workbook. |
Returns an ATO object.
In the same way that otn_imos_column_map takes OTN data and massages it into an IMOS-like format for REMORA, this function and its ilk take IMOS data (in this case, receiver metadata) and massage it into an OTN-like format, for the purposes of reporting and more general applicability within the OTN suite of programs.
imos_otn_column_map( det_dataframe, rcvr_dataframe = NULL, tag_dataframe = NULL, derive = TRUE )imos_otn_column_map( det_dataframe, rcvr_dataframe = NULL, tag_dataframe = NULL, derive = TRUE )
det_dataframe |
... |
rcvr_dataframe |
A dataframe containing IMOS receiver metadata. |
tag_dataframe |
... |
derive |
... |
A dataframe containing the above data in an OTN-like format.
In the same way that otn_imos_column_map takes OTN data and massages it into an IMOS-like format for REMORA, this function and its ilk take IMOS data (in this case, animal detections) and massages it into an OTN-like format, for the purposes of reporting and more general applicability within the OTN suite of programs.
imos_to_otn_detections(detection_dataframe, coll_code = NULL)imos_to_otn_detections(detection_dataframe, coll_code = NULL)
rcvr_dataframe |
A dataframe containing IMOS receiver metadata. |
A dataframe containing the above data in an OTN-like format.
In the same way that otn_imos_column_map takes OTN data and massages it into an IMOS-like format for REMORA, this function and its ilk take IMOS data (in this case, receiver metadata) and massage it into an OTN-like format, for the purposes of reporting and more general applicability within the OTN suite of programs.
imos_to_otn_receivers(rcvr_dataframe)imos_to_otn_receivers(rcvr_dataframe)
rcvr_dataframe |
A dataframe containing IMOS receiver metadata. |
A dataframe containing the above data in an OTN-like format.
In the same way that otn_imos_column_map takes OTN data and massages it into an IMOS-like format for REMORA, this function and its ilk take IMOS data (in this case, tag metadata and animal measurements data) and massage it into an OTN-like format, for the purposes of reporting and more general applicability within the OTN suite of programs.
imos_to_otn_tags(tag_dataframe, animal_measurements_dataframe)imos_to_otn_tags(tag_dataframe, animal_measurements_dataframe)
tag_dataframe |
A dataframe containing IMOS-formatted tag metadata. |
animal_measurements_dataframe |
A dataframe containing IMOS-formatted animal measurements data. |
A single dataframe containing the tag and measurement data combined into an OTN-like format.
Determine whether an input file is CSV or Parquet and pipe it into the correct mapping function. Hopefully this is all a stopgap until the typical format of OTN -> ATO and then ATO -> IMOS gets done, at which point those pipeline pieces will connect together. But for now, this will keep dependent software like Remora running.
map_otn_file(filename, derive = TRUE, coll_code = NULL)map_otn_file(filename, derive = TRUE, coll_code = NULL)
filename |
The path to the file to be processed. |
derive |
Passed through to the mapping functions; determines whether or not receiver and tag metadata will be derived from the detection extract or not. |
coll_code |
Passed through to the mapping functions; allows user to supply collectionCode if they are passing their own rcvr/tag metadata, which won't contain the collectionCode. |
The output of the appropriate mapping function.
Takes three dataframes in the OTN format- one for a detection extract, one for receiver deployment metadata, and one for tag metadata- and rearranges, renames, and creates columns until they can pass for IMOS-format dataframes. This allows us to pass the data directly into Remora without making substantial changes to how that code runs or what it looks for.
otn_imos_column_map( det_dataframe, rcvr_dataframe = NULL, tag_dataframe = NULL, derive = TRUE, coll_code = NULL, tagname_column = "tagname" )otn_imos_column_map( det_dataframe, rcvr_dataframe = NULL, tag_dataframe = NULL, derive = TRUE, coll_code = NULL, tagname_column = "tagname" )
det_dataframe |
The dataframe containing detection information. Most likely a detection extract. |
rcvr_dataframe |
The dataframe containing receiver information. |
tag_dataframe |
The dataframe containing tag information. |
derive |
An optional flag that allows the user to pass in fewer than all three files. If given, the code will use the detection extract dataframe to generate dataframes for either or both of the receiver and tag dataframes, if they are not passed in. Although this will result in missing information, it does let the user supply only a detection extract file, which is a situation some may find themselves in. |
coll_code |
The user-supplied collectioncode, which we'll use to populate the receiver_project_name and tagging_project_name columns in the receiver and tag metadata files respectively. We don't have a good way to associate the relevant info from the det extract to the appropriate columns in the rcvr/tag metadata, but those datasets are restricted to one collectioncode each, so we can just take it from the user at the time they run the code. |
tagname_column |
The name of the column that's equivalent to 'tagname', if the tagname column isn't present. Should only be necessary if deriving. |
Returns a list containing three approximately IMOS-formatted dataframes.
Takes three dataframes in the OTN format- one for a detection extract, one for receiver deployment metadata, and one for tag metadata- and rearranges, renames, and creates columns until they can pass for IMOS-format dataframes. This allows us to pass the data directly into Remora without making substantial changes to how that code runs or what it looks for. This is functionally identical to otn_imos_column_map() except that the column names on the OTN side reflect that the detection dataframe came from our new parquet format rather than our old CSV format.
otn_imos_new_style_column_map( det_dataframe, rcvr_dataframe = NULL, tag_dataframe = NULL, derive = TRUE, coll_code = NULL, tagname_column = "tagName", format = "parquet" )otn_imos_new_style_column_map( det_dataframe, rcvr_dataframe = NULL, tag_dataframe = NULL, derive = TRUE, coll_code = NULL, tagname_column = "tagName", format = "parquet" )
det_dataframe |
The dataframe containing detection information. |
rcvr_dataframe |
The dataframe containing receiver information. |
tag_dataframe |
The dataframe containing tag information. |
derive |
An optional flag that allows the user to pass in fewer than all three files. If given, the code will use the detection extract dataframe to generate dataframes for either or both of the receiver and tag dataframes, if they are not passed in. Although this will result in missing information, it does let the user supply only a detection extract file, which is a situation some may find themselves in. |
coll_code |
The user-supplied collectioncode, which we'll use to populate the receiver_project_name and tagging_project_name columns in the receiver and tag metadata files respectively. We don't have a good way to associate the relevant info from the det extract to the appropriate columns in the rcvr/tag metadata, but those datasets are restricted to one collectioncode each, so we can just take it from the user at the time they run the code. |
tagname_column |
The name of the column that's equivalent to 'tagname', if the tagname column isn't present. Should only be necessary if deriving. |
format |
Defaults to parquet. Since the column names are the same across parquet files and new-style CSV files, this function can handle both as long as it knows what it's getting. Calling map_otn_file will handle all the checking for the user, though. |
Returns a list containing three approximately IMOS-formatted dataframes.
Takes an OTN detection extract and optionally receiver/tag metadata and returns an ATO object.
otn_to_ato(otn_detections, otn_receivers = "", otn_tags = "")otn_to_ato(otn_detections, otn_receivers = "", otn_tags = "")
otn_detections |
The dataframe containing detection information. |
otn_receivers |
The dataframe containing receiver information. |
otn_tags |
The dataframe containing tag information. |
Returns an ATO object.
Take two parameters- an OTN detection extract and the output created by Remora on parsing that detection extract- and merge them back together such that the Remora QC columns are appended to the OTN extract, preserving appropriate ordering and getting back all the OTN data. This function exists because, to get OTN data into Remora, we have to cut it up until it looks like IMOS data (this problem was the genesis of Surimi, in fact). But that means the output from Remora has all IMOS-formatted columns and is missing some information, because we either had to discard it to get into IMOS format or because we can't re-synthesize it from what's in the IMOS files. However, we do have enough information to join the two tables, thereby obviating the data loss problem by taking us all the way back to the original data, with a little something extra attached.
rollup(detection_extract, remora_output)rollup(detection_extract, remora_output)
detection_extract |
Path to an OTN detection extract corresponding to the remora output in the second parameter. |
remora_output |
Path to Remora's QC output corresponding to the OTN detection extract in the first parameter. |
The OTN detection extract, but with the remora QC attached as appropriate.