---
title: "Data loading methods for the R package glatos"
date: "Updated: `r Sys.Date()`"
output: 
  rmarkdown::html_document:
    toc: true
    toc_depth: 3
    number_sections: true
    toc_float: true
    toc_collapsed: false
  rmarkdown::pdf_document:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Data loading methods for the R package glatos}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

#set 'str' options to desired output format
str_opts <- getOption("str") #get list of options
str_opts$strict.width = "wrap"
str_opts$vec.len = 1
options(str = str_opts)

#set 'width'
options(width = 85)

```

\pagebreak

# Overview

This vignette describes methods for loading data into the R package 
*glatos*. Sections are organized by data type (*detections*, 
*receiver locations*, etc), and each section contains examples for:

1. data in standardized formats from the Great Lakes Acoustic Telemetry 
Observation System (GLATOS), the Ocean Tracking Network (OTN), and VEMCO using 
built-in data loading functions; and

2. data in non-standard formats that require loading using non-*glatos* 
functions and modification to meet *glatos* requirements.


## Loading data from GLATOS, OTN, and VEMCO

The *glatos* package contains five functions (see [Built-in
functions](#built-in-functions)) designed to load data files in standardized 
formats from the GLATOS, the OTN, and VEMCO. Each data loading function:  

1. loads data into an R session consistently and efficiently using the best
available methods and  

2. returns an object that meets the requirements of *glatos* package functions.  

Thus, using *glatos* load functions ensures that resulting data conform to the
requirements of other functions in the package (e.g., *summarize_detections*,
*detection_bubble_plot*) and relieves users from the work of reformatting their
data to meet requirements of each specific function.


### Built-in functions

The *glatos* package includes five data loading functions:
    
- _**read_glatos_detections**_  
 : reads detection data from a comma-separated-values 
         text file obtained from the GLATOS Data Portal and returns an object of class 
          *glatos_detections* that is also a *data.frame*. 
  
- _**read_otn_detections**_
 : reads detection data from a comma-separated-values 
      text file obtained from the Ocean Tracking Network and returns an object of 
      class *glatos_detections* that is also a *data.frame*.
  
- _**read_glatos_receivers**_
 : reads receiver data from a comma-separated-values 
      text file obtained from the GLATOS Data Portal and returns an object of 
      class *glatos_receivers* that is also a *data.frame*.
  
- _**read_glatos_workbook**_
 : reads data from a GLATOS project-specific 
      MS Excel workbook (\*.xlsm file) and returns a list of class 
      *glatos_workbook* with two-elements; one of class *glatos_receivers* and 
      one of class *glatos_animals* (both are also *data.frame*s).
  
- _**read_vemco_tag_specs**_
 : reads tag specification data from an MS Excel 
      workbook (\*.xls file) provided by VEMCO and returns a list with two 
      elements; one containing tag specifications and one containing tag 
      operating schedules. (both are also *data.frame*s).
  
### Data objects and classes

Most of the functions listed above return an object with a *glatos*-specific S3
class name (e.g., *glatos_detections*) in addition to a more general class
(e.g., *data.frame*). Currently, no methods exist for *glatos* classes and such
classes are not explicitly required by any function, so *glatos* classes can
merely be thought of as labels showing that the objects were produced by a
*glatos* function and will therefore be compatible with other *glatos*
functions. Beware, as with any S3 class, that it is possible to modify a
*glatos* object to the point that is will no longer be compatible with *glatos*
functions. The [Data Requirements vignette](data_requirements.html) provides 
an overview of data requirements of *glatos* functions.

## Loading data from other sources

To use *glatos* functions with data that are not in one of the standard formats 
described above, those data will need to be:  

1. loaded into R using some other function (e.g., *read_csv*) and 

2. modified to ensure that all requirements of the desired function are met. 

Strictly speaking, there are no requirements of the package as a whole, but 
input data are checked within each individual function to determine if 
requirements are met. Nonetheless, the 
[Data Requirements vignette](data_requirements.html) provides a set of data 
requirements, including column names, types, and formats, that will ensure 
compatibility with all *glatos* functions. 

For each data type (e.g., *detection*, *receiver location*, etc), this 
vignette shows how data from a comma-separated-values text file can be 
loaded into R using non-*glatos* functions and then modified to meet the 
*glatos* requirements.


## Tips to improve speed and efficiency

The main examples in the this vignette use only base R functions. However, there
are many contributed packages that can provide functions that can improve
workflow speed and efficiency. After most examples using base R functions, boxed
examples are also given to expose users to alternative examples using functions
from contributed packages. Most boxed tips show use of functions from the
*data.table* package. If you are not familiar with *data.table*, then read
through the introductory vignette (see `vignette("datatable-intro", package =
"data.table")`). For more on *data.table*, see the vignettes
(`browseVignettes("data.table")`). In addition to *data.table*, one boxed tip
draws from the *lubridate* package because it is a fast way to coerce timestamps
strings to the date-time class *POSIXct*.  

A future version of this vignette may include examples using other packages such
as the *tidyverse*.

A few notes about boxed-example code:  

1. Make sure the relevant packages are installed and attached:  

> ```{r results = "hide", warning = FALSE, message = FALSE}
> #install.packages("data.table")
> library(data.table)
> 
> #install.packages("lubridate")
> library(lubridate)
> ```

2. The detection data object has been named differently in the box examples than
the data.frame version (`dtc2` for data.table vs. `dtc` for data.frame) so that
both sets of code can be run without the box example code interacting with the
base R examples. However, the box examples do require that all base R code that
comes before it has also been run.



# Detection data 

## Requirements

*glatos* functions that accept detection data as input will typically require a 
*data.frame* with the following seven columns:

- detection\_timestamp\_utc 
- receiver\_sn 
- deploy\_lat 
- deploy\_long 
- transmitter\_codespace 
- transmitter\_id 
- sensor\_value 
- sensor\_unit 
- animal\_id  

 Some functions will also require at least one categorical column to 
 identify location (or group of locations). These can be specified by the user, 
 but examples of such columns in a GLATOS standard detection file are:
 
- glatos\_array 
- station 
- glatos\_project\_receiver 

For definitions of any of the above fields, see the [Data Requirements
vignette](data_requirements.html)) and function-specific help files (e.g.,
`?summarize_detections`).

Any *data.frame* that contains the above columns (in the correct formats) should
be compatible with all *glatos* functions that accept detection data as input.
Use of the data loading functions *read_glatos_detections* and
*read_otn_detections* will ensure that these columns are present and formatted
correctly, but can only be used on data in GLATOS and OTN formats. Data in other
formats will need to be loaded using other functions (e.g., *read.csv*,
*fread*, etc.) and carefully checked for compatibility with *glatos* functions
(see [Other formats - CSV file exported from a VUE database](#other-formats---csv-file-exported-from-a-vue-database)).

## Examples 

### Loading GLATOS data

The *read_glatos_detections* function reads detection data from a standard
detection export file (\*.csv file) obtained from the GLATOS [Data
Portal](https://glatos.glos.us/portal) and checks that the data meet
*glatos* package requirements. Data are read using *fread* in the
*data.table* package, timestamps are formatted as class *POSIXct* and dates are
formatted as class *Date*.

First, we will use *system.file* to get the path to the *walleye_detections.csv* 
file included in the *glatos* package.

```{r echo=TRUE}
# Set path to walleye_detections.csv example dataset
wal_det_file <- system.file("extdata", "walleye_detections.csv", 
                            package = "glatos")
```


Next, we will load data from *walleye_detections.csv* using 
*read_glatos_detections*.  

```{r message = FALSE}
# Attach glatos package
library(glatos)

# Read in the walleye_detections.csv file using `read_glatos_detections`
walleye_detections <- read_glatos_detections(wal_det_file)
```

Let's view the structure of the resulting data frame (we've modified the `str` 
default arguments to show only first record in each column).  

```{r echo=TRUE}
# View the structure and data from first row
str(walleye_detections)
```

The result is an object with 30 columns and two classes: *glatos_detections* and
*data.frame*. The *glatos_detections* class label indicates that the data set
was created using a *glatos* load function and therefore should meet
requirements of any *glatos* function that accepts detection data as input. See
the [Data Requirements vignette](data_requirements.html)) for field definitions.



### Loading OTN data 

The *read_otn_detections* function reads in detection data (\*.csv files) 
obtained from the Ocean Tracking Network and reformats the data to meet
requirements of *glatos* functions. Data are read using 
*fread* in the *data.table* package, timestamps are 
formatted as class *POSIXct* and dates are formatted as class *Date*.

```{r echo=TRUE}
# Set path to blue_shark_detections.csv example dataset
shrk_det_file <- system.file("extdata", "blue_shark_detections.csv",
                             package = "glatos")

# Read in the blue_shark_detections.csv file using `read_otn_detections`
blue_shark_detections <- read_otn_detections(shrk_det_file)


# View the structure of blue_shark_detections
str(blue_shark_detections)
```

The resulting object has 34 columns, many of which are not present in the 
GLATOS standard format. However, some columns have been modified to meet
*glatos* requirements, and thus, the *glatos_detections* class name
has been added.


### Other formats - CSV file exported from a VUE database

Detection data in any format than GLATOS or OTN will need to be  
modified to meet the requirements of *glatos* functions. Here, we show an 
example using detection data that have been exported from a VEMCO VUE database. 
There is currently no *glatos* function to load detection data directly into 
an R session from VUE software, so data in that format will need to be:  

1. loaded into R using some other function (e.g., *read_csv*) and 

2. modified to ensure that all requirements of the desired function are met. 

In the example below, we will use the base R functions *read.csv* and
*as.POSIXct* to load detection data from a csv file and reformat the data to
be consistent with the schema described above. Tip boxes will also show
alternatives (simpler and/or faster) for these methods using functions in the
*data.table* and *lubridate* packages.

First, get the path to a file (\*.csv) that
contains detection data exported from VEMCO VUE software. Such a file
is included in the *glatos* package.   

```{r, echo=TRUE}
#get path to example CSV file included with glatos package
csv_file <- system.file("extdata", "VR2W_109924_20110718_1.csv",
                        package = "glatos")
```

Now that we have the path to a VUE export file, we will read the data using 
*read.csv*. In this case we are also setting some *read.csv* arguments 
to non-default values. First, we set `as.is = TRUE` so that character values
are treated as characters and not converted to factors. Second, we 
set `check.names = FALSE` to prevent conversion of syntactically-invalid 
column names to syntactically valid names. This simply keeps the names 
exactly as they appear in the source text file rather than, for example, 
replacing spaces with a dot (`.`). This does mean that we need to wrap those column 
names in back-ticks when called (e.g., ``dtc$`Sensor Value` ``). Third, 
we set `fileEncoding = "UTF-8-BOM"` to match the encoding of the text file. 
If this argument is omitted then you might see special characters
added to the first column name. Setting the *fileEncoding* may also slow 
down the import.

```{r}
dtc <- read.csv(csv_file, as.is = TRUE, check.names = FALSE, 
                fileEncoding = "UTF-8-BOM")

```

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip:__ Use _fread_ instead of _read.csv_.  

> ``` {r eval = FALSE}  
> library(data.table)
>
> #read data from csv file using data.table::fread
> #name dtc2 (data.table) to distinguish from dtc (data.frame) in this script
> dtc2 <- fread(csv_file, sep = ",", header = TRUE, fill = TRUE)
>
>
> Note also that we use `fill = TRUE` because by default VEMCO VUE 
> exports do not include a comma for every column in the CSV file.
> ```  
> 
> _fread_ is fast. That's one reason it is used used by _read_glatos_detections_
> and other _glatos_ functions.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  


Now we will reformat to be consistent with a *glatos_detections* object. We 
will do this for each of the required columns described above.

#### _**detection_timestamp_utc**_

Change the column name from *Date and Time (UTC)* to *detection_timestamp_utc*.
There are many ways to do this (e.g., reference columns by number;
`names(dtc)[1] <- "detection_timestamp_utc"`) but in the code below use of
`match()` to get the column number is robust to changes in column order.

```{r}
#change column name
names(dtc)[match("Date and Time (UTC)", names(dtc))] <- "detection_timestamp_utc"
```


> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip:__ Use _setnames_ to change column names.  

> ``` {r eval = FALSE} 
> #use data.table::setnames to change column names via old and new names
> setnames(dtc2, "Date and Time (UTC)", "detection_timestamp_utc") 
> ```  
> *setnames(x, old, new)*... could it be more intuitive?
> 
> Notice that there are no assignment operators (_<-_ or _=_) in this code. This 
> is because _setnames_, like other _data.table_ functions, updates the target 
> object (in this case _dtc_) directly (aka: _by reference_).  
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  


Finally, we format the timestamp column using base R function *as.POSIXct*.
All *POSIXct* objects are stored internally as a number representing the number
of elapsed seconds since "1970-01-01 00:00:00" in UTC. When we convert a
character string to *POSIXct*, we need to tell R *how* to convert it--namely the
time zone of the input data. By default, *as.POSIXct* will assume your local
system time zone (e.g., the one returned by `Sys.timezone()`. To prevent
timezone errors, *always* specify time zone (using the *tz* argument) whenever
you coerce any timestamp to *POSIXct*. In this case, the timestamps were
exported from VUE in UTC, so we use the following:

```{r}

dtc$detection_timestamp_utc <- as.POSIXct(dtc$detection_timestamp_utc,
                                          tz = "UTC")

#take a peek
str(dtc$detection_timestamp_utc)

#first few records
head(dtc$detection_timestamp_utc, 3)
```

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip:__ Use _:=_ to add or modify a column.  

> _:=_ is an assignment operator for _data.table_ objects that assigns objects 
> by reference. We use it because it is more compact than base R methods. 
>
> ``` {r eval = FALSE} 
> #use ':=' to format timestamps
> dtc2[ , detection_timestamp_utc := as.POSIXct(detection_timestamp_utc,
>                                              tz = "UTC")]
> ```  
> 
> Note that, like in the previous boxed tip, there are no assignment operators 
> _<-_ or _=_ because _dtc_ is updated by reference.  
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  


> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __lubridate tip:__ Use _fast\_strptime_ to set timestamps.  

> ``` {r eval = FALSE} 
> #just for this example, we convert detection_timestamp_utc back to string
> dtc2$detection_timestamp_utc <- format(dtc2$detection_timestamp_utc)
> class(dtc2$detection_timestamp_utc) #now character string again
>
> #fast_strptime is the fastest way we know to ceorce strings to POSIX
> dtc2[ , detection_timestamp_utc :=  
>          lubridate::fast_strptime(detection_timestamp_utc,
>                                   format = "%Y-%m-%d %H:%M:%OS",
>                                   tz = "UTC",
>                                   lt = FALSE)]
> ```  
> _fast_strptime_ requires a bit more code because we have to specify 
> `format` and set `lt = FALSE` so that _POSIXct_ is returned instead of 
> the default (_POSIXlt_) for this function.
>
> Notice that we formatted the timestamps using _fast_strptime_ but also 
> used _data.table_'s _set_ operator (_:=_) to assign it to the target column.  
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  


#### _**receiver_sn**_

There is no single column in the VUE export data with receiver serial number, 
so we need to extract it from the *Receiver* column. 

```{r}
dtc$Receiver[1]
```

To do this, we will write 
a function (*get_rsn*) to extract the second element of the 
hyphen-delimited string in the *Receiver* column using the base R function 
*strsplit*. We then use the *sapply* function to "apply" our custom function 
to each element (record) of the *Receiver* column.

```{r}
#make new function to extract second element from a hyphen-delimited string
get_rsn <- function(x) strsplit(x, "-")[[1]][2]

#apply get_rsn() to each record in Receiver column
dtc$receiver_sn <- sapply(dtc$Receiver, get_rsn)
```


> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip:__ Use _by_ argument in _data.table_ to update a column 
> by groups.  

> ``` {r eval = FALSE} 
> #make new column "receiver_sn"; parse from "Receiver"
> dtc2[ , receiver_sn := get_rsn(Receiver), by = "Receiver"]
> ``` 
> 
> This is more efficient because it operates on groups instead of each 
> individual record.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  


#### _**deploy_lat**_ and _**deploy_long**_

The *Latitude* and *Longitude* values are all zero in this data set because the
VUE database from which these data were exported did not contain any latitude or
longitude data. To add those data to these detections, we will make a new data
frame containing *latitude* and *longitude* data along with other receiver data
and merge the new receiver data with the detection data.

The code below shows a simple left join on *receiver_sn*, which assigns 
the same receiver location data to all detection records on that receiver 
without time consideration. 


```{r}
#make an example receiver data frame
rcv <- data.frame(
        glatos_array = "DWM",
        station = "DWM-001", 
        deploy_lat = 45.65738, 
        deploy_long = -84.46418, 
        deploy_date_time = as.POSIXct("2011-04-11 20:30:00", tz = "UTC"),
        recover_date_time = as.POSIXct("2011-07-08 17:11:00", tz = "UTC"),
        ins_serial_no = "109924",
        stringsAsFactors = FALSE) 

#left join on receiver serial number to add receiver data to detections
dtc <- merge(dtc, rcv, by.x = "receiver_sn", by.y = "ins_serial_no", 
             all.x = TRUE)

# take a look at first few rows
head(dtc, 3)
```

Note that new columns have been added to *dtc*, including *deploy_lat*, 
*deploy_long*, and two columns (*glatos_array* and *glatos_station*) that could 
serve as optional location grouping variables. Columns *deploy_date_time* and 
*recover_date_time* (*POSIXct* objects) are not required columns, 
but are useful for removing detections that occurred before receiver deployment 
or after recovery. 

Two limitations of this simple join shown above are that it:

* is inadequate if any receiver is deployed at more than one location.
* includes detections that occurred before receiver deployment and after 
receiver recovery.

To account for those situations and ensure that detections are correctly
associated with a location, we will subset detections to omit any that occurred
before deployment or after recovery. For convenience, we use the base R function
*with* so that we do not have to repeatedly call *dtc$...* but note that this
can be somewhat risky (see `?with`).

```{r}
#count rows before subset
nrow(dtc)

#subset deployments between receiver deployment and recovery (omit others)
dtc <- with(dtc, dtc[detection_timestamp_utc >= deploy_date_time & 
                     detection_timestamp_utc <= recover_date_time, ])

#count rows after subset
nrow(dtc)
```
We removed five rows. Those detections either occurred before receiver
deployment or after receiver recovery, so the location of those detections are
unknown.

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip:__ Use _between_ to subset records by intervals.

> ``` {r eval = FALSE} 
> #merge detections and receivers
> dtc2 <- merge(dtc2, rcv, by.x = "receiver_sn", by.y = "ins_serial_no", 
>                all.x = TRUE)
>
> #subset detections that occurred after deployment and before recovery
> dtc2 <- dtc2[between(detection_timestamp_utc, deploy_date_time,
>                      recover_date_time)]
> 
> nrow(dtc2)
> ``` 
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  





#### _**transmitter_codespace**_ and _**transmitter_id**_

There is no single column in the VUE export data with transmitter code space or
transmitter ID code, so we need to extract them from the *Transmitter* column.
Like we did with *receiver_sn*, we'll make new functions to extract the 
id and codespace from each record, then use *sapply* to "apply" each of 
those functions to each record in *Transmitter*. Note that the codes space 
requires an extra step, after we split the string on "-" we then paste the 
first and second back together to create the code space string.

```{r}
#make a new function to extract id from Transmitter
#i.e., get third element of hyphen-delimited string
parse_tid <- function(x) strsplit(x, "-")[[1]][3]

#make a new function to extract codespace from Transmitter
#i.e., get first two elements of hyphen-delimited string
parse_tcs <- function(x) {
    #split on "-" and keep first two extracted elements
    tx <- strsplit(x, "-")[[1]][1:2]
    #re-combine and separate by "-"
    return(paste(tx[1:2], collapse = "-"))
  }

#apply parse_tcs() to Transmitter and assign to transmitter_codespace
dtc$transmitter_codespace <- sapply(dtc$Transmitter, parse_tcs)

#apply parse_tid() to Transmitter and assign to transmitter_id
dtc$transmitter_id <- sapply(dtc$Transmitter, parse_tid)

```

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip:__ Use the _functional form_ of _:=_ to add/modify more
> than one column.

> ``` {r eval = FALSE} 
> dtc2[ , `:=`(transmitter_codespace = parse_tcs(Transmitter),                    
>              transmitter_id = parse_tid(Transmitter)), 
>        by = "Transmitter"]  
> ```
>
> See `?data.table::set` for description and examples of functional form of 
> `:=`.  
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  



#### _**sensor_value**_ and _**sensor_unit**_

Change the column names from *'Sensor Value'* and *'Sensor Unit'* to 
*sensor_value* and *sensor_unit*.

```{r}
#change column name
names(dtc)[match(c("Sensor Value", "Sensor Unit"), names(dtc))] <- 
                 c("sensor_value", "sensor_unit")
```


> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip:__  Use _setnames_ to change multiple column names.

> ``` {r eval = FALSE} 
> setnames(dtc2, c("Sensor Value", "Sensor Unit"),
>                c("sensor_value", "sensor_unit"))  
> ```
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  



```{r}
str(dtc)
```

#### _**animal_id**_

The *animal_id* column was not included in the VUE database and will need to
come from another source. If no tags were re-used, then a simple solution might
be to create a new column and assign it the values of *Transmitter* column. In
this example, however, we will make a new data frame containing *animal* data
and merge it with detection data.

The code below shows a simple left join on *transmitter_codespace* and
*transmitter_id*, which assigns the same receiver location data to all
detection records of each transmitter without time consideration. 

```{r}
#make an example animal (fish) data frame
fsh <- data.frame(
        animal_id = c("1", "4", "7", "128"), 
        tag_code_space = "A69-1601",
        tag_id_code = c("439", "442", "445", "442"), 
        common_name = "Sea Lamprey", 
        release_date_time = as.POSIXct(c("2011-05-05 12:00", 
                                         "2011-05-05 12:00", 
                                         "2011-05-06 12:00", 
                                         "2011-06-08 12:00"), 
                                       tz = "UTC"),
        recapture_date_time = as.POSIXct(c(NA, "2011-05-26 15:00", NA, NA),
                                         tz = "UTC"), 
        stringsAsFactors = FALSE)

#simple left join on codespace and id
dtc <- merge(dtc, fsh, by.x = c("transmitter_codespace", "transmitter_id"), 
                       by.y = c("tag_code_space", "tag_id_code"),
                       all.x = TRUE)
          
```

Two limitations of this simple join are that it:

* is inadequate if any transmitter was deployed more than once.
* includes detections that occurred before transmitter deployment (animal 
  release) and after transmitter recovery (animal recapture).
  
To account for these circumstances, we will perform a conditional subset in the
next step. Specifically, one tag (`tag_id_code = 442`) was re-used, but we did
not account for this in the above merge (a simple left join). So we now need to
subset to omit detections that occurred before release or after recapture.

```{r}
#count rows before subset
nrow(dtc)

#subset detections to include only those between release and recapture 
# or after release if never recaptured
dtc <- with(dtc, dtc[detection_timestamp_utc >= release_date_time & 
                     (detection_timestamp_utc <= recapture_date_time |
                        is.na(recapture_date_time)) , ])
```

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
> __data.table tip__  Use _between_ to query records or evaluate statements 
> by intervals.

> ``` {r eval = FALSE} 
> #merge detection with fish data
> #note that allow.cartesian is needed to acknowledge one-to-many join
> dtc2 <- merge(dtc2, fsh, 
>                by.x = c("transmitter_codespace", "transmitter_id"), 
>                by.y = c("tag_code_space", "tag_id_code"),
>                all.x = TRUE, allow.cartesian = TRUE)
> 
> #subset detections between release and recapture
> dtc2 <- dtc2[between(detection_timestamp_utc, release_date_time,
>                    recapture_date_time) | is.na(recapture_date_time), ] 
> ```
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  

```{r}
#count rows after subset
nrow(dtc)
```


We should now have a detection dataset that will meet the requirements of 
*glatos* functions.


# Receiver location data

## Requirements

*glatos* functions that accept receiver location data as input will
typically require a *data.frame* with one or more of the following
columns:  

- deploy\_lat 
- deploy\_long 
- deploy\_date\_time 
- recover\_date\_time 

Some functions will also require at least one categorical column to 
identify location (or group of locations). These can be specified by the user, 
but examples of such columns in a GLATOS standard receiver locations file are:
 
- glatos\_array 
- station 
- glatos\_project\_receiver 

For definitions of any of the above fields, see the [Data Requirements
vignette](data_requirements.html) and function-specific help files (e.g.,
`?abacus_plot`).

Any *data.frame* that contains the above columns (in the correct formats) should
be compatible with all *glatos* functions that accept receiver data as input.
Use of the data loading function *read_glatos_receivers* 
 will ensure that these columns are present and formatted
correctly, but can only be used on data in GLATOS format. Data in other
formats will need to be loaded using other functions (e.g., *read.csv*,
*fread*, etc.) and carefully checked for compatibility with *glatos* functions
(see [Other formats - CSV file exported from a VUE database](#other-formats---csv-file-exported-from-a-vue-database)).


## Examples 

### Loading GLATOS data (entire network)

The *read_glatos_receivers* function reads in receiver location data obtained
from the GLATOS [Data Portal](https://glatos.glos.us/portal) and checks that the
data meet requirements of *glatos* functions. Data are read using *fread* in the
*data.table* package, timestamps are formatted as class *POSIXct*.

We will get the path to the *sample_receivers.csv* (example included in the 
*glatos* package) using *system.file*, then read the data using 
*read_glatos_receivers*, and view the structure of the result.

```{r echo=TRUE}
#get path to example receiver_locations file
rec_file <- system.file("extdata", "sample_receivers.csv", 
                        package = "glatos")

#read sample_receivers.csv using 'read_glatos_receivers'
rcv <- read_glatos_receivers(rec_file)

#view structure
str(rcv)
```

The result is an object with `r ncol(rcv)` columns (including the required
columns described above) and two classes: *glatos_receivers* and *data.frame*.
The *glatos_receivers* class label indicates that the data set was created using
a *glatos* load function and therefore should work with any *glatos* function
that accepts receiver data as input.


### Loading GLATOS data (single project workbook)

The *read_glatos_workbook* function reads in receiver location data from a
standard GLATOS project workbook (\*.xlsm file) and checks that the data meet
requirements of *glatos* functions. Data are read using *read_excel* in the
*readxl* package, timestamps are formatted as class *POSIXct*.

We will get the path to the *walleye_workbook.xlsm* (example included in the 
*glatos* package) using *system.file*, then read the data using 
*read_glatos_workbook*, and view the structure of the result.

```{r echo=TRUE}
#get path to example walleye_workbook.xlsm file
wb_file <- system.file("extdata", "walleye_workbook.xlsm", 
                        package = "glatos")

#read walleye_workbook.xlsm using 'read_glatos_workbook'
wb <- read_glatos_workbook(wb_file)

#view structure
class(wb)
names(wb)
```

The result is a *list* (also a *glatos_workbook*) object with three elements
containing data about the project and the data file (*metadata*), the fish that
were tagged and released (*animals*), and the receivers (*receivers*). The
*receivers* element is actually the result of merging two sheets in the source
file: *deployments* and *recoveries*. Next, we will extract the receiver element
from the workbook object and view its structure.

```{r}
#extract receivers element from workbook list
rcv2 <- wb[["receivers"]]

#view structure
str(rcv2)

```

The result contains `r ncol(rcv2)` columns and two classes:
*glatos_receivers* and *data.frame*. Despite some differences between the
structure of this project-specific data object and the network-level data object
loaded in the previous example, both have been minimally modified to meet the
requirements of any *glatos* function that accepts receiver data as input.


### Other formats 

Receiver location data in any format than one of the GLATOS standards will need
to be:

1. loaded into R using some other function (e.g., *read_csv*, *fread*, etc) and  

2. modified to ensure that all requirements of the desired function are met.  

This vignette does not include an example of receiver location data loaded from
CSV because the methods would be very similar to those described above. For
example, you might step through each required column described in the [Data
Requirements vignette](data_requirements.html)), check that each column meets
*glatos* requirements, and modify accordingly using methods described above for
detection data from a CSV file exported from VUE (see [Other formats - CSV file
exported from a VUE
database](#other-formats---csv-file-exported-from-a-vue-database))


# Animal tagging and biological data

## Requirements

There are currently no *glatos* functions that require animal tagging and
biological data other than those columns present in the required *detection*
data. Therefore, there are no formal requirements of such data in the package.
Nonetheless, the *read_glatos_workbook* function can be used to facilitate
loading animal tagging and biological data from a standard GLATOS project
workbook (\*.xlsm file) into an R session. 

Use of the data loading function *read_glatos_workbook* will ensure that animal
data are loaded efficiently and consistently among users, but can only be used
on data in GLATOS format. Data in other formats will need to be loaded using
other functions (e.g., *read.csv*, *fread*, etc.). Although there are currently
no *glatos* requirements of animal data, any future requirements might be
expected to be consistent with the *glatos_animals* class.

## Examples 

### Loading GLATOS data (single project workbook)

The *read_glatos_workbook* function reads animal tagging and biological data
from a standard GLATOS project workbook (\*.xlsm file; *tagging* sheet). Data
are read using *read_excel* in the *readxl* package, timestamps are
formatted as class *POSIXct*.

We will again use data from the same *walleye_workbook.xlsm* example file used in the previous section (see data loading steps above), but will extract the *animals* element
and view its structure.

```{r}
#extract animals element from workbook list
fsh <- wb[["animals"]]

#view structure
str(fsh)

```

The result contains `r ncol(fsh)` columns and two classes:
*glatos_animals* and *data.frame*. 


### Other formats 

Receiver location data in any format than one of the GLATOS standards will need
to be:

1. loaded into R using some other function (e.g., *read_csv*, *fread*, etc) and  

2. modified to ensure that all requirements of the desired function are met.  

This vignette does not include an example of animal tagging and biological data loaded from
CSV because the methods would be very similar to those described above. Moreover, there are currently no *glatos* functions that require animal tagging and biological data other than those present in *glatos_detections* data. Although the *glatos* package currently does not contain any specific requirements of animal tagging and biological data, future requirements might be expected to resemble key columns of *glatos_animals* objects.



# Transmitter specification data

## Requirements

There are currently no *glatos* functions that require transmitter specification
data. Therefore, there are no formal requirements of such data in the package.
Nonetheless, the *read_vemco_tag_specs* function can be used to facilitate
loading transmitter specification data from a standard VEMCO tag spec (\*.xls)
file provided to tag purchasers from VEMCO.

Use of the data loading function *read_vemco_tag_specs* will ensure that
transmitter specificaiton data are loaded efficiently and consistently among
users, but can only be used on data in VEMCO standard format. Data in other
formats will need to be loaded using other functions (e.g., *read.csv*, *fread*,
etc.). Although there are currently no *glatos* requirements of transmitter
specification data, any future requirements might be expected to be consistent
with the output of *read_vemco_tag_specs*.

## Examples 

### Loading data from a VEMCO tag specs file

The *read_vemco_tag_specs* function reads transmitter specification data
from a standard VEMCO tag specs (\*.xls file). Data
are read using *read_excel* in the *readxl* package.

We will get the path to the *lamprey_tag_specs.xls* (example included in the 
*glatos* package) using *system.file*, then read the data using 
*read_vemco_tag_specs*, and view the structure of the result.

```{r echo=TRUE}
#get path to example lamprey_tag_specs.xls file
spec_file <- system.file("extdata", "lamprey_tag_specs.xls", 
                        package = "glatos")

#read lamprey_tag_specs.xls using 'read_vemco_tag_specs'
my_tags <- read_vemco_tag_specs(spec_file, file_format = "vemco_xls")

#view structure
class(my_tags)
names(my_tags)
```

The result is a *list* object with two elements containing data about the
transmitter specifications (*specs*) and the operating schedule (*schedule*).
Next, we will view the structure of each element, starting with *specs*.


```{r}
#view structure of specs element
str(my_tags$specs)

```

The result contains `r ncol(my_tags$specs)` columns of transmitter 
characteristics that do not change over time.


```{r}
#view structure of schedule element
str(my_tags$schedule)

```

The result contains `r ncol(my_tags$schedule)` columns of transmitter 
characteristics that change over time. These may be used to estimate the 
operating characteristics (e.g., *power*, *min_delay*, *max_delay*, etc.) on 
a specific date following activation or release.


### Other formats 

Transmitter specification and schedule data in any format than one of the GLATOS standards will need
to be:

1. loaded into R using some other function (e.g., *read_csv*, *fread*, etc) and  

2. modified to ensure that all requirements of the desired function are met.  

This vignette does not include an example of transmitter specification and schedule data loaded from
CSV because the methods would be very similar to those described above. Moreover, there are currently no *glatos* functions that require transmitter specification and schedule data other than those present in *glatos_detections* data. Although the *glatos* package currently does not contain any specific requirements of these data, future requirements might be expected to resemble key columns of the output of *read_vemco_tag_specs*.