---
title: "SSM_fitting"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{SSM_fitting}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
require(dplyr)
require(patchwork)
require(aniMotum)
```

### Processing data
Fitting SSM's to tracking data in `aniMotum` is intended to be as simple as possible using `fit_ssm()`. Provided the data are in an expected format (see `vignette('Overview')`), `fit_ssm` can be invoked immediately to both process the data and fit the SSM. Within `fit_ssm`, data processing consists of:  

1. identifying the most appropriate measurement error model to use based on the format of the data;  
2. date formatting, ordering observations in time, flagging duplicate date-time observations, flagging observations occurring within a minimum time period (default = 60 s);  
3. projecting long,lat coordinates to Mercator x,y (in km) accounting for -180,180 crossing;  
4. using [`trip::sda`](https://CRAN.R-project.org/package=trip) to identify potentially extreme observations based on speed, distance, and angle (Freitas et al. 2008).  

Several arguments to `fit_ssm` provide a measure of control over data processing. The `vmax`, `ang` and `distlim` arguments provide control over the `trip::sda` filter (see `?fit_ssm` for details). The `sda` filter can occasionally fail (possibly when calculating angles between successive displacements), in this case we fall back to using `trip::speedfilter` and a warning issued. The `trip` filter can be turned off altogether with `spdf = FALSE`, although this is usually not advisable if working with older Least-Squares Argos data, the lower prevalence of extreme observations in newer Kalman filtered Argos data may not always require speed-filtering prior to fitting an SSM. The minimum time interval allowed between observations is set by `min.dt`. Finally, the processed data can be returned, e.g. for careful inspection, without fitting an SSM with the `pf = TRUE` argument:
```{r pre-processing, eval=FALSE, message=FALSE}
fit_ssm(ellie, 
        vmax = 3, 
        pf = TRUE)
```

By setting `pf = TRUE`, `fit_ssm` returns the processed data as a projected `sf` data.frame prior to fitting an SSM. Looking at the `keep` column, we can see that the 3rd, 4th and 7th observations have been flagged to be ignored by the SSM, in this case because they failed the distance limits imposed by the `trip::sda` filter. Also note that because the data included CLS Argos location ellipse error information, the observation type has been designated `KF` for Kalman filter and this ensures that the appropriate measurement error model is used when fitting the SSM to these data.

### Fit an SSM
In the simplest case, fitting an SSM consists of specifying the process model and a prediction time step:
```{r model fit, message=FALSE, eval=FALSE}
fit <- fit_ssm(ellie, 
               model = "rw", 
               time.step = 24)
```

This call uses the defaults for data processing and all other arguments (see `?fit_ssm` for details). More control can be exerted over the model fitting process via the arguments: `emf`, `map`, and `control`. 

#### `emf`: specifying error multiplication factors 
The `emf` argument is helpful when modelling older Argos Least-Squares location data as it allows alternate specification of error multiplication factors for each location quality class. The error multiplication factors define the magnitude of the location error standard deviation for each Argos location quality class relative to the best quality class, 3. The emf's used by default in `aniMotum` (see Jonsen et al. 2020 for further details) are unlikely to be equally applicable to all tracking data sets. Alternate specifications could be derived from Argos location error studies that draw on Argos - GPS double-tagging data such as (Costa et al. 2010). 

The default emf values are given by `emf()`:
```{r emf, message=FALSE}
emf()
```
These can be adjusted by modifying the data.frame and passing this to `fit_ssm` via `emf.
```{r emf mod, eval=FALSE, message=FALSE}
emf2 <- dplyr::mutate(emf(), emf.x = c(0.1, 1, 2, 4, 20, 8, 15, 15))

fit1 <- fit_ssm(subset(sese2, id == unique(id)[1]), 
                model = "rw", 
                time.step=24, 
                control = ssm_control(verbose = 0))

fit2 <- fit_ssm(subset(sese2, id == unique(id)[1]), 
                model = "rw", 
                time.step=24, 
                emf = emf2, 
                control = ssm_control(verbose = 0))

par(mfrow = c(2,1), mar = c(4,3,1,1))

hist(grab(fit1, "f")$x - grab(fit2, "f")$x, breaks = 20, 
     main = "fit1$x - fit2$x", xlim = c(-20,20), xlab = "")

hist(grab(fit1, "f")$y - grab(fit2, "f")$y, breaks = 20, 
     main = "fit1$y - fit2$y", xlim = c(-20,20), xlab = "Distance (km)")
```
![](images/ssm_fitting/histos.jpeg){width="98%"}

In this contrived example, re-specified `emf.x` values result in differences in the predicted locations of up to about 20 km in the x-direction and a more subtle 4 km in the y-direction, even though the `emf.y` values were not edited. The latter is due to a model-estimated correlation between the x and y errors.

Now that CLS Argos provides locations estimated by their Kalman filter algorithm, users of contemporary Argos data sets should have no need to invoke the `emf` argument as location quality class-based errors are now superseded by the use of location-specific Argos error ellipse information (McClintock et al. 2015; Jonsen et al. 2020).

#### `map`: fixing model parameters
The `map` argument allows some model parameters to by turned off by fixing them at 0 on the log-scale. This can be useful in cases where model simplification can aid in convergence or simply provide a better fit to the data. For example, the `crw` has a parameter `psi` that re-scales the semi-major axis of Argos KF error ellipses to account for apparent bias in the error ellipses (Jonsen et al. 2020). Occasionally, there may be insufficient information to reliably estimate this parameter, or one might wish to compare model fits with and without error ellipse bias correction.

Here we fit the `crw` model with (`fit1`) and without (`fit2`) the bias-correction parameter `psi`, by supplying a named list to the `map` argument. Where `factor(NA)` is a `TMB` convention that ensures `psi` is fixed during optimization.
```{r demonstrate map use, eval=FALSE, message=FALSE, warning=FALSE}
fit1 <- fit_ssm(ellie, 
                model = "crw", 
                time.step = 24,
                control = ssm_control(verbose = 0))

fit2 <- fit_ssm(ellie, 
                model = "crw", 
                time.step = 24, 
                map = list(psi = factor(NA)),
                control = ssm_control(verbose = 0))

c(fit1$ssm[[1]]$AICc, fit2$ssm[[1]]$AICc)
```
```{r demonstrate map use2, message=FALSE, echo=FALSE}
load("data/ssm_fitting/fits_aic.rda")
c(fit1$ssm[[1]]$AICc, fit2$ssm[[1]]$AICc)
```

Comparison of AIC~c~ from the model fits implies the model including the `psi` parameter `fit1` provides a slightly better fit.

#### `control`: controlling the optimization
By default, `aniMotum` uses R's `nlminb` optimizer, but users can also select the more flexible `optim` to minimize the objective function defined by the built-in `TMB` C++ templates. Users can specify which optimizer to use via the `control` argument, which takes a named list as input that is provided by the utility function `ssm_control:
```{r ssm_control, message=FALSE, warning=FALSE, eval = FALSE}
fit <- fit_ssm(ellie, 
               model = "crw", 
               time.step = 24,
               control = ssm_control(optim = "optim",
                                     verbose = 0))
```
Here, we use `control = ssm_control()` to set the optimizer to `optim` and turned off the parameter trace that would be printed to the console by default. See `?ssm_control` for further options. Both of optimizers rely on a list of control parameters that allow fine-tuning of the optimization process, which can aid convergence on tricky data sets. Users can specify any of these control parameters in `ssm_control`, for example:
```{r optimizer params, message=FALSE, warning=FALSE, eval = FALSE}
fit <- fit_ssm(ellie, 
               model = "crw", 
               time.step = 24,
               control = ssm_control(optim = "optim",
                                     maxit = 1000,
                                     verbose = 0))
```

Here we increase the maximum number of optimizer iterations from 100 (`optim` default for derivative-based methods) to 1000. Note, users must know the correct name for each optimizer's control arguments and should be familiar with their effect on the optimization before using, see `?nlminb` and `?optim` for details.

The `control` argument is used to turn off parameter trace printing via `ssm_control(verbose = 0)` in all above examples. This keeps the vignette tidy as the default parameter trace would print as a long list to the vignette. In normal use in the console, the parameter trace overwrites itself on a single line per individual data set and is provided as a guide to model fitting progress. The full inner and outer optimization as reported by `TMB` (Kristensen et al. 2016) can also be printed to the console by setting `ssm_control(verbose = 2)`; this can be helpful in diagnosing convergence failures or optimization errors.

Finally, the `ssm_control(se = TRUE)` argument can be invoked when fitting the `crw` SSM to turn on computation of SE's for speed-along-track estimates. By default, these SE's are turned off as their calculation require considerable computation, which increases with the number of prediction intervals, and dramatically increases the time required to fit the `crw` model.


## References

Costa DP, Robinson PW, Arnould JPY, Harrison AL, Simmons SE, Hassrick JL, Hoskins AJ, Kirkman SP, Oosthuizen H, Villegas-Amtmann S, Crocker DE (2010) Accuracy of ARGOS locations of pinnipeds at-sea estimated using Fastloc GPS. _PLOS One_ 5:e8677.

Freitas C, Lydersen C, Fedak MA, Kovacs KM (2008) A simple new algorithm to filter marine mammal Argos locations. _Marine Mammal Science_ 24: 315-325.

Jonsen ID, Patterson TA, Costa DP, Doherty PD, Godley BJ, Grecian WJ, Guinet C, Hoenner X, Kienle SS, Robinson PW, Votier SC, Whiting S, Witt MJ, Hindell MA, Harcourt RG, McMahon CR (2020) A continuous-time state-space model for rapid quality control of Argos locations from animal-borne tags. _Movement Ecology_ 8:31.

Kristense K, Nielsen A, Berg CW, Skaug H, Bell BM (2016) TMB: automatic differentiation and Laplace approximation. _Journal of Statistical Software_ 70:1-21.

McClintock BT, London JM, Cameron MF, Boveng PL (2015) Modelling animal movement using the Argos satellite telemetry location error ellipse. _Methods in Ecology and Evolution_ 6:266–77.