library(sporeg)

Objective

This vignette is a step by step guide to the journal article, A novel process to infer the reliability of ecological information derived from passive acoustic telemetry track reconstruction, published in Methods in Ecology and Evolution (https://doi.org/10.1111/2041-210X.14340). This guide will show you how to simulate tracks, derive acoustic telemetry data from those tracks, reconstruct tracks from those derived detection data, re-route the tracks off land, summarize distribution information, and compare the simulated and modeled tracks. The comparisons represent the statistical bias incurred by the track reconstruction. The accuracy of the track re-constructions depends on the orientation of receivers within the study area.

Let’s begin by clearing our environment.

rm(list=ls())

And clearing any cache we may have remaining from previous work.

gc()
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  606369 32.4    1105524 59.1  1105524 59.1
#> Vcells 1149458  8.8    8388608 64.0  2557734 19.6

Next, let’s load the packages that we will need to run the steps in this vignette.

library(sporeg)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(sp)
library(pathroutr) 
library(parallel)
library(geosphere)
library(mapview)
library(car)
#> Loading required package: carData
#> 
#> Attaching package: 'car'
#> The following object is masked from 'package:dplyr':
#> 
#>     recode

Get the objects required for simulating tracks

Get the study site and set the projected coordinate system.

load(system.file("extdata", "study_site.Rda", 
                            package = "sporeg"))

Get the release site.

load(system.file("extdata", "rel_site.Rda", 
                            package = "sporeg"))

Create the objects required for summarizing distribution information

Make the different resolution grid cells and put them all in a list.

HS_10km_grid <- grid_res(10, study_site, 3857, what = "polygons")
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
HS_25km_grid <- grid_res(25, study_site, 3857, what = "polygons")
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
HS_50km_grid <- grid_res(50, study_site, 3857, what = "polygons")
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
HS_100km_grid <- grid_res(100, study_site, 3857, what = "polygons")
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries

HSgrid <- list(HS_100km_grid, HS_50km_grid, HS_25km_grid, HS_10km_grid)

Get the objects required for deriving acoustic telemetry data

Get the stations.

load(system.file("extdata", "stations.Rda", 
                            package = "sporeg"))

Get the objects required for re-routing tracks

Get the land barrier.

load(system.file("extdata", "land_barrier.Rda", 
                            package = "sporeg")) 

land_barrier <- land_barrier %>%
  sf::st_transform(3857) %>%
  sf::st_collection_extract('POLYGON') %>%
  sf::st_cast('POLYGON')
#> Warning in st_collection_extract.sf(., "POLYGON"): x is already of type
#> POLYGON.

Create a visibility graph from the land barrier object.

vis_graph <- pathroutr::prt_visgraph(land_barrier)

Set the parameters values

For simulating tracks:

# Number of animals
anims <- 3
# Number of years
yr <- 1       
# Turning angle - standard deviation of body flexion from blacktip kinematics study
theta <- c(0, 1.74)  
# Lower bound of mean recorded velocity for animal in m/s (ex: blacktip shark) (average - standard deviation)
vmin <- 0.98  
# Upper bound of mean recorded velocity for animal in m/s (ex: blacktip shark) (average + standard deviation)
vmax <- 1.58  
# Intersection of bounding box of release site(s) and study site
rel_site <- rel_site 

# Study site must be an sp object for the track simulations
study_site <- study_site 

# Name the epsg of the coordinate system for study_site
crs <- 3857

# Number of days animal is tracked
n_days <- 365*yr 
# Initial heading of animal
initHeading <- 0

For the iterations:

total_iterations <- 2 # Total number of iterations desired
reps <- 2 # Number of replicates that will result in desired iterations given reps
runs <- 1:(total_iterations/reps) # Interval at which outputs should be written to file

Create a cluster

This will allow us to run these processes in parallel. Here, we find out how many cores we have, not including logical cores, and create a cluster

numCores <- detectCores(logical = F)
numCores

cl <- makeCluster(numCores-4, outfile = "cluster_out")

Creating a cluster will create multiple instances of R on your computer. Now, we need to upload all of the libraries that each R instance will need to run the iterations. Notice that we have omitted the parallel library.

clusterEvalQ(cl, {
  
  library(sporeg)
  library(sf)
  library(sp)
  library(glatos) 
  library(dplyr)
  library(momentuHMM)
  library(crawl)
  library(lubridate)
  library(pathroutr)
  library(sfnetworks)
  library(tidyverse)
  library(stplanr)
  library(future)
  library(doFuture)
  
})

Let’s create an output file where our results will go. We only want to create this file once, otherwise we will overwrite our previous results. Thus, you will need to un-comment the lines to create it the first time. We want to assign the file name every time we run iterations, however.


# Create empty list object and save it
# results <- list()

# Assign the filename
filename <- "results.Rda"

# Save the empty list object to the filename
# save(results, file = filename)

Let’s remove any unnecessary objects we may have created before we load the environment up to the cluster

rm(list = c("HS_100km_grid", "HS_50km_grid", "HS_25km_grid", "HS_10km_grid"))

Now we’re ready to mirror our global environment to the cluster cores.

parallel::clusterExport(cl, varlist = ls(envir = .GlobalEnv))

Run the iterations

We should now have everything in place to iterate our methods. Here, we are using base::replicate to decide how often to save the iterations we have made. We are using parallel::parLapply to create the iterations. So, if we decided to use reps <- 10 and runs <- 100, we would create 1,000 total iterations and save them every 100 iterations.

base::replicate(n = reps, expr = {
    
    iterated_results <- parallel::parLapply(cl, runs, fun = function(I) {
      
      # Simulate tracks 
      df <- simul_trks(anims, study_site, theta, vmin, vmax, rel_site, crs, n_days, initHeading)
      
      # Compare summary distribution info of simulated and modeled tracks across multiple grid cell resolutions
      df <- comp_trks(df, stations, land_barrier, vis_graph, HSgrid, multi.grid = TRUE, snap_tolerance = 650)
      
      return(df)
    }
    )
    # Load the previous results
    load(system.file("extdata", filename, package = "sporeg")) 
    
    # Append the new results to the previous results
    results <- append(results, iterated_results)
    
    # Overwrite the results with both the new and previous results combined
    save(results, file = system.file("extdata", filename, package = "sporeg"))
   
    return(df)
  }
  )

Stop the cluster

You always want to stop the cluster when you are finished using it.

parallel::stopCluster(cl)

Check the results

The number of iterations we want depends on how variable our results are so let’s do a power analysis after we’ve run a few hundred iterations.

First, let’s load the results.

load(system.file("extdata", filename, package = "sporeg"))

Hmm… these are hard to look at. Let’s turn them into a data frame instead.

# Bind the lists by grid cell resolution
results <- lapply(results, data.table::rbindlist, idcol = 'resolution')

# Bind the bound resolution by iteration
results <- data.table::rbindlist(results, idcol = 'iteration')

Remove zero inflation

Remove any cases where both sim_count and mod_count were zero for all iterations within a grid resolution. First, let’s merge grid resolution size names to the main data frame.

results <- results %>%
  dplyr::left_join(dplyr::tibble(resolution = 1:4, 
                   res_name = c("100km", "50km", "25km", "10km")), 
            by = "resolution")

Test for zero variance.

# Remove grid cells with zero variance
df_var <- zero_var(results) 

# Find the grid cell with zero variance
df_zero_var <- anti_join(results, df_var, by = c("res_name", "gid")) %>%
  distinct(res_name, gid)

Prep for power analysis

Prep the results for a power analysis by setting some parameters for our power analysis. We previously set our number of animals, but it’s likely that we’ve stepped away to work on other things before all these iterations finished so we can run it again here by un-commenting anims <- 30 if we need to. Whatever variable you are interested in here, you set as NULL. So, if I want to know how many iterations I will need to achieve a power of 80% and an effect size within 1% of the total number of animals at a 95% significance level, I will leave n <- NULL.

# The number of animals
anims <- anims

# The power that we want to achieve
power <- 0.8

# The effect size we want to achieve
delta <- anims*0.01 # a delta within 1% of the total number of animals

# The level of significance that we want to achieve
sig.level <- 0.95

# The number of iterations we will need to run to achieve the other parameters that we set
n <- NULL

The parameters power, delta, sig.level, and n will need to be chosen based on the study itself. In the past, this statistics book has served as a useful resource to me. It also comes with R code resources.

Let’s calculate standard deviation by grid resolution. Now we’re ready to run our power analysis. We need to run one for each grid cell resolution. So, we’ll do that by setting another variable called res and filtering for the grid cell resolution of interest in the results that we prepped for this power analysis.

pow_stat <- df_var %>% 
  group_by(res_name, gid) %>%
  summarise(sd = sd(dif)) %>%
  ungroup() %>%
  group_by(res_name)
#> `summarise()` has grouped output by 'res_name'. You can override using the
#> `.groups` argument.

res <- pow_stat %>% dplyr::filter(res_name == "100km")
pwr_100km <- powr(res, sig.level, power, delta)
print("Grid cell resolution: 100 km x 100 km")
#> [1] "Grid cell resolution: 100 km x 100 km"
pwr_100km
#> 
#>      Paired t test power calculation 
#> 
#>               n = 11837.53
#>           delta = 0.03
#>              sd = 3.609324
#>       sig.level = 0.95
#>           power = 0.8
#>     alternative = two.sided
#> 
#> NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

res <- pow_stat %>% dplyr::filter(res_name == "50km")
pwr_50km <- powr(res, sig.level, power, delta)
print("Grid cell resolution: 50 km x 50 km")
#> [1] "Grid cell resolution: 50 km x 50 km"
pwr_50km
#> 
#>      Paired t test power calculation 
#> 
#>               n = 12784.14
#>           delta = 0.03
#>              sd = 3.750861
#>       sig.level = 0.95
#>           power = 0.8
#>     alternative = two.sided
#> 
#> NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

res <- pow_stat %>% dplyr::filter(res_name == "25km")
pwr_25km <- powr(res, sig.level, power, delta)
print("Grid cell resolution: 25 km x 25 km")
#> [1] "Grid cell resolution: 25 km x 25 km"
pwr_25km
#> 
#>      Paired t test power calculation 
#> 
#>               n = 12784.14
#>           delta = 0.03
#>              sd = 3.750861
#>       sig.level = 0.95
#>           power = 0.8
#>     alternative = two.sided
#> 
#> NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs


res <- pow_stat %>% dplyr::filter(res_name == "10km")
pwr_10km <- powr(res, sig.level, power, delta)
print("Grid cell resolution: 10 km x 10 km")
#> [1] "Grid cell resolution: 10 km x 10 km"
pwr_10km
#> 
#>      Paired t test power calculation 
#> 
#>               n = 14404.71
#>           delta = 0.03
#>              sd = 3.981507
#>       sig.level = 0.95
#>           power = 0.8
#>     alternative = two.sided
#> 
#> NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

Sanity check

Do we really need THAT few/many iterations?! Let’s do a sanity check here and see how variable our worst grid cell results were. We can also compare these max_sd results to the sd = printout results from the above power analyses.

max_sd <- pow_stat %>%
  ungroup() %>%
  group_by(res_name) %>%
  summarise(max_sd = max(sd))
max_sd
#> # A tibble: 4 × 2
#>   res_name max_sd
#>   <chr>     <dbl>
#> 1 100km      3.61
#> 2 10km       3.98
#> 3 25km       3.75
#> 4 50km       3.75

Decisions, decisions

Here, we need to decide whether we are satisfied with the number of iterations we’ve run so far or if we need to run more iterations to achieve our desired power. If you want to run more iterations, you will return to the “Create a cluster” section. If you’re satisfied, let’s move on to the next section, Satisfaction.

Satisfaction

If you’re satisfied with the results of the power analysis for your simulated acoustic detection data, let’s summarize the differences between simulated and modeled tracks by grid cell within each grid resolution.

If we ran too many iterations, we can filter for the number required for each iteration, as indicated by our previous power analysis.

new_df <- df_var %>%
  dplyr::filter(res_name == "100km" & iteration <= 119 |
                  res_name == "50km" & iteration <= 128 |
                  res_name == "25km" & iteration <= 128 |
                  res_name == "10km" & iteration <= 144)

results <- new_df %>%
  group_by(res_name, gid) %>%
  summarise(avg_dif = mean(dif),
            se = plotrix::std.error(dif)) 
#> `summarise()` has grouped output by 'res_name'. You can override using the
#> `.groups` argument.
results
#> # A tibble: 6,709 × 4
#> # Groups:   res_name [4]
#>    res_name   gid avg_dif    se
#>    <chr>    <dbl>   <dbl> <dbl>
#>  1 100km        1  11.7   0.237
#>  2 100km        2  11.7   0.237
#>  3 100km        3  27.6   0.162
#>  4 100km        4   7.36  0.202
#>  5 100km        5  26.1   0.166
#>  6 100km        6   1.71  0.112
#>  7 100km        7  21.5   0.219
#>  8 100km        8   1.48  0.123
#>  9 100km        9  22.6   0.193
#> 10 100km       10   0.933 0.183
#> # ℹ 6,699 more rows

Check the methods performance

Let’s create a table that shows how closely the modeled tracks compared to the reconstructed tracks for each grid resolution. Here, we compare counts of grid cells within a certain difference from sim_count divided by the total number of grid cells.

# Counts
how_well_ct <- results %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")),
         wn_0 = ifelse(avg_dif >= -1*0.3 & avg_dif <= 0.3, 1, 0),
         wn_1 = ifelse(avg_dif >= -1*1 & avg_dif <= 1, 1, 0), 
         wn_2 = ifelse(avg_dif >= -1*2 & avg_dif <= 2, 1, 0), 
         wn_3 = ifelse(avg_dif >= -1*3 & avg_dif <= 3, 1, 0), 
         wn_4 = ifelse(avg_dif >= -1*4 & avg_dif <= 4, 1, 0), 
         wn_5 = ifelse(avg_dif >= -1*5 & avg_dif <= 5, 1, 0)) %>%
    group_by(res_name) %>%
  summarise(maxgid = n(),
            tot_wn_0 = sum(wn_0),
            tot_wn_1 = sum(wn_1),
            tot_wn_2 = sum(wn_2),
            tot_wn_3 = sum(wn_3),
            tot_wn_4 = sum(wn_4),
            tot_wn_5 = sum(wn_5)) %>%
  distinct()

# Percentages
how_well <- results %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")),
         wn_0 = ifelse(avg_dif >= -1*0.3 & avg_dif <= 0.3, 1, 0),
         wn_1 = ifelse(avg_dif >= -1*1 & avg_dif <= 1, 1, 0), 
         wn_2 = ifelse(avg_dif >= -1*2 & avg_dif <= 2, 1, 0), 
         wn_3 = ifelse(avg_dif >= -1*3 & avg_dif <= 3, 1, 0), 
         wn_4 = ifelse(avg_dif >= -1*4 & avg_dif <= 4, 1, 0), 
         wn_5 = ifelse(avg_dif >= -1*5 & avg_dif <= 5, 1, 0)) %>%
    group_by(res_name) %>%
  summarise(maxgid = n(),
            tot_wn_0 = sum(wn_0)/maxgid*100,
            tot_wn_1 = sum(wn_1)/maxgid*100,
            tot_wn_2 = sum(wn_2)/maxgid*100,
            tot_wn_3 = sum(wn_3)/maxgid*100,
            tot_wn_4 = sum(wn_4)/maxgid*100,
            tot_wn_5 = sum(wn_5)/maxgid*100) %>%
  distinct()

Track Reconstruction Issue

We had a problem in the track reconstruction where some iterations did not reconstruct tracks for all anims. Let’s find out how many track reconstructions were successful across all the iterations for each grid cell resolution.

recon <- df_var %>%
  group_by(res_name, iteration) %>%
  summarise(max_recon = max(mod_count)) %>%
  ungroup() %>%
  group_by(res_name) %>%
  summarise(max_30 = length(which(max_recon >= 30))/max(iteration)*100,
            max_29 = length(which(max_recon >= 29))/max(iteration)*100,
            max_28 = length(which(max_recon >= 28))/max(iteration)*100,
            max_27 = length(which(max_recon >= 27))/max(iteration)*100,
            max_26 = length(which(max_recon >= 26))/max(iteration)*100,
            max_25 = length(which(max_recon >= 25))/max(iteration)*100)
#> `summarise()` has grouped output by 'res_name'. You can override using the
#> `.groups` argument.

Additional Bias

Because some tracks were not re-constructed by the movement model, we may have introduced unnecessary bias. Let’s see how the model performance changes if we isolate iterations where all anims tracks were reconstructed. First, let’s see how many iterations actually reconstructed all anims tracks.

recon_res <- df_var %>%
  group_by(res_name, iteration) %>%
  summarise(max_mod = max(mod_count)) %>%
  dplyr::filter(max_mod == anims) %>%
  ungroup() %>%
  group_by(res_name) %>%
  count()
#> `summarise()` has grouped output by 'res_name'. You can override using the
#> `.groups` argument.

Let’s re-number these iterations where all anims were re-constructed.

new_iters <- df_var %>%
  group_by(res_name, iteration) %>%
  summarise(max_mod = max(mod_count)) %>%
  dplyr::filter(max_mod == 30) %>%
  ungroup() %>%
  distinct(iteration) %>%
  mutate(new_iter = seq_along(iteration))
#> `summarise()` has grouped output by 'res_name'. You can override using the
#> `.groups` argument.

Now we can see how our results would change if we had selected only iterations in which all anims were re-constructed.

new_iters_df <- merge(new_iters, df_var) %>%
  dplyr::filter(res_name == "100km" & new_iter <= 119 |
                  res_name == "50km" & new_iter <= 128 |
                  res_name == "25km" & new_iter <= 128 |
                  res_name == "10km" & new_iter <= 144)

new_results <- new_iters_df %>%
  group_by(res_name, gid) %>%
  summarise(avg_dif = mean(dif),
            se = plotrix::std.error(dif)) 
#> `summarise()` has grouped output by 'res_name'. You can override using the
#> `.groups` argument.
new_results
#> # A tibble: 6,709 × 4
#> # Groups:   res_name [4]
#>    res_name   gid avg_dif     se
#>    <chr>    <dbl>   <dbl>  <dbl>
#>  1 100km        1 11.3    0.217 
#>  2 100km        2 11.4    0.219 
#>  3 100km        3 27.7    0.136 
#>  4 100km        4  6.13   0.190 
#>  5 100km        5 26.3    0.155 
#>  6 100km        6  0.429  0.0554
#>  7 100km        7 21.4    0.247 
#>  8 100km        8  0.0756 0.0747
#>  9 100km        9 22.5    0.217 
#> 10 100km       10 -0.454  0.168 
#> # ℹ 6,699 more rows

new_how_well <- new_results %>%
  group_by(res_name) %>%
  mutate(maxgid = max(gid),
         res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")),
         wn_0 = ifelse(avg_dif >= -1*0.1 & avg_dif <= 0.1, 1, 0),
         wn_1 = ifelse(avg_dif >= -1*1 & avg_dif <= 1, 1, 0), 
         wn_2 = ifelse(avg_dif >= -1*2 & avg_dif <= 2, 1, 0), 
         wn_3 = ifelse(avg_dif >= -1*3 & avg_dif <= 3, 1, 0), 
         wn_4 = ifelse(avg_dif >= -1*4 & avg_dif <= 4, 1, 0), 
         wn_5 = ifelse(avg_dif >= -1*5 & avg_dif <= 5, 1, 0)) %>%
  summarise(tot_wn_0 = sum(wn_0)/maxgid*100,
            tot_wn_1 = sum(wn_1)/maxgid*100,
            tot_wn_2 = sum(wn_2)/maxgid*100,
            tot_wn_3 = sum(wn_3)/maxgid*100,
            tot_wn_4 = sum(wn_4)/maxgid*100,
            tot_wn_5 = sum(wn_5)/maxgid*100) %>%
  distinct()
#> `summarise()` has grouped output by 'res_name'. You can override using the
#> `.groups` argument.

new_how_well
#> # A tibble: 4 × 7
#> # Groups:   res_name [4]
#>   res_name tot_wn_0 tot_wn_1 tot_wn_2 tot_wn_3 tot_wn_4 tot_wn_5
#>   <ord>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1 100km      11.8      31.2      37.6     39.8     40.9     44.1
#> 2 50km        3.42     14.0      22.6     28.8     32.2     39.4
#> 3 25km        0.710     7.40     15.7     21.7     26.1     30.7
#> 4 10km        1.05      7.14     14.2     19.8     25.2     29.7

Gather more information

In this section, we want to find out why the results are the way they are. Can we derive some information about the grid cells that tell us what contributes to a better result?

First, let’s gather information on the depth in each grid cell. For this example, I have logged onto the National Oceanic and Atmospheric Administration (NOAA) ERDDAP server and gathered depth information for the bounding box of the study site, which must be in a geographic coordinate system (e.g., NAD 1983). Let’s get the bounding box of our study site.¹

# Objects required for iterations to run
load(system.file("extdata", "site_depth.Rda", package = "sporeg")) 

site_depth <- site_depth %>%
  sf::st_as_sf() %>%
  sf::st_transform(4269)

sf::st_bbox(site_depth)
#>      xmin      ymin      xmax      ymax 
#> -81.65038  24.32100 -69.28000  41.85253

We will use the xmin (Min Longitude), ymin (Min Latitude), xmax (Max Longitude), and ymax (Max Latitude) to search for relevant depth (altitude) data in the ERDDAP server.

NOAA ERDDAP server advanced search parameters for depth measurements in study site.

We are going to use the DatasetID: etopo180

ERDDAP search results

Click on graph and then click on Download the Data or an Image

How to get the ERDDAP data

Save the export.

load(system.file("extdata", "atlcoast.Rda", package = "sporeg")) 

atlcoast <- atlcoast %>%
  sf::st_transform(4269)

Read in depth data. For now, we will load the demonstration depth_data file. Here, we also filter the depth data to only be those depths found inside the study site - right now, it’s everything inside the grid bounding box.

load(system.file("extdata", "depth_data.Rda", package = "sporeg"))

depth_data <- depth_data %>%
  sf::st_as_sf(coords = c("longitude", "latitude"), crs = 4269)

depth_data <- sf::st_filter(depth_data, site_depth) %>% 
  mutate(uid = seq_along(geometry))

We also need to re-create the grids without the false origin.² The study site must be in a projected coordinate system (WGS 84; EPSG: 3857) when it is initially fed into the function grid_res.

site_depth <- site_depth %>%
  sf::st_transform(., 3857)

HS_100km_grid <- grid_res(100, site_depth, 4269, "polygons")
HS_50km_grid <- grid_res(50, site_depth, 4269, "polygons")
HS_25km_grid <- grid_res(25, site_depth, 4269, "polygons")
HS_10km_grid <- grid_res(10, site_depth, 4269, "polygons")

Merge depth data with different grid cell resolutions.

HS_100km_grid <- depth_data(HS_100km_grid, depth_data)
HS_50km_grid <- depth_data(HS_50km_grid, depth_data)
HS_25km_grid <- depth_data(HS_25km_grid, depth_data)
HS_10km_grid <- depth_data(HS_10km_grid, depth_data)

Calculate distance to shore, density of receivers, presence/absence of receivers, and merge with grid cell depth data. We can do all of this using the false origin shapefiles.

load(system.file("extdata", "fo_sts_pts.Rda", package = "sporeg")) 

fo_sts_pts <- fo_sts_pts %>%
  sf::st_centroid()

load(system.file("extdata", "fo_study_site.Rda", package = "sporeg")) 
  
load(system.file("extdata", "fo_land_barrier.Rda", package = "sporeg")) 

fo_land_barrier <- fo_land_barrier %>%
  sf::st_transform(3857) %>%
  sf::st_collection_extract('POLYGON') %>%
  sf::st_cast('POLYGON') %>%
  sf::st_union() %>%
  sf::st_as_sf()

epsg = 3857

HS_100km_grid <- merge((HS_100km_grid %>% as.data.frame() %>% dplyr::select(-geometry)), d_shore_rcvs(100, study_site, fo_land_barrier, epsg, fo_sts_pts), by = "gid") %>%
  sf::st_as_sf() 

HS_50km_grid <- merge((HS_50km_grid %>% as.data.frame %>% dplyr::select(-geometry)), d_shore_rcvs(50, study_site, fo_land_barrier, epsg, fo_sts_pts), by = "gid") %>%
  sf::st_as_sf()

HS_25km_grid <- merge((HS_25km_grid %>% as.data.frame %>% dplyr::select(-geometry)), d_shore_rcvs(25, study_site, fo_land_barrier, epsg, fo_sts_pts), by = "gid") %>%
  sf::st_as_sf()

HS_10km_grid <- merge((HS_10km_grid %>% as.data.frame %>% dplyr::select(-geometry)), d_shore_rcvs(10, study_site, fo_land_barrier, epsg, fo_sts_pts), by = "gid") %>%
  sf::st_as_sf()

Let’s find out how many grid cells lacked receivers.

grids <- list(HS_100km_grid, HS_50km_grid, HS_25km_grid, HS_10km_grid)

grids <- sf::st_as_sf(data.table::rbindlist(grids, idcol = 'resolution')) %>%
  left_join(tibble(resolution = 1:4, res_name = c("100km", "50km", "25km", "10km")), by = "resolution") %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")))

no_rcv <- grids %>%
  group_by(res_name) %>%
  summarise(n_zeroes = sum(p_a == "0"),
            perc_zeroes = n_zeroes/max(gid)*100) %>%
  as.data.frame %>%
  dplyr::select(-x)

Merge the results to these grid cell metadata and determine which grid cells exhibited a good fit.

results <- results %>%
  mutate(g_fit = ifelse(avg_dif>= -1*0.1*anims & avg_dif <= 0.1*anims, 1, 0))

HS_100km_grid <- merge(results %>% dplyr::filter(res_name == "100km"), HS_100km_grid, by = c("gid")) %>% 
  sf::st_as_sf() 

HS_50km_grid <- merge(results %>% dplyr::filter(res_name == "50km"), HS_50km_grid, by = c("gid")) %>%
  sf::st_as_sf() 

HS_25km_grid <- merge(results %>% dplyr::filter(res_name == "25km"), HS_25km_grid, by = c("gid")) %>%
  sf::st_as_sf() 

HS_10km_grid <- merge(results %>% dplyr::filter(res_name == "10km"), HS_10km_grid, by = c("gid")) %>%
  sf::st_as_sf()

Map results

Let’s map these and see what kind of spatial trends we can see.

mapview::mapView(HS_100km_grid)

mapview::mapView(HS_50km_grid)

mapview::mapView(HS_25km_grid)

mapview::mapView(HS_10km_grid)

Map the results

Let’s marry our results to a sf object that lacks a false origin so that we can more easily map them.

# Create grids without false origin
study_site <- study_site %>% # UTM 17N
  sf::st_transform(4269) %>% # Convert to NAD 1983
  sf::st_transform(3857) # Convert to WGS 84

HS100km <- grid_res(100, study_site, 3857, "polygons")
HS50km <- grid_res(50, study_site, 3857, "polygons")
HS25km <- grid_res(25, study_site, 3857, "polygons")
HS10km <- grid_res(10, study_site, 3857, "polygons")

# Merge results data to new geometries
HS100km <- merge((HS_100km_grid %>% as.data.frame() %>% dplyr::select(-x)), HS100km, by = c("gid")) %>% 
  sf::st_as_sf() 

HS50km <- merge((HS_50km_grid %>% as.data.frame() %>% dplyr::select(-x)), HS50km, by = c("gid")) %>%
  sf::st_as_sf()

HS25km <- merge((HS_25km_grid %>% as.data.frame() %>% dplyr::select(-x)), HS25km, by = c("gid")) %>%
  sf::st_as_sf()

HS10km <- merge((HS_10km_grid %>% as.data.frame() %>% dplyr::select(-x)), HS10km, by = c("gid")) %>%
  sf::st_as_sf()

Model the effects

We want to create a model to understand which variables affect a good fit in the grid cells.

Create a model to test for multicolinearity

We need to check for multicollinearity between main effects to see if we need to include any interaction terms between main effects in the model.

vif.fit <- glm(formula = g_fit ~ den_rcs + p_a + mean_depth + d_shore, family = binomial(), data = HS_10km_grid)

car::vif(vif.fit)
#>    den_rcs        p_a mean_depth    d_shore 
#>   2.148366   2.143100   3.903303   3.917314

# 100 x 100km resolution
fit1.100km <- glm(formula = g_fit ~ den_rcs + p_a + mean_depth + d_shore,
                 family = binomial(), data = HS_100km_grid)

# den_rcs      1.7679  1  0.1836395

fit2.100km <- glm(formula = g_fit ~ p_a + mean_depth + d_shore,
                 family = binomial(), data = HS_100km_grid)

car::Anova(fit2.100km)
#> Analysis of Deviance Table (Type II tests)
#> 
#> Response: g_fit
#>            LR Chisq Df Pr(>Chisq)
#> p_a               0  1          1
#> mean_depth        0  1          1
#> d_shore           0  1          1


# 50 x 50km resolution
fit1.50km <- glm(formula = g_fit ~ den_rcs + p_a + mean_depth + d_shore,
                 family = binomial(), data = HS_50km_grid)

# d_shore      0.1248  1     0.7239

fit2.50km <- glm(formula = g_fit ~ den_rcs + p_a + mean_depth,
                 family = binomial(), data = HS_50km_grid)

# den_rcs       0.305  1     0.5809 

fit3.50km <- glm(formula = g_fit ~ p_a + mean_depth,
                 family = binomial(), data = HS_50km_grid)

car::Anova(fit3.50km)
#> Analysis of Deviance Table (Type II tests)
#> 
#> Response: g_fit
#>            LR Chisq Df Pr(>Chisq)   
#> p_a          7.7896  1   0.005255 **
#> mean_depth   2.4360  1   0.118576   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# 25 x 25km resolution
fit1.25km <- glm(formula = g_fit ~ den_rcs + p_a + mean_depth + d_shore,
                 family = binomial(), data = HS_25km_grid)

# d_shore       2.161  1    0.14154

fit2.25km <- glm(formula = g_fit ~ den_rcs + p_a + mean_depth,
                 family = binomial(), data = HS_25km_grid)

car::Anova(fit2.25km)
#> Analysis of Deviance Table (Type II tests)
#> 
#> Response: g_fit
#>            LR Chisq Df Pr(>Chisq)    
#> den_rcs      4.9048  1  0.0267819 *  
#> p_a          6.5787  1  0.0103209 *  
#> mean_depth  14.5422  1  0.0001371 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# 10 x 10km resolution
fit1.10km <- glm(formula = g_fit ~ den_rcs + p_a + mean_depth + d_shore,
                 family = binomial(), data = HS_10km_grid)

car::Anova(fit1.10km)
#> Analysis of Deviance Table (Type II tests)
#> 
#> Response: g_fit
#>            LR Chisq Df Pr(>Chisq)    
#> den_rcs      6.2834  1    0.01219 *  
#> p_a          7.2807  1    0.00697 ** 
#> mean_depth  22.6427  1  1.951e-06 ***
#> d_shore      0.0295  1    0.86362    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Is the model useful?

Now that we’ve constructed the model using backward selection by p-value, let’s evaluate whether or not it is useful. We’ll do this by comparing the null deviance to the residual deviance. p-value = 1 - pchisq(null deviance - residual deviance, degrees of freedom)

summary(fit2.100km)
#> 
#> Call:
#> glm(formula = g_fit ~ p_a + mean_depth + d_shore, family = binomial(), 
#>     data = HS_100km_grid)
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.657e+01  7.155e+04       0        1
#> p_a1        -8.972e-15  8.704e+04       0        1
#> mean_depth  -6.579e-17  3.526e+02       0        1
#> d_shore      5.478e-20  8.199e-01       0        1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 0.0000e+00  on 91  degrees of freedom
#> Residual deviance: 5.3375e-10  on 88  degrees of freedom
#>   (1 observation deleted due to missingness)
#> AIC: 8
#> 
#> Number of Fisher Scoring iterations: 25

p_value.100km = 1 - pchisq(120.090-59.388, 3)
p_value.100km
#> [1] 4.161116e-13

summary(fit3.50km)
#> 
#> Call:
#> glm(formula = g_fit ~ p_a + mean_depth, family = binomial(), 
#>     data = HS_50km_grid)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -4.41418    1.10178  -4.006 6.16e-05 ***
#> p_a1         2.47201    1.09383   2.260   0.0238 *  
#> mean_depth  -0.02302    0.02450  -0.940   0.3473    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 65.768  on 285  degrees of freedom
#> Residual deviance: 51.920  on 283  degrees of freedom
#>   (6 observations deleted due to missingness)
#> AIC: 57.92
#> 
#> Number of Fisher Scoring iterations: 10

p_value.50km = 1 - pchisq(327.03-233.64, 3)
p_value.50km
#> [1] 0

summary(fit2.25km)
#> 
#> Call:
#> glm(formula = g_fit ~ den_rcs + p_a + mean_depth, family = binomial(), 
#>     data = HS_25km_grid)
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -3.13200    0.36272  -8.635  < 2e-16 ***
#> den_rcs     -54.74344   38.52510  -1.421  0.15532    
#> p_a1          1.55060    0.58426   2.654  0.00795 ** 
#> mean_depth   -0.03117    0.01399  -2.229  0.02581 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 202.43  on 967  degrees of freedom
#> Residual deviance: 178.25  on 964  degrees of freedom
#>   (18 observations deleted due to missingness)
#> AIC: 186.25
#> 
#> Number of Fisher Scoring iterations: 10

p_value.25km = 1 - pchisq(975.40-750.33, 3)
p_value.25km
#> [1] 0

summary(fit1.10km)
#> 
#> Call:
#> glm(formula = g_fit ~ den_rcs + p_a + mean_depth + d_shore, family = binomial(), 
#>     data = HS_10km_grid)
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -3.066e+00  1.588e-01 -19.304  < 2e-16 ***
#> den_rcs     -2.126e+01  1.201e+01  -1.771  0.07659 .  
#> p_a1         1.205e+00  4.359e-01   2.764  0.00570 ** 
#> mean_depth  -3.349e-02  1.189e-02  -2.817  0.00485 ** 
#> d_shore      1.067e-06  6.222e-06   0.171  0.86387    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 948.70  on 5198  degrees of freedom
#> Residual deviance: 863.12  on 5194  degrees of freedom
#>   (139 observations deleted due to missingness)
#> AIC: 873.12
#> 
#> Number of Fisher Scoring iterations: 11

p_value.10km = 1 - pchisq(5012.5-4012.9, 3)
p_value.10km
#> [1] 0

Calculate odds ratios

We want to know how much each main effect increases (or decreases) the odds of a good fit occurring so we need to calculate the odds ratios.

odds_100km <- get_or(fit2.100km) %>%
  mutate(res_name = "100km")

odds_50km <- get_or(fit3.50km) %>%
  mutate(res_name = "50km")

odds_25km <- get_or(fit2.25km) %>%
  mutate(res_name = "25km")

odds_10km <- get_or(fit1.10km) %>%
  mutate(res_name = "10km")

odds_ratios <- rbind(odds_100km, odds_50km, odds_25km, odds_10km)

How many receivers?

Let’s find out how the density of receivers varied for each grid cell resolution.

res <- list(HS_100km_grid, HS_50km_grid, HS_25km_grid, HS_10km_grid)

rcv_dens <- lapply(res, den_rcvs)

rcv_dens <- data.table::rbindlist(rcv_dens, idcol = 'resolution') %>%
  left_join(tibble(resolution = 1:4, 
                   res_name = c("100km", "50km", "25km", "10km")), 
            by = "resolution") %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")))

Where do good fits occur?

What are the characteristics of the grid cells that exhibited a good fit?

gd_fit_char <- lapply(res, gd_fts) %>% 
  data.table::rbindlist(., idcol = 'resolution') %>%
  left_join(tibble(resolution = 1:4, 
                   res_name = c("100km", "50km", "25km", "10km")), 
            by = "resolution") %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")))

Do the methods close gaps?

Let’s find out how many of the good fits occurred in grid cells that lacked receivers. We can use the results list res that we created in the previous step and apply a function over all resolutions at the same time.

clsd_gps <- lapply(res, gps_fld) %>% 
  data.table::rbindlist(., idcol = 'resolution') %>%
  left_join(tibble(resolution = 1:4, 
                   res_name = c("100km", "50km", "25km", "10km")), 
            by = "resolution") %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")))

Where are these closed gaps?

Let’s find out where these closed gaps are located (e.g., are they coastal?).

clsd_gp_chars <- lapply(res, gps_chr) %>%
  data.table::rbindlist(., idcol = 'resolution') %>%
  left_join(tibble(resolution = 1:4, 
                   res_name = c("100km", "50km", "25km", "10km")), 
            by = "resolution") %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")))

Where are receivers located?

Let’s find out where most of the receivers are in the current array.

rcv_chars <- lapply(res, rcv_chr) %>%
  data.table::rbindlist(., idcol = 'resolution') %>%
  left_join(tibble(resolution = 1:4, 
                   res_name = c("100km", "50km", "25km", "10km")), 
            by = "resolution") %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")))

What if we had chosen a different depth cutoff?

Blacktip sharks, the species chosen to demonstrate the current methods, is known to cross the Gulf Stream, which is in deep waters, on occasion. However, in a mark and recapture study, this species was not caught in a location with a depth greater than 265 m. If we cut off our study to 300 m, how do our results change?

depth_limit <- 300 # [m]

dif_depth <- lapply(res, dif_co) %>%
  data.table::rbindlist(., idcol = 'resolution') %>%
  left_join(tibble(resolution = 1:4, 
                   res_name = c("100km", "50km", "25km", "10km")), 
            by = "resolution") %>%
  mutate(res_name = ordered(res_name, levels = c("100km", "50km", "25km", "10km")))

Demonstrate ecological application

Let’s apply these methods to real acoustic telemetry data so we can derive distribution information.

load(system.file("extdata", "demo_dets.Rda", package = "sporeg"))

demo_dets <- demo_dets %>%
    dplyr::arrange(ID, time) %>%
    dplyr::group_by(ID, time) %>%
    dplyr::summarize(mean_pos = sf::st_combine(geometry)) %>%
    sf::st_centroid() %>%
    dplyr::mutate(time = as.POSIXct(time),
           x = sf::st_coordinates(mean_pos)[,1],
           y = sf::st_coordinates(mean_pos)[,2],
           locType = "o") %>% # Obtain x and y coordinates so that you can merge with predicted locations later
    sf::st_drop_geometry() %>%
    as.data.frame
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.

Now we can run the movement model, a continuous-time correlated random walk model, using the momentuHMM R package, to impute locations between observations.

demo_dets <- momentuHMM::crawlWrap(obsData = demo_dets, timeStep = "15 min",
                                    fixPar = c(NA, NA),
                                    theta = c(8,0),
                                    attempts = 100, retrySD = 5, retryFits = 5,
                                    ncores = 2, retryParallel = TRUE)

demo_dets <- data.frame(demo_dets$crwPredict)

Turn the movement model product into a data frame and label the predicted locations.

demo_dets <- demo_dets %>%
    dplyr::select(ID, locType, time, mu.x, mu.y, speed) %>%
    dplyr::mutate(locType = ifelse(is.na(locType), "p", locType)) %>%  #Specify predicted locs
    as.data.frame

demo_dets <- demo_dets %>%
    dplyr::select(ID, time, mu.x, mu.y) %>%
    dplyr::mutate(time = lubridate::ymd_hms(format(as.POSIXct(time, tz = "UTC"), "%Y-%m-%d %H:%M:%S", ID = as.factor(ID))))

Since the movement model does not recognize land, let’s re-route the parts that crossed the land barrier.

CRS <- 3857

load(system.file("extdata", "atlcoast.Rda", package = "sporeg"))

barrier <- atlcoast %>% sf::st_transform(., 3857)
vis_graph <- pathroutr::prt_visgraph(barrier)
buffer <- 650
  
tbuff650 <- sub_rrt(demo_dets, CRS, barrier, vis_graph, buffer)

# Check map
mapview::mapView(tbuff650)

Calculate grid counts

Here, we need to calculate the number of unique individuals in each grid cell so that we can determine the distribution of the population.

sg <- grid_res(50, study_site, epsg = 3857, what = "polygons")

subset_cts <- cts(sg, tbuff650)

How would it differ if we just used the daily summarized acoustic telemetry data to derive distribution information?

load(system.file("extdata", "at_dly_locs.Rda", package = "sporeg"))

sg <- grid_res(50, study_site, epsg = 3857, what = "polygons")

at_dly_locs_cts <- cts(sg, at_dly_locs)

What if we connected those data with straight lines?

load(system.file("extdata", "at_dly_locs.Rda", package = "sporeg"))

at_lines <- at_dly_locs %>%
  group_by(ID, time) %>%
  sf::st_transform(3857) %>%
  summarise(pt = sf::st_combine(geometry)) %>%
  sf::st_centroid() %>%
  mutate(lat = sf::st_coordinates(pt)[,2],
         lon = sf::st_coordinates(pt)[,1]) %>%
  arrange(ID, time) %>%
  mutate(start_x = lon, start_y = lat, end_x = lead(lon), end_y = lead(lat)) %>% #Prep data for making lines - ensures proper order
  sf::st_as_sf(coords = c("lon", "lat"), crs = 3857) %>%
  filter(!is.na(end_y)) %>%
  tidyr::nest() %>% 
  mutate(
    data = purrr::map(data, 
                      ~ mutate(.x,
                               x = purrr::pmap(.l = list(start_x, start_y, end_x, end_y),
                                               .f = make_line)))) %>%
  mutate(
      data = purrr::map(data,
                        ~ mutate(.x, x = sf::st_sfc(x))),
      x = purrr::map(data, ~ sf::st_union(sf::st_set_geometry(.x, 'x'))), ##Preserves order
      x = purrr::map(x, ~ sf::st_cast(.x, 'MULTILINESTRING')) 
    ) %>% 
    dplyr::select(-data) %>% 
    tidyr::unnest(x) %>% 
    sf::st_as_sf(sf_column_name = 'x', crs = 3857)
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.

sg <- grid_res(50, study_site, epsg = 3857, what = "polygons")

at_line_cts <- cts(sg, at_lines)

Map good fits over ecological data

Let’s merge the results that have a false origin with the distribution data so we know where we can rely on the results.

goodfts_50km <- merge((HS_50km_grid %>% as.data.frame() %>% dplyr::select(-x)), at_line_cts, by = "gid") %>%
  sf::st_as_sf()

Map reconstruction results

Let’s map the mean differences with the corrected map projection.

mapview::mapView(goodfts_50km, zcol = "avg_dif")

Note: Some acoustic telemetry users along the U.S. Eastern Seaboard have experienced theft of their equipment and so they are hesitant to share exact locations of receivers. For that reason, the real receiver locations in this vignette have been obscured with a false origin and the related shapefiles were altered in kind to visually match the obscured receiver locations. However, to get the actual environmental information required to make meaningful inferences about these results, we need to pull information from the actual study site. Therefore, we call in the actual study site for this section.↩︎
Note: Some acoustic telemetry users along the U.S. Eastern Seaboard have experienced theft of their equipment and so they are hesitant to share exact locations of receivers. For that reason, the real receiver locations in this vignette have been obscured with a false origin and the related shapefiles were altered in kind to visually match the obscured receiver locations. However, to get the actual environmental information required to make meaningful inferences about these results, we need to pull information from the actual study site. Therefore, we call in the actual study site for this section.↩︎

sporeg Intended Workflow