--- title: "`matos` for the power user" date: 2023-22-02 output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{`matos` for the power user} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```r library(matos) ``` If you're a data manager for a lab that partakes in fish telemetry, you are likely balancing a number of projects at any one time. It can be pretty hard to keep track of what's new when an OTN data push occurs. For example, I have 12 projects for which I am responsible. ```r projects <- list_my_projects() head(projects) #> name number #> 33 Maryland Department of Natural Resources 90 #> 46 Navy Kennebec ME Telemetry Array 192 #> 51 NCBO-MD DNR Chesapeake Backbone North 181 #> 52 NCBO-VMRC Chesapeake Backbone South 164 #> 118 UMCES Black Sea Bass & Offshore Construction 97 #> 119 UMCES BOEM Marine Mammal Monitoring 242 #> url #> 33 https://matos.asascience.com/project/detail/90 #> 46 https://matos.asascience.com/project/detail/192 #> 51 https://matos.asascience.com/project/detail/181 #> 52 https://matos.asascience.com/project/detail/164 #> 118 https://matos.asascience.com/project/detail/97 #> 119 https://matos.asascience.com/project/detail/242 ``` ## Parallel I like to use the [`future` family of packages](https://www.futureverse.org/packages-overview.html) to run things in parallel specifically, [`future.apply`](https://future.apply.futureverse.org/). When you get quite a few projects, this speeds up pulling your files from MATOS quite a bit. ## Listing ```r library(future.apply) plan(multisession) # List files in all of my projects extraction_files <- future_lapply(projects$number, function(x){ list_extract_files(x) } ) # Bind together into one data frame extraction_files <- do.call(rbind, extraction_files) head(extraction_files) #> project file_type detection_type detection_year #> 1 90 Data Extraction File matched 2015 #> 2 90 Data Extraction File matched 2016 #> 3 90 Data Extraction File matched 2017 #> 4 90 Data Extraction File matched 2018 #> 5 90 Data Extraction File matched 2019 #> 6 90 Data Extraction File matched 2020 #> upload_date file_name #> 1 2023-03-21 mddnr1nr_matched_detections_2015.zip #> 2 2023-03-21 mddnr1nr_matched_detections_2016.zip #> 3 2023-03-21 mddnr1nr_matched_detections_2017.zip #> 4 2023-03-21 mddnr1nr_matched_detections_2018.zip #> 5 2023-03-21 mddnr1nr_matched_detections_2019.zip #> 6 2023-07-06 mddnr1nr_matched_detections_2020.zip #> url #> 1 https://matos.asascience.com/projectfile/downloadExtraction/90_1 #> 2 https://matos.asascience.com/projectfile/downloadExtraction/90_2 #> 3 https://matos.asascience.com/projectfile/downloadExtraction/90_3 #> 4 https://matos.asascience.com/projectfile/downloadExtraction/90_4 #> 5 https://matos.asascience.com/projectfile/downloadExtraction/90_5 #> 6 https://matos.asascience.com/projectfile/downloadExtraction/90_6 ``` That's 142 files of which I need to keep track! It really adds up. ## Downloading If we want to download all of those files, we can do something similar. We just need to change the function we're running in parallel to `get_extract_file` and provide it the URLs from the list we made via `list_extract_files`. I'll download the first three files for demonstration purposes. ```r future_lapply(extraction_files$url[1:3], function(x){ get_extract_file(url = x) } ) #> C:\Users\darpa2\Analysis\matos\vignettes\mddnr1nr_matched_detections_2015.zip #> C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2015.csv #> C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt #> #> ── Downloading files ────────────────────────────────────────────── #> ✔ File(s) saved to: #> #> ── Unzipping files ──────────────────────────────────────────────── #> ✔ File(s) unzipped to: #> C:\Users\darpa2\Analysis\matos\vignettes\mddnr1nr_matched_detections_2016.zip #> C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2016.csv #> C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt #> #> ── Downloading files ────────────────────────────────────────────── #> ✔ File(s) saved to: #> #> ── Unzipping files ──────────────────────────────────────────────── #> ✔ File(s) unzipped to: #> C:\Users\darpa2\Analysis\matos\vignettes\mddnr1nr_matched_detections_2017.zip #> C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2017.csv #> C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt #> #> ── Downloading files ────────────────────────────────────────────── #> ✔ File(s) saved to: #> #> ── Unzipping files ──────────────────────────────────────────────── #> ✔ File(s) unzipped to: #> [[1]] #> [1] "C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2015.csv" #> [2] "C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt" #> #> [[2]] #> [1] "C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2016.csv" #> [2] "C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt" #> #> [[3]] #> [1] "C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2017.csv" #> [2] "C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt" ``` ## Summarizing We can do the same by looping through receiver and transmitter push summaries. For me, this will create 24 reports! Still a lot, but quite a bit easier to digest than millions of detections spread over 142 files. ```r future_lapply(projects$number[1:2], function(x){ matos_receiver_summary(x) } ) future_lapply(projects$number[1:2], function(x){ matos_tag_summary(x) } ) ```