--- title: "VDAT data structure" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{VDAT data structure} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```r library(rvdat) ``` ## Row-by-row data description I'll start off the same way as in `vignette('how-to-use-rvdat')`: 1. Locating `vdat.exe` with `vdat_here` 2. Downloading the example VRL file from the `glatos` package 3. Converting the VRL to a CSV using `vdat_to_csv` ```r library(rvdat) td <- file.path( tempdir(), "vignette" ) dir.create( td ) vdat_here("c:/program files/innovasea/fathom/vdat.exe") download.file( url = file.path( "https://github.com/ocean-tracking-network/glatos/raw/dev/inst/extdata", "detection_files_raw/VR2AR_546310_20190613_1.vrl" ), destfile = file.path(td, "VR2AR_546310_20190613_1.vrl"), mode = "wb", quiet = TRUE ) vdat_to_csv(file.path(td, "VR2AR_546310_20190613_1.vrl"), outdir = td ) ``` ```r list.files(td, pattern = "VR2AR") #> [1] "VR2AR_546310_20190613_1.csv" "VR2AR_546310_20190613_1.vrl" ``` ## Exploring the VDAT CSV's data structure Now for the fun part: what does the csv look like? ```r exported_csv <- read.csv( file.path(td, "VR2AR_546310_20190613_1.csv") ) #> Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names ``` Uh oh! Something weird! `utils::read.csv` guesses how many columns there should be by looking at the first five lines of the data. Here, it took a guess and more columns showed up later. Let's skip that process by telling it that the first row does not contain column names by using `header = FALSE`. ```r exported_csv <- read.csv( file.path(td, "VR2AR_546310_20190613_1.csv"), header = FALSE ) head(exported_csv) #> V1 V2 #> 1 VEMCO DATA LOG 2.0.0 #> 2 RECORD TYPE FIELD #> 3 ATTITUDE_DESC Device Time (UTC) #> 4 BATTERY_DESC Device Time (UTC) #> 5 CFG_CHANNEL_DESC Device Time (UTC) #> 6 CFG_STATION_DESC Device Time (UTC) #> V3 V4 #> 1 vdat-9.13.2-20240607-7a2e9d-release #> 2 FIELD FIELD #> 3 Time Time Offset (h) #> 4 Time Time Offset (h) #> 5 Time Time Offset (h) #> 6 Time Time Offset (h) #> V5 V6 V7 V8 #> 1 #> 2 FIELD FIELD FIELD FIELD #> 3 Time Correction (s) Model Serial Number Tilt (deg) #> 4 Time Correction (s) Model Serial Number Battery Position #> 5 Time Correction (s) Model Serial Number Channel #> 6 Time Correction (s) Model Serial Number Station Name #> V9 V10 V11 #> 1 #> 2 FIELD FIELD FIELD #> 3 #> 4 Battery Type Battery Serial Number Battery Voltage (V) #> 5 Detection Type Frequency (kHz) Blanking (ms) #> 6 Latitude (deg) Longitude (deg) #> V12 V13 V14 V15 V16 V17 V18 V19 #> 1 #> 2 FIELD FIELD FIELD FIELD FIELD FIELD FIELD FIELD #> 3 #> 4 Battery Remaining (%) #> 5 Map ID Coding ID #> 6 #> V20 V21 V22 V23 V24 V25 V26 #> 1 #> 2 FIELD FIELD FIELD FIELD FIELD FIELD FIELD #> 3 #> 4 #> 5 #> 6 ``` I suggest that you skim through the data frame with the `View` function. You'll find that it has a few properties: #### The first row contains metadata about `vdat.exe`, itself. ```r exported_csv[1, ] #> V1 V2 V3 V4 V5 V6 #> 1 VEMCO DATA LOG 2.0.0 vdat-9.13.2-20240607-7a2e9d-release #> V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 #> 1 #> V24 V25 V26 #> 1 ``` While there are 26 columns as a separate row takes that many fields to describe its data structure (see the next section), only the first three are pertinent. 1. This is a VEMCO data log; 2. It is using version 2.0.0 of the schema; 3. It was created with some specific vdat version. a. Note that this does, in fact, match the version of `vdat` we are using. This can be found using `rvdat::vdat_version`. ```r vdat_version() #> vdat-9.13.2-20240607-7a2e9d-release ``` #### The next 20+ lines contain the column names of different subsets of the data. These subsets are by variable, such as tilt angle, temperature, and the like. They are designated as *descriptions* by appending `_DESC` to the variable name in the form `_DESC` (Description -- DESC. Get it?). This is the reason we couldn't import the CSV the first time -- there are a different number of columns for each variable, and thus the data was more jagged than `read.csv` expected. ```r exported_csv[2:5, ] #> V1 V2 V3 V4 #> 2 RECORD TYPE FIELD FIELD FIELD #> 3 ATTITUDE_DESC Device Time (UTC) Time Time Offset (h) #> 4 BATTERY_DESC Device Time (UTC) Time Time Offset (h) #> 5 CFG_CHANNEL_DESC Device Time (UTC) Time Time Offset (h) #> V5 V6 V7 V8 #> 2 FIELD FIELD FIELD FIELD #> 3 Time Correction (s) Model Serial Number Tilt (deg) #> 4 Time Correction (s) Model Serial Number Battery Position #> 5 Time Correction (s) Model Serial Number Channel #> V9 V10 V11 #> 2 FIELD FIELD FIELD #> 3 #> 4 Battery Type Battery Serial Number Battery Voltage (V) #> 5 Detection Type Frequency (kHz) Blanking (ms) #> V12 V13 V14 V15 V16 V17 V18 V19 #> 2 FIELD FIELD FIELD FIELD FIELD FIELD FIELD FIELD #> 3 #> 4 Battery Remaining (%) #> 5 Map ID Coding ID #> V20 V21 V22 V23 V24 V25 V26 #> 2 FIELD FIELD FIELD FIELD FIELD FIELD FIELD #> 3 #> 4 #> 5 ``` #### The next ~7 rows contain receiver configuration After a row designated as `DATA_SOURCE_FILE`, there are a few rows which outline the receiver's configuration; this file has 6 total. These are things like the initialization time, Map ID, whether an internal transmitter is turned on (and its delay), etc.. ```r first_data_row <- which(exported_csv$V1 == "DATA_SOURCE_FILE") exported_csv[first_data_row:(first_data_row + 6), ] #> V1 V2 #> 29 DATA_SOURCE_FILE VR2AR_546310_20190613_1.vrl #> 30 EVENT_INIT 2019-06-11 02:44:16 #> 31 CLOCK_REF 2019-06-11 02:44:16 #> 32 CFG_CHANNEL 2019-06-11 02:44:16 #> 33 CFG_TRANSMITTER 2019-06-11 02:44:16 #> 34 CFG_TRANSMITTER 2019-06-11 02:44:27 #> 35 CFG_TRANSMITTER 2019-06-11 02:44:34 #> V3 #> 29 ffee6ee7-450a-db42-b54c-334b672fddf5 #> 30 #> 31 #> 32 #> 33 #> 34 #> 35 #> V4 V5 V6 V7 #> 29 com.vemco.file.vrl.0207.ff02.ff02/5.0.1 110026 ORIGINAL #> 30 VR2AR-69 546310 #> 31 VR2AR-69 546310 #> 32 VR2AR-69 546310 #> 33 VR2AR-69 546310 #> 34 VR2AR-69 546310 #> 35 VR2AR-69 546310 #> V8 V9 V10 V11 V12 #> 29 #> 30 2019-06-11 02:44:11 UTC-04:00 5.0.1 #> 31 2019-06-11 02:44:16 0 INITIALIZATION #> 32 1 PPM 69 260 MAP-113 #> 33 PPM A69-1601-60774 60774 DISABLED 540 #> 34 PPM A69-1601-60774 60774 DISABLED 540 #> 35 PPM A69-1601-60774 60774 DISABLED 540 #> V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 #> 29 #> 30 #> 31 #> 32 #> 33 660 #> 34 660 #> 35 660 ``` #### Logged data! *"'And now, ' cried Max, 'let the wild rumpus start!'"* Depending on how long you've had your receiver deployed, you could have many thousands of rows after this. If you're logging environmental data you'll have on the order of four rows per minute deployed (temperature, depth, tilt, and general diagnostics), as well as one for every logged detection. ```r exported_csv[(first_data_row + 7):(first_data_row + 7 + 7), ] #> V1 V2 V3 V4 V5 V6 V7 V8 V9 #> 36 DIAG_FAST 2019-06-11 02:45:00 VR2AR-69 546310 17.064 150 #> 37 TEMP 2019-06-11 02:45:00 VR2AR-69 546310 17.064 #> 38 DEPTH 2019-06-11 02:45:00 VR2AR-69 546310 0 #> 39 ATTITUDE 2019-06-11 02:45:00 VR2AR-69 546310 2 #> 40 DIAG_FAST 2019-06-11 02:46:00 VR2AR-69 546310 17.129 161 #> 41 TEMP 2019-06-11 02:46:00 VR2AR-69 546310 17.129 #> 42 DEPTH 2019-06-11 02:46:00 VR2AR-69 546310 0 #> 43 ATTITUDE 2019-06-11 02:46:00 VR2AR-69 546310 3 #> V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 #> 36 2 0 #> 37 #> 38 #> 39 #> 40 3 0 #> 41 #> 42 #> 43 #> V26 #> 36 #> 37 #> 38 #> 39 #> 40 #> 41 #> 42 #> 43 ``` Diagnostic rows (`DIAG`) will have the hourly mean of `TEMP`, `DEPTH`, and `ATTITUDE` rows, as well as detections/pings. If you have fast diagnostics on, rows labeled with `DIAG_FAST` will contain the same per-minute data as `TEMP`/`DEPTH`/`ATTITUDE`. ```r exported_csv[(exported_csv$V1 == "DIAG"), ][1, ] #> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 #> 96 DIAG 2019-06-11 03:00:00 VR2AR-69 546310 17.744 150 87 #> V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 #> 96 0 0 0 ``` #### Detections are logged to the tenth of a millisecond That's $1\text{e-}4$ seconds. Whether that's accurate or just over-precise... I don't know. Detections are indicated by a row labeled with `DET`, followed by the raw logged time, time-corrected time if applicable, and information about the transmitter. ```r exported_csv[(exported_csv$V1 == "DET"), ][1, ] #> V1 V2 V3 V4 V5 V6 V7 V8 V9 #> 2567 DET 2019-06-11 13:03:32.0837 VR2AR-69 546310 1 PPM #> V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 #> 2567 A69-1601-57060 57060 #> V23 V24 V25 V26 #> 2567 ``` #### The last three rows are data associated with the download The internal transmitter configuration, time associated with the internal and downloading-computer clocks, and the download, itself, are all reported. ```r tail(exported_csv) #> V1 V2 V3 V4 V5 V6 V7 #> 16749 TEMP 2019-06-13 17:30:00 VR2AR-69 546310 #> 16750 DEPTH 2019-06-13 17:30:00 VR2AR-69 546310 #> 16751 ATTITUDE 2019-06-13 17:30:00 VR2AR-69 546310 #> 16752 CFG_TRANSMITTER 2019-06-13 17:30:34 VR2AR-69 546310 #> 16753 CLOCK_REF 2019-06-13 17:30:53 VR2AR-69 546310 #> 16754 EVENT_OFFLOAD 2019-06-13 17:30:53 VR2AR-69 546310 #> V8 V9 V10 V11 V12 V13 #> 16749 21.21 #> 16750 0 #> 16751 2 #> 16752 PPM A69-1601-60774 60774 DISABLED 540 660 #> 16753 2019-06-13 17:30:54 1 OFFLOAD #> 16754 2019-06-13 17:30:54 UTC-04:00 1328 99.6235 #> V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 #> 16749 #> 16750 #> 16751 #> 16752 #> 16753 #> 16754 VR2AR_546310_20190613_1.vrl #> V24 V25 V26 #> 16749 #> 16750 #> 16751 #> 16752 #> 16753 #> 16754 ```