--- title: "Introduction to swfscDAS" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{swfscDAS} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message=FALSE} library(dplyr) library(stringr) library(swfscDAS) ``` This document introduces you to the swfscDAS package, and specifically its functionality and workflow. This package is intended to standardize and streamline processing of DAS data. It was designed for shipboard DAS data generated using the WinCruz software from the Southwest Fisheries Science Center. However, the DAS files and the `swfscDAS` package are and can be used by others, e.g. to process decades of NOAA Fisheries survey data. In DAS data, an event is only recorded when something changes or happens, which can complicate processing. Thus, the main theme of this package is enabling analyses and downstream processing by 1) determining associated state and condition information for each event and 2) pulling out event-specific information from the Data (Field) columns. ## Data This package includes a sample DAS file, which we will use in this document ```{r data} y <- system.file("das_sample.das", package = "swfscDAS") head(readLines(y)) ``` ## Check data format The first step in processing DAS data is to ensure that the DAS file has expected formatting and values. This package contains the `das_check` function, which performs some basic checks. This function is a precursor to a more comprehensive DASCHECK program, which is currently in development. The checks performed by this function are detailed in the function documentation, which can be accessed by running `?das_check`. You can find the PDF with the expected DAS data format at https://swfsc.github.io/swfscDAS/, or see `?das_format_pdf` for how to access a local copy. To check for valid species codes using this function, you can pass the function an SpCodes.dat file ```{r check, eval=FALSE} # Code not run y.check <- das_check(y, skip = 0, print.cruise.nums = TRUE) ``` ## Read and process data Once QA/QC is complete and you have fixed any data entry errors, you can begin to process the DAS data. The backbone of this package is the reading and processing steps: 1) the data from the DAS file are read into the columns of a data frame and 2) state and condition information are extracted for each event. This means that after processing, you can simply look at any event (row) and determine the Beaufort, viewing conditions, etc., at the time of the event. All other functions in the package depend on the DAS data being in this processed state. One other processing note is that the 'DateTime' column in the `das_read` output has a time zone of "UTC" - this default time zone value is meaningless and only present because POSIXct vectors must have exactly one time zone. If you need to do conversions based on time zone, then use the 'OffsetGMT' column and `force_tz` and `with_tz` from the `lubridate` package. One reason to do such conversions would be if data was collected in a non-local time zone that causes the daily recorded effort to span more than one day, since then 'OnEffort' values will be incorrect because `das_process` resets the event sequence to off effort at the beginning of a new day. ```{r readproc} # Read y.read <- das_read(y, skip = 0) glimpse(y.read) # Process y.proc <- das_process(y) glimpse(y.proc) # Note that das_read can read multiple files at once y2.read <- das_read(c(y, y)) ``` Once you have processed the DAS data, you can easily access a variety of information. For instance, you can look at the different events or Beaufort values that occurred in the data, or filter for specific events to get the beginning and ending points of each effort section. ```{r readprocother} # The number of each event table(y.proc$Event) # The number of events per Beaufort value table(y.proc$Bft) # Filter for R and E events to extract lat/lon points y.proc %>% filter(Event %in% c("R", "E")) %>% select(Event, Lat, Lon, Cruise, Mode, EffType) %>% head() ``` ## Sightings The `swfscDAS` package does contain specific functions for extracting and/or summarizing particular information from the processed data. First is `das_sight`, a function that returns a data frame with pertinent sighting data pulled out to their own columns. Due to the different data collected for each type of sighting and the different needs of end-users, this function offers several different return formats. For the "default" format there is one row for each species of each sighting, for the "wide" format, there is one row for each sighting event, and for the "complete" format there is one row for every group size estimate for each sighting. See `?das_sight` for more details. The "complete" format is intended for users looking to do observer corrections, or to combine estimates using a different method than the arithmetic mean (e.g. the geometric mean) ```{r sight} y.sight <- das_sight(y.proc, return.format = "default") y.sight %>% select(Event, SightNo:PerpDistKm) %>% glimpse() y.sight.wide <- das_sight(y.proc, return.format = "wide") y.sight.wide %>% select(Event, SightNo:PerpDistKm) %>% glimpse() y.sight.complete <- das_sight(y.proc, return.format = "complete") y.sight.complete %>% select(Event, SightNo:PerpDistKm) %>% glimpse() ``` You can also easily filter or subset the sighting data for the desired event code(s) ```{r sight2} y.sight.sg <- das_sight(y.proc, return.events = c("S", "G")) # Note that this is equivalent to: y.sight.sg2 <- das_sight(y.proc) %>% filter(Event %in% c("S", "G")) ``` ## Effort In addition, you can chop the effort data into segments, and summarize the conditions and sightings on those segments using `das_effort` and `das_effort_sight`. These effort segments can be used for line transect estimates using the Distance software, species distribution modeling, or summarizing the number of sightings of certain species on each segment, among other uses. `das_effort` chops continuous effort sections (the event sequence from R to E events) into effort segments using one of several different chopping methods: condition (a new effort segment every time a condition changes), equal length (effort segments of equal length), or section (each segment is a full continuous effort section, i.e. it runs from an R event to an E event). `das_effort_sight` takes the output of `das_effort` and returns the number of included sightings and animals per segment for specified species codes. Both functions return a list of three data frames: segdata, sightinfo, and randpicks. These data frames and the different chopping methodologies are described in depth in the function documentation (`?das_effort` and `?das_effort_sight`), but briefly segdata contains information about each effort segment, sightinfo contains information about the sightings such as their corresponding segment, and randpicks contains information specific to the 'equal length' chopping method. `das_effort` and `das_effort_sight` are separate functions to allow the user more control over which sightings should be included in the effort segment summaries (see `?das_effort`). See below for how to chop/split effort lines by strata. ```{r eff} # Chop the effort into 10km segments y.eff.eq <- das_effort( y.proc, method = "equallength", seg.km = 10, dist.method = "greatcircle", num.cores = 1 ) # Chop the effort every time a condition changes y.eff <- das_effort( y.proc, method = "condition", seg.min.km = 0, dist.method = "greatcircle", conditions = c("Bft", "SwellHght", "Vis"), num.cores = 1 ) y.eff.sight <- das_effort_sight(y.eff, sp.codes = c("018", "076")) glimpse(y.eff.sight$segdata) glimpse(y.eff.sight$sightinfo) ``` ## Strata This package contains several methods to incorporate strata into your DAS data processing: `das_intersects_strata` for assigning points to strata and `das_effort` for chopping effort lines by strata. These functions contain a 'strata.files' argument, which expects a (named) list of paths to CSV files; names are automatically generated if not provided. These files must have headers, longitude values in column one and latitude values in column two, and be closed polygons. First, `das_intersects_strata` allows you to add columns to data frames indicating if a point intersects one or more strata polygons. You can pass this function either a data frame or a list. If a list then it must be the output of `das_effort` or `das_effort_sight`, and the function will use the segment midpoints to determine if a segment (and associated sightings) intersected each stratum. ```{r strata_int} stratum.file <- system.file("das_sample_stratum.csv", package = "swfscDAS") y.eff.sight.strata <- das_intersects_strata(y.eff.sight, list(InPoly = stratum.file)) glimpse(y.eff.sight.strata$segdata) ``` In addition, you can chop/split effort lines by strata using the 'strata.files' argument of `das_effort`. Effort lines are chopped/split by strata before then being processed using the specified method (condition, equal length, etc.). The 'segdata' element of the `das_effort` output will contain a 'stratum' column, which contains the name of the stratum that the segment is in. See `?das_effort` and `?das_effort_strata` for more details. ```{r strata_eff} y.eff.strata.section <- das_effort( y.proc, method = "section", strata.files = list(stratum.file), num.cores = 1 ) y.eff.strata.condition <- das_effort( y.proc, method = "condition", seg.min.km = 0, strata.files = list(Poly1 = stratum.file), num.cores = 1 ) glimpse(y.eff.strata.section$segdata) ``` ## Comments In addition, you can use `das_comments` to generate comment strings. This is particularly useful if looking for comments with keywords ```{r comm} y.comm <- das_comments(y.proc) glimpse(select(y.comm, Event, line_num, comment_str)) y.comm[str_detect(y.comm$comment_str, "gear"), ] #Could also use grepl() here ```