Title: | Processing DAS Data Files |
---|---|
Description: | Process and summarize DAS data files. These files are typically, but do not have to be DAS <https://swfsc-publications.fisheries.noaa.gov/publications/TM/SWFSC/NOAA-TM-NMFS-SWFSC-305.PDF> data produced by the Southwest Fisheries Science Center (SWFSC) program 'WinCruz'. This package standardizes and streamlines basic DAS data processing, and includes a PDF with the DAS data format requirements expected by the package. |
Authors: | Sam Woodman [aut, cre] |
Maintainer: | Sam Woodman <[email protected]> |
License: | Apache License (== 2) |
Version: | 0.6.3 |
Built: | 2025-01-13 05:05:25 UTC |
Source: | https://github.com/swfsc/swfscdas |
Process and summarize shipboard DAS data
This package contains functions designed for processing and analyzing DAS data generated using the WinCruz program by the Southwest Fisheries Science Center. It is intended to standardize and streamline basic DAS data processing.
Sam Woodman [email protected]
https://swfsc.github.io/swfscDAS/
Check if an object is of class das_df
, or coerce it if possible.
as_das_df(x) ## S3 method for class 'das_df' as_das_df(x) ## S3 method for class 'data.frame' as_das_df(x)
as_das_df(x) ## S3 method for class 'das_df' as_das_df(x) ## S3 method for class 'data.frame' as_das_df(x)
x |
an object to be coerced to class |
Only data frames can be coerced to an object of class das_df
.
If x
does not have column names and classes as specified in das_df-class
,
then the function returns an error message detailing the first column that does not
meet the requirements of a das_df
object.
An object of class 'das_df'
Check if an object is of class das_dfr
, or coerce it if possible.
as_das_dfr(x) ## S3 method for class 'das_dfr' as_das_dfr(x) ## S3 method for class 'data.frame' as_das_dfr(x)
as_das_dfr(x) ## S3 method for class 'das_dfr' as_das_dfr(x) ## S3 method for class 'data.frame' as_das_dfr(x)
x |
an object to be coerced to class |
Only data frames can be coerced to an object of class das_dfr
.
If x
does not have column names and classes as specified in das_dfr-class
,
then the function returns an error message detailing the first column that does not
meet the requirements of a das_dfr
object.
An object of class 'das_dfr'
Check that DAS file has accepted formatting and values
das_check( file, skip = 0, file.out = NULL, sp.codes = NULL, print.cruise.nums = TRUE )
das_check( file, skip = 0, file.out = NULL, sp.codes = NULL, print.cruise.nums = TRUE )
file |
filename(s) of one or more DAS files |
skip |
integer: see |
file.out |
filename to which to write the error log;
default is |
sp.codes |
character; filename of .dat file from which to read
accepted species codes. If |
print.cruise.nums |
logical; indicates if a table with all the
cruise numbers in the |
Precursor to a more comprehensive DASCHECK program. This function checks that the following is true:
Event codes are one of the following: #, *, ?, 1, 2, 3, 4, 5, 6, 7, 8, A, B, C, E, F, k, K, N, P, Q, r, R, s, S, t, V, W, g, G, p, X, Y, Z
Latitude values are between -90 and 90 (inclusive; NA values are ignored)
Longitude values are between -180 and 180 (inclusive; NA values are ignored)
The effort dot matches effort determined using B, R, and E events
There are an equal number of R and E events, and they alternate occurrences
A BR event series or R event does not occur while already on effort
An E event does not occur while already off effort
All Data# columns for non-C events are right-justified
Only C events have data past the 99th column in the DAS file
The following events have NA (blank) Data# columns: *
All of *, B, R, E, V, W, N, P, and Q events have NA Data# columns where specified (see format pdf for more details)
Event/column pairs meet the following requirements:
Item | Event | Column | Requirement |
Cruise number | B | Data1 | Can be converted to a numeric value |
Mode | B | Data2 | Must be one of C, P, c, p, or NA (blank) |
Echo sounder | B | Data4 | Must be one of Y, N, y, n, or NA (blank) |
Effort type | R | Data1 | Must be one of F, N, S, or NA (blank) |
ESW sides | R | Data2 | Effective strip width; must be one of F, H, or NA (blank) |
Course | N | Data1 | Can be converted to a numeric value |
Speed | N | Data2 | Can be converted to a numeric value |
Beaufort | V | Data1 | Must be a whole number between 0 and 9 |
Swell height | V | Data2 | Can be converted to a numeric value |
Wind speed | V | Data5 | Can be converted to a numeric value |
Rain or fog | W | Data1 | Must be between 0 and 5 and either a whole number or have decimal value .5 |
Horizontal sun | W | Data2 | Must be a whole number between 0 and 12 |
Vertical sun | W | Data3 | Must be a whole number between 0 and 12 |
Visibility | W | Data5 | Can be converted to a numeric value |
Sighting (mammal) | S, K, M | Data3-7 | Can be converted to a numeric value |
Sighting (mammal) | G | Data5-7 | Can be converted to a numeric value |
Sighting cue (mammal) | S, K, M | Data3 | Must be a whole number between 1 and 6 |
Sighting method (mammal) | S, K, M, G | Data4 | Must be a whole number between 1 and 7 |
Bearing (mammal) | S, K, M, G | Data5 | Must be a whole number between 0 and 360 |
Photos | A | Data3 | Must be one of N, Y, n, y, or NA (blank) |
Birds | A | Data4 | Must be one of N, Y, n, y, or NA (blank) |
Calibration school | S, K, M | Data10 | Must be one of N, Y, n, y, or NA (blank) |
Aerial photos taken | S, K, M | Data11 | Must be one of N, Y, n, y, or NA (blank) |
Biopsy taken | S, K, M | Data12 | Must be one of N, Y, n, y, or NA (blank) |
Species codes | A | Data5-8 | If a species codes file is provided, must be one of the provided codes |
Resight | s, k | Data2-5 | Can be converted to a numeric value |
Turtle species | t | Data2 | If a species codes file is provided, must be one of the provided codes |
Turtle sighting | t | Data3-5, 7 | Can be converted to a numeric value |
Turtle JFR | t | Data6 | Must be one of F, J, N, R, or NA (blank) |
Fishing vessel | F | Data2-4 | Can be converted to a numeric value |
Sighting info | 1-8 | Data2-8 | Can be converted to a numeric value |
Sighting info | 1-8 | Data9 | The Data9 column must be NA (blank) for events 1-8 |
In the table above, 'between' means inclusive.
Long-term items, and checks that are not performed:
Check that datetimes are sequential, meaning they 1) are the same as or 2) come after the previous event
Check that A events only come immediately after a G/S/K/M event, and all G/S/K/M events have an A after them. And that each has at least one group size estimate (1:8 event)
A data frame with columns: the file name, line number, cruise number, 'ID' (columns 4-39 from the DAS file), and description of the issue
If file.out
is not NULL
, then the error log data frame is also
written to file.out
using write.csv
A warning is printed if any events are r events; see das_process
for details about r events
y <- system.file("das_sample.das", package = "swfscDAS") if (interactive()) das_check(y)
y <- system.file("das_sample.das", package = "swfscDAS") if (interactive()) das_check(y)
Chop DAS data into a new effort segment every time a specified condition changes
das_chop_condition(x, ...) ## S3 method for class 'data.frame' das_chop_condition(x, ...) ## S3 method for class 'das_df' das_chop_condition( x, conditions, seg.min.km = 0.1, distance.method = NULL, num.cores = NULL, ... )
das_chop_condition(x, ...) ## S3 method for class 'data.frame' das_chop_condition(x, ...) ## S3 method for class 'das_df' das_chop_condition( x, conditions, seg.min.km = 0.1, distance.method = NULL, num.cores = NULL, ... )
x |
an object of class |
... |
ignored |
conditions |
the conditions that trigger a new segment;
see |
seg.min.km |
numeric; minimum allowable segment length (in kilometers). Default is 0.1. See the Details section below for more information |
distance.method |
character; see |
num.cores |
see |
WARNING - do not call this function directly! It is exported for documentation purposes, but is intended for internal package use only.
This function is intended to be called by das_effort
when the "condition" method is specified.
Thus, x
must be filtered for events (rows) where either
the 'OnEffort' column is TRUE
or the 'Event' column is "E";
see das_effort
for more details.
This function chops each continuous effort section (henceforth 'effort sections')
in x
into modeling segments (henceforth 'segments') by
creating a new segment every time a specified condition changes.
Each effort section runs from an "R" event to its corresponding "E" event.
After chopping, das_segdata
is called
(with segdata.method = "maxdist"
)
to get relevant segdata information for each segment.
Changes in the one of the conditions specified in the conditions
argument triggers a new segment.
One exception is if the event at which this condition change occurs is part of an event series,
meaning one of several events in a row at the same lat/lon points (such as a PVNW event series).
In this situation, the final event of the event series is considered the last event
of the current effort segment, and thus also the start of the next effort segment.
Related, when multiple condition changes happen at the same lat/lon points,
such as a "RPVNW" series of events at the beginning of the effort section.
When this happens, no segments of length zero are created;
rather, a single segment is created that includes all of the condition changes
(i.e. all of the events in the event series) that happened during
the series of events (i.e. at the same location).
Note that this combining of events at the same position happens
even if seg.min.km = 0
.
In addition, (almost) all segments whose length is less than seg.min.km
are combined with the segment immediately following them to ensure that the length
of (almost) all segments is at least seg.min.km
.
This allows users to account for situations where multiple conditions,
such as Beaufort and the visibility, change in rapid succession, for instance <0.1 km apart.
When segments are combined, a message is printed, and the condition that was
recorded for the maximum distance within the new segment is reported.
See das_segdata
, segdata.method = "maxdist"
, for more details
about how the segdata information is determined.
The only exception to this rule is if the short segment ends in an "E" event,
meaning it is the last segment of the effort section.
Since in this case there is no 'next' segment,
this short segment is left as-is.
If the column dist_from_prev
does not exist, the distance between
subsequent events is calculated as described in das_effort
List of two data frames:
x
, with columns added for the corresponding unique segment code and number
segdata: data frame with one row for each segment, and columns with
relevant data (see das_effort
for specifics)
Chop DAS data into approximately equal-length effort segments, averaging conditions by segment
das_chop_equallength(x, ...) ## S3 method for class 'data.frame' das_chop_equallength(x, ...) ## S3 method for class 'das_df' das_chop_equallength( x, conditions, seg.km, randpicks.load = NULL, distance.method = NULL, num.cores = NULL, ... )
das_chop_equallength(x, ...) ## S3 method for class 'data.frame' das_chop_equallength(x, ...) ## S3 method for class 'das_df' das_chop_equallength( x, conditions, seg.km, randpicks.load = NULL, distance.method = NULL, num.cores = NULL, ... )
x |
an object of class |
... |
ignored |
conditions |
see |
seg.km |
numeric; target segment length in kilometers |
randpicks.load |
character, data frame, or |
distance.method |
character; see |
num.cores |
see |
WARNING - do not call this function directly! It is exported for documentation purposes, but is intended for internal package use only.
This function is intended to be called by das_effort
when the "equallength" method is specified.
Thus, x
must be filtered for events (rows) where either
the 'OnEffort' column is TRUE
or the 'Event' column is "E";
see das_effort
for more details.
This function chops each continuous effort section (henceforth 'effort sections')
in x
into modeling segments (henceforth 'segments') of equal length.
Each effort section runs from an "R" event to its corresponding "E" event.
After chopping, das_segdata
is called to get relevant
segdata information for each segment.
When chopping the effort sections in segments of length seg.km
,
there are several possible scenarios:
The extra length remaining after chopping is greater than or equal to
half of the target segment length (i.e. >= 0.5*seg.km
):
the extra length is assigned to a random portion of the effort section as its own segment
(see Fig. 1a)
The extra length remaining after chopping is less than half of the
target segment length (i.e. < 0.5*seg.km
):
the extra length is added to one of the (randomly selected) equal-length segments
(see Fig. 1b)
The length of the effort section is less than or equal to the target segment length: the entire segment becomes a segment (see Fig. 1c)
The length of the effort section is zero: a segment of length zero. If there are more than two events (the "B"/R" and "E" events), the function throws a warning
Therefore, the length of each segment is constrained to be between
one half and one and one half of seg.km
(i.e. 0.5*seg.km <=
segment length >=1.5*seg.km
),
and the central tendency is approximately equal to the target segment length.
The only exception is when a continuous effort section is less than
one half of the target segment length (i.e. < 0.5*seg.km
;
see Fig. 1c).
Note the PDF with Figs. 1a - 1c is included in the package, and can be found at:
system.file("DAS_chop_equal_figures.pdf", package = "swfscDAS")
'Randpicks' is a record of the random assignments that were made when
chopping the effort sections into segments, and can be saved to allow
users to recreate the same random allocation of extra km when chopping.
The randpicks returned by this function is a data frame with two columns:
the number of the effort section and the randpick value.
Users should save the randpicks output to a CSV file,
which then can be specified using the randpicks.load
argument
to recreate the same effort segments from x
(i.e., using the same DAS data) in the future.
Note that when saving with write.csv
, users must
specify row.names = FALSE
so that the CSV file only has two columns.
For an example randpicks file, see
system.file("das_sample_randpicks.csv", package = "swfscDAS")
If the column dist_from_prev
does not exist, the distance between
subsequent events is calculated as described in das_effort
List of three data frames:
x
, with columns added for the corresponding unique segment code and number
segdata: data frame with one row for each segment, and columns with
relevant data (see das_effort
for specifics)
randpicks: data frame with record of length allocations (see Details section above)
Chop DAS data into effort segments by continuous effort section
das_chop_section(x, ...) ## S3 method for class 'data.frame' das_chop_section(x, ...) ## S3 method for class 'das_df' das_chop_section(x, conditions, distance.method = NULL, num.cores = NULL, ...)
das_chop_section(x, ...) ## S3 method for class 'data.frame' das_chop_section(x, ...) ## S3 method for class 'das_df' das_chop_section(x, conditions, distance.method = NULL, num.cores = NULL, ...)
x |
an object of class |
... |
ignored |
conditions |
see |
distance.method |
character; see |
num.cores |
see |
WARNING - do not call this function directly! It is exported for documentation purposes, but is intended for internal package use only.
This function is simply a wrapper for das_chop_equallength
.
It calls das_chop_equallength
, with seg.km
set to a
value larger than the longest continuous effort section in x
.
Thus, the effort is 'chopped' into the continuous effort sections and then summarized.
See the Examples section for an example where the two methods give the same output. Note that the longest continuous effort section in the sample data is ~22km.
See das_chop_equallength
. The randpicks values will all be NA
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) y.eff1 <- das_effort(y.proc, method = "equallength", seg.km = 25, num.cores = 1) y.eff2 <- das_effort(y.proc, method = "section", num.cores = 1) all.equal(y.eff1, y.eff2)
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) y.eff1 <- das_effort(y.proc, method = "equallength", seg.km = 25, num.cores = 1) y.eff2 <- das_effort(y.proc, method = "section", num.cores = 1) all.equal(y.eff1, y.eff2)
Extract comments from DAS data
das_comments(x) ## S3 method for class 'data.frame' das_comments(x) ## S3 method for class 'das_df' das_comments(x) ## S3 method for class 'das_dfr' das_comments(x)
das_comments(x) ## S3 method for class 'data.frame' das_comments(x) ## S3 method for class 'das_df' das_comments(x) ## S3 method for class 'das_dfr' das_comments(x)
x |
an object of class |
This function recreates the comment strings by pasting the Data# columns back together for the C events (comments). See the examples section for how to search for comments with certain phrases
x
, filtered for C events and with the added column
comment_str containing the concatenated comment strings
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) das_comments(y.proc) # Extract all comments containing "record" - could also use stringr pacakge y.comm <- das_comments(y.proc) y.comm[grepl("record", y.comm$comment_str, ignore.case = TRUE), ] # Join comments with processed data dplyr::left_join(y.proc, y.comm[, c("file_das", "line_num", "comment_str")], by = c("file_das", "line_num"))
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) das_comments(y.proc) # Extract all comments containing "record" - could also use stringr pacakge y.comm <- das_comments(y.proc) y.comm[grepl("record", y.comm$comment_str, ignore.case = TRUE), ] # Join comments with processed data dplyr::left_join(y.proc, y.comm[, c("file_das", "line_num", "comment_str")], by = c("file_das", "line_num"))
das_df
classThe das_df
class is a subclass of data.frame
,
created to provide a concise and robust way to ensure that the input to downstream DAS processing functions,
such as das_sight
, adheres to certain requirements.
Specifically, objects of class das_df
are data frames with specific column names and classes,
as detailed in the 'Properties of das_df
' section. Objects of class das_df
are created by
das_process
or as_das_df
, and are intended to be passed directly to
DAS processing functions such as das_sight
.
Subsetting, say for a specific date or cruise number, or otherwise altering an object of class das_df
will cause the object to drop its das_df
class attribute.
If this object is then passed to a DAS processing function such as das_sight
, the function
will try to coerce the object to a das_df
object.
das_df
objectsAll values in the Event column must not be NA
.
Objects of class das_df
have a class attribute of c("das_df", "data.frame")
.
In addition, they must have the following column names and classes:
Column name | Column class |
Event | "character" |
DateTime | c("POSIXct", "POSIXt") |
Lat | "numeric" |
Lon | "numeric" |
OnEffort | "logical" |
Cruise | "numeric" |
Mode | "character" |
EffType | "character" |
Course | "numeric" |
SpdKt | "numeric" |
Bft | "numeric" |
SwellHght | "numeric" |
WindSpdKt | "numeric" |
RainFog | "numeric" |
HorizSun | "numeric" |
VertSun | "numeric" |
Glare | "logical" |
Vis | "numeric" |
ObsL | "character" |
Rec | "character" |
ObsR | "character" |
ObsInd | "character" |
Data1 | "character" |
Data2 | "character" |
Data3 | "character" |
Data4 | "character" |
Data5 | "character" |
Data6 | "character" |
Data7 | "character" |
Data8 | "character" |
Data9 | "character" |
Data10 | "character" |
Data11 | "character" |
Data12 | "character" |
EffortDot | "logical" |
EventNum | "integer" |
file_das | "character" |
line_num | "integer" |
das_dfr
classThe das_dfr
class is a subclass of data.frame
,
created to provide a concise and robust way to ensure that the input to das_process
adheres to certain requirements.
Specifically, objects of class das_dfr
are data frames with specific column names and classes,
as detailed in the 'Properties of das_dfr
' section. Objects of class das_dfr
are created by
das_read
or as_das_dfr
, and are intended to be passed directly to
das_process
.
Subsetting or otherwise altering an object of class das_dfr
will cause the object to drop its
das_dfr
class attribute. das_process
will then try to coerce the object to a
das_dfr
object. It is strongly recommended to pass an object of class das_dfr
to
das_process
before subsetting, e.g. for events from a certain date range.
das_dfr
objectsObjects of class das_dfr
have a class attribute of c("das_dfr", "data.frame")
.
In addition, they must have the following column names and classes:
Column name | Column class |
Event | "character" |
EffortDot | "logical" |
DateTime | c("POSIXct", "POSIXt") |
Lat | "numeric" |
Lon | "numeric" |
Data1 | "character" |
Data2 | "character" |
Data3 | "character" |
Data4 | "character" |
Data5 | "character" |
Data6 | "character" |
Data7 | "character" |
Data8 | "character" |
Data9 | "character" |
Data10 | "character" |
Data11 | "character" |
Data12 | "character" |
EventNum | "integer" |
file_das | "character" |
line_num | "integer" |
Chop DAS data into effort segments
das_effort(x, ...) ## S3 method for class 'data.frame' das_effort(x, ...) ## S3 method for class 'das_df' das_effort( x, method = c("condition", "equallength", "section"), conditions = NULL, strata.files = NULL, distance.method = c("greatcircle", "lawofcosines", "haversine", "vincenty"), seg0.drop = FALSE, comment.drop = FALSE, event.touse = NULL, num.cores = NULL, ... )
das_effort(x, ...) ## S3 method for class 'data.frame' das_effort(x, ...) ## S3 method for class 'das_df' das_effort( x, method = c("condition", "equallength", "section"), conditions = NULL, strata.files = NULL, distance.method = c("greatcircle", "lawofcosines", "haversine", "vincenty"), seg0.drop = FALSE, comment.drop = FALSE, event.touse = NULL, num.cores = NULL, ... )
x |
an object of class |
... |
arguments passed to the specified chopping function,
such as |
method |
character; method to use to chop DAS data into effort segments Can be "condition", "equallength", "section", or any partial match thereof (case sensitive) |
conditions |
character vector of names of conditions to include in segdata output.
These values must be column names from the output of |
strata.files |
list of path(s) of the CSV file(s) with points defining each stratum.
The CSV files must contain headers and be a closed polygon.
The list should be named; see the Details section.
If |
distance.method |
character; method to use to calculate distance between lat/lon coordinates. Can be "greatcircle", "lawofcosines", "haversine", "vincenty", or any partial match thereof (case sensitive). Default is "greatcircle" |
seg0.drop |
logical; flag indicating whether or not to drop segments
of length 0 that contain no sighting (S, K, M, G, t) events.
Default is |
comment.drop |
logical; flag indicating if comments ("C" events)
should be ignored (i.e. position information should not be used)
when segment chopping. Default is |
event.touse |
character vector of events to use to determine
segment lengths; overrides |
num.cores |
Number of CPUs to over which to distribute computations.
Defaults to |
This is the top-level function for chopping processed DAS data
into modeling segments (henceforth 'segments'), and assigning sightings
and related information (e.g., weather conditions) to each segment.
This function returns data frames with all relevant information for the
effort segments and associated sightings ('segdata' and 'sightinfo', respectively).
Before chopping, the DAS data is filtered for events (rows) where either
the 'OnEffort' column is TRUE
or the 'Event' column "E".
In other words, the data is filtered for continuous effort sections (henceforth 'effort sections'),
where effort sections run from "R" to "E" events (inclusive),
and then passed to the chopping function specified using method
.
Note that while B events immediately preceding an R are on effort,
they are ignored during effort chopping.
In addition, all on effort events (other than ? and numeric events)
with NA
DateTime, Lat, or Lon values are verbosely removed.
If strata.files
is not NULL
, then the effort lines
will be split by the user-provided stratum (strata).
In this case, a column 'stratum' will be added to the end of the segdata
data frame with the user-provided name of the stratum that the segment was in,
or NA
if the segment was not in any of the strata.
If no name was provided for the stratum in strata.files
,
then the value will be "Stratum#",
where "#" is the index of the applicable stratum in strata.files
.
While the user can provide as many strata as they want,
these strata can share boundaries but they cannot overlap.
See das_effort_strata
for more details.
The following chopping methods are currently available:
"condition", "equallength", and "section.
When using the "condition" method, effort sections are chopped
into segments every time a condition changes,
thereby ensuring that the conditions are consistent across the entire segment.
See das_chop_condition
for more details about this method,
including arguments that must be passed to it via the argument ...
The "equallength" method consists of
chopping effort sections into equal-length segments of length seg.km
,
and doing a weighted average of the conditions for the length of that segment.
See das_chop_equallength
for more details about this method,
including arguments that must be passed to it via the argument ...
The "section" method involves 'chopping' the effort into continuous effort sections,
i.e. each continuous effort section is a single effort segment.
See das_chop_section
for more details about this method.
The distance between the lat/lon points of subsequent events
is calculated using the method specified in distance.method
.
If "greatcircle", distance_greatcircle
is used,
while distance
is used otherwise.
See das_sight
for how the sightings are processed.
The sightinfo data frame includes the column 'included',
which is used in das_effort_sight
when summarizing
the number of sightings and animals for selected species.
das_effort_sight
is a separate function to allow users to
personalize the included values as desired for their analysis.
By default, i.e. in the output of this function, 'included' is TRUE
if:
the sighting was made when on effort,
by a standard observer (see das_sight
),
and in a Beaufort sea state less than or equal to five.
List of three data frames:
segdata: one row for every segment, and columns for information including unique segment number (segnum), the corresponding effort section (section_id), the segment index within the corresponding effort section (section_sub_id), the starting and ending line of the segment in the DAS file (stlin, endlin), start/end/midpoint coordinates(lat1/lon1, lat2/lon2, and mlat/mlon, respectively), the start/end/midpoint date/time of the segment (DateTime1, DateTime2, and mDateTime, respectively; mDateTime is the average of DateTime1 and DateTime2), segment length (dist), conditions (e.g. Beaufort), and, if applicable, stratum (InStratumName).
sightinfo: details for all sightings in x
, including:
the unique segment number it is associated with, segment mid points (lat/lon),
the 'included' column described in the 'Details' section,
and the output information described in das_sight
for return.format
is "default"
randpicks: see das_chop_equallength
;
NULL
if using "condition" method
Internal functions called by das_effort
:
das_chop_condition
, das_chop_equallength
,
das_chop_section
, das_segdata
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) # Using "condition" method das_effort( y.proc, method = "condition", conditions = c("Bft", "SwellHght", "Vis"), seg.min.km = 0.05, num.cores = 1 ) # Using "section" method das_effort(y.proc, method = "section", num.cores = 1) # Using "equallength" method y.rand <- system.file("das_sample_randpicks.csv", package = "swfscDAS") das_effort( y.proc, method = "equallength", seg.km = 10, randpicks.load = y.rand, num.cores = 1 ) # Using "section" method and chop by strata stratum.file <- system.file("das_sample_stratum.csv", package = "swfscDAS") das_effort( y.proc, method = "section", strata.files = list(Poly1 = stratum.file), num.cores = 1 )
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) # Using "condition" method das_effort( y.proc, method = "condition", conditions = c("Bft", "SwellHght", "Vis"), seg.min.km = 0.05, num.cores = 1 ) # Using "section" method das_effort(y.proc, method = "section", num.cores = 1) # Using "equallength" method y.rand <- system.file("das_sample_randpicks.csv", package = "swfscDAS") das_effort( y.proc, method = "equallength", seg.km = 10, randpicks.load = y.rand, num.cores = 1 ) # Using "section" method and chop by strata stratum.file <- system.file("das_sample_stratum.csv", package = "swfscDAS") das_effort( y.proc, method = "section", strata.files = list(Poly1 = stratum.file), num.cores = 1 )
Summarize number of sightings and animals for selected species by segment
das_effort_sight( x.list, sp.codes, sp.events = c("S", "G", "K", "M", "t", "p"), gs.columns = c("GsSpBest", "GsSpLow", "GsSpHigh") )
das_effort_sight( x.list, sp.codes, sp.events = c("S", "G", "K", "M", "t", "p"), gs.columns = c("GsSpBest", "GsSpLow", "GsSpHigh") )
x.list |
output of |
sp.codes |
character; species code(s) to include in segdata output. These must exactly match the species codes in the data, such as including leading zeros |
sp.events |
character; event code(s) to include in the sightinfo output. This argument supersedes the 'included' value when determining whether a sighting is included in the segment summaries. Must be one or more of: "S", "K", "M", "G", "t", "p" (case-sensitive). The default is that all of these event codes are kept |
gs.columns |
character; the column(s) to use to get the group size values that will be summarized in the segdata output. Must be one or more of 'GsSpBest', 'GsSpLow', and 'GsSpBest' (case-sensitive). See Details section for more information |
This function takes the output of das_effort
and
adds columns for the number of sightings (nSI) and number of animals (ANI)
for selected species (selected via sp.codes
) for each segment
to the segdata element of x.list
.
However, only sightings with an included value of TRUE
(included is a column in sightinfo) are included in the summaries.
Having this step separate from das_effort
allows users to
personalize the included values as desired for their analysis.
The ANI columns are the sum of the 'GsSp...' column(s) from
das_sight
specified using gs.columns
.
If gs.columns
specifies more than one column,
then the secondary columns will only be used if
the values for the previous columns are NA
.
For instance, if gs.columns = c('GsSpBest', 'GsSpLow')
,
then for each row in sightinfo, the value from GsSpLow
will be used only if the value from GsSpBest is NA
A list, identical to x.list
except for
1) the nSI and ANI columns added to x.list$segdata
,
one each for each element of sp.codes
, and
2) the 'included' column of x.list$sightinfo
, which has been set as
FALSE
for sightings of species not listed in sp.codes
.
Thus, the 'included' column in the output accurately reflects
the sightings that were included in the effort segment summaries
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) y.eff.cond <- das_effort( y.proc, method = "condition", conditions = "Bft", seg.min.km = 0.05, num.cores = 1 ) das_effort_sight(y.eff.cond, sp.codes = c("013", "076", "DC"), sp.events = c("S", "t"))
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) y.eff.cond <- das_effort( y.proc, method = "condition", conditions = "Bft", seg.min.km = 0.05, num.cores = 1 ) das_effort_sight(y.eff.cond, sp.codes = c("013", "076", "DC"), sp.events = c("S", "t"))
Split DAS effort where it intersects with a stratum boundary
das_effort_strata(x, ...) ## S3 method for class 'data.frame' das_effort_strata(x, ...) ## S3 method for class 'das_df' das_effort_strata(x, strata.files, ...)
das_effort_strata(x, ...) ## S3 method for class 'data.frame' das_effort_strata(x, ...) ## S3 method for class 'das_df' das_effort_strata(x, strata.files, ...)
x |
an object of class |
... |
ignored |
strata.files |
list of path(s) of the stratum CSV file(s);
see |
This function should only be called by das_effort
,
i.e. it should not be called by users in their personal scripts.
Practically speaking, this functions splits the effort line wherever it crosses a stratum line.
This point of intersection is interpolated;
specifically, it is determined using st_intersection
.
Thus, any effort will be first split at these effort-stratum boundary intersection points,
and then using the specified method (e.g. condition).
The data frame x, with 1) columns added that
indicate a) if the point was in a particular stratum (see das_intersects_strata
), and
b) the index of the stratum in strata.files
(column name 'stratum'; 0 if the point intersects with no strata), and
2) two rows added for each strata crossing
that occurs between something other than an E and R.
These rows are necessary because of how das_effort
processes effort.
The added rows are the same as the event previous to the strata crossing, except:
They have the event code "strataE" and "strataR", respectively
Their coordinates are the coordinates of the intersection of the effort line and the stratum boundary
Their 'idx_eff' values are plus 0.4 and 0.5, respectively
The second added row has the same stratum info as the point immediately after the stratum boundary crossing
Save the PDF document describing the DAS format required by swfscDAS
to a specified file
das_format_pdf(file, ...)
das_format_pdf(file, ...)
file |
character, the name of the file where the PDF will be saved |
... |
passed on to |
A wrapper function for file.copy
.
This function saves the PDF document describing the DAS data format requirements by
copying the PDF document located at system.file("DAS_Format.pdf", package = "swfscDAS")
to file
This file can also be downloaded from https://github.com/swfsc/swfscDAS/blob/master/inst/DAS_Format.pdf
output of file.copy
;
TRUE
if writing of file was successful, and FALSE
otherwise
das_format_pdf(file.path(tempdir(), "DAS_Format.pdf"), overwrite = FALSE)
das_format_pdf(file.path(tempdir(), "DAS_Format.pdf"), overwrite = FALSE)
Determine if swfscDAS outputs intersect with strata polygons
das_intersects_strata(x, ...) ## S3 method for class 'list' das_intersects_strata(x, strata.files, ...) ## S3 method for class 'data.frame' das_intersects_strata( x, strata.files, x.lon = "Lon", x.lat = "Lat", strata.which = FALSE, ... )
das_intersects_strata(x, ...) ## S3 method for class 'list' das_intersects_strata(x, strata.files, ...) ## S3 method for class 'data.frame' das_intersects_strata( x, strata.files, x.lon = "Lon", x.lat = "Lat", strata.which = FALSE, ... )
x |
a data frame (such as an object of class |
... |
ignored |
strata.files |
list of path(s) of the CSV file(s) with points defining each stratum. The CSV files must contain headers and be a closed polygon. The list may be named; see 'Value' section for how these names are used |
x.lon |
character; name of the longitude column of |
x.lat |
character; name of the latitude column of |
strata.which |
logical; indicates if the numeric column 'strata_which' should
be included in the output data frame.
Ignored if |
Assigns DAS event points or segment midpoints to strata polygons
using st_intersects
.
If x
is a list, then 1) it must be the output of
das_effort
or das_effort_sight
and
2) the segment midpoints (column names mlon and mlat, respectively)
are the points checked if they intersect with each provided stratum.
If x
is a data frame, then the user must provide the columns
that specify the point coordinates to check.
x
should not be an object of class das_dfr
,
or an object of class das_df
created with add.dtll.sight = FALSE
,
because the ? and numeric event codes will have NA latitude and longitude values.
If x
is a data frame, then logical columns are added to x
indicating if each point intersected with the corresponding stratum polygon.
The names of these columns are the names of strata.files
;
the element(s) of strata.files
will have the name InPoly#,
where '#' is the index of that stratum polygon in strata.files
.
If strata.which
, then the column 'strata_which' is added to the end of the data frame.
This column contains either a 0 if the point intersects with no strata or
2) a numeric indicating the index (in strata.files
) of
the (first) strata polygon that the point intersects with.
Otherwise, i.e. if x
is a list and thus the output of one of the effort functions,
then the stratum columns are added to both the segdata and sightinfo data frames.
However, note that the columns added to the sightinfo data frame still indicate
whether or not the segment midpoint was in the corresponding stratum,
rather than the sighting point itself.
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) y.eff <- das_effort(y.proc, method = "section", num.cores = 1) stratum.file <- system.file("das_sample_stratum.csv", package = "swfscDAS") das_intersects_strata(y.eff, list(InPoly = stratum.file), x.lon = "Lon", x.lat = "Lat") das_intersects_strata(y.proc, list(stratum.file)) # Visualize effort midpoints and stratum polygon require(sf) y.eff.strata <- das_intersects_strata(y.eff, list(InPoly = stratum.file)) segdata <- st_as_sf(y.eff.strata$segdata, coords = c("mlon", "mlat"), crs = 4326) # Make stratum polygon stratum.df <- read.csv(stratum.file) stratum.sfc <- st_sfc( st_polygon(list(matrix(c(stratum.df$Lon, stratum.df$Lat), ncol = 2))), crs = 4326 ) plot(segdata["InPoly"], axes = TRUE, reset = FALSE, xlim = c(-137, -142.5), ylim = c(42, 47)) plot(stratum.sfc, add = TRUE)
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) y.eff <- das_effort(y.proc, method = "section", num.cores = 1) stratum.file <- system.file("das_sample_stratum.csv", package = "swfscDAS") das_intersects_strata(y.eff, list(InPoly = stratum.file), x.lon = "Lon", x.lat = "Lat") das_intersects_strata(y.proc, list(stratum.file)) # Visualize effort midpoints and stratum polygon require(sf) y.eff.strata <- das_intersects_strata(y.eff, list(InPoly = stratum.file)) segdata <- st_as_sf(y.eff.strata$segdata, coords = c("mlon", "mlat"), crs = 4326) # Make stratum polygon stratum.df <- read.csv(stratum.file) stratum.sfc <- st_sfc( st_polygon(list(matrix(c(stratum.df$Lon, stratum.df$Lat), ncol = 2))), crs = 4326 ) plot(segdata["InPoly"], axes = TRUE, reset = FALSE, xlim = c(-137, -142.5), ylim = c(42, 47)) plot(stratum.sfc, add = TRUE)
Process DAS data (the output of das_read
),
including extracting state and condition information for each DAS event
das_process(x, ...) ## S3 method for class 'character' das_process(x, ...) ## S3 method for class 'data.frame' das_process(x, ...) ## S3 method for class 'das_dfr' das_process( x, days.gap = 20, reset.event = TRUE, reset.effort = TRUE, reset.day = TRUE, add.dtll.sight = TRUE, ... )
das_process(x, ...) ## S3 method for class 'character' das_process(x, ...) ## S3 method for class 'data.frame' das_process(x, ...) ## S3 method for class 'das_dfr' das_process( x, days.gap = 20, reset.event = TRUE, reset.effort = TRUE, reset.day = TRUE, add.dtll.sight = TRUE, ... )
x |
an object of class |
... |
passed to |
days.gap |
numeric of length 1; default is |
reset.event |
logical; default is |
reset.effort |
logical; default is |
reset.day |
logical; default is |
add.dtll.sight |
logical indicating if the DateTime (dt) and latitude and longitude (ll) columns should be added to the sighting events (?, 1, 2, 3, 4, 5, 6, 7, and 8) from the corresponding (immediately preceding) A event |
If x
is a character,
it is assumed to be a filepath and first passed to das_read
.
This output is then passed to das_process
.
DAS data is event-based, meaning most events indicate when a state or weather condition changes. For instance, a 'V' event indicates when one or more sea state viewing conditions (such as Beaufort sea state) change, and these conditions are the same for subsequent events until the next 'V' event. For each state/condition: a new column is created, the state/condition information is extracted from relevant events, and extracted information is propagated to appropriate subsequent rows (events). Thus, each row in the output data frame contains all pertinent state/condition information for that row.
The following assumptions/decisions are made during processing:
Event codes are expected to be one of the following: #, *, ?, 1, 2, 3, 4, 5, 6, 7, 8, A, B, C, E, F, k, K, M, N, P, Q, r, R, s, S, t, V, W, g, G, p, X, Y, Z
All '#' events (deleted events) are removed
r events are converted to R events with non-standard effort;
see das_format_pdf
for more details
An event is considered 'on effort' if it is 1) an R event, 2) a B event immediately preceding an R event, or 3) between corresponding R and E events (not including the E event). The 'EffortDot' column is not used when determining on effort data. Note that effort is reset to 'off effort' at the beginning of a new day.
All state/condition information is reset at the beginning of each cruise.
New cruises are identified using days.gap
.
All state/condition information relating to B, R, P, V, N, and W events
are reset every time there is a BR event sequence if reset.effort == TRUE
,
because in WinCruz a BR event sequence should always be a BRPVNW event sequence.
An event sequence means that all of the events have the same Lat/Lon/DateTime info,
and thus previous values for conditions set during the event sequence should not
carry over to any part of the event sequence.
'OffsetGMT' is converted to an integer. Values are expected to be consistent within a day for each cruise, so events will have an OffsetGMT value if there is any B event with the offset data on the same day, whether that event is before or after the B event. Thus, if any date/cruise combinations have multiple OffsetGMT values in the data, then a warning message will be printed and the OffsetGMT values will be all NA (for the entire output).
'Mode' is capitalized, and 'Mode' values of NA
are assigned a value of "C"
'EffType' is capitalized, and values of NA
are assigned a value of "S"
'ESWsides' represents the number of sides being searched during that effort section -
a value of NA
(for compatibility with older data) or "F" means 2 sides are being searched,
and a value of "H" means 1 side is being searched.
ESWsides will be NA
for values that are not one of "F", NA
, or "H"
'Glare': TRUE
if 'HorizSun' is 11, 12 or 1 and 'VertSun' is 2 or 3,
or if 'HorizSun' is 12 and 'VertSun' is 1;
NA
if 'HorizSun' or 'VertSun' is NA
;
otherwise FALSE
Missing values are NA
rather than -1
A das_df
object, which is also a data frame.
It consists of the input data frame, i.e. the output of das_read
,
with the following columns added:
State/condition | Column name | Data source |
On/off effort | OnEffort | B/R and E events |
Cruise number | Cruise | Event: B; Column: Data1 |
Effort mode | Mode | Event: B; Column: Data2 |
GMT offset of DateTime data | OffsetGMT | Event: B; Column: Data3 |
Effort type | EffType | Event: R; Column: Data1 |
Number of sides with observer | ESWSide | Event: R; Column: Data2 |
Course (ship direction) | Course | Event: N; Column: Data1 |
Speed (ship speed, knots) | SpdKt | Event: N; Column: Data2 |
Beaufort sea state | Bft | Event: V; Column: Data1 |
Swell height (ft) | SwellHght | Event: V; Column: Data2 |
Wind speed (knots) | WindSpdKt | Event: V; Column: Data5 |
Rain/fog/haze code | RainFog | Event: W; Column: Data1 |
Horizontal sun (clock system) | HorizSun | Event: W; Column: Data2 |
Vertical sun (clock system) | VertSun | Event: W; Column: Data3 |
Glare | Glare | HorizSun and VertSun |
Visibility (nm) | Vis | Event: W; Column: Data5 |
Left observer | ObsL | Event: P; Column: Data1 |
Data recorder | Rec | Event: P; Column: Data2 |
Right observer | ObsR | Event: P; Column: Data3 |
Independent observer | ObsInd | Event: P; Column: Data4 |
OffsetGMT represents the difference in hours between the DateTime data (which should be in local time) and GMT (i.e., UTC).
Internal warning messages are printed with row numbers of the input file (NOT of the output data frame) of unexpected event codes and r events, as well as if there is are potential issues with the number and/or order of R and E events
y <- system.file("das_sample.das", package = "swfscDAS") das_process(y) y.read <- das_read(y) das_process(y.read) das_process(y.read, reset.effort = FALSE)
y <- system.file("das_sample.das", package = "swfscDAS") das_process(y) y.read <- das_read(y) das_process(y.read) das_process(y.read, reset.effort = FALSE)
Read one or more fixed-width DAS text file(s) generated by WinCruz into a data frame, where each line is data for a specific event
das_read(file, skip = 0, ...)
das_read(file, skip = 0, ...)
file |
filename(s) of one or more DAS files |
skip |
integer; see |
... |
ignored |
Reads/parses DAS data into columns of a data frame.
If file
contains multiple filenames, then the individual
data frames will be concatenated.
The provided DAS file must adhere to the following column number and format specifications:
Item | Columns | Format |
Event number | 1-3 | |
Event | 4 | |
Effort dot | 5 | |
Time | 6-11 | HHMMSS or HHMM |
Date | 13-18 | MMDDYY |
Latitude | 20-28 | NDD:MM.MM |
Longitude | 30-39 | WDDD:MM.MM |
Data1 | 40-44 | |
Data2 | 45-49 | |
Data3 | 50-54 | |
Data4 | 55-59 | |
Data5 | 60-64 | |
Data6 | 65-69 | |
Data7 | 70-74 | |
Data8 | 75-79 | |
Data9 | 80-84 | |
Data10 | 85-89 | |
Data11 | 90-94 | |
Data12 | 95+ | |
See das_format_pdf
for more information about DAS format requirements, and
note that 'Data#' columns may be referred to as 'Field#' columns in other documentation.
A das_dfr
object, which is also a data frame, with DAS data read into columns.
The data are read into the data frame as characters as described in 'Details',
with the following exceptions:
Name | Class | Details |
EffortDot | logical | TRUE if "." was present, and FALSE otherwise |
DateTime | POSIXct | combination of 'Date' and 'Time' columns |
Lat | numeric | 'Latitude' column converted to decimal degrees in range [-90, 90] |
Lon | numeric | 'Longitude' column converted to decimal degrees in range [-180, 180] |
Data# | character | leading/trailing whitespace trimmed for non-comment events (i.e. where 'Event' is not "C") |
EventNum | character | leading/trailing whitespace trimmed; left as character for some project-specific codes |
file_das | character | base filename, extracted from the file argument |
line_num | integer | line number of each data row |
DateTime values have a (meaningless) time zone value of "UTC".
See the OffsetGMT column from das_process
for relevant time zone information
Warnings are printed if any unexpected events have NA
DateTime/Lat/Lon values,
or if any Lat/Lon values cannot be converted to numeric values.
Events that are 'expected' to have NA
DateTime/Lat/Lon values are:
C, ?, 1, 2, 3, 4, 5, 6, 7, 8
y <- system.file("das_sample.das", package = "swfscDAS") das_read(y)
y <- system.file("das_sample.das", package = "swfscDAS") das_read(y)
Summarize DAS effort data by effort segment, while averaging or getting the max for each condition
das_segdata(x, ...) ## S3 method for class 'data.frame' das_segdata(x, ...) ## S3 method for class 'das_df' das_segdata( x, conditions, segdata.method = c("avg", "maxdist"), seg.lengths, section.id, ... )
das_segdata(x, ...) ## S3 method for class 'data.frame' das_segdata(x, ...) ## S3 method for class 'das_df' das_segdata( x, conditions, segdata.method = c("avg", "maxdist"), seg.lengths, section.id, ... )
x |
an object of class |
... |
ignored |
conditions |
see |
segdata.method |
character; either avg" or "maxdist". See Details section for more information |
seg.lengths |
numeric; length of the modeling segments
into which |
section.id |
numeric; the ID of |
WARNING - do not call this function directly! It is exported for documentation purposes, but is intended for internal package use only.
This function was designed to be called by one of the das_chop_ functions,
e.g. das_chop_equallength
, and thus
users should avoid calling it themselves.
It loops through the events in x
, chopping x
into modeling segments
while calculating and storing relevant information for each segment.
Because x
is a continuous effort section, it must begin with
a "B" or "R" event and end with the corresponding "E" event.
For each segment, this function reports the segment number,
segment ID, cruise number, the start/end/mid coordinates (lat/lon),
start/end/mid date/times (DateTime), segment length,
year, month, day, midpoint time, mode, effort type,
effective strip width sides (number of sides searched),
and average conditions (which are specified by conditions
).
The segment ID is designated as section.id
_ index of the modeling segment.
Thus, if section.id
is 1
, then the segment ID for
the second segment from x
is "1_2"
.
The start/end coordinates and date/times are interpolated as needed,
e.g. when using the 'equallength' method.
When segdata.method
is "avg", the condition values are
calculated as a weighted average by distance.
The reported value for logical columns (e.g. Glare) is the percentage
(in decimals) of the segment in which that condition was TRUE
.
For character columns, the reported value for each segment is
the unique value(s) present in the segment, with NA
s omitted,
pasted together via paste(..., collapse = "; ")
.
When segdata.method
is "maxdist", the reported values
are, for each condition, the value recorded for the longest distance
during that segment (with NA
s omitted).
Cruise number, mode, effort type, sides searched, and file name are
also included in the segdata output.
These values (excluding NA
s) must be consistent across the
entire effort section, and thus across all segments in x
;
a warning is printed if there are any inconsistencies
bearing
and destination
are used to calculate the segment start, mid, and end points,
with method = "vincenty"
.
Data frame with the segdata information described in Details
and in das_effort
Extract sightings and associated information from processed DAS data
das_sight(x, ...) ## S3 method for class 'data.frame' das_sight(x, ...) ## S3 method for class 'das_df' das_sight( x, return.format = c("default", "wide", "complete"), return.events = c("S", "K", "M", "G", "s", "k", "m", "g", "t", "p", "F"), ... )
das_sight(x, ...) ## S3 method for class 'data.frame' das_sight(x, ...) ## S3 method for class 'das_df' das_sight( x, return.format = c("default", "wide", "complete"), return.events = c("S", "K", "M", "G", "s", "k", "m", "g", "t", "p", "F"), ... )
x |
an object of class |
... |
ignored |
return.format |
character; can be one of "default", "wide", "complete", or any partial match thereof (case sensitive). Formats described below |
return.events |
character; event codes included in the output. Must be one or more of: "S", "K", "M", "G", "s", "k", "m", "g", "t", "p", "F" (case-sensitive). The default is all of these event codes |
DAS events contain specific information in the 'Data#' columns,
with the information depending on the event code for that row.
The output data frame contains columns with this specific information
extracted to dedicated columns as described below.
This function recognizes the following types of sightings:
marine mammal sightings (event codes "S", "K", or "M"),
marine mammal resights (codes "s", "k", "m"),
marine mammal subgroup sightings (code "G"),
marine mammal subgroup resights (code "g"),
turtle sightings (code "t"),
pinniped sightings (code "p"),
and fishing vessel sightings (code "F").
Warnings are printed if all S, K, M, and G events (and only these events) are not
followed by an A event and at least one numeric event.
See das_format_pdf
for more information about events and event formats.
Of specific note - sperm whale sightings (species code 046) often contain additional estimates
recorded as "C" events immediately following the S, A, and numeric events.
Because these estimates are recorded as"C" events, they are NOT included in the
das_sight
calculations or output for any return.format
The return.events
argument simply provides a shortcut for
filtering the output of das_sight
by event codes
Abbreviations used in output column names: Gs = group size, Sp = species, Nm = nautical mile, Perc = percentage, Prob = probable, GsSchool = school-level group size info
This function makes the following assumptions, and alterations to the raw DAS data:
"A" events immediately following an S/K/M/G event have the same sighting number (Data1 value) as the S/K/M/G event
The 'nSp' column is equivalent to the number of non-NA
values across the
'Data5', 'Data6', 'Data7', and 'Data8' columns
for the pertinent "A" event
The following data are coerced to a numeric using
as.numeric
:
Bearing, Reticle, DistNm, Cue, Method,
species percentages, and group sizes (including for t, p, and F events).
Note that if there are any formatting errors and these data are not numeric,
the function will likely print a warning message
The values for the following columns are capitalized using
toupper
:
'Birds', 'Photos', 'CalibSchool', 'PhotosAerial', 'Biopsy',
'TurtleAge', and 'TurtleCapt'
Data frame with 1) the columns from x
, excluding the 'Data#' columns,
and 2) columns with sighting information extracted from 'Data#' columns.
See das_format_pdf
for more information the sighting information.
If return.format
is "default", then there is one row for each species of each sighting event;
if return.format
is "wide", then there is one row for each sighting event;
if return.format
is "complete", then there is one row for every
group size estimate for each sighting event (excluding sperm whale "C" events - see the Details section).
The format-specific columns are described in their respective sections. The following sighting information columns are included in all return formats:
Sighting information | Column name | Notes |
Sighting number | SightNo | Character |
Subgroup code | Subgroup | Character |
Daily sighting number | SightNoDaily | See below |
Observer that made the sighting | Obs | |
Standard observer | ObsStd | Logical; TRUE if Obs is one of ObsL, Rec or ObsR, and FALSE otherwise |
Bearing to the sighting | Bearing | Numeric; degrees, expected range 0 to 360 |
Number of reticle marks | Reticle | Numeric |
Distance (nautical miles) | DistNm | Numeric |
Sighting cue | Cue | |
Sighting method | Method | |
Photos of school? | Photos | |
Birds present with school? | Birds | |
Calibration school? | CalibSchool | |
Aerial photos taken? | PhotosAerial | |
Biopsy taken? | Biopsy | |
Probable sighting | Prob | Logical indicating if sighting has associated ? event; NA for non-S/K/M/G events |
Number of species in sighting | nSp | NA for non-S/K/M/G events |
Mixed species sighting | Mixed | Logical; TRUE if nSp > 1 |
Group size of school - best estimate | GsSchoolBest | See below |
Group size of school - high estimate | GsSchoolHigh | See below |
Group size of school - low estimate | GsSchoolLow | See below |
Course (true heading) of school at resight | CourseSchool | NA for non-s/k/m events |
Presence of associated JFR | TurtleJFR | NA for non-"t" events; JFR = jellyfish, floating debris, or red tide |
Estimated turtle maturity | TurtleAge | NA for non-"t" events |
Perpendicular distance (km) to sighting | PerpDistKm | Calculated via (abs(sin(Bearing*pi/180) * DistNm) * 1.852)
|
SightNoDaily is a running count of the number of S/K/M/G sightings that occurred on each day. It is formatted as 'YYYYMMDD'_'running count', e.g. "20050101_1".
The GsSchoolBest, GsSchoolHigh, and GsSchoolLow columns are either:
1) the arithmetic mean across observer estimates, for the "default" and "wide" formats, or
2) the individual observer estimates, for the "complete" format.
Note that for non-"complete" formats, na.rm = TRUE
is used when calculating the mean,
and thus blank elements of estimates (but not the whole incomplete estimate) are ignored.
To convert the perpendicular distance back to nautical miles, one would divide PerpDistKm by 1.852
This output data frame contains 'long' sighting data, meaning there is one row for each species of each sighting event.
The GsSp... columns are calculated as follows:
for each species and for each observer estimate, the best/high/low school size estimate is multiplied by the applicable species percent estimate.
The values are grouped by species and then averaged to get single GsSpBest, GsSpHigh, and GsSpLow values for each species.
(using mean
with na.rm = TRUE
)
Sighting information columns/formats present specifically in the "default" format output:
Sighting information | Column name | Notes |
Species code | SpCode | Boat type or mammal, turtle, or pinniped species codes |
Probable species code | SpCodeProb | Probable mammal species codes; NA if none or not applicable |
Group size of species - best estimate | GsSpBest | The product of the arithmetic means of GsSchoolBest and the corresponding species percentage |
Group size of species - high estimate | GsSpHigh | The product of the arithmetic means of GsSchoolHigh and the corresponding species percentage |
Group size of species - low estimate | GsSpLow | The product of the arithmetic means of GsSchoolLow and the corresponding species percentage |
Note that for the above calculations,
the GsSchoolX value and corresponding species percentages were each
averaged across observers, using na.rm = TRUE
,
before being multiplied to calculate GsSpX. For example, in the workflow:
GsSpBest1 = mean(.data$Data2, na.rm = TRUE) * mean(.data$Data5, na.rm = TRUE)
The "wide" and "complete" options have very similar columns in their output date frames.
There are two main differences: 1) the "wide" format has one row for each sighting event,
while the complete format has a row for every observer estimate for each sightings, and thus
2) in the "wide" format, all numeric information for which there are multiple observer estimates
(school group size, species percentage, etc.) are averaged across estimated via
an arithmetic mean (using mean
with na.rm = TRUE
)
With these formats, note that the species/type code and group size for turtle, pinniped, and boat sightings are in their own column
Sighting information columns present in the "wide" and "complete" format outputs:
Sighting information | Column name | Notes |
Observer code - estimate | ObsEstimate | See below |
Species 1 code | SpCode1 | |
Species 2 code | SpCode2 | |
Species 3 code | SpCode3 | |
Species 4 code | SpCode4 | |
Species 1 probable code | SpCodeProb1 | Extracted from '?' event |
Species 2 probable code | SpCodeProb2 | Extracted from '?' event |
Species 3 probable code | SpCodeProb3 | Extracted from '?' event |
Species 4 probable code | SpCodeProb4 | Extracted from '?' event |
Percentage of Sp 1 in school | SpPerc1 | |
Percentage of Sp 2 in school | SpPerc2 | |
Percentage of Sp 3 in school | SpPerc3 | |
Percentage of Sp 4 in school | SpPerc4 | |
Group size of species 1 | GsSpBest1 | Present in "wide" output only; see below |
Group size of species 2 | GsSpBest2 | Present in "wide" output only; see below |
Group size of species 3 | GsSpBest3 | Present in "wide" output only; see below |
Group size of species 4 | GsSpBest4 | Present in "wide" output only; see below |
Turtle species | TurtleSp | NA for non-"t" events |
Turtle group size | TurtleGs | NA for non-"t" events |
Was turtle captured? | TurtleCapt | NA for non-"t" events |
Pinniped species | PinnipedSp | NA for non-"p" events |
Pinniped group size | PinnipedGs | NA for non-"p" events |
Boat or gear type | BoatType | NA for non-"F" events |
Number of boats | BoatGs | NA for non-"F" events
|
ObsEstimate refers to the code of the observer that made the corresponding estimate. For the "wide" format, ObsEstimate is a list-column of all of the observer codes that provided an estimate. Also in the "wide" format, the GsSpBest# columns are the product of the means of GsSchoolBest and the corresponding species percentage (see the Default section for calculation details). These numbers, 1 to 4, correspond to the order of the data as it appears in the DAS file
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) das_sight(y.proc) das_sight(y.proc, return.format = "complete")
y <- system.file("das_sample.das", package = "swfscDAS") y.proc <- das_process(y) das_sight(y.proc) das_sight(y.proc, return.format = "complete")
Calculate the great-circle distance between two lat/lon points
distance_greatcircle(lat1, lon1, lat2, lon2)
distance_greatcircle(lat1, lon1, lat2, lon2)
lat1 |
numeric; starting latitude coordinate(s) |
lon1 |
numeric; starting longitude coordinate(s) |
lat2 |
numeric; ending latitude coordinate(s) |
lon2 |
numeric; ending longitude coordinate(s) |
Distance in kilometers between lat1/lon1 and lat2/lon2
https://en.wikipedia.org/wiki/Great-circle_distance
Convert randpicks file from segchopr format to swfscDAS format
randpicks_convert(x.randpicks, x.segdata, seg.km)
randpicks_convert(x.randpicks, x.segdata, seg.km)
x.randpicks |
Data frame with two columns;
randpick values formatted for segchopr that correspond to |
x.segdata |
Data frame; segdata that corresponds to |
seg.km |
numeric; target segment length used when creating |
Past DAS processing code (segchopr) only recorded the generated random values,
whereas swfscDAS randpicks files contain one line for each continuous effort section.
See das_chop_equallength
for more details about the swfscDAS randpicks format.
This function 'converts' a randpicks data frame generated by segchopr
to a data frame that meets the swfscDAS randpicks format requirements
Data frame with one line for each continuous effort section in x.segdata
,
and two columns: effort_section
and randpicks
Subsetting das_dfr
or das_df
objects
## S3 method for class 'das_dfr' x[i, j, ..., drop = TRUE] ## S3 replacement method for class 'das_dfr' x$name <- value ## S3 replacement method for class 'das_dfr' x[i, j, ...] <- value ## S3 replacement method for class 'das_dfr' x[[i]] <- value ## S3 method for class 'das_df' x[i, j, ..., drop = TRUE] ## S3 replacement method for class 'das_df' x$name <- value ## S3 replacement method for class 'das_df' x[i, j, ...] <- value ## S3 replacement method for class 'das_df' x[[i]] <- value
## S3 method for class 'das_dfr' x[i, j, ..., drop = TRUE] ## S3 replacement method for class 'das_dfr' x$name <- value ## S3 replacement method for class 'das_dfr' x[i, j, ...] <- value ## S3 replacement method for class 'das_dfr' x[[i]] <- value ## S3 method for class 'das_df' x[i, j, ..., drop = TRUE] ## S3 replacement method for class 'das_df' x$name <- value ## S3 replacement method for class 'das_df' x[i, j, ...] <- value ## S3 replacement method for class 'das_df' x[[i]] <- value
x |
object of class |
i , j , ...
|
elements to extract or replace, see |
drop |
logical, see |
name |
A literal character string or ..., see |
value |
A suitable replacement value, see |
When subsetting a das_dfr
or das_df
object, henceforth a das_
object,
using any of the functions described in [.data.frame
,
then then the das_
class is simply dropped and the object is of class data.frame
.
This is because of the strict format requirements of das_
objects;
it is likely that a subsetted das_
object will not have
the format required by subsequent swfscDAS functions,
and thus it is safest to drop the das_
class.
If a data frame is passed to downstream swfscDAS
functions that require a das_
object,
then they will attempt to coerce the object to the necessary das_
class
See as_das_dfr
and as_das_df
for more details.
y <- system.file("das_sample.das", package = "swfscDAS") y.read <- das_read(y) # All return a data frame: class(y.read[1:10, ]) class(y.read[, 1:10]) y.df <- y.read y.df[, 1] <- "a" class(y.df) y.df <- y.read y.df$Event <- "a" class(y.df) y.df <- y.read y.df[["Event"]] <- "a" class(y.df)
y <- system.file("das_sample.das", package = "swfscDAS") y.read <- das_read(y) # All return a data frame: class(y.read[1:10, ]) class(y.read[, 1:10]) y.df <- y.read y.df[, 1] <- "a" class(y.df) y.df <- y.read y.df$Event <- "a" class(y.df) y.df <- y.read y.df[["Event"]] <- "a" class(y.df)
These functions are exported only to be used internally by swfscAirDAS. They implement functionality that is used when processing both DAS and AirDAS data
.chop_condition_eff(i, call.x, call.conditions, call.seg.min.km, call.func1) .chop_equallength_eff( i, call.x, call.conditions, call.seg.km, call.r.pos, call.func1 ) .process_num(init.val, das.df, col.name, event.curr, event.na) .process_chr(init.val, das.df, col.name, event.curr, event.na) .segdata_proc( das.df, conditions, segdata.method, seg.lengths, section.id, df.out1 ) .segdata_aggr(data.list, curr.df, idx, dist.perc) .dist_from_prev( z, z.distance.method = c("greatcircle", "lawofcosines", "haversine", "vincenty") )
.chop_condition_eff(i, call.x, call.conditions, call.seg.min.km, call.func1) .chop_equallength_eff( i, call.x, call.conditions, call.seg.km, call.r.pos, call.func1 ) .process_num(init.val, das.df, col.name, event.curr, event.na) .process_chr(init.val, das.df, col.name, event.curr, event.na) .segdata_proc( das.df, conditions, segdata.method, seg.lengths, section.id, df.out1 ) .segdata_aggr(data.list, curr.df, idx, dist.perc) .dist_from_prev( z, z.distance.method = c("greatcircle", "lawofcosines", "haversine", "vincenty") )
i |
ignore |
call.x |
ignore |
call.conditions |
ignore |
call.seg.min.km |
ignore |
call.func1 |
ignore |
call.seg.km |
ignore |
call.r.pos |
ignore |
init.val |
ignore |
das.df |
ignore |
col.name |
ignore |
event.curr |
ignore |
event.na |
ignore |
conditions |
ignore |
segdata.method |
ignore |
seg.lengths |
ignore |
section.id |
ignore |
df.out1 |
ignore |
data.list |
ignore |
curr.df |
ignore |
idx |
ignore |
dist.perc |
ignore |
z |
ignore |
z.distance.method |
ignore |