Files
2025/share/doc/seiscomp/html/_sources/apps/scardac.rst.txt

500 lines
14 KiB
ReStructuredText

.. highlight:: rst
.. _scardac:
#######
scardac
#######
**Waveform archive data availability collector.**
Description
===========
scardac scans an :term:`SDS waveform archive <SDS>`, e.g.,
created by :ref:`slarchive` or :ref:`scart` for available
:term:`miniSEED <miniSeed>` data. It will collect information about
* ``DataExtents`` -- the earliest and latest times data is available
for a particular channel,
* ``DataAttributeExtents`` -- the earliest and latest times data is available
for a particular channel, quality and sampling rate combination,
* ``DataSegments`` -- continuous data segments sharing the same quality and
sampling rate attributes.
scardac is intended to be executed periodically, e.g., as a cronjob.
The availability data information is stored in the SeisComP database under the
root element :ref:`DataAvailability <api-datamodel-python>`. Access to the
availability data is provided by the :ref:`fdsnws` module via the services:
* :ref:`/fdsnws/station <sec-station>` (extent information only, see
``matchtimeseries`` and ``includeavailability`` request parameters).
* :ref:`/fdsnws/ext/availability <sec-avail>` (extent and segment information
provided in different formats)
.. _scarcac_non-sds:
Non-SDS archives
----------------
scardac can be extended by plugins to scan non-SDS archives. For example the
``daccaps`` plugin provided by :cite:t:`caps` allows scanning archives generated
by a CAPS server. Plugins are added to the global module configuration, e.g.:
.. code-block:: properties
plugins = ${plugins}, daccaps
.. _scarcac_workflow:
Definitions
-----------
* ``Record`` -- continuous waveform data of same sampling rate and quality bound
by a start and end time. scardac will only read the record's meta data and not
the actual samples.
* ``Chunk`` -- container for records, e.g., a :term:`miniSEED <miniSeed>` file,
with the following properties:
- overall, theoretical time range of records it may contain
- contains at least one record, otherwise it must be absent
- each record of a chunk must fulfill the following conditions:
- `chunk start <= record start < chunk end`
- `chunk start < record end < next chunk end`
- chunks do not overlap, end time of current chunk equals start time of
successive chunk, otherwise a ``chunk gap`` is declared
- records may occur unordered within a chunk or across chunk boundaries,
resulting in `DataSegments` marked as ``outOfOrder``
* ``Jitter`` -- maximum allowed deviation between the end time of the current
record and the start time of the next record in multiples of the current's
record sampling rate. E.g., assuming a sampling rate of 100Hz and a jitter
of 0.5 will allow for a maximum end to start time difference of 50ms. If
exceeded a new `DataSegment` is created.
* ``Mtime`` -- time the content of a chunk was last modified. It is used to
- decided whether a chunk needs to be read in a secondary application run
- calculate the ``updated`` time stamp of a `DataSegment`,
`DataAttributeExtent` and `DataExtent`
* ``Scan window`` -- time window limiting the synchronization of the archive
with the database configured via :confval:`filter.time.start` and
:confval:`filter.time.end` respectively :option:`--start` and :option:`--end`.
The scan window is useful to
- reduce the scan time of larger archives. Depending on the size and storage
type of the archive it may take some time to just list available chunks and
their mtime.
- prevent deletion of availability information even though parts of the
archive have been deleted or moved to a different location
* ``Modification window`` -- the mtime of a chunk is compared with this time
window to decide whether it needs to be read or not. It is configured via
:confval:`mtime.start` and :confval:`mtime.end` repectively
:option:`--modified-since` and :option:`--modified-until`. If no lower bound
is defined then the ``lastScan`` time stored in the `DataExtent` is used
instead. The mtime check may be disabled using :confval:`mtime.ignore` or
:option:`--deep-scan`.
**Note:** Chunks in front or right after a chunk gap are read in any case
regardless of the mtime settings.
Workflow
--------
#. Read existing `DataExtents` from database.
#. Collect a list of available stream IDs either by
* scanning the archive for available IDs or
* reading an ID file defined by :confval:`nslcFile`.
#. Identify extents to add, update or remove respecting `scan window`,
:confval:`filter.nslc.include` and :confval:`filter.nslc.exclude`.
#. Subsequently process the `DataExtents` using :confval:`threads` number of
parallel threads. For each `DataExtent`:
#. Collect all available chunks within `scan window`.
#. If the `DataExtent` is new (no database entry yet), store a new and
empty `DataExtent` to database, else query existing `DataSegments` from
the database:
* count segments outside `scan window`
* create a database iterator for extents within `scan window`
#. Create two in-memory segment lists which collect segments to remove and
segments to add/update
#. For each chunk
* determine the `chunk window` and `mtime`
* decide whether the chunk needs to be read depending on the `mtime`
and a possible `chunk gap`. If necessary, read the chunk and
- create chunk segments by analyzing the chunk records for
gaps/overlaps defined by :confval:`jitter`, sampling rate or quality
changes
- merge chunk segments with database segments and update the in-memory
segment lists.
If not necessary, advance the database segment iterator to the end
of the chunk window.
#. Remove and then add/update the collected segments.
#. Merge segment information into `DataAttributeExtents`
#. Merge `DataAttributeExtents` into overall `DataExtent`
Examples
--------
#. Get command line help or execute scardac with default parameters and informative
debug output:
.. code-block:: sh
scardac -h
scardac --debug
#. Synchronize the availability of waveform data files existing in the standard
:term:`SDS` archive with the seiscomp database and create an XML file using
:ref:`scxmldump`:
.. code-block:: sh
scardac -d mysql://sysop:sysop@localhost/seiscomp -a $SEISCOMP_ROOT/var/lib/archive --debug
scxmldump -Yf -d mysql://sysop:sysop@localhost/seiscomp -o availability.xml
#. Synchronize the availability of waveform data files existing in the standard
:term:`SDS` archive with the seiscomp database. Use :ref:`fdsnws` to fetch a flat file containing a list
of periods of available data from stations of the CX network sharing the same
quality and sampling rate attributes:
.. code-block:: sh
scardac -d mysql://sysop:sysop@localhost/seiscomp -a $SEISCOMP_ROOT/var/lib/archive
wget -O availability.txt 'http://localhost:8080/fdsnws/ext/availability/1/query?network=CX'
.. note::
The |scname| module :ref:`fdsnws` must be running for executing this
example.
.. _scardac_configuration:
Module Configuration
====================
| :file:`etc/defaults/global.cfg`
| :file:`etc/defaults/scardac.cfg`
| :file:`etc/global.cfg`
| :file:`etc/scardac.cfg`
| :file:`~/.seiscomp/global.cfg`
| :file:`~/.seiscomp/scardac.cfg`
scardac inherits :ref:`global options<global-configuration>`.
.. confval:: archive
Default: ``@SEISCOMP_ROOT@/var/lib/archive``
Type: *string*
The URL to the waveform archive where all data is stored.
Format: [service:\/\/]location[#type]
\"service\": The type of the archive. If not given,
\"sds:\/\/\" is implied assuming an SDS archive. The SDS
archive structure is defined as
YEAR\/NET\/STA\/CHA\/NET.STA.LOC.CHA.YEAR.DAYFYEAR, e.g.
2018\/GE\/APE\/BHZ.D\/GE.APE..BHZ.D.2018.125
Other archive types may be considered by plugins.
.. confval:: threads
Default: ``1``
Type: *int*
Number of threads scanning the archive in parallel.
.. confval:: jitter
Default: ``0.5``
Type: *float*
Acceptable derivation of end time and start time of successive
records in multiples of sample time.
.. confval:: maxSegments
Default: ``1000000``
Type: *int*
Maximum number of segments per stream. If the limit is reached
no more segments are added to the database and the corresponding
extent is flagged as too fragmented. Set this parameter to 0 to
disable any limits.
.. confval:: nslcFile
Type: *string*
Line\-based text file of form NET.STA.LOC.CHA defining available
stream IDs. Depending on the archive type, size and storage
media used this file may offer a significant performance
improvement compared to collecting the available streams on each
startup. Filters defined under `filter.nslc` still apply.
.. note::
**filter.\***
*Parameters of this section limit the data processing to either*
**
*reduce the scan time of larger archives or to*
**
*prevent deletion of availability information even though parts*
*of the archive have been deleted or moved to a different*
*location.*
.. note::
**filter.time.\***
*Limit the processing by record time.*
.. confval:: filter.time.start
Type: *string*
Start of data availability check given as date string or
as number of days before now.
.. confval:: filter.time.end
Type: *string*
End of data availability check given as date string or
as number of days before now.
.. note::
**filter.nslc.\***
*Limit the processing by stream IDs.*
.. confval:: filter.nslc.include
Type: *list:string*
Comma\-separated list of stream IDs to process. If
empty all streams are accepted unless an exclude filter
is defined. The following wildcards are supported: '\*'
and '?'.
.. confval:: filter.nslc.exclude
Type: *list:string*
Comma\-separated list of stream IDs to exclude from
processing. Excludes take precedence over includes. The
following wildcards are supported: '\*' and '?'.
.. note::
**mtime.\***
*Parameters of this section control the rescan of data chunks.*
*By default the last update time of the extent is compared with*
*the record file modification time to read only files modified*
*since the list run.*
.. confval:: mtime.ignore
Default: ``false``
Type: *boolean*
If set to true all data chunks are read independent of their
mtime.
.. confval:: mtime.start
Type: *string*
Only read chunks modified after specific date given as date
string or as number of days before now.
.. confval:: mtime.end
Type: *string*
Only read chunks modified before specific date given as date
string or as number of days before now.
Command-Line Options
====================
.. program:: scardac
:program:`scardac [OPTION]...`
Generic
-------
.. option:: -h, --help
Show help message.
.. option:: -V, --version
Show version information.
.. option:: --config-file arg
Use alternative configuration file. When this option is
used the loading of all stages is disabled. Only the
given configuration file is parsed and used. To use
another name for the configuration create a symbolic
link of the application or copy it. Example:
scautopick \-> scautopick2.
.. option:: --plugins arg
Load given plugins.
Verbosity
---------
.. option:: --verbosity arg
Verbosity level [0..4]. 0:quiet, 1:error, 2:warning, 3:info,
4:debug.
.. option:: -v, --v
Increase verbosity level \(may be repeated, eg. \-vv\).
.. option:: -q, --quiet
Quiet mode: no logging output.
.. option:: --print-component arg
For each log entry print the component right after the
log level. By default the component output is enabled
for file output but disabled for console output.
.. option:: --component arg
Limit the logging to a certain component. This option can
be given more than once.
.. option:: -s, --syslog
Use syslog logging backend. The output usually goes to
\/var\/lib\/messages.
.. option:: -l, --lockfile arg
Path to lock file.
.. option:: --console arg
Send log output to stdout.
.. option:: --debug
Execute in debug mode.
Equivalent to \-\-verbosity\=4 \-\-console\=1 .
.. option:: --trace
Execute in trace mode.
Equivalent to \-\-verbosity\=4 \-\-console\=1 \-\-print\-component\=1
\-\-print\-context\=1 .
.. option:: --log-file arg
Use alternative log file.
Collector
---------
.. option:: -a, --archive arg
Overrides configuration parameter :confval:`archive`.
.. option:: --threads arg
Overrides configuration parameter :confval:`threads`.
.. option:: -j, --jitter arg
Overrides configuration parameter :confval:`jitter`.
.. option:: --nslc arg
Overrides configuration parameter :confval:`nslcFile`.
.. option:: --start arg
Overrides configuration parameter :confval:`filter.time.start`.
.. option:: --end arg
Overrides configuration parameter :confval:`filter.time.end`.
.. option:: --include arg
Overrides configuration parameter :confval:`filter.nslc.include`.
.. option:: --exclude arg
Overrides configuration parameter :confval:`filter.nslc.exclude`.
.. option:: --deep-scan
Overrides configuration parameter :confval:`mtime.ignore`.
.. option:: --modified-since arg
Overrides configuration parameter :confval:`mtime.start`.
.. option:: --modified-until arg
Overrides configuration parameter :confval:`mtime.end`.
.. option:: --generate-test-data arg
Do not scan the archive but generate test data for each
stream in the inventory. Format:
days,gaps,gapslen,overlaps,overlaplen. E.g., the following
parameter list would generate test data for 100 days
\(starting from now\(\)\-100\) which includes 150 gaps with a
length of 2.5s followed by 50 overlaps with an overlap of
5s: \-\-generate\-test\-data\=100,150,2.5,50,5