seiscomp-training/share/doc/caps/html/_sources/base/upgrading.rst.txt

.. _sec-caps-upgrading:

Upgrading
=========

New file format
---------------

Starting from version 2021.048 CAPS introduces a new file storage format.
Actually the files are still compatible and chunk based but two new chunk types
were added. The upgrade itself should run smoothly without interruption but due
to the new file format all files must be converted before they can be read.
CAPS will do that on-the-fly whenever a file is opened for reading or writing.

That can cause performance drops until all files have been converted. But it
should not cause any outages.

Rationale
---------

The time to store an out-of-order record in CAPS increased the more records
were stored already. This was caused by a linear search of the insert position.
The more records were stored the more records had to be checked and the more
file content had to be paged in system memory which is a slow operation.
In addition a second index file had to be maintained which requires an additional
open file descriptor per data file. As we also looked for way to reduce
disc fragmentation and to allow file size pre-allocation on any operating system
we decided to redesign the way how individual records are stored within a data
file. What we wanted was:

* Fast insert operations
* Fast data retrieval
* Portable file size pre-allocations
* Efficient OS memory paging

CAPS now implements a B+tree index per data file. No additional index file is
required. The index is maintained as additional chunks in the data file itself.
Furthermore CAPS maintains a meta chunk at the end of the file with information
about the logical and pyhsical file size, the index chunks and so on. If that
chunk is not available or is not valid then the data file will be re-scanned
and converted. This is what actually happens after an upgrade.

As a consequence, time window requests will be much faster with respect to
CPU time. Also file accesses are less frequent and reading file content overhead
while extracting arbitrary time windows is less than before.

As the time range stored in the data file is now part of the meta data a full
re-scan is not necessary when restarting CAPS without its archive log. When
dealing with many channels it will speed up re-scanning an archive a lot.

Manual archive conversion
-------------------------

If a controlled conversion of the archive files is desired then the following
procedure can be applied:

1. Stop caps

   .. code-block:: sh

      $ seiscomp stop caps

2. Enter the configured archve directory

   .. code-block:: sh

      $ cd seiscomp/var/lib/caps/archive

3. Check all files and trigger a conversion

   .. code-block:: sh

      $ find -name *.data -exec rifftest {} check \;

4. Start caps

   .. code-block:: sh

      $ seiscomp start caps

Depending on the size of the archive step 3 can take some time.