liboml2.conf(5)
===============

NAME
----
liboml2.conf - OML2 client library configuration file format

SYNOPSIS
--------
[verse]
*<oml2-enabled-app>* [APP-OPTIONS] --oml-config <config-file>

DESCRIPTION
-----------

*liboml2* is the client library for OML2.  It provides an API for
application writers to collect measurements from their applications
via user-defined 'Measurement Points' (MPs).  It also provides a
flexible filtering and collection mechanism that allows application
users to customize how measurements are processed and stored by an
OML-enabled application.

This man page documents the format of the configuration file that
*liboml2* uses to configure itself at runtime.  Application writers
who want to learn how to write an application using the *liboml2* API
should consult linkoml:liboml2[3].  OML also supports simpler but less
powerful configuration via the command line, for testing and
debugging.  It is documented in linkoml:liboml2[1].

When an OML application starts up, it passes the command line
arguments to *liboml2*, which scans them for options that it
understands, and uses those to configure itself.  *liboml2* then
removes its own options from the command line so that the application
proper does not get confused by them.  All OML options start with the
*--oml-* prefix.  For more information, see linkoml:liboml2[1].

The *--oml-config* command line option accepts the name of a
configuration file.  If specified, this file is used to configure the
library internally. All other *--oml-** command line options are ignored
when it is specified.  The file is an XML document.  The configuration
file serves two main purposes: configuring the destinations for
measurement outputs, and configuring the filtering of the MP inputs
prior to sending them to their destinations.  It is therefore important
to understand how OML filters work.

The data path between a measurement point and the measurement
destination looks like this:

----
        ---> F1 ---
       /           \
MP *--- ---> F2 --- ---> [ Collector ]---> [ File/Network ]
       \           /
        |   ...   |
        |         |
         --> FN --
----

The samples injected to the MP are sent to N filters. The outputs of
all the filters are then multiplexed together into a combined output
stream by a collector. A "collector" takes measurement streams from
the output of the filters, multiplexes them together, and then sends
the combined measurement stream on to the destination, which is either
a file on the local disk or a remote OML server
(see linkoml:oml2-server[1]). Each collector sends measurements to just one
destination, but the configuration can specify multiple collectors,
each with a different destination.

Note that although the diagram above shows the outputs of the filters
attached to one MP being sent to one collector, that same collector
may be accepting the output streams of filters attached to several
MPs. Further, the samples injected to an MP can be sent to a second
(or third, etc.) set of filters and then on to another collector. This
will become clear in the description of the configuration file format
below.

Note that there is no round-robin scheme operating among the filters;
all injected samples are sent to all filters attached to the MP.

There are two mechanisms available for generating outputs from
filters.  The first is the "samples" method.  It causes a
filter output sample to be generated for every 'n'-th sample injected
into the filter.  The second is the "interval" method.  It causes a
filter output sample to be generated every 't' seconds.  The parameters
't' and 'n' can be specified in the configuration file.  The output
mechanism can be selected independently for each MP (or more
accurately, for each <stream/> element in the configuration file).

An OML filter is a processing block that accepts a stream of typed
tuples from an MP on its input, computes a function of the input
tuples, and outputs the result as another stream of tuples.  The
schema of the input tuples can be different from the schema of the
output tuples, i.e. they can differ in number, name and type.
Further, the output samples can be generated at a different rate to
the input samples.  For instance, an output might be generated only on
every tenth input sample.  For more on MP and filter schemas and how
measurement outputs are generated, see the section *MEASUREMENT OUTPUT
AND SCHEMAS* below.

CONFIGURATION FILE FORMAT
-------------------------

The configuration file must be an XML file. The root element must be
'omlc', and its children must be 'collect' elements. The following
example shows the skeleton of an OML config file:

--------------------------
    <omlc domain="my_experiment" id="my_source_id">
        <collect url="..." encoding="binary">
           ...
        </collect>

        ...

        <collect url="...">
           ...
        </collect>
    </omlc>
--------------------------

The 'omlc' element recognizes two attributes: 'experiment' and 'id'.
The 'domain' attribute names the experimental domain that the client
wants to join.  It names the database that the measurements will be
stored in by the linkoml:oml2-server[1] if one of the destinations is a
server.  This is the same as the *--oml-domain* flag on the command
line. The obsolescent 'experiment' attribute can still be used for this
purpose.

The 'id' attribute identifies the source of these measurements.
Typically it is used to identify the machine, but it is up to the
experimenter what to put in the 'id' field.  This is the same as the
*--oml-id* flag on the command line.

The 'encoding' attribute can be use to specify which protocol mode to
use.  'binary' is the default binary marshalling mechanism, while 'text'
switches to text mode.

The 'collect' elements identify separate destinations for the
measurements generated by the client programme. The 'url' attribute
identifies the destination. It can be either a file, or the IP address
(or hostname) and port of an linkoml:oml2-server[1]. See *URI FORMAT* in
linkoml:liboml2[1,_uri_format].  Within the 'collect' element should be a sequence
of 'stream' elements.  Each 'stream' element defines the sampling policy
and filtering to apply to a measurement stream derived form a particular
named MP. Here is an example of a configuration with two 'stream'
elements:

--------------------------
    <omlc domain="my_experiment" id="my_source_id">
        <collect url="tcp:192.0.2.200">
           <stream mp="radiotap" interval="2">
              <filter field="sig_strength_dBm" />
              <filter field="noise_strength_dBm" />
              <filter field="power" />
           </stream>
           <stream mp="udp" samples="10">
              <filter field="udp_len" />
           </stream>
        </collect>
    </omlc>
--------------------------

This example connects to a server on the default TCP port (3003) at IP
address 192.0.2.200, and extracts measurements from two of the
application's defined measurement points, named 'radiotap' and 'udp',
respectively.  The 'radiotap' measurements are sampled every two
seconds, and the 'udp' measurements are sampled every 10 samples.  For
the 'radiotap' measurements, three filters are defined: one for the
'sig_strength_dBm' field, one for the 'noise_strength_dBm' field, and
one for the 'power' field.  For the 'udp' measurements, a single
filter is established on the 'udp_len' field.

The 'stream' element also accepts a 'name' attribute that allows the
stream name to be controlled by the user.  If 'name' is omitted then
the stream is named according to <app-name>_<mp-name>, where
<app-name> is the application name as set by the application
developer, and <mp-name> is the name of the MP.  If 'name' is used
then its value is used to construct the stream name as
<app-name>_<stream-name>.  A stream must have a unique name, so if the
user wants to create two streams from the same MP then at least one of
them must have a 'name' attribute.  Omitting it, or having two streams
with identical 'name' attributes, will result in the *liboml2*
configuration process aborting with an error message about the
duplicate stream in the OML client log file.

Filters operate on a single scalar input value.  The 'filter' element
establishes a filter and the 'field' attribute selects the field of
the MP that should form the input for the filter.  The 'field'
attribute is mandatory.

Without any further attributes, the 'filter' element establishes a default
filter.  The default filter type is 'avg' for numeric values and
'first' for non-numeric values such as strings.

The 'filter' element also recognizes the 'operation' attribute, which allows
the user to select what type of filter to apply.  For instance, the
following selects a standard deviation filter, 'stddev':

--------------------------
<filter field="udp_len" operation="stddev"/>
--------------------------

The 'rename' attribute allows the user to name the stream output from
this filter:

--------------------------
<filter field="udp_len" operation="stddev" rename="udp_measurements"/>
--------------------------

It is possible to include several 'stream' elements using the same
'mp' attribute value. In that case, to avoid ambiguity the second will
be internally renamed to "<name>_2", the third to "<name>_3",
etc. This renaming will appear in the schema of the measurement
outputs (either in the local file or the database on the server end).
This behaviour may be augmented in a future version to give more
control of the renaming to the user.

CONFIGURATION WITHOUT XML FILE
------------------------------

When no configuration file is given, *liboml2* provides a basic set of
filters for each measurement point, and sends measurements to just one
collection URI (given by either the *--oml-collect* command line
option).  For each measurement point, each element of the measurement
point's injected tuple is given its own filter.  The filter created
depends on the type of the element and the current sampling policy.

For instance, suppose a measurement point is defined with a
measurement tuple as follows:

----------
("source"      : OML_UINT64_VALUE,
 "destination" : OML_UINT64_VALUE,
 "length"      : OML_INT32_VALUE,
 "snr"         : OML_DOUBLE_VALUE,
 "name"        : OML_STRING_VALUE)
----------

Then *liboml2* will create a separate filter for each of "source",
"destination", "length", "snr", and "name".  The filters for the first
four numeric elements will be an averaging filter (filter type
'avg'), and the last string element will be given a 'first' filter.
The 'first' filter keeps the first injected value in the current
sampling period and throws away all others, passing the first value on
to the measurement output stage.

For more information on measurement points and how
they are defined, see linkoml:liboml[3] and linkoml:omlc_add_mp[3].

MEASUREMENT OUTPUT AND SCHEMAS
------------------------------

The measurement output of an OML program goes either to an SQL
database (if using a network address in the 'collect' element's 'url'
attribute) or a file (if using the 'file://' url protocol).
Measurement points are created with a 'schema', as above, a schema
being an ordered list of (name, type) pairs.

OML filters also generate output with a declared schema. For each
measurement stream, *liboml2* generates a single output measurement
that is the union of the outputs of all filters attached to the MP.
The names of the fields (or columns) of the schema are derived from
the names of the original MP fields, and the output schemas of the
filters. The schemas can be observed directly in the file output
(identical schemas are sent to the server when a server is used). For
instance, here is the output schema for stream that takes its inputs
from a simple example MP that measures a string ("label") and an
integer ("seq_no"):

--------
schema: 1 generator_lin label:string seq_no:uint32
--------

The schema name is "generator_lin" -- a combination of the application
name ("generator") and the stream name ("lin").  (The number '1' on
this line is an index used in the output columns to identify a line of
measurement with the schema to which it conforms.)  This output can be
generated using an 'mp' element with 'samples="1"' and no explicit
filter:

--------
<omlc domain="my_experiment" id="my_source_id">
 <collect url="file:-">
   <stream mp="lin" samples="1" />
 </collect>
</omlc>
--------

This creates a 'first' filter for both of the fields of the
measurement point.  The 'first' filter outputs a single value that has
the same type as the filter's input.

If we change the configuration file to use 'samples="2"', then an
averaging filter is used for the numeric "seq_no" field ("label" is
unchanged).  The schema therefore changes as well:

--------
schema: 1 generator_lin label:string seq_no_avg:double seq_no_min:double seq_no_max:double
--------

An 'avg' filter picks one field of the MP to filter (in this case
"seq_no") and then produces a 3-tuple as output (avg, min, max).
Therefore *liboml2* creates a schema for this filter output that looks
like:

--------
("seq_no_avg" : OML_DOUBLE_VALUE,
 "seq_no_min" : OML_DOUBLE_VALUE,
 "seq_no_max" : OML_DOUBLE_VALUE)
--------

This is the general pattern for filters:  their output schemas are
formed by appending the name of the source MP with the name of the
filter output field.  (The 'first' filter is an exception in that it
just takes the name of the input field and uses that as the output
field name.)

When output is sent to a server, a database table is created for each
measurement point using the combined OML output schema as schema for
the table. For instance, the above example would translate to an SQL
CREATE statement like:

--------
CREATE TABLE generator_lin (label TEXT, seq_no_avg REAL, seq_no_min REAL, seq_no_max REAL);
--------

Note that even though an MP field may have an integral type, it may be
represented as a floating point type in the output because the filter
may output floating point values.  For instance, the average of a set
of integers is real valued because of the division in the averaging
operation.

If we use the 'name' attribute of the <stream/> element, the name of
the schema will follow the 'name' attribute rather than the name of
the source MP.  For instance consider this configuration file:

--------
<omlc domain="my_experiment" id="my_source_id">
 <collect url="file:-">
   <stream mp="lin" samples="1" name="foo"/>
 </collect>
</omlc>
--------

It will generate the following schema declaration:

--------
schema: 1 generator_foo label:string seq_no:uint32
--------

AVAILABLE FILTERS
-----------------

The following lists the filters that are available in OML, and
describes how they should be used.  We plan to add more filters with
each release of OML.

First Filter (first)
~~~~~~~~~~~~~~~~~~~~

This filter saves the first sample in a sample set and throws away all
the rest, outputting just the first sample. It accepts any type of
value as its input. It outputs a single value:

--------
("first" : OML_INPUT_VALUE)
--------

The pseudo-type OML_INPUT_VALUE indicates that this filter's output
has the same type as its input.

To use this filter, use 'operation="first"' in the 'filter' element.

Last Filter (last)
~~~~~~~~~~~~~~~~~~~~

This filter saves the last sample in a sample set and throws away all
the rest, outputting just the last sample. It accepts any type of
value as its input. It outputs a single value:

--------
("last" : OML_INPUT_VALUE)
--------

The pseudo-type OML_INPUT_VALUE indicates that this filter's output
has the same type as its input.

To use this filter, use 'operation="last"' in the 'filter' element.

Averaging Filter (avg)
~~~~~~~~~~~~~~~~~~~~~~

This filter computes the average of its input samples. It accepts
numeric inputs only (one of the OML integer types or
OML_DOUBLE_VALUE). It outputs a pair of values, namely:

--------
("avg" : OML_DOUBLE_VALUE,
 "min" : OML_DOUBLE_VALUE,
 "max" : OML_DOUBLE_VALUE)
--------

where 'avg' is the average over the current sample set, 'min' is the
minimum value of the current sample set, and 'max' is the maximum
value of the current sample set.

To use this filter, use 'operation="avg"' in the 'filter' element.

Standard Deviation Filter (stddev)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This filter computes the standard deviation and variance of its inputs
samples. It accepts numeric inputs only (one of the OML integer types
or OML_DOUBLE_VALUE). It outputs a pair of values, namely:

--------
("stddev"   : OML_DOUBLE_VALUE,
 "variance" : OML_DOUBLE_VALUE)
--------

where 'stddev' is the standard deviation over the current sample set
and 'variance' is the variance (i.e. the square of 'stddev').

To use this filter, put 'operation="stddev"' in the 'filter' element.

Sum Filter (sum)
~~~~~~~~~~~~~~~~

This filter computes the sum of its input samples. It accepts
numeric inputs only (one of the OML integer types or
OML_DOUBLE_VALUE). It outputs a single value, namely:

--------
("sum" : OML_DOUBLE_VALUE)
--------

where 'sum' is the sum all the sample values in the current sample
set.

To use this filter, use 'operation="sum"' in the 'filter' element.

Delta Filter (delta)
~~~~~~~~~~~~~~~~~~~~

This filter computes the change in its input value between the end of
the previous sample set and the start of the current one.  If the
value at the end of the previous sample set was 'last' and the value
at the end of the current sample set was 'current', then the filter
computes 'delta=current-last'. It accepts numeric inputs only (one
of the OML integer types or OML_DOUBLE_VALUE). It outputs a pair of
values, namely:

--------
("delta" : OML_DOUBLE_VALUE,
 "last   : OML_DOUBLE_VALUE)
--------

where 'delta' is the change in the input over the current sample set
and 'last' is the value that the input had at the end of the current
sample set.

The value of 'delta' in the first sample set is computed as
'current-first', where 'first' is the first value in the sample set
and 'current' is the final value.

To use this filter, use 'operation="delta"' in the 'filter' element.

NOTES
-----

Prior to OML 2.6, the configuration file format was identical in
structure but used a different set of names for the XML attributes and
elements.  These old names were very confusing, so they have been
replaced names that better reflect the underlying concepts.  The old
names are still supported so old configuration files and tools will
not break.  Here is a summary of the old names and how they relate to
the new ones:
[width="90%",cols="<2,<1"]
|=====================
^| OLD     ^| NEW
|  <omlc exp_id="abc" ...> | <omlc domain="abc" ... >
|  <mp name="def" rename="bar"...> | <stream mp="def" name="bar"...>
|  <f fname="avg" ...> | <filter operation="avg" ...>
|  <f pname="foo" ...> | <filter field="foo" ...>
|  <f sname="bar" ...> | <filter rename="bar" ...>
|=====================

BUGS
----
The selection of the 'first' filter when 'samples=1' is used can
be confusing for numeric MP fields because it results in a different
schema in the measurement output compared to other possible
configurations available from the command line, which use the 'avg'
filter.  It is not clear whether this is a feature or a bug.

include::bugs.txt[]

SECURITY CONSIDERATIONS
-----------------------

'oml2-server' does not use any authentication, and should thus be
considered insecure.  It is intended to be deployed behind firewalls
on a dedicated testbed network.  It should not be run as a daemon on
an open network.  Future versions of OML may be re-designed to be
suitable for use in insecure environments.

SEE ALSO
--------
Manual Pages
~~~~~~~~~~~~
linkoml:oml2-server[1], linkoml:liboml2[1], linkoml:liboml2[3], linkoml:omlc_add_mp[3]

include::manual.txt[]

// vim: ft=asciidoc:tw=72
