Codecs
Codecs are used to describe how to decode data from the wire and encode it back to wire format.
Supported Codecs
json
En- and decodes JSON, for encoding a minified format is used (excluding newlines and spaces).
string
Treats the event as non structured string. It is required that the input is valid UTF-8 or the decoding will fail.
msgpack
Msgpack works based on the msgpack binary format that is structurally compatible with JSON.
Being a binary format, message pack is significantly more performant and requires less space compared to JSON.
It is an excellent candidate to use in tremor to tremor deployments but as well with any offramp that does support this format.
influx
En- and decodes the influx line protocol. The structural representation of the data is as follows:
weather,location=us-midwest temperature=82 1465839830100400200
translates to:
{
"measurement": "weather",
"tags": { "location": "us-midwest" },
"fields": { "temperature": 82.0 },
"timestamp": 1465839830100400200
}
binflux
The binflux
codec is a binary representation of influx data that is significantly faster encodes and decodes as well as takes less space on the wire.
The format itself does not include framing but can be used with the size-prefix
pre/post processors.
For all numbers network byte order is used (big endian). The data is represented as follows:
- 2 byte (u16) length of the
measurement
in bytes - n byte (utf8) the measurement (utf8 encoded string)
- 8 byte (u64) the timestamp
- 2 byte (u16) number of tags (key value pairs) repetitions of:
- 2 byte (u16) length of the tag name in bytes
- n byte (utf8) tag name (utf8 encoded string)
- 2 byte (u16) length of tag value in bytes
- n byte (utf8) tag value (utf8 encoded string)
- 2 byte (u16) number of fiends (key value pairs) repetition of:
- 2 byte (u16) length of the tag name in bytes
- n byte (utf8) tag name (utf8 encoded string)
- 1 byte (tag) type of the field value can be one of:
TYPE_I64 = 0
followed by 8 byte (i64)TYPE_F64 = 1
followed by 8 byte (f64)TYPE_TRUE = 2
no following dataTYPE_FALSE = 3
no following dataTYPE_STRING = 4
followed by 2 byte (u16) length of the string in bytes and n byte string value (utf8 encoded string)
statsd
The same as the influx, the statsd
codec translates a single statsd
measurement into a structured format. The structure is as follows:
sam:7|c|@0.1
Translates to:
{
"type": "c",
"metric": "sam",
"value": 7,
"sample_rate": 0.1
}
The following types are supported:
c
forcounter
ms
fortiming
g
forgauge
h
forhistogram
s
forsets
For gauge there is also the field action
which might be add
if the value was prefixed with a +
, or sub
if the value was prefixed with a -
yaml
En- and decodes YAML.
syslog
En- and decodes syslog messages (both, the standard IETF format and the old BSD format). A syslog message following BSD format as follows:
<13>Jan 5 15:33:03 74794bfb6795 root[8539]: i am foobar
get translates to:
{
"severity": "notice",
"facility": "user",
"hostname": "74794bfb6795",
"appname": "root",
"msg": "i am foobar",
"procid": 8539,
"msgid": null,
"protocol": "RFC3164",
"protocol_version": null,
"structured_data": null,
"timestamp": 1609860783000000000
}
Syslog message following IETF standard as follows:
<165>1 2021-03-18T20:30:00.123Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut=\"3\" eventSource=\"Application\" eventID=\"1011\"] BOMAn application event log entry..."
get translates to:
{
"severity": "notice",
"facility": "local4",
"hostname": "mymachine.example.com",
"appname": "evntsog",
"msg": "BOMAn application event log entry...",
"procid": null,
"msgid": "ID47",
"protocol": "RFC5424",
"protocol_version": 1,
"structured_data": {
"exampleSDID@32473" :
[
{"iut": "3"},
{"eventSource": "Application"},
{"eventID": "1011"}
]
},
"timestamp": 1616099400123000000
}
a malformed syslog message is treated under 3164
protocol and entire string goes to the msg
of result object.