Header rules

HeaderParser

class log2seq.header.HeaderParser(items, separator=None, full_format=None, **kwargs)

Parser for header parts in log messages.

Header parts in log messages provides some items of meta-information. For example, default syslogd records messages with timestamps and hostnames as header information. The other parts (free-format statements) are parsed as statement part (with item Statement).

A HeaderParser rule is represented with a list of Item. Item is a component of regular expression patterns to parse corresponding variable item. HeaderParser automatically generates one regular expression pattern from the items, and tests that it matches the input log messages. If matched, HeaderParser extracts variables for the items.

In HeaderParser rule, one Statement item is mandatory.

If you want to extract timestamp in datetime.datetime format (i.e., using reformat_timestamp option), the items should include ones with special value names (see value_name):

  • year (int)
  • month (int)
  • day (int)
  • hour (int, optional)
  • minute (int, optional)
  • second (int, optional)
  • microsecond (int, optional)
  • tzinfo (datetime.tzinfo, optional)

Besides, you can also use aggregated items with following value names:

  • datetime (datetime.datetime): all
  • date (datetime.date): year, month, day
  • time (datetime.time): hour, minute, second, microsecond, tzinfo

If some timestamp-related items are not given, please add corresponding values (in the specified type) in defaults. Note that “year” is missing in some logging framework (e.g., default syslogd configuration).

There are two options to define the placement of Items. One is “separator”, which is an easier (and recommended) option. Separator defines separator characters between Items. The other is “full_format”, which is similar to log_format in logparser[1]. It is a regular expression holed with Item replacers. For example, if full_format is r”<0> <1> <2> [<3>] <4>”, <0> will be replaced with the first Item in items (The number corrsponds to the index of given items). If you need “<” and “>”, escape it with a backslash. The number of replacers must be equal to the length of items. Note that optional Items must be manually enclosed with “(” and “)?” in the full_format regular expression. (e.g., r”<0> <1> <2> ([<3>] )?<4>” where Item-3 is optional.)

Parameters:
  • items (list of Item) – header format rule.
  • separator (str, optional) – Separators for header part. Defaults to white spaces.
  • full_format (str, optional) – Place format of header part. If given, argument separator is ignored.
  • defaults (dict, optional) – Default values, used for missing values (for optional or missing items) in log messages.
  • reformat_timestamp (bool, optional) – Transform time-related items into a timestamp in datetime.datetime object. Set false if log messages do not have timestamps.
  • astimezone (datetime.tzinfo, optional) – Convert timestamp to given new timezone by calling datetime.datetime.astimezone(). Effective only when reformat_timestamp is True.
Reference:
[1] logparser: https://github.com/logpai/logparser
process_line(line)

Parse header part of a log message (i.e., a line).

Parameters:line (str) – A log message without line feed code.
Returns:Parsed items.
Return type:dict

Basic items

class log2seq.header.Item(optional=False, dummy=False)

Base class of items, components of header parts.

Parameters:
  • optional (bool, optional) – This item is optional. Not all inputs need this item in their header parts. If true, Item.pick() returns None if no corresponding part found.
  • dummy (bool, optional) – Dummy items do not extract any values. If true, log2seq does not try extracting a value for this item, and Item.pick() will not be called for this item. For example, if a header part have multiple same value (e.g., year in top and middle), one of them should be dummy for avoiding re groupname duplication.
get_regex()

Get regular expression pattern string of this Item instance.

match_name

Match name of this Item.

Match name is used to distinguish the extracted values in re MatchObject. Match name cannot be duplicated in a set of ParserHeader items.

Type:str
pattern

Get regular expression pattern string for this Item class.

Type:str
pick(mo)

Get value name and the extracted values from re MatchObject in appropriate format.

Parameters:mo – MatchObject for combined pattern of HeaderParser.
Returns:value_name and the value extracted by Item.pick_value().
Return type:tuple
pick_value(mo)

Get a value from re MatchObject in appropriate format.

Parameters:mo – MatchObject for combined pattern of HeaderParser.
Returns:Extracted value for this Item. Any type, depending on the class. If not specified, a matched string value is returned as is.
test(string)

Test this Item will match the input string or not. Note that this function is only for debugging your parser script (because it generates internal re.Pattern for every call).

Parameters:string – Input string to test matching.
Returns:re.Match or None
value_name

Value name of this Item.

Value name is used as the keys of return value of HeaderParser. Also, timestamps are reformatted with specific value names.

Type:str
class log2seq.header.ItemGroup(items, separator=None, optional=False)

ItemGroup enables us a hierarchical parsing of Items. One typical use is defining an optional part including multiple Items appearing together. Another use is using different separator definition in the ItemGroup part.

class log2seq.header.Statement(optional=False, dummy=False)

Item for statement part. Usually it includes strings except all other items with greedy match.

Timestamp items

class log2seq.header.UnixTime(optional=False, dummy=False)

Item for unixtime integer.

e.g., 1551024123 for 2019-02-25 01:02:03
class log2seq.header.DatetimeISOFormat(optional=False, dummy=False)

Item for datetime in ISO8601 (or RFC3339) format. Datetime information (year, month, day, hour minute, second) are always included. Microseconds and timezone are optionally extracted.

e.g., 2112-09-03T11:22:33
e.g., 2112-09-03T11:22:33.012345+09:00
class log2seq.header.Date(optional=False, dummy=False)

Item for date, including year, month, and day. Represented in eight-letter numeric string separated with two hyphens. Similar to the formar part of DatetimeISOFormat.

e.g., 2112-09-03
class log2seq.header.Time(optional=False, dummy=False)

Item for time, including hour, minute, and second. It can also include microsecond and timezone, as like DatetimeISOFormat.

e.g., 11:22:33
class log2seq.header.MonthAbbreviation(optional=False, dummy=False)

Item for abbreviated month names. Strings with first capitalized 3 characters will match (e.g., Jan, Feb, Mar, …).

class log2seq.header.DemicalSecond(optional=False, dummy=False)

Item for demical seconds.

e.g., 678 as milliseconds
e.g., 123456 as microseconds
class log2seq.header.TimeZone(optional=False, dummy=False)

Item for timezone.

e.g., +0900
class log2seq.header.DateConcat(no_century=False, **kwargs)

Item for date without separators.

e.g., 20190225 for 2019-02-25
e.g., 190225 for 2019-02-25 (no_century is True)
Parameters:no_century (bool, optional) – If true, abbreviate year by removing century.
class log2seq.header.TimeConcat(optional=False, dummy=False)

Item for time without separators.

e.g., 010203 for 01:02:03

Variable items

class log2seq.header.NamedItem(name, **kwargs)

A base class of namable items. Namable items requires an argument for the name. The name is used as match name and value name. The name should not be duplicated with match names of other items (including unnamable items) in one HeaderParser rule.

Parameters:name (string) – name of Item instance, used as match name and value name.
class log2seq.header.Digit(name, **kwargs)

NamedItem for a digit value.

class log2seq.header.Hostname(name, **kwargs)

NamedItem for a hostname (or IPaddress) string.

Check Hostname.pattern to see the accepted names. If your hostname does not match the pattern, consider using UserItem. (This is because hostname can include various values depending on the devices or OSes.)

class log2seq.header.UserItem(name, pattern, strip=None, **kwargs)

Customizable NamedItem.

The pattern is described in Python Regular Expression Syntax (re). Some special characters are not allowed to use for this Item because HeaderParser generates a single re.Pattern by automatically combining the given set of items.

  • Optional parts, such as ?
  • ^ and $
Parameters:
  • name – same as NamedItem.
  • pattern – regular expression pattern of this Item instance.
  • strip (str, optional) – specified characters will be stripped with str.strip() in the parsed object.