Header rules¶
HeaderParser¶
-
class
log2seq.header.
HeaderParser
(items, separator=None, full_format=None, **kwargs)¶ Parser for header parts in log messages.
Header parts in log messages provides some items of meta-information. For example, default syslogd records messages with timestamps and hostnames as header information. The other parts (free-format statements) are parsed as statement part (with item
Statement
).A HeaderParser rule is represented with a list of
Item
. Item is a component of regular expression patterns to parse corresponding variable item. HeaderParser automatically generates one regular expression pattern from the items, and tests that it matches the input log messages. If matched, HeaderParser extracts variables for the items.In HeaderParser rule, one
Statement
item is mandatory.If you want to extract timestamp in datetime.datetime format (i.e., using reformat_timestamp option), the items should include ones with special value names (see
value_name
):- year (int)
- month (int)
- day (int)
- hour (int, optional)
- minute (int, optional)
- second (int, optional)
- microsecond (int, optional)
- tzinfo (datetime.tzinfo, optional)
Besides, you can also use aggregated items with following value names:
- datetime (datetime.datetime): all
- date (datetime.date): year, month, day
- time (datetime.time): hour, minute, second, microsecond, tzinfo
If some timestamp-related items are not given, please add corresponding values (in the specified type) in defaults. Note that “year” is missing in some logging framework (e.g., default syslogd configuration).
There are two options to define the placement of Items. One is “separator”, which is an easier (and recommended) option. Separator defines separator characters between Items. The other is “full_format”, which is similar to log_format in logparser[1]. It is a regular expression holed with Item replacers. For example, if full_format is r”<0> <1> <2> [<3>] <4>”, <0> will be replaced with the first
Item
in items (The number corrsponds to the index of given items). If you need “<” and “>”, escape it with a backslash. The number of replacers must be equal to the length of items. Note that optional Items must be manually enclosed with “(” and “)?” in the full_format regular expression. (e.g., r”<0> <1> <2> ([<3>] )?<4>” where Item-3 is optional.)Parameters: - items (list of
Item
) – header format rule. - separator (str, optional) – Separators for header part. Defaults to white spaces.
- full_format (str, optional) – Place format of header part. If given, argument separator is ignored.
- defaults (dict, optional) – Default values, used for missing values (for optional or missing items) in log messages.
- reformat_timestamp (bool, optional) – Transform time-related items into a timestamp in datetime.datetime object. Set false if log messages do not have timestamps.
- astimezone (datetime.tzinfo, optional) – Convert timestamp to given new timezone by calling datetime.datetime.astimezone(). Effective only when reformat_timestamp is True.
- Reference:
- [1] logparser: https://github.com/logpai/logparser
-
process_line
(line)¶ Parse header part of a log message (i.e., a line).
Parameters: line (str) – A log message without line feed code. Returns: Parsed items. Return type: dict
Basic items¶
-
class
log2seq.header.
Item
(optional=False, dummy=False)¶ Base class of items, components of header parts.
Parameters: - optional (bool, optional) – This item is optional. Not all inputs need this item in their header parts. If true, Item.pick() returns None if no corresponding part found.
- dummy (bool, optional) – Dummy items do not extract any values. If true, log2seq does not try extracting a value for this item, and Item.pick() will not be called for this item. For example, if a header part have multiple same value (e.g., year in top and middle), one of them should be dummy for avoiding re groupname duplication.
-
match_name
¶ Match name of this Item.
Match name is used to distinguish the extracted values in re MatchObject. Match name cannot be duplicated in a set of ParserHeader items.
Type: str
-
pattern
¶ Get regular expression pattern string for this Item class.
Type: str
-
pick
(mo)¶ Get value name and the extracted values from re MatchObject in appropriate format.
Parameters: mo – MatchObject for combined pattern of HeaderParser
.Returns: value_name
and the value extracted byItem.pick_value()
.Return type: tuple
-
pick_value
(mo)¶ Get a value from re MatchObject in appropriate format.
Parameters: mo – MatchObject for combined pattern of HeaderParser
.Returns: Extracted value for this Item
. Any type, depending on the class. If not specified, a matched string value is returned as is.
-
test
(string)¶ Test this Item will match the input string or not. Note that this function is only for debugging your parser script (because it generates internal re.Pattern for every call).
Parameters: string – Input string to test matching. Returns: re.Match or None
-
value_name
¶ Value name of this
Item
.Value name is used as the keys of return value of
HeaderParser
. Also, timestamps are reformatted with specific value names.Type: str
-
class
log2seq.header.
ItemGroup
(items, separator=None, optional=False)¶ ItemGroup enables us a hierarchical parsing of Items. One typical use is defining an optional part including multiple Items appearing together. Another use is using different separator definition in the ItemGroup part.
-
class
log2seq.header.
Statement
(optional=False, dummy=False)¶ Item for statement part. Usually it includes strings except all other items with greedy match.
Timestamp items¶
-
class
log2seq.header.
UnixTime
(optional=False, dummy=False)¶ Item for unixtime integer.
e.g.,1551024123
for 2019-02-25 01:02:03
-
class
log2seq.header.
DatetimeISOFormat
(optional=False, dummy=False)¶ Item for datetime in ISO8601 (or RFC3339) format. Datetime information (year, month, day, hour minute, second) are always included. Microseconds and timezone are optionally extracted.
e.g.,2112-09-03T11:22:33
e.g.,2112-09-03T11:22:33.012345+09:00
-
class
log2seq.header.
Date
(optional=False, dummy=False)¶ Item for date, including year, month, and day. Represented in eight-letter numeric string separated with two hyphens. Similar to the formar part of DatetimeISOFormat.
e.g.,2112-09-03
-
class
log2seq.header.
Time
(optional=False, dummy=False)¶ Item for time, including hour, minute, and second. It can also include microsecond and timezone, as like
DatetimeISOFormat
.e.g.,11:22:33
-
class
log2seq.header.
MonthAbbreviation
(optional=False, dummy=False)¶ Item for abbreviated month names. Strings with first capitalized 3 characters will match (e.g.,
Jan
,Feb
,Mar
, …).
-
class
log2seq.header.
DemicalSecond
(optional=False, dummy=False)¶ Item for demical seconds.
e.g.,678
as millisecondse.g.,123456
as microseconds
-
class
log2seq.header.
TimeZone
(optional=False, dummy=False)¶ Item for timezone.
e.g.,+0900
-
class
log2seq.header.
DateConcat
(no_century=False, **kwargs)¶ Item for date without separators.
e.g.,20190225
for 2019-02-25e.g.,190225
for 2019-02-25 (no_century is True)Parameters: no_century (bool, optional) – If true, abbreviate year by removing century.
-
class
log2seq.header.
TimeConcat
(optional=False, dummy=False)¶ Item for time without separators.
e.g.,010203
for 01:02:03
Variable items¶
-
class
log2seq.header.
NamedItem
(name, **kwargs)¶ A base class of namable items. Namable items requires an argument for the name. The name is used as match name and value name. The name should not be duplicated with match names of other items (including unnamable items) in one
HeaderParser
rule.Parameters: name (string) – name of Item
instance, used as match name and value name.
-
class
log2seq.header.
Hostname
(name, **kwargs)¶ NamedItem
for a hostname (or IPaddress) string.Check Hostname.pattern to see the accepted names. If your hostname does not match the pattern, consider using UserItem. (This is because hostname can include various values depending on the devices or OSes.)
-
class
log2seq.header.
UserItem
(name, pattern, strip=None, **kwargs)¶ Customizable
NamedItem
.The pattern is described in Python Regular Expression Syntax (re). Some special characters are not allowed to use for this Item because HeaderParser generates a single re.Pattern by automatically combining the given set of items.
- Optional parts, such as
?
^
and$
Parameters: - name – same as NamedItem.
- pattern – regular expression pattern of this Item instance.
- strip (str, optional) – specified characters will be stripped with str.strip() in the parsed object.
- Optional parts, such as