Statement rules¶

StatementParser¶

class log2seq.statement.StatementParser(actions)¶

Parser for statement parts in log messages.

Statement parts in log messages describe the event in free-format text. This parser will segment the text into words and theire separators. The words are parsed as ‘words’ item, and the separators are parsed as ‘symbols’ item.

The behavior of this parser is defined with a list of actions. The actions are sequentially applied into the statement, and separate it into a sequence of words (and separator symbols).

Parameters:	actions (list of any action) – Segmentation rules. The rules are sequentially applied to the input statement.

process_line(statement: str, verbose: bool = False)¶

Parse statement part of a log message (i.e., a line).

Parameters:	statement (string) – String of statement part. verbose (bool, optional) – Show intermediate progress of applying actions.
Returns:	List of two components: words and symbols. First component, words, is list of words. Second component, symbols, is list of separator string symbols. The length of symbols is always len(words)+1, which includes one before first word and one after last word. Some symbols can be empty string.
Return type:	tuple

Standard actions¶

class log2seq.statement.Split(separators)¶

Split statement (or its parts) by given separators.

For example, separators of white space and dot translates

['This is a statement.'] -> ['This', 'is', 'a', 'statement']

The removed separators (white space and dot in this case) will not be considered in further actions.

Example

>>> parser = StatementParser([Split(" .")])
>>> parser.process_line("This is a statement.")
(['This', 'is', 'a', 'statement'], ['', ' ', ' ', ' ', '.'])

Parameters:	separators (str) – separator symbol strings. If iterable, they imply joined and used all for segmentation. Escape sequence is internally added, so you don’t need to add it manually.

class log2seq.statement.Fix(patterns)¶

Add Fixed flag to matched parts.

Fixed parts will not be segmented by following actions. Fixed parts are selected by regular expression of given pattern (see re).

Example

>>> p = StatementParser([Split(" "), Fix(r".+\.txt"), Split(".")])
>>> p.process_line("parsing sample.txt done.")
(['parsing', 'sample.txt', 'done'], ['', ' ', ' ', '.'])

Parameters:	patterns (str or list of str) – Regular expression patterns. If multiple patterns are given, they are matched with every word in order.

Extended actions¶

class log2seq.statement.FixIP(address=True, network=True)¶

Add Fixed flag to the parts of IP addresses.

This class use ipaddress library instead of regular expression.

Parameters:	address – match IP addresses, defaults to True network – match IP networks, defaults to True

class log2seq.statement.FixPartial(patterns, fix_groups, recursive=False, remove_groups=None, rest_remove=False)¶

Extended Fix action to accept complicated patterns.

Usual Fix consider the matched part as a word, and fix it. In contrast, FixPartial allow the matched part to include multiple fixed words or separators.

Usecase 1:: e.g., source 192.0.2.1.80 initialized.

If you intend to consider 192.0.2.1.80 as a combination of two different word: IPv4 address 192.0.2.1 and port number 80, this cannot be segmented with simple Fix and Split actions. Following example with FixPartial can fix these two variables.

Example

>>> pattern = r'^(?P<ipaddr>(\d{1,3}\.){3}\d{1,3})\.(?P<port>\d{1,5})$'
>>> parser = StatementParser([Split(" "), FixPartial(pattern, fix_groups=["ipaddr", "port"]), Split(".")])
>>> parser.process_line("source 192.0.2.1.80 initialized.")
(['source', '192.0.2.1', '80', 'initialized'], ['', '', '.', '', '.'])

Usecase 2:: e.g., comment added: "This is a comment description".

If you intend to consider the comment (strings between parenthesis) as a word without segmentation, this cannot be achieved with simple Fix and Split actions. Following example with FixPartial can fix the comment part.

Example

>>> pattern = r'^.*?(?P<left>")(?P<fix>.+?)(?P<right>").*$'
>>> parser = StatementParser([FixPartial(pattern, fix_groups=["fix"], \
... remove_groups=["left", "right"], rest_remove=False), Split(' :.')])
>>> parser.process_line('comment added: "This is a comment description".')
(['comment', 'added', 'This is a comment description'], ['', ' ', ': "', '".'])

Parameters:

patterns (str) – Regular expression patterns. If multiple patterns given, the first matched pattern is used to Fix the part (other patterns are ignored).
fix_groups (str or list of str) – Name groups in the patterns to fix. e.g., [“ipaddr”, “port”] for Usecase 1. Unspecified groups are not fixed, so you can use other group names to other re functions like back references.
recursive (bool, optional) – If True, the patterns will be searched recursively.
remove_groups (str or list of str, optional) – Name groups in the patterns that should be considered as separators.
rest_remove (bool, optional) – This option determines how to handle strings outside the fixed groups. e.g., ‘comment added: “’ and ‘”.’ in Usecase 2. Defaults to False, which means they are left as parts for further actions. In contrast, if True, they are considered as separators and will not be segmented or fixed further.

class log2seq.statement.FixParenthesis(patterns, recursive=False)¶

Extended FixPartial easily used to fix strings between parenthesis.

The basic usage is similar to FixPartial, but this class is designed especially for parenthesis, and the format of patterns is simpler. For example, FixParenthesis with pattern ['"', '"'] work samely as FixPartial with pattern r'^.*?"(?P<fix>.+?)".*$'.

Each pattern is a 2-length list of left and right parenthesis. The left and right pattern can consist of multiple characters, such as [""].

Example

>>> parser = StatementParser([FixParenthesis(['"', '"']), Split(' .:"')])
>>> parser.process_line('comment added: "This is a comment description".')
(['comment', 'added', 'This is a comment description'], ['', ' ', ': "', '".'])

Note: If a statement has multiple pairs of parenthesis, you need to add multiple FixParenthesis action to StatementParser actions. This is because FixParenthesis accept only one fix_group to extract in the action.

class log2seq.statement.Remove(patterns)¶

Add Separator flag to matched parts.

Separator parts will be ignored by following actions. Separator parts are selected by regular expression of given pattern (see re).

Parameters:	patterns (str or list of str) – Regular expression patterns. If multiple patterns are given, they are matched with every word in order.

class log2seq.statement.RemovePartial(patterns, remove_groups, recursive=False)¶

Extended Remove action to accept complicated patterns.

Usual Remove consider the matched part as a separator. In contrast, RemovePartial allow partially removing separators from a word matching with the given patterns.

Example

>>> rpattern = r'^.*([^:](?P<colon>:))$'
>>> fpattern = r'^\d{2}:\d{2}:\d{2}\.\d{3}$'
>>> rules = [Split(" "), RemovePartial(rpattern, remove_groups=["colon"]), Fix(fpattern), Split(":")]
>>> parser = StatementParser(rules)
>>> parser.process_line("2000 Mar 4 12:34:56.789: message: duplicated header")
(['2000', 'Mar', '4', '12:34:56.789', 'duplicated', 'header'], ['', ' ', ' ', ' ', ': ', ' ', ''])

class log2seq.statement.ConditionalSplit(patterns, separators)¶

Split parts matching the given patterns by given separators.

Example

>>> parser = StatementParser([
>>>     Split(" ()"),
>>>     RemovePartial(r'^.*[^:](?P<colon>:)$', remove_groups=["colon"]),
>>>     ConditionalSplit(r'^%[A-Z]+-\d+(-[A-Z]+-\d+)?$', r'%-')
>>> ])
>>> parser.process_line("%KERNEL-4-EVENT-7: host h1-i2.example.org scored -0.035 value (20.0%)")
['KERNEL', '4', 'EVENT', '7', 'host', 'h1-i2.example.org', 'scored', '-0.035', 'value', '20.0%']

Parameters:	patterns (str) – Regular expression patterns. If multiple patterns given, this action will split parts matching at least one of them. separators (str) – separator symbol strings. If iterable, they imply joined and used all for segmentation. Escape sequence is internally added, so you don’t need to add it manually.