Create API for external parsers

Issue #121 resolved

Former user created an issue 2019-11-28

Would you consider a generic API for the parser? This will allow others to contribute to the project without having to learn yacc/lex. Some of the patterns can be trivially parsed with lua/awk but one is stuck without an API to inform the engine that there's an attack.

Comments (10)

Kevin Zheng
I would welcome ideas on what such an API would look like.

Currently, sshg-parser reads log messages from standard input and writes attacks to standard output in a special format – the intention here was that it shouldn’t be too hard to drop in a custom parser. Would that be more or less useful than an API, per se?
- 2019-11-28T08:36:17+00:00
Tommy Lee Jones
Hi Kevin,

Yes this is along the lines of what I was saying. However, after much thought, the important integration api really belongs in the blocker. The main goal of sshguard is to block a miscreant’s either the ip or domain. So, the key information for sshguard are those elements.

I was thinking about either an open api for the blocker that takes the most fundamental information - time, ip, (whatever else you guide us that sshguarg needs to block the miscreant.) One can then arrange to parse outside of sshguard and create an api compliant client for sshguard blocker.

Fundamentally, it might be easier to expose and document the current _special forma_t that you mentioned above for you to discontinue maintaining the parser… Does this make sense?

The next step is to add a community contributed parsers section to the project where you crate a library of these plugin parsers and accept contributions from your users once the blocker api is available. Bison/Byacc/Yacc requirement will be gone.

There are many advantages to this approach:
- You don’t need to continue maintaining the parser (it can be frozen as is)
- A bunch of current open parser issues become non-issue
- Users can create parsers in any language that they are comfortable with Rust, Lua, Awk? All are welcome
- There is a possibility for community created parsers where users will gravitate to the best of the best.
Thank you for a great project.

‌
- 2019-11-29T17:51:57+00:00
Kevin Zheng
I believe what you’re asking for is already here; I just need to spend some time documenting it.

To put in a custom parser, you need to edit the sshguard script (an improvement would be to make this configurable in sshguard.conf) to call something other than sshg-parser.

Then, you’d write a replacement for sshg-parser that reads logs from standard input and writes attacks to standard output:
```
static void print_attack(const attack_t *attack) {
    printf("%d %s %d %d\n", attack->service, attack->address.value,
           attack->address.kind, attack->dangerousness);
}
```
In other words, every detected attack is on a separate line, in the format:

<service code> <address> <'4' for IPv4, '6' for IPv6> <score>

For example:

100 8.8.8.8 4 10
- 2019-12-03T17:10:49+00:00
Kevin Zheng
- removed component
Unmark ‘parser’ component, because I use that for new/updated attack signatures.
- 2019-12-03T19:47:29+00:00
Tommy Lee Naija
This is excellent.

Using this information, it was trivial to combine logsurfer (https://www.crypt.gen.nz/logsurfer/) with sshguard and there is no need to learn lex, yacc, or bison.

This is what I did:
- logsurfer scans the logfile and emits 100 1.2.3.4 4 <score>
  - I am able to change score depending on the nature of the attack
- Pipe the output from logsurfer to sshguard using deamontools (I am using sshguard as the ‘logger’ in daemontools - attack information is never lost:)
- Supervise the pipeline with daemontools (or runit, pick your poison)
- Buy Kevin a beer.
Thank you for a great project. It works because of an internal implementation of the UNIX philosophy. Here is your virtual beer. 🍺🍺🍺

PS: I would recommend a rethink of ‘service’. I was wondering if you could use service numbers from /etc/service (22 is SSH, http is 80, https is 443), and 100 is unassigned - call it service_id, and move the current service to service_name. Just a thought. I am happy where I am at.
- 2019-12-09T20:12:28+00:00
Kevin Zheng
Hi Tommy, I’m glad you’ve been able to write a drop-in replacement for sshg-parser. As you’ve noted, this is exactly why things were designed this way!

I’m not familiar with logsurf, but if you have a patch or configuration for logsurf that you’d like to contribute, I think others might benefit from seeing what you’ve done.

I agree that using the service names and/or port numbers from /etc/services is a good idea. The best time to make this change might be now, before more people make compatible parsers.
- 2019-12-09T20:27:02+00:00
Tommy Lee Naija
Hi Kevin.

There is no patch necessary for logsurf to execute the workflow; one only needs to understand how logsurf parses the log file using regular expressions. Take the following log line as an example. Line 1 is the log line, lines 2 and 3 are logsurf instructions. The logsurf project provides many examples to learn from.
```
#@ 10.11.12.13 fe_wan~ fe_wan/<NOSRV> ... LR-- 1/1/0/0/3 0/0 "GET //admin/config.php?password
'^.{25,} (.*) fe_wan.*fe_wan/<NOSRV>.*/config.php' - - - 0
  echo "100 $2 4 1000"
Yields
100 10.11.12.13 4 1000
```
- Line 1 starts with #. it is a comment line for logsurf (documentation) from HAProxy log
- Line 2: Logsurf starts ‘parsing’ from column 25, the necessary regexp capture (.*) is the ip, the remainder of the line is to make line recognition easier
- Line 3: echo is a built-in command instructed to emit the format you provided earlier in the thread
  
  shell <service code> <address> <'4' for IPv4, '6' for IPv6> <score> ie <100:service>
  
  <100:service code> <10.11.12.13 (capture #2 ip_address)> <4:ipv4> <score:1000>.
With this, I can manage the score any way I want. Some matches score a measly one (1), others like the above 1000 (immediate ban.) I would like to use 80, 443 as service codes for http and https respectively. Now you understand why it would be nice to use standardized service codes:)

I normally start sshguard with a threshold of 100 (-a 100). Life is good.

Thanks again.

‌
- 2019-12-09T20:44:28+00:00
Tommy Lee Naija
Kevin, I would say this issue is resolved.

Thank you for your help.
- 2019-12-09T21:28:13+00:00
Kevin Zheng
- changed status to resolved
- 2020-05-14T00:17:13+00:00
Kevin Zheng
- removed version
Removing version: 2.2 (automated comment)
- 2020-05-14T22:18:45+00:00
Log in to comment

Assignee: –

Type: proposal

Priority: minor

Status: resolved

Component: –

Votes: 1

Watchers: 2