anonymizing IPv6 with embedded IPv4

Apart from the normal format for IPv6, it is also possible to format an IPv6 with an embedded IPv4 instead of the last two blocks. This is described in the RFC 4291 section 2.2.3, for example:
The format is used for systems working with both IPv6 and IPv4 addresses. However, in the RFC it is not made clear if this format can also be used to display normal IPv6 addresses or not.

It is important that anonymizers recignize this format as well, since privacy regulations also require it to be anonymized. Therefore, a solution should be found that can do that.


SLFA release

As I mentioned in my recent post about the mmanon rewrite, I have been working on an offline anonymization tool. It is called SLFA (short for simple log file anonymizer) and I have now finished the first usable version.

SLFA is a java application that allows you to anonymize files from the command line. Currently, it can anonymize IPv4 addresses and IPv6 addresses (similar to the mmanon module of rsyslog) and can be configured to work with regular expressions.

It will be released shortly and  would be glad  about feedback once it has been released.


Mmanon rewrite finished for the time being

I have recently finished the rewrite of the mmanon module. Now I have also finished implementing support for IPv6. This includes the same parameters as IPv4, but the mode parameter is now called anonmode, to make a later implementation of different output format modes for ipv6 easier. This might be in the future, but it would be possible to add another parameter. It could be named outputmode and would set the format of IPv6 output format. One thing that could be configured would be if abbreviations (::) should be used.

Also, IPv6 does not support anonymization in simple mode because of the different formats possible. It is now also possible to only anonymize IPv4 or IPv6 by using the new "enable" parameters. These have been added because mmanon now does not only work for IPv4.

Right now,  I consider the rewrite and the implementation of IPv6 finished until there is more feedback from users. It would also be great if people could test the changes once they are merged and post their feedback.

I will now work on an offline tool for log anonymization.


IPv6 anonymization portability problems

IPv6-addresses are represented by 128 bits. This makes it possible to provide far more addresses than IPv4.

However, this also causes problems when working with IPv6. In this case, I am currently working on an IPv6-anonymiation function for rsyslog.

There are some systems that support an unsigned 128 bit integer when using GCC or clang compilers. However, many systems do not support this datatype.

Since rsyslog tries to cater to as many systems as possible, the implementation has to work on all platforms. As such, we have to use two unsigned 64 bit integers instead of one with 128 bits.

The main problem this causes is that this implementation is on software-level as opposed to he hardware-level a 128 bit integer is implemented on. This brings up a conflict of portability vs. speed, since an implementation on software-level is slower than one on hardware-level.

I have decided to make the first implementation as portable as possible, but might also later on try to speed the anonymization up. This might be possible by checking whether the 128 bit integer is supported by the system and if it is using it instead of two 64 bit integers.


rewriting mmanon: an update

I have completed the rewrite of the IPv4-function of the mmanon module.
I have managed to keep the original parameters available, although the "rewrite" option is now called "zero". This is not going to be a problem causing older, already running, configurations to no longer work, as the old parameter-names still work. However, having the "ipv4." prefix attached to configuration regarding IPv4-anonymization is going to improve clarity, as the IPv6 feature follows.

But whats new? I have added the options to randomize bits in an IP-address as a form of anonymization. Also it is now possible to anonymize IP-addresses as random, while still having one IP-address always anonymized as the same alias (generated randomly).

Now I plan on implementing an IPv6-functionality for the mmanon module with similar parameter as the IPv4 one. After that I might add other functions, like the ability to configure other separators for IP-addresses like '-' or an option to reverse the direction of anonymization.


Rewriting mmanon

Currently, rsyslog's mmanon module has the task of anonymizing ip-addresses. However, due to only being able to anonymize ipv4-addresses, I decided to overhaul the module to be able to also work with ipv6-addresses (see this feature request). In doing this, I also noticed some bugs with the ipv4-module.

Now there are two options: try to fix these or rewrite the function. While it might seem like more work, I have decided to rewrite the function, since I was already planning to add new configuration options like the option to randomize ip-addresses.

I have already worked on a similar tool for liblognorm which has never been released due to a lack of time on my part. An UNFINISHED VERSION of this work is available in my private repository.

Since this function already has some of the options I plan to implement, I will implement this as part of the new mmanon ipv4 anonymization.

But what about the current options? I will try to implement as many as possible in the initial rewrite, however some may be up for later reintroduction since I also plan on starting with the ipv6 anonymization as soon as possible. However I think that not too many people were using the module and consequently, its options. At least searching the github issue tracker for mmanon did not bring up many issues. Most of them not even really related to mmanon. There are also only few questions regarding mmanon on the mailing list. I think this proves my point, since some of the bugs affect major parts of the function.

If you were using or are currently using the module, I would be glad if you could tell me what configurations you are using and what is important to you when using the module. I have created a github issue tracker for this purpose.


Improving rsyslog debug output